Hajnal Andréka and István Németi on Unity of Science: From Computing to Relativity Theory Through Algebraic Logic (Outstanding Contributions to Logic, 19) 3030641864, 9783030641863

This book features more than 20 papers that celebrate the work of Hajnal Andréka and István Németi. It illustrates an in

109 93 7MB

English Pages 527 [513] Year 2021

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Contents
Contributors
Part I Computing
1 Algebraic Logic and Knowledge Bases
1.1 Introduction
1.2 Preliminaries
1.2.1 Basic Notions and Notations
1.2.2 Halmos Categories and Halmos Algebras
1.2.3 The Galois Correspondence
1.3 Knowledge Base Model
1.3.1 What Is Knowledge?
1.3.2 Category of Knowledge Description FΘ(calH)
1.3.3 Category of Knowledge Content DΘ(calH)
1.3.4 The Knowledge Functor CtcalH
1.3.5 Definition of a Knowledge Base
1.4 Knowledge Bases Equivalences
1.4.1 Informationally Equivalent Knowledge Bases
1.4.2 LG-equivalent and LG-isotypic Knowledge Bases
1.4.3 LG-Equivalence and Informational Equivalence of Knowledge Bases
References
2 Guarded Ontology-Mediated Queries
2.1 Introduction
2.1.1 Rule-Based Ontology-Mediated Queries
2.1.2 Guardedness to the Rescue
2.2 Preliminaries
2.3 Query Evaluation
2.3.1 From Eval(G,CQ) to Satisfiability for the Guarded Fragment
2.4 Query Containment
2.4.1 Atomic Queries
2.4.2 From Conjunctive Queries to Atomic Queries
2.5 First-Order Rewritability
2.5.1 Atomic Queries
2.5.2 From Conjunctive Queries to Atomic Queries
2.6 Reasoning over Finite Instances
2.7 Conclusions
References
3 Semiring Provenance for Guarded Logics
3.1 Introduction
3.2 Modal Logic and the Guarded Fragment
3.3 Semiring Provenance for First-Order Logic and Acyclic Games
3.3.1 Commutative Semirings
3.3.2 Provenance for First-Order Logic
3.3.3 Provenance Analysis for Acyclic Games
3.3.4 Provenance Analysis via Model-Checking Games
3.4 Provenance Analysis for Modal Logic and the Guarded Fragment
3.5 Algorithmic Analysis
3.6 A More Abstract View of Guarded Logics
3.7 Guarded Negation First-Order Logic
3.7.1 Provenance Analysis for GNF
3.7.2 Model Checking Games for GNF* and Their Provenance Analysis
3.7.3 Algorithmic Analysis
References
4 Implicit Partiality of Signature Morphisms in Institution Theory
4.1 Introduction
4.1.1 Institution Theory
4.1.2 From Total to Partial Signature Morphisms in Institution Theory
4.1.3 Contributions and Structure of the Paper
4.2 Category-Theoretic and Other Preliminaries
4.2.1 Categories
4.2.2 Partial Functions
4.2.3 3/2-categories
4.2.4 Various Colimits in 3/2-categories
4.3 3/2-institutions
4.3.1 Institutions
4.3.2 3/2-institutions: Definition
4.3.3 3/2-institutions: Examples
4.3.4 3/2-institutional Seeds
4.3.5 Model Amalgamation in 3/2-institutions
4.3.6 Theory Morphisms in 3/2-institutions
4.3.7 Lifting Properties
4.4 Theory Blending in 3/2-institutions
4.4.1 Computational Creativity and Conceptual Blending
4.4.2 Theory Blending in 3/2-institutions
4.5 Theory Changes
4.5.1 The Problem of Merging Software Changes
4.5.2 Theory Changes
4.6 Conclusions
References
5 The Four Essential Aristotelian Syllogisms, via Substitution and Symmetry
5.1 Dedication
5.2 The Main Theorem
5.3 Aristotle
5.4 From Aristotle to the 19th Century
5.5 20th Century Treatments of the Aristotelian Syllogisms
5.6 The Aristotelian Syllogisms
5.6.1 Moods
5.6.2 Figures
5.7 Obversion and Conversion
5.8 Contraposition
5.9 Proof of Theorem 1
5.10 Conclusion
References
6 Adding Guarded Constructions to the Syllogistic
6.1 Introduction
6.2 Technical Preliminaries
6.3 Lower Complexity Bounds
6.4 Upper Complexity Bounds
6.5 Proof-Theoretic Consequences
References
7 The Significance of Relativistic Computation for the Philosophy of Mathematics
7.1 A Short Refresher on the RTM Model
7.2 RTM and Mathematical Knowledge
7.3 “RTM Proofs” and the Problem of Mathematical Explanation
7.4 The Theoretical Virtues of the RTM Model
7.5 Concluding Remarks
References
Part II Algebraic Logic
8 Generalized Quantifiers Meet Modal Neighborhood Semantics
8.1 Introduction: Quantifiers and Neighborhoods
8.2 Locality and Conservativity
8.2.1 Locality in Modal Semantics
8.2.2 Conservativity and Domain Restriction for Quantifiers
8.3 Invariance and Simulation
8.3.1 Modal Logic and Invariance
8.3.2 Invariance and Generalized Quantifiers
8.4 Modal Logics of Quantifiers
8.4.1 Modal Logic of Permutation-Invariant Quantifiers
8.4.2 Imposing More Conditions
8.4.3 Modal Logics of Specific Quantifiers
8.5 Conclusion
References
9 On the Semilattice of Modal Operators and Decompositions of the Discriminator
9.1 Introduction
9.2 Notation and First Definitions
9.2.1 Boolean Algebras
9.3 Modal Algebras
9.4 The Semilattice of Modal Operators
9.5 Decomposing Discriminators
9.6 Proper Companions
References
10 Modal Logics that Bound the Circumference of Transitive Frames
10.1 Algebraic Logic and Logical Algebra
10.2 Grzegorczyk and Löb
10.3 Clusters and Cycles
10.4 Models and Valid Schemes
10.5 Logics and Canonical Models
10.6 Finite Model Property for K4mathbbCn
10.7 Extensions of K4mathbbCn
10.7.1 Seriality
10.7.2 S4mathbbCn
10.7.3 Linearity
10.7.4 Simple Final Clusters
10.7.5 Degenerate Final Clusters
10.8 Models on Irresolvable Spaces
10.9 Generating Varieties of Algebras
References
11 Undecidability of Algebras of Binary Relations
11.1 Introduction
11.2 Definitions
11.3 Main Results
11.4 Some Earlier Results
11.5 Tiling
11.6 Partial Group Embedding
11.7 Extending the Results
References
12 On the Representation of Boolean Magmas and Boolean Semilattices
12.1 Introduction
12.2 Representable Boolean Magmas
12.3 Representable Boolean Semilattices
12.4 Constructions of Representable Boolean Magmas
12.5 Appendix: Known Representations for 8-Element Boolean Semilattices
References
13 Canonical Relativized Cylindric Set Algebras and Weak Associativity
13.1 Introduction
13.2 Definition of
13.3 Canonical Extensions
13.4 Algebras of Binary Relations
13.5 Characterizing WA
13.6 The Relativized Cylindric Set Algebra of a Suitable Structure
13.7 The Suitable Structure of a WA
13.8 The Complex Algebra of a Suitable Structure
13.9 Relation-Algebraic Reducts
13.10 Cylindric-Relativized Representation
13.11 Relativized Relational Representation
13.12 Elementary Laws of WA
References
14 Blow Up and Blur Constructions in Algebraic Logic
14.1 Introduction
14.2 The Algebras and Some Basic Concepts
14.3 Non-atom Canonicity of Infinitely Many Varieties Between CAn and RCAn
14.3.1 Clique Guarded Semantics
14.3.2 Blowing up and Blurring Finite Rainbow Cylindric Algebras
14.3.3 An Application on Omitting Types for the Clique Guarded Fragment of Ln
References
Part III Relativity Theory
15 Freeing Structural Realism from Model Theory
15.1 Introduction
15.2 Predicate Functors
15.3 Tractarian Geometry
15.4 Cylindric Algebras
15.5 Conclusion
References
16 In the Footsteps of Hilbert: The Andréka-Németi Group’s Logical Foundations of Theories in Physics
16.1 Introduction
16.2 Hilbert’s Axiomatic Approach to the Sciences
16.3 The Programme of the Andréka-Németi Group and Its Close Correspondence to Hilbert’s Dynamic Methodology
16.4 Conclusion: The Hilbert-Andréka-Németi View of the Unity of Science
References
17 General Relativity as a Collection of Collections of Models
17.1 Introduction
17.2 Preliminaries
17.3 Possibility
17.4 Inextendibility
17.5 Singularities
17.6 Conclusion
References
18 Why Not Categorical Equivalence?
18.1 Introduction
18.2 Categorically Equivalent Theories
18.3 Interlude: Concerns I Will Not Pursue
18.4 Category Structure and Ideology
18.5 The `G' Property
18.6 Where Do We Go from Here?
18.7 Conclusion
References
19 Time Travelling in Emergent Spacetime
19.1 Introduction
19.2 Global Hyperbolicity and Energy Conditions
19.2.1 At the Classical Level
19.3 Theories of Quantum Gravity
19.3.1 Semi-classical Quantum Gravity
19.3.2 Causal Set Theory
19.3.3 Loop Quantum Gravity
19.3.4 String Theory
19.4 Emergent Time Travel?
19.4.1 Causal Set Theory as a Cosmological Theory
19.4.2 Loop Quantum Gravity as a Cosmological Theory
19.4.3 Quantum Gravity as `Astrophysics'
19.5 Conclusions
References
Appendix A From Computing to Relativity Theory Through Algebraic Logic: A Joint Scientific Autobiography
Large-Scale AI Program for the Hungarian Power System (c. 1966–1970)
Software Department, Theorem Prover, General Logic, Universal Algebra (c. 1970–1976)
Categorical Injectivity Logic, Partial Algebras (c. 1976–1983)
Nonstandard-Time Semantics for Dynamic Logic of Programs (c. 1978–1986)
Algebraic Logic, Tarski's School, Cylindric and Relation Algebras (1971– )
Logic Graduate School, the Amsterdam–Budapest–London Triangle (c. 1991–1998)
Relativity Theory, Relativistic Computing, Methodology of Science (c. 1998– )
Appendix B Joint Annotated Bibliography of Hajnal Andréka and István Németi
Books
Books Edited
Dissertations
Publications (Articles, Book Chapters, Other)
Recommend Papers

Hajnal Andréka and István Németi on Unity of Science: From Computing to Relativity Theory Through Algebraic Logic (Outstanding Contributions to Logic, 19)
 3030641864, 9783030641863

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Outstanding Contributions to Logic 19

Judit Madarász Gergely Székely Editors

Hajnal Andréka and István Németi on Unity of Science From Computing to Relativity Theory Through Algebraic Logic

Outstanding Contributions to Logic Volume 19

Editor-in-Chief Sven Ove Hansson, Division of Philosophy, KTH Royal Institute of Technology, Stockholm, Sweden

Outstanding Contributions to Logic puts focus on important advances in modern logical research. Each volume is devoted to a major contribution by an eminent logician. The series will cover contributions to logic broadly conceived, including philosophical and mathematical logic, logic in computer science, and the application of logic in linguistics, economics, psychology, and other specialized areas of study. A typical volume of Outstanding Contributions to Logic contains: • A short scientific autobiography by the logician to whom the volume is devoted • The volume editor’s introduction. This is a survey that puts the logician’s contributions in context, discusses its importance and shows how it connects with related work by other scholars • The main part of the book will consist of a series of chapters by different scholars that analyze, develop or constructively criticize the logician’s work • Response to the comments, by the logician to whom the volume is devoted • A bibliography of the logician’s publications Outstanding Contributions to Logic is published by Springer as part of the Studia Logica Library. This book series, is also a sister series to Trends in Logic and Logic in Asia: Studia Logica Library. All books are published simultaneously in print and online. This book series is indexed in SCOPUS. Proposals for new volumes are welcome. They should be sent to the editor-in-chief [email protected]

More information about this series at http://www.springer.com/series/10033

Judit Madarász · Gergely Székely Editors

Hajnal Andréka and István Németi on Unity of Science From Computing to Relativity Theory Through Algebraic Logic

Editors Judit Madarász Alfréd Rényi Institute of Mathematics Budapest, Hungary

Gergely Székely Alfréd Rényi Institute of Mathematics Budapest, Hungary University of Public Service Budapest, Hungary

ISSN 2211-2758 ISSN 2211-2766 (electronic) Outstanding Contributions to Logic ISBN 978-3-030-64186-3 ISBN 978-3-030-64187-0 (eBook) https://doi.org/10.1007/978-3-030-64187-0 © Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Cover credit: The T-shirt of István Németi on the cover photo was designed by Csilla and Péter Németi. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

This volume has been prepared in appreciation of the truly outstanding contributions of Hajnal Andréka and István Németi to logic. One who does not know them or their joint careers and working style may wonder why this is a joint volume, but for anyone who knows them well or who reads in their joint autobiography how their careers have been closely intertwined since at least as far back as 1971, it should be clear that having one volume is a natural choice. Not only do they write most of their publications jointly, but they also give conference talks and university lectures together. Their talks are like a kind of dance. They frequently switch roles spontaneously while speaking, even during the course of a single talk, flawlessly and smoothly complementing each other’s trains of thought. They are almost always together. One rarely sees one of them without the other. Somebody once said about them that “they are like a single soul in two bodies” and we have to admit it is quite an accurate description! We are both collaborators and former students of Hajnal and István. However, they are more than mere teachers and collaborators for us. They took us under their wings as undergraduate students and continued advising and supervising us during graduate school and even as postgraduates. They treated us, just like all of their students, like family members. They often invited us to their home to work and discuss ideas sitting at the round table in their living room. These were round-table discussions in every sense, in which everyone was treated equally and ideas were welcomed from every participant. For us, they are also mentors and good friends: preparing this volume has been a great honor for us. We are truly grateful to them for their continuous support and the so many seeds of insight they have planted in our minds. For Hajnal and István, looking for deeper understandings and connecting different research areas has always played a central role. Therefore, we have invited researchers working in various areas related to logic to write chapters of this book representing the research-area-integrating and insight-seeking spirit of Hajnal and István’s work. This volume contains 19 invited chapters organized into 3 parts: Computing, Algebraic Logic, and Relativity Theory, corresponding, in chronological order, to the 3 main areas of research where Hajnal and István have contributed the most. In the autobiography of Hajnal and István at the end of this volume, one can find how these areas are interconnected in their research. v

vi

Preface

We are grateful to the series editor Sven Ove Hansson and the whole Editorial Board of this series for the opportunity to publish this volume in the series Outstanding Contributions to Logic. We are also grateful to Ram Prasad Chandrasekar and Christi Lue for their help and patience. Budapest, Hungary June 2020

Judit Madarász Gergely Székely

Contents

Part I

Computing

1

Algebraic Logic and Knowledge Bases . . . . . . . . . . . . . . . . . . . . . . . . . . Elena Aladova, Boris Plotkin, and Tatjana Plotkin

3

2

Guarded Ontology-Mediated Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . Pablo Barceló, Gerald Berger, Georg Gottlob, and Andreas Pieris

27

3

Semiring Provenance for Guarded Logics . . . . . . . . . . . . . . . . . . . . . . . Katrin M. Dannert and Erich Grädel

53

4

Implicit Partiality of Signature Morphisms in Institution Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R˘azvan Diaconescu

81

5

The Four Essential Aristotelian Syllogisms, via Substitution and Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Vaughan R. Pratt

6

Adding Guarded Constructions to the Syllogistic . . . . . . . . . . . . . . . . . 139 Ian Pratt-Hartmann

7

The Significance of Relativistic Computation for the Philosophy of Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Krzysztof Wójtowicz

Part II

Algebraic Logic

8

Generalized Quantifiers Meet Modal Neighborhood Semantics . . . . 187 Johan van Benthem and Dag Westerståhl

9

On the Semilattice of Modal Operators and Decompositions of the Discriminator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Ivo Düntsch, Wojciech Dzik, and Ewa Orłowska

vii

viii

Contents

10 Modal Logics that Bound the Circumference of Transitive Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Robert Goldblatt 11 Undecidability of Algebras of Binary Relations . . . . . . . . . . . . . . . . . . 267 Robin Hirsch, Ian Hodkinson, and Marcel Jackson 12 On the Representation of Boolean Magmas and Boolean Semilattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 Peter Jipsen, M. Eyad Kurd-Misto, and James Wimberley 13 Canonical Relativized Cylindric Set Algebras and Weak Associativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 Roger D. Maddux 14 Blow Up and Blur Constructions in Algebraic Logic . . . . . . . . . . . . . . 347 Tarek Sayed Ahmed Part III Relativity Theory 15 Freeing Structural Realism from Model Theory . . . . . . . . . . . . . . . . . . 363 Neil Dewar 16 In the Footsteps of Hilbert: The Andréka-Németi Group’s Logical Foundations of Theories in Physics . . . . . . . . . . . . . . . . . . . . . . 383 Giambattista Formica and Michèle Friend 17 General Relativity as a Collection of Collections of Models . . . . . . . . 409 JB Manchak 18 Why Not Categorical Equivalence? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427 James Owen Weatherall 19 Time Travelling in Emergent Spacetime . . . . . . . . . . . . . . . . . . . . . . . . . 453 Christian Wüthrich Appendix A: From Computing to Relativity Theory Through Algebraic Logic: A Joint Scientific Autobiography . . . . . . . 475 Appendix B: Joint Annotated Bibliography of Hajnal Andréka and István Németi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499

Contributors

Elena Aladova Department of Mathematics, Federal University of Rio Grande do Norte, University Campus Lagoa Nova, CEP Natal/RN, Brazil Pablo Barceló Institute for Mathematical and Computational Engineering, Pontificia Universidad Católica de Chile & IMFD Chile, Santiago, Chile Johan van Benthem Institute for Logic, Language and Computation, University of Amsterdam, Amsterdam, The Netherlands; Department of Philosophy, Stanford University, Stanford, CA, USA; School of Humanties, Tsinghua University, Beijing, China Gerald Berger Institute of Logic and Computation, TU Wien, Vienna, Austria Katrin M. Dannert Mathematische Grundlagen der Informatik, RWTH Aachen University, Aachen, Germany Neil Dewar Munich Center for Mathematical Philosophy, Ludwig-MaximiliansUniversität München, Munich, Germany R˘azvan Diaconescu Simion Stoilow Institute of Mathematics of the Romanian Academy, Bucure¸sti, Romania Ivo Düntsch School of Mathematics and Computer Science, Fujian Normal University, Fuzhou, Fujian, China; Department of Computer Science, Brock University, St Catharines, ON, Canada Wojciech Dzik Institute of Mathematics, University of Silesia, Katowice, Poland M. Eyad Kurd-Misto Chapman University, Orange, CA, USA Giambattista Formica Pontifical Urbaniana University, Vatican City, Italy Michèle Friend UCCS, Univ. Lille Nord-Europe, CNRS, Centrale Lille, Univ. Artois, Lille, France; George Washington University, Washington, D.C., USA Robert Goldblatt School of Mathematics and Statistics, Victoria University of Wellington, Wellington, New Zealand ix

x

Contributors

Georg Gottlob Institute of Logic and Computation, TU Wien, Vienna, Austria; Department of Computer Science, University of Oxford, Oxford, UK Erich Grädel Mathematische Grundlagen der Informatik, RWTH Aachen University, Aachen, Germany Robin Hirsch Department of Computer Science, University College London, London, UK Ian Hodkinson Department of Computing, Imperial College London, London, UK Marcel Jackson Department of Mathematics and Statistics, La Trobe University, Bundoora, VIC, Australia Peter Jipsen Chapman University, Orange, CA, USA Roger D. Maddux Department of Mathematics, Iowa State University, Ames, IA, USA JB Manchak Department of Logic and Philosophy of Science, University of California, Irvine, CA, USA Ewa Orłowska National Institute of Telecommunications, Warszawa, Poland Andreas Pieris School of Informatics, University of Edinburgh, Edinburgh, UK Boris Plotkin Department of Mathematics, Hebrew University of Jerusalem, Jerusalem, Israel Tatjana Plotkin Department of Computer Science, Bar Ilan University, Ramat Gan, Israel Vaughan R. Pratt Computer Science Department, Stanford University, Stanford, CA, USA Ian Pratt-Hartmann Department of Computer Science, University of Manchester, Manchester, UK; Instytut Informatyki, Uniwersytet Opolski, Opole, Poland Tarek Sayed Ahmed Department of Mathematics, Cairo University, Cairo, Egypt James Owen Weatherall Department of Logic and Philosophy of Science, University of California, Irvine, CA, USA Dag Westerståhl Department of Philosophy, Stockholm University, Stockholm, Sweden; School of Humanties, Tsinghua University, Beijing, China James Wimberley Chapman University, Orange, CA, USA Krzysztof Wójtowicz Faculty of Philosophy, Warsaw University, Warsaw, Poland Christian Wüthrich Department of Philosophy, University of Geneva, Geneva, Switzerland

Part I

Computing

Chapter 1

Algebraic Logic and Knowledge Bases Elena Aladova, Boris Plotkin, and Tatjana Plotkin

Abstract Knowledge bases theory provides an important example of the field where applications of universal algebra and algebraic logic look very natural, and their interaction with practical problems arising in computer science might be very productive. In this paper we study the equivalence problem for knowledge bases. Our interest is to find out how the informational equivalence is related to the logical description of knowledge. The main objectives of this paper are logically-geometrically equivalent and LG-isotypic knowledge bases. We will see that these notions give us a good characterization of knowledge bases. Keywords Algebraic logic · Knowledge base · Halmos algebras · Galois correspondence · Syntax · Semantics · Equivalence of models · Equivalence of knowledge bases · Category · Ultrafilter

1.1 Introduction Knowledge bases and databases theories provide an important example of the field where applications of universal algebra and algebraic logic look very natural. Historically, theoretical research in databases started at the beginning of sixties. The classical work of Codd [7] laid foundations of relational databases. Since then,advances in To Hajnal Andréka and István Németi with admiration. E. Aladova Department of Mathematics, Federal University of Rio Grande do Norte, University Campus Lagoa Nova, CEP Natal/RN 59078-970, Brazil e-mail: [email protected] B. Plotkin Department of Mathematics, Hebrew University of Jerusalem, 91904 Jerusalem, Israel e-mail: [email protected] T. Plotkin (B) Department of Computer Science, Bar Ilan University, 5290002 Ramat Gan, Israel e-mail: [email protected] © Springer Nature Switzerland AG 2021 J. Madarász and G. Székely (eds.), Hajnal Andréka and István Németi on Unity of Science, Outstanding Contributions to Logic 19, https://doi.org/10.1007/978-3-030-64187-0_1

3

4

E. Aladova et al.

database theory are tightly related to mathematical logic, theory of algorithms and general algebra. Plotkin in [16] proposed a mathematical model of a database and gave a formal definition of the databases equivalence concept. As a result, the algebraic model of a knowledge base has been introduced and developed, see [16, 17, 19]. We should emphasize that the model of a knowledge base under consideration involves various ideas of universal algebra, algebraic logic and algebraic geometry. The main peculiarity of this approach is that a database and a knowledge base are treated as certain algebraic and model-theoretical structures. There are a lot of specialized knowledge bases and it is desirable to determine their characteristic properties without referring to their complicated structure and without studying their detailed architecture. In this concern, our aim is to distinguish logical invariants of knowledge bases which rigidly determine them. The selection of logical characterizations of a knowledge base is motivated by the knowledge base model in question. We consider queries to a knowledge base as sentences in some logical language, and in order to operate with them properly one should attract logical tools. Moreover, we convert logic into algebra. This converting allows us to study syntactical logical notions and questions by using more transparent algebraic and geometric structures, i.e., via semantics. Our main tool is Halmos algebras which were presented under the name of polyadic algebras in [9], and introduced in [16] for the multi-sorted case and an arbitrary variety of algebras. These generalizations are important for the logical geometry and databases applications. In plain words, Halmos algebras are intended to replace the logical system by an algebraic system, equivalent to the original logic. This way of thinking is wellknown for Boolean algebras: to think about propositional sentences as elements in a Boolean algebra is a folklore. The principal goal of algebraization of a first order logic is to do essentially the same for the logic enriched by quantifiers and predicates. Thus, the appropriate (in the sense of algebraization of a first order logic) algebra should be a Boolean algebra, equipped with quantifiers (defined in an abstract way as operations) and operations of replacement of variables. This requires additional axioms and the resulting algebraic object is called a Halmos algebra. Note here, that there are other algebraizations of the first order logic, see, for example, [5, 6, 8, 10]. A survey of algebraizations of quantifier logics is [15]. For some reason we have chosen Halmos algebras as a tool, appropriate for our goals. In this paper we discuss equivalence problem for knowledge bases. We focus our attention on a special kind of equivalence, namely, informational equivalence. Informally, one can say that two knowledge bases are informationally equivalent if and only if all information that can be retrieved from one knowledge base can be also obtained from the other one and vice versa. Our main goal is to find out how the informational equivalence is related to the logical description of a knowledge base.

1 Algebraic Logic and Knowledge Bases

5

1.2 Preliminaries In this Section we present the main apparatus, which is needed further. We give a brief review of basic notions and notations from universal algebraic geometry. In particular, we define Halmos algebras and Halmos categories, and construct the Galois correspondence which plays an important role in all considerations. The material of this section can be found in [17, 19], see also [2, 20, 21].

1.2.1 Basic Notions and Notations Let X 0 = {x1 , . . . , xn , . . . } be an infinite set of variables. Denote by Γ the collection of all finite subsets X = {xi1 , . . . , xik } of X 0 . Let Θ be a variety of algebras, that is, a class of algebras satisfying a set of identities. We denote by V ar (H ) the variety generated by the algebra H . Denote by W (X ) the free algebra in the variety Θ with free generating set X , X ∈ Γ . All free algebras W (X ) ∈ Θ form a category of free algebras Θ 0 with homomorphisms s : W (X ) → W (Y ) as morphisms, X, Y ∈ Γ . By a model H we mean a triple (H, Ψ, f ), where H is an algebra from Θ, Ψ is a set of relation symbols ϕ, f is an interpretation of all ϕ in H (see, for instance, [13, 16]). Let H be an algebra in Θ and X = {x1 , . . . , xn }. A point (a1 , . . . , an ) from n-th Cartesian power of H can be represented as a map μ : X → H such that ai = μ(xi ). This map can be extended up to homomorphism of algebras μ : W (X ) → H . Thus, the point (a1 , . . . , an ) can be also viewed as a homomorphism μ : W (X ) → H . Denote by H om(W (X ), H ) the set of all homomorphisms from W (X ) to H . We will regard H om(W (X ), H ) as an affine space. All affine spaces H om(W (X ), H ) with various X ∈ Γ constitute the category Θ 0 (H ) of affine spaces with morphisms  s : H om(W (X ), H ) → H om(W (Y ), H ), for each homomorphism of free algebras s : W (Y ) → W (X ). The map  s is defined as  s(μ) = μs, where μ : W (X ) → H ,  s(μ) : W (Y ) → H . Categories Θ 0 and Θ 0 (H ) are very important in what follows. Moreover, it is known [14] that the categories Θ 0 and Θ 0 (H ) are contravariant isomorphic if and only if V ar (H ) = Θ.

1.2.2 Halmos Categories and Halmos Algebras Halmos categories were introduced in [17, 19]. They are related to the first order logic in a way analogous to the relationship between Boolean algebras and propositional logic. Such an approach allows us to use techniques and structures of algebraic logic (see [9, 16]). The immediate advantage of this phenomenon is that we can view

6

E. Aladova et al.

queries to a knowledge base and replies to these queries as objects of the same nature, i.e., objects of Halmos categories. Then the transition query-reply can be treated as a functor (for details see Sect. 1.3).

1.2.2.1

Extended Boolean Algebras

We start from the notion of an existential quantifier on a Boolean algebra. Let B be a Boolean algebra. Existential quantifier on B is an unary operation ∃ : B → B such that the following conditions hold: 1. ∃ 0 = 0, 2. a ≤ ∃a, 3. ∃(a ∧ ∃b) = ∃a ∧ ∃b. Universal quantifier ∀ : B → B is dual to ∃ : B → B, they are related by ∀a = ¬(∃(¬a)). Let Θ be an arbitrary variety of algebras and W (X ) the free algebra in Θ with the free generating set X (i.e., Θ can be variety of groups, of associative algebras, etc.). Definition 1 Let a set of variables X = {x1 , . . . , xn } and a set of relations Ψ be given. A Boolean algebra B is called an extended Boolean algebra over W (X ) relative to Ψ , if 1. the existential quantifier ∃x is defined on B for all x ∈ X , and ∃x∃y = ∃y∃x for all x, y ∈ X ; 2. to every relation symbol ϕ ∈ Ψ of arity n ϕ and a collection of elements w1 , . . . , wn ϕ from W (X ) there corresponds a nullary operation (a constant) of the form ϕ(w1 , . . . , wn ϕ ) in B. Thus, the signature L X of an extended Boolean algebra consists of the Boolean connectives, existential quantifiers ∃x and constants ϕ(w1 , . . . , wn ϕ ): L X = {∨, ∧, ¬, ∃x, M X }, where M X is a set of all ϕ(w1 , . . . , wn ϕ ). We provide two important examples of extended Boolean algebras. Example 1 Let a model H = (H, Ψ, f ) be given. Let Bool(W (X ), H ) be the Boolean algebra of all subsets in H om(W (X ), H ). One can equip the Boolean algebra Bool(W (X ), H ) with the structure of an extended Boolean algebra (cf. [16]). First, we define quantifiers ∃x, x ∈ X , on Bool(W (X ), H ). We set μ ∈ ∃x A if and only if there exists ν ∈ A such that μ(y) = ν(y) for every y ∈ X , y = x. It can be checked that ∃x defined in such a way is, indeed, an existential quantifier. According to Definition 1, given a relation symbol ϕ ∈ Ψ of arity m and a collection of elements w1 , . . . , wm from W (X ) we shall define a constant in

1 Algebraic Logic and Knowledge Bases

7

Bool(W (X ), H ). Denote this constant by [ϕ(w1 , . . . , wm )]H and define it as a subset in Bool(W (X ), H ) consisting of all points μ : W (X ) → H satisfying relation ϕ(w1 , . . . , wm ). Thus, the Boolean algebra Bool(W (X ), H ) is equipped with the structure of an extended Boolean algebra. We denote this extended Boolean algebra by H alΘX (H ). Example 2 Another important example of an extended Boolean algebra is presented by the algebra of formulas Φ0 (X ) = L X /τ X , where L X is the absolutely free algebra in the signature L X over the set M X , τ X is a congruence relation (Lindenbaum–Tarski congruence) on L X defined by the rule: uτ X v if and only if (u → v) ∧ (v → u), u, v ∈ L X . Boolean operations and quantifiers on Φ0 (X ) are naturally inherited from L X . For more details see [1, 2, 17].

1.2.2.2

Halmos Categories

Now we define a Halmos category which plays a crucial role in our approach (cf. [16, 17, 20]). Definition 2 A category H is a Halmos category if: 1. Its every object has the form H(X ), where H(X ) is an extended Boolean algebra over W (X ). 2. The morphisms in H correspond to morphisms in the category Θ 0 . To every morphism s : W (X ) → W (Y ) in Θ 0 it corresponds a morphism s∗ : H(X ) → H(Y ) in H such that a. the transitions W (X ) → H(X ) and s → s∗ determine a covariant functor from Θ 0 to H. b. s∗ : H(X ) → H(Y ) is a homomorphism of corresponding Boolean algebras. 3. Let homomorphisms s, s 1 , s 2 : W (X ) → W (Y ) and corresponding morphisms s∗ , s∗1 , s∗2 : H(X ) → H(Y ) be given. Let a ∈ H(X ). The identities controlling the interaction of morphisms with quantifiers are as follows: 3.1. s∗1 ∃xa = s∗2 ∃xa, if s1 (x ) = s2 (x ) for every x ∈ X such that x = x; 3.2. s∗ ∃xa = ∃(s(x))(s∗ a), if s(x) = y and y is a variable which does not belong to the support of s(x ), for every x ∈ X such that x = x. For the definition of support see [16]. In plain words, this condition means that y does not participate in the shortest expression of the element s(x ) ∈ W (Y ) through the elements of Y . 4. Let ϕ be a relation symbol from Ψ of arity n ϕ , w1 , . . . , wn ϕ be a collection of elements from W (X ) and ϕ(w1 , . . . , wn ϕ ) be the corresponding constant in H(X ). The identities controlling the interaction of morphisms with constants are as follows: s∗ (ϕ(w1 , . . . , wn ϕ )) = ϕ(s(w1 ), . . . , s(wn ϕ )),

8

E. Aladova et al.

where s : W (X ) → W (Y ) is a homomorphism of free algebras and s∗ : H(X ) → H(Y ) is the corresponding morphisms. Let us give some explanations concerning Definition 2. Axioms (3.1) and (3.2) which look strange are not so complicated as they seem at first glance. They accumulate usual and intuitively clear relations between quantifications and transformations of variables in first order logic calculus. For example, axiom (3.1) says that homomorphisms s 1 and s 2 are identity maps on X \ {x}. This corresponds to the familiar fact that it is possible to replace a quantified variable in a formula by another one. Axiom (3.2) says that if the replaced variable y = s(x) does not participate in the support of s(x ) ∈ W (Y ) for all x ∈ X different from x, then it is possible to quantify y. Suppose that the set of relation symbols Ψ contains the symbol of equality predicate “≡". Let B be an extended Boolean algebra over W (X ) with respect to Ψ . One can define an equality on B as a binary predicate ≡: W (X ) × W (X ) → B, which takes a pair w, w ∈ W (X ) to the constant w ≡ w in B subject to conditions:     1. w1 ≡ w1 ≤ w1 ≡ w1 ; 2. w ≡ w is the unit B;   of thealgebra  3. w1 ≡ w2 ∧ w2 ≡ w3 ≤ w1 ≡ w3 ; 4. For every n-ary operation ω ∈ Ω, where Ω is a signature of the variety Θ, we have     (w1 ≡ w1 ) ∧ . . . ∧ (wn ≡ wn ) ≤ ω(w1 , . . . , wn ) ≡ ω(w1 , . . . , wn ) . Thus, the equality on the extended Boolean algebra B is symmetric, reflexive and transitive predicate which respects all operations on W (X ). We can write the identities controlling the interaction of morphisms with the constant of the form w ≡ w , w, w ∈ W (X ), as follows:   (a)  s∗ (w ≡ w ) = s(w)≡ s(w ) ; (b) (swx )∗ a ∧ (w ≡ w ) ≤ (swx )∗ a, where a ∈ H and swx : W (X ) → W (X ) is defined by swx (x) = w and swx (x ) = x for x = x. Item (a) is exactly condition (4) from Definition 2, item (b) follows from (4) and properties of the equality predicate “≡”. In such a way we can add additional axioms to the item (4) in Definition 2, if we know specific properties of predicates on B corresponding to relation symbols from Ψ .

1 Algebraic Logic and Knowledge Bases

1.2.2.3

9

Halmos Algebras

It is known that categories and multi-sorted algebras are tightly connected. In this section we present the Halmos category as a multi-sorted Halmos algebra. For some reasons such a representation is very useful, for instance, in this case we can speak about congruences on algebras. Information about multi-sorted algebras can be found in [12, 16]. In Sect. 1.2.2.1 we have already defined the signature L X of an extended Boolean algebra: L X = {∨, ∧, ¬, ∃x, M X }, where M X is a set of all ϕ(w1 , . . . , wn ϕ ), ϕ ∈ Ψ , w1 , . . . , wn ϕ ∈ W (X ), X ∈ Γ . We extend L X by the symbol s∗ and denote the new signature by L Θ . Thus, L Θ = {∨, ∧, ¬, ∃x, M X , s∗ }. In the signature L Θ the symbol s∗ is reserved for multi-sorted operations of the type (X ; Y ), where X, Y run through Γ . For the definition of a type of a multi-sorted operation see [12, 16]. Definition 3 We call an algebra H = (H(X ), X ∈ Γ ) in the signature L Θ a Halmos algebra, if 1. Every domain H(X ) is an extended Boolean algebra in the signature L X . 2. To every homomorphism of free algebras s : W (X ) → W (Y ) it corresponds the operation s∗ : H(X ) → H(Y ) of the type (X ; Y ) in such a way that 2.1 the map s∗ : H(X ) → H(Y ) is a homomorphism of the corresponding Boolean algebras; 2.2 for a given s 1 : W (X ) → W (Y ), s 2 : W (Y ) → W (Z ) and a ∈ H(X ) we have s∗2 (s∗1 (a)) = (s 2 s 1 )∗ (a). 3. The identities controlling the interaction of the operations s∗ , s∗1 , s∗2 : H(X ) → H(Y ) with quantifiers and constants repeat the ones from Definition 2, items (3) and (4).

1.2.2.4

Examples of Halmos Categories

The next two examples of Halmos categories are connected with the examples of extended Boolean algebras from Sect. 1.2.2.1. Example 3 Category H alΘ (H ). Objects of this category are extended Boolean algebras H alΘ (H ) from Example 1 for various X ∈ Γ . Morphisms Y (H ) are defined as follows: s∗ A = {μ : W (X ) → H | s∗ : H alΘX (H ) → H alΘ μs ∈ A}, where A ⊂ H om(W (X ), H ), s : W (X ) → W (Y ).

10

E. Aladova et al.

Remark 1 A homomorphism s : W (X ) → W (Y ) gives rise to a map  s : H om (W (Y ), H ) → H om(W (X ), H ) by the rule s(μ) = μs, which is a morphism in the category of affine spaces Θ 0 (H ) (see Sect. 1.2.1). If A is a subset of H om(W (X ), H ), then s∗ A is the full pre-image of A under  s.  We define Halmos category Φ  in terms of multi-sorted Example 4 Category Φ. algebras, which is more convenient. Consider once again the signature L Θ = {∨, ∧, ¬, ∃x, M X , s∗ }, x ∈ X, X ∈ Γ. Denote by M = (M X , X ∈ Γ ) the multi-sorted set with the domains M X , where M X is the set of all ϕ(w1 , . . . , wn ϕ ), ϕ ∈ Ψ , w1 , . . . , wn ϕ ∈ W (X ). Each formula ϕ(w1 , . . . , wn ϕ ) ∈ M X is a formula of the length zero and of the sort X . Let u be a formula of the length n and the sort X , and let x ∈ X . Then the formulas ¬u and ∃xu are the formulas of the same sort X and the length (n + 1). Let now u 1 and u 2 be formulas of the same sort X and the length n 1 and n 2 accordingly. Then the formulas u 1 ∨ u 2 and u 1 ∧ u 2 have the length (n 1 + n 2 + 1) and the sort X . For the given s : W (X ) → W (Y ) the formula s∗ u is a formula of the length (n + 1) and of the sort Y . By induction, one can define lengths and sorts of arbitrary formulas. Let L X be the set of all formulas of the sort X constructed in such a way. Denote by L = (L X , X ∈ Γ ) the multi-sorted algebra in the signature L Θ with domains L X . By construction, the algebra L is the absolutely free algebra of formulas over multi-sorted set M = (M X , X ∈ Γ ). Define on L the multi-sorted Lindenbaum–Tarski congruence τ = (τ X , X ∈ Γ ) which works on components τ X as follows: uτ X v if and only if (u → v) ∧ (v → u), where u, v ∈ L X . This means that two formulas u and v of the same sort are claimed to be equivalent if each of them is derivable from the other in the theory of multi-sorted Halmos algebras.  as a quotient algebra of the absoWe define the Halmos algebra of formulas Φ lutely free algebra of formulas L by the Lindenbaum–Tarski congruence τ , that  = L/τ. It can be written as Φ  = (Φ(X ), X ∈ Γ ), where Φ(X ) = L X /τ X . By is, Φ  is an extended Definition 3 each component Φ(X ) of multi-sorted Halmos algebra Φ Boolean algebra of the sort X in the signature L X .  with objects of the form Φ(X ) Thus, simultaneously, we define the category Φ and morphisms s∗ : Φ(X ) → Φ(Y ), X, Y ∈ Γ . The next remark is connected with Remark 1. Remark 2 Let a homomorphism s : W (X ) → W (Y ) be given. In parallel to the map  s : H om(W (Y ), H ) → H om(W (X ), H ), we define a map from the set of subsets in Φ(Y ) to the set of subsets in Φ(X ). We denote it by the same symbol s and define

1 Algebraic Logic and Knowledge Bases

11

as sT = {u ∈ Φ(X ) | s∗ u ∈ T }, where T ⊂ Φ(Y ). Then sT is the full pre-image of T under s∗ .  and H alΘ (H ) are tightly connected via homomorphism Halmos categories Φ of extended Boolean algebras X : Φ(X ) → Bool(W (X ), H ). V alH X is Intuitively, the image of a formula u ∈ Φ(X ) under the homomorphism V alH X a value of u in the algebra H , i.e. V alH u is the set of points in H om(W (X ), H ) satisfying u. For details see [17, 20].

1.2.3 The Galois Correspondence In this section we introduce a correspondence between sets of formulas in the algebra Φ(X ) and subsets of points from the affine space H om(W (X ), H ). In this concern we define the notion of a logical kernel of a point. This notion is closely related to the notion of a type from model theory [13]. Let μ be a point from the affine space H om(W (X ), H ). Definition 4 The logical kernel L K er (μ) of a point μ is the set of all formulas u ∈ Φ(X ) which hold true on the point μ, that is, X (u)} L K er (μ) = {u ∈ Φ(X ) | μ ∈ V alH

Remark 3 The logical kernel L K er (μ) of a point μ is a Boolean ultrafilter (maximal filter) in the algebra Φ(X ) (see [18]). Let T be a set of formulas from Φ(X ). We define a set of points THL in H om(W (X ), H ) as THL = {μ : W (X ) → H | T ⊂ L K er (μ)}. That is, THL is a set of all points μ ∈ H om(W (X ), H ) satisfying all formulas from T ⊂ Φ(X ). L in Take a set of points A ⊂ H om(W (X, H )) and define a set of formulas AH Φ(X ): L X = {u ∈ Φ(X ) | A ⊂ V alH (u)}. AH L is the set of all formulas u∈Φ(X ) which hold true on all points from A. The set AH The above defined correspondence between sets of formulas and sets of points is the Galois correspondence (see [11]). In the case of the Galois correspondence one can speak about Galois closures. In particular, subsets THL ⊂ H om(W (X ), H ) and L AH ⊂ Φ(X ) are Galois-closed.

12

E. Aladova et al.

We call the subset THL ⊂ H om(W (X ), H ) definable set presented by the set of L is a Boolean filter in the algebra Φ(X ), it is called formulas T . Note that the set AH H -closed filter. The constructed above Galois correspondence gives us a bijection between definable sets in H om(W (X ), H ) and H -closed filters in the extended Boolean algebra Φ(X ). Proposition 1 ([18]) The intersection of H -closed filters is an H -closed filter. The next proposition describes one more property of H -closed filters ([1, 17]). Proposition 2 Let a homomorphism of free algebras s : W (X ) → W (Y ) be given. If T is an H -closed filter in Φ(Y ), then  sT is an H -closed filter in Φ(X ). The following proposition describes the relation between the Galois correspon (see [1, 17]). dence and morphisms in the categories H alΘ (H ) and Φ Proposition 3 Let T be a set of formulas from Φ(X ), A be a set of points in H om(W (X ), H ) and s : W (X ) → W (Y ) be a homomorphism of free algebras. Then L 1. (s∗ T )H = s∗ THL , L L . 2. s∗ AH ⊆ (s∗ A)H

Corollary 1 If A is a definable set in H om(W (X ), H ) then s∗ A is also a definable set. The similar relation takes place between definable sets, H -closed filters and maps  s (see [1, 17]). Proposition 4 Let T be a set of formulas from Φ(Y ), A be a set of points in H om(W (Y ), H ). Let a homomorphism of free algebras s : W (X ) → W (Y ) be given. Then  L L 1.  s(AH )=  sA ,   LH L 2.  s(TH ) ⊆  sT H .

1.3 Knowledge Base Model In this section we will introduce the concept of a knowledge base model.

1.3.1 What Is Knowledge? Let us start with a discussion of what knowledge is. Speaking about knowledge we proceed from its representation in three components: subject area of knowledge, description of knowledge, content of knowledge.

1 Algebraic Logic and Knowledge Bases

13

Subject area of knowledge is presented by a model H = (H, Ψ, f ), where H is an algebra in fixed variety of algebras Θ, Ψ is a set of relation symbols ϕ, f is an interpretation of each symbol ϕ in H . Description of knowledge presents a syntactical component of knowledge. From algebraic viewpoint description of knowledge is a set of formulas T , more precise, it is an H -filter in the algebra of formulas Φ(X ), X = {x1 , . . . , xn }. Content of knowledge is a subset in H n , where H n is the Cartesian power of H . Each content of a knowledge A corresponds to the description of a knowledge T ⊂ Φ(X ), |X | = n. If we regard H n as an affine space then this correspondence can be treated geometrically via Galois correspondence. In order to describe the dynamic nature of a knowledge base two categories and a functor are introduced: the category of knowledge description FΘ (H ), the category of knowledge content DΘ (H ) and the knowledge functor CtH . These categories are defined using the machinery of logical geometry (see [2, 22]). Speaking about logic we presuppose that it has two important components: syntax and semantics. Applying techniques and notions of algebraic logic (see [9, 16], Sect. 1.2.2), we assign the appropriate algebraic structures to syntax and semantics: the category of knowledge description FΘ (H ) and the category of knowledge content DΘ (H ), correspondingly. Note that the paper contains a slightly different approach to the construction of the category of knowledge description FΘ (H ) and of the category of knowledge content DΘ (H ) than in [2, 22]. In particular, it gives us more appropriate connection between syntax and semantics, i.e., between the category of knowledge description FΘ (H ) and the category of knowledge content DΘ (H ). Namely, the new approach gives the dual isomorphism between FΘ (H ) and DΘ (H ) (see [1, 4] and Fig. 1.1). Detailed description of categories FΘ (H ) and DΘ (H ) and the knowledge functor is given in [1]. Let us remind these constructions.

Fig. 1.1 The knowledge functor CtH : dual isomorphism

14

E. Aladova et al.

1.3.2 Category of Knowledge Description FΘ (H ) An object FΘX (H ) of the category FΘ (H ) is the lattice of all H -closed filters in the algebra Φ(X ), X ∈ Γ . Remark 4 We cannot say that the usual set-theoretical union of H -closed filters is an H -closed filter. In order to obtain a lattice of H -closed filters in Φ(X ) we need a new operation LL . T1 ∪T2 = (T1 ∪ T2 )H Then all H -closed filters in Φ(X ) form a lattice with respect to operations ∪ and ∩ (for details see [1, 17, 18]). Let a homomorphism s : W (X ) → W (Y ) and H -closed filters T1 ∈ Φ(X ) and T2 ∈ Φ(Y ) be given. We say that a map [s∗ ] : T1 → T2 is admissible, if s∗ T1 ⊆ T2 . Remind that s∗ is a map between H -closed filters in Φ(X ) and Φ(Y ) induced by  (see Sect. 1.2.2). the corresponding morphism of the category Φ X A morphism between objects FΘ (H ) and FΘY (H ) [s∗ ] : FΘX (H ) → FΘY (H ) is determined if [s∗ ] : T1 → T2 is admissible for every T1 ∈ FΘX (H ). We define a composition of morphisms [s∗1 ] : FΘX (H ) → FΘY (H ) and [s∗2 ] : FΘY (H ) → FΘZ (H ) as follows [s∗2 ] ◦ [s∗1 ] = [s∗2 s∗1 ].

1.3.3 Category of Knowledge Content DΘ (H ) An object DΘX (H ) of the category DΘ (H ) is the lattice of all definable sets in the affine space H om(W (X ), H ), X ∈ Γ . Let a homomorphism s : W (X ) → W (Y ) and definable sets A2 ∈ H om(W (Y ), H ) s] : A2 → A1 is admissible, and A1 ∈ H om(W (X ), H ) be given. We say that a map [ s is a map between definable sets in H om(W (Y ), H ) and if s A2 ⊆ A1 . Remind that H om(W (X ), H ) induced by the corresponding morphism of the category of affine spaces Θ 0 (H ) (see Sect. 1.2.1). Y Y (H ) and DΘX (H ) [ s] : DΘ (H ) → DΘX (H ) is A morphism between objects DΘ Y determined if [ s] : A2 → A1 is admissible for every A2 ∈ DΘ (H ). We define a comY Y (H ) and [s2 ] : DΘ (H ) → DΘX (H ) position of morphisms [s1 ] : FΘZ (H ) → DΘ     2 1 2 1 as follows [s ] ◦ [s ] = [s s ].

1.3.4 The Knowledge Functor C tH The category of knowledge description FΘ (H ) and the category of knowledge content DΘ (H ) are related by the knowledge functor:

1 Algebraic Logic and Knowledge Bases

15

CtH : FΘ (H ) → DΘ (H ), which is defined on objects by CtH (FΘX (H )) = DΘX (H ), and on morphisms by CtH ([s∗ ]) = [ s], where s : W (X ) → W (Y ) is a given homomorphism of free algebras. Moreover, if [s∗ ] : FΘX (H ) → FΘY (H ) is a morphism in FΘ (H ), such that Y s] : DΘ (H ) → DΘX (H ) is a morphism in DΘ (H ) defined [s∗ ] : T1 → T2 , then [ L L . by the rule [ s] : (T2 )H → (T1 )H

1.3.5 Definition of a Knowledge Base Now we are on the point to give a definition of knowledge base model. Let a model H = (H, Ψ, f ) be given.  Definition 5 A knowledge base K B = K B(H, Ψ, f ) is a triple FΘ (H ), DΘ (H ), CtH , where FΘ (H ) is the category of knowledge description, DΘ (H ) is the category of knowledge content, and CtH : FΘ (H ) → DΘ (H ) is the contravariant functor. Remark 5 We will use the term “a knowledge base” instead of a more precise “a knowledge base model”. One can say that defined knowledge base model is a sort of automaton (see [16]), where queries are objects of the category of knowledge descriptions FΘ (H ), replies are objects of the category of knowledge content DΘ (H ). In order such an automaton to be viewed as a knowledge base, a connection with a particular data is supposed. This data is held in the subject area presented by a model H = (H, Ψ, f ). The knowledge functor CtH gives a dynamical passage between queries and replies, namely, between categories FΘ (H ) and DΘ (H ). Moreover, this passage is a oneto-one correspondence. Theorem 1 ([4]) The knowledge functor CtH gives rise to contravariant isomorphism between the category of knowledge description FΘ (H ) and the category of knowledge content DΘ (H ).

1.4 Knowledge Bases Equivalences There are various approaches to specialize the meaning of knowledge bases equivalence. We are interested in a special kind of equivalence called informational equivalence.

16

E. Aladova et al.

1.4.1 Informationally Equivalent Knowledge Bases Fix a variety of algebras Θ, algebras H1 and H2 from Θ and a set of relation symbols Ψ . Let two models H1 = (H1 , Ψ, f 1 ) and H2 = (H2 , Ψ, f 2 ) be given. Take the corresponding knowledge bases K B(H1 ) and K B(H2 ). Definition 6 Knowledge bases K B(H1 ) and K B(H2 ) are called informationally equivalent, if the categories of knowledge description FΘ (H1 ) and FΘ (H2 ) are isomorphic. In view of Theorem 1, the categories of knowledge description FΘ (H1 ) and FΘ (H2 ) are isomorphic if and only if the categories of knowledge content DΘ (H1 ) and DΘ (H2 ) are isomorphic. Thus, one can formulate Definition 6 in terms of isomorphism of the categories of knowledge content. From now on our goal is to recognize the informational equivalence of knowledge bases in the sense of Definition 6. Our main interest is to find out how the informational equivalence is related to the logical description of knowledge bases. In this concern, there were defined logically-geometrical equivalent (LG-equivalent), LGisotypic and logically automorphically equivalent knowledge bases. In this paper we concentrate our attention on LG-equivalent and LG-isotypic knowledge bases. Logically automorphically equivalent knowledge bases are considered in [3].

1.4.2 L G-equivalent and L G-isotypic Knowledge Bases The notion of LG-equivalent (logically-geometrically equivalent) knowledge bases is based on geometrical approach, whereas LG-isotypic knowledge bases are defined using logical tools. But as we will see later these notions give us the same description of knowledge bases (see Theorem 3). Both concepts, LG-equivalence and LG-isotypeness of knowledge bases, are defined via LG-equivalence and LG-isotypeness of subject areas of the given knowledge bases, that is, via LG-equivalence and LG-isotypeness of the corresponding models.

1.4.2.1

L G-equivalence and L G-isotypeness Of Models

Informally, we would like to define logically-geometrical equivalence of models H1 = (H1 , Ψ, f 1 ) and H2 = (H2 , Ψ, f 2 ) in such a way that the corresponding subject area algebras H1 and H2 have equal possibilities with respect to solution of logical formulas from any set of formulas T . This goal gives rise to the following definition. Definition 7 The models H1 = (H1 , Ψ, f 1 ) and H2 = (H2 , Ψ, f 2 ) are called LGequivalent if for any X ∈ Γ and T ⊂ Φ(X ) we have

1 Algebraic Logic and Knowledge Bases

17

THL L1 = THL L2 . In other words, LG-equivalence means that any H1 -closed filter in Φ(X ) is H2 closed for every X ∈ Γ . The concept of LG-isotypic models is defined using model-theoretical tools. Let a language L and a model H be given (for definitions see [13]). Definition 8 A set T of consistent L-sentences u(x1 , . . . , xn ) with free variables from x1 , . . . , xn is called an n-type. Definition 9 An n-type T is complete if for every L-sentence u(x1 , . . . , xn ) with free variables from x1 , . . . , xn either u(x1 , . . . , xn ) or ¬u(x1 , . . . , xn ) is in T . Let a = (a1 , . . . , an ) be a point from H n . Denote by t p H (a) the set of all sentences u(x1 , . . . , xn ) in free variables x1 , . . . , xn such that H |= u(a1 , . . . , an ). One can check that t p H (a) is a complete n-type. We will also say that t p H (a) is a type of the point (a1 , . . . , an ) ∈ H n . Definition 10 A complete n-type T is called realizable in H if there is a point a = (a1 , . . . , an ) ∈ H n such that T = t p H (a). In this case we will say that T is a type of the model H . Remark 6 Later on we will be interested only in complete realizable n-types. We will call an n-type t p H (a) from Definition 10 an X -M T -type (model-theoretical type), where X = {x1 , . . . , xn }. Define now the concept of an LG-type (logically-geometrical type). Remind, that the set of all formulas in variables from X valid on H is called the elementary X -theory of H (see [13, 19]). Denote by T h X (H ) the X -elementary theory of H . Definition 11 Every ultrafilter T in the algebra Φ(X ) containing T h X (H ) is called X -LG-type. Definition 12 An ultrafilter T is called X -LG-type of the model H if there is a point μ : W (X ) → H , such that T = L K er (μ). In the latter case, we will say that the type T is realizable in H (cf. Definition 10). More details about connections between model-theoretical and logically-geometrical types can be found in [20]. We only mention the important result of Zhitomirski [23] which gives a bridge between M T - and LG-types. He showed that model-theoretical types of two points coincide if and only if their logically-geometrical types coincide. Thus, one can deal with M T - or LG-types depending on the needs. X Let SH be a set of all realizable X -LG-types of a model H , that is, X = {L K er μ | μ ∈ H om(W (X ), H )}. SH

18

E. Aladova et al.

Definition 13 The models H1 = (H1 , Ψ, f 1 ) and H2 = (H2 , Ψ, f 2 ) are called LGisotypic if X X = SH SH 1 2 for every X ∈ Γ . The next theorem gives us a relationship between LG-equivalent and LG-isotypic models. Theorem 2 The models (H1 , Ψ, f 1 ) and (H2 , Ψ, f 2 ) are LG-equivalent if and only if they are LG-isotypic. Proof Let models (H1 , Ψ, f 1 ) and (H2 , Ψ, f 2 ) be LG-equivalent. This means that an ultrafilter L K er μ is H1 -closed if and only if it is H2 -closed for every X ∈ Γ and μ ∈ H om(W (X ), H1 ). Moreover, every H2 -closed ulrtafilter is the logical kernel L K er ν for a certain point ν ∈ H om(W (X ), H2 ). Thus, L K er μ = L K er ν X X = SH for every X ∈ Γ . and SH 1 2 Suppose that models (H1 , Ψ, f 1 ) and (H2 , Ψ, f 2 ) are LG-isotypic. This means that for every X ∈ Γ a subset T ∈ Φ(X ) is an H1 -closed ultrafilter if and only if it is an H2 -closed ultrafilter. Since every H1 -closed filter is an intersection of H1 -closed ultrafilters, then it is an intersection of H2 -closed ultrafilters as well. Remind that the intersection of H -closed filters is an H -closed filter (see Proposition 1). Thus, T is an H1 -closed filter if and only if it is an H2 -closed filter. This means that the models (H1 , Ψ, f 1 ) and (H2 , Ψ, f 2 ) are LG-equivalent. The next section deals with LG-equivalent and LG-isotypic knowledge bases.

1.4.2.2

L G-equivalence and L G-isotypeness of Knowledge Bases

As we mentioned, LG-equivalence and LG-isotypeness for knowledge bases are defined via the similar notions for the models. Let two models H1 = (H1 , Ψ, f 1 ), H2 = (H2 , Ψ, f 2 ) and the corresponding knowledge bases K B(H1 ) and K B(H2 ) be given. Definition 14 Knowledge bases K B(H1 ) and K B(H2 ) are called LG-equivalent if the corresponding models H1 and H2 are LG-equivalent. Definition 15 Knowledge bases K B(H1 ) and K B(H2 ) are called LG-isotypic if the corresponding models H1 and H2 are LG-isotypic. The next theorem is a straightforward corollary of Theorem 2. Theorem 3 Knowledge bases K B(H1 ) and K B(H2 ) are LG-equivalent if and only if they are LG-isotypic. 

1 Algebraic Logic and Knowledge Bases

19

1.4.3 L G-Equivalence and Informational Equivalence of Knowledge Bases Now our main goal is to find out how the informational equivalence is related to the logically-geometrical description of knowledge bases. In this section we present results which demonstrate the relationship between LG-equivalent (LG-isotypic) and informationally equivalent knowledge bases. We start from auxiliary results concerning subject areas of LG-equivalent knowledge bases. The next theorem gives the correspondence between definable sets over two LG-equivalent models that will be useful in subsequent considerations. Theorem 4 If the models H1 =(H1 , Ψ, f 1 ) and H2 = (H2 , Ψ, f 2 ) are LG-equivalent, then there is a one-to-one correspondence between X -definable sets over H1 and H2 , for every X ∈ Γ . Proof Fix a set X ∈ Γ . Let A ∈ H om(W (X ), H1 ) be an X -definable set over H1 .  L L is an X -definable set over H2 . We will show that the correspondence Then AH 1 H2  L L τ X : A → AH 1 H2

(1.1)

is a bijection. Let us show that the map τ X is injective. Let A and B be definable sets from H om(W (X ), H1 ). Suppose that 

L AH 1

L H2

 L L = BH , 1 H2

 L L L  L L L then AH and BH are H2 -closed filters in Φ(X ), and 1 H2 1 H2 

L AH 1

L L H2

 L L L = BH . 1 H2

Since models H1 and H2 are LG-equivalent, then  Hence,

L AH 1

L L



H2

 L L L  L L L  L L L = AH , BH 1 H 2 = BH . 1 H1 1 H1

L AH 1

L L H1

 L L L LL LL = BH , AH = BH . 1 H1 1 1

LL LL The sets A and B are definable, so A = AH = BH = B and the map τ X is injective. 1 1 L  L If B is an X -definable set over H2 then the set BH 2 H 1 is X -definable over H1 . Thus, the correspondence between X -definable sets over H1 and H2 is surjective.

Let s : W (X ) → W (Y ) be a homomorphism of free algebras. Denote by [ s]0H i a Y X morphism from DΘ (Hi ) to DΘ (Hi ) in the category DΘ (Hi ), such that

20

E. Aladova et al. LL [ s]0H i : A → ( s A)H i

Y for every A ∈ DΘ (Hi ).

Remark 7 It is more correct to write  sH i A instead of  s A, but we will omit the subscribe Hi in this case. The next two propositions give the connection between correspondence τ X constructed in Theorem 4 and morphisms in categories DΘ (H1 ) and DΘ (H2 ) of lattices of definable sets over H1 and H2 , correspondingly. Proposition 5 If models H1 = (H1 , Ψ, f 1 ) and H2 = (H2 , Ψ, f 2 ) are LG-equivalent, then the following commutative diagram takes place [ s]0H

−−−−→

A ⏐ ⏐ τY  

L AH 1

L H2

1

LL ( s A)H ⏐ 1 ⏐τ X

(1.2)

[ s]0H L  2 L −−−−→ ( s A)H 1 H2

where A is a definable set in H om(W (Y ), H1 ), [ s]0H 1 and [ s]0H 2 are morphisms in categories DΘ (H1 ) and DΘ (H2 ) defined above. Proof To prove the commutativity of this diagram we need to show that L  L [ s]0H 2 (τY (A)) = ( s A)H 1 H2 and

 L L s]0H 1 (A)) = ( s A)H . τ X ([ 1 H2

Indeed, by definitions of [ s]0H 2 and τY we have  L L    L L  L L [ s]0H 2 (τY (A)) = [ =  s (AH 1 )H 2 s]0H 2 (AH ) . 1 H2 H2

    L  L L L L L According to Proposition 4,  sA H = s(AH ), then  s (AH ) =  s H 1 2 H2  L L L  L L . Since models H1 and H2 are LG-equivalent, then AH is an H1 (AH 1 )H 2 1 H2

L L ) L L = AH closed and an H2 -closed filter, simultaneously. This means that (AH 1 H2 1 L        L L L LL L L and  s (AH ) =  s(A ) . Using Proposition 4, we have  s(A ) = H1 H2 H1 H2 1 H2 H2  L L  L 0 L ( s A)H 1 H 2 . Thus, [ s]H 2 (τY (A)) = ( s A)H 1 H 2 . L  0 L Let us prove the equality τ X ([ s]H 1 (A)) = ( s A)H . By definitions of τ X and 1 H2 L     LL LL L L [ s]0H 1 we have τ X ([ s]0H 1 (A)) = τ X ( . Since ( s A)H is s A)H = ( s A) H1 H1 1 1 H2

1 Algebraic Logic and Knowledge Bases

21

   L LL L L L an H1 -closed filter, then ( s A)H = ( s A)H . Thus, τ X ([ s]0H 1 (A)) = ( s A)H . 1 H1 1 1 H2 Proposition is proved. Proposition 6 If models H1 = (H1 , Ψ, f 1 ) and H2 = (H2 , Ψ, f 2 ) are LG-equivalent, then the following commutative diagram takes place [ s]H 1

A1

(1.3)

A2

τY

τX

 L L (A1 )H 1 H2

[ s]H 2



L (A2 )H 1

L H2

where A1 is a definable set in H om(W (Y ), s]H 1 and [ s]H 2 are morphisms H1 ), A2 is a definable set in H om(W (X ), H1 ), [ in categories DΘ (H1 ) and DΘ (H2 ), corresponding to a given homomorphism s : W (X ) → W (Y ) of free algebras. Proof To prove this proposition we need to show that  L   L L L ⊆ (A2 )H ( s A1 ) ⊆ A2 if and only if  s (A1 )H . 1 H2 1 H2 s]H 2 are defined correctly, that is, This condition means that morphisms [ s]H 1 and [  L L [ s]H 1 is admissible for A1 and A2 if and only if [ s]H 2 is admissible for (A1 )H 1 H2 L  L and (A2 )H . Using Proposition 5 we rewrite diagram (1.3) as follows: 1 H2 [ s]H 1

[ s]0H

A1

1

LL ( s A 1 )H 1

τY



L (A1 )H 1



τX

L

[ s]0H

2

H2

A2 τX

 L L ( s A 1 )H 1



H2

[ s]0H

L  L (A2 )H 1 H2

2

Moreover, by Proposition 4, we have 

L ( s A 1 )H 1

L H2

    L L  L L . =  s (A1 )H ⊇  s (A ) 1 H 1 1 H2 H2

22

E. Aladova et al.

Since τ X is a bijection, then ( s A1 ) ⊆ A2 if and only if  L   L L L ⊆ (A2 )H .  s (A1 )H 1 H2 1 H2 Proposition is proved. The following theorem plays a crucial role in solving the problem of the relationship between LG-equivalent (LG-isotypic) and informationally equivalent knowledge bases. Theorem 5 If the models H1 = (H1 , Ψ, f 1 ) and H2 = (H2 , Ψ, f 2 ) are LG-equivalent, then the categories of lattices of definable sets over the given models are isomorphic. Proof Let DΘ (H1 ) and DΘ (H2 ) be categories of lattices of definable sets over models H1 and H2 , correspondingly. To prove the theorem, we will define the functor F : DΘ (H1 ) → DΘ (H2 ), which will provide an isomorphism of categories. For an object DΘX (H1 ) from DΘ (H1 ) we put F (DΘX (H1 )) = DΘX (H2 ). Let s : W (X ) → W (Y ) be a homomorphism of free algebras. Take the corresponding morphism [ s]H 1 in DΘ (H1 ), that is, Y (H1 ) → DΘX (H1 ), [ s]H 1 : DΘ

s A1 ⊆ A2 . such that [ s]H 1 : A1 → A2 , and  We put s]H 2 , F ([ s]H 1 ) = [ Y (H2 ) → DΘX (H2 ) is a morphism in DΘ (H2 ), such that [ s]H 2 : where [ s]H 2 : DΘ s]H 2 is defined correctly, that is, τY A1 → τ X A2 . By Proposition 6, the morphism [  s(τY A1 ) ⊆ τ X A2 . Y (H1 ) and id DΘY (H 2 ) is the identity If id DΘY (H 1 ) is the identity morphism of DΘ Y morphism of DΘ (H2 ), then the following diagram takes place:

A1

id D Y (H Θ

1)

id D Y (H

2)

τY

τY A 1

A1 τY

Θ

τY A 1

Thus, the property F (id DΘX (H 1 ) ) = idF (DΘX (H 1 )) holds.

1 Algebraic Logic and Knowledge Bases

23

Now we will check that s2 ]H 1 ) = F ([ s1 ]H 1 ) ◦ F ([ s2 ]H 1 ), F ([ s1 ]H 1 ◦ [ Y Y (H1 ) → DΘX (H1 ), [ s2 ]H 1 : DΘZ (H1 ) → DΘ (H1 ). where [ s1 ] H 1 : D Θ Z s2 ]H 1 : A3 → A2 , [ s1 ]H 1 : A 2 → Let A3 be a definable set in DΘ (H1 ). Then, [ A1 , and, by the definition of a morphism in DΘ (H ), s2 A3 ⊆ A2 , s1 A2 ⊆ A1 . Further, s2 ]H 1 : A3 → A1 and we have [ s1 ]H 1 ◦ [

s2 ]H 1 ) : τ Z T3 → τ X A1 . F ([ s1 ]H 1 ◦ [

(1.4)

s1 ]H 1 ) : τY A2 → τ X A1 , Hence, From the other hand, F ([ s2 ]H 1 ) : τ Z A3 →τY A2 , F ([ s2 ]H 1 ) : τ Z A 3 → τ X A 1 . F ([ s1 ]H 1 ) ◦ F ([

(1.5)

Comparing Eqs. (1.4) and (1.5) we have s2 ]H 1 ) = F ([ s1 ]H 1 ) ◦ F ([ s2 ]H 1 ), F ([ s1 ]H 1 ◦ [ and F is a contravariant functor. One can define the functor F : DΘ (H2 ) → DΘ (H1 ) as Y Y (H2 )) = DΘ (H1 ), F (DΘ

s]H 2 ) = [ s]H 1 , F ([ Y (H2 ) → DΘX (H2 ) is a morphism in DΘ (H2 ), such that [ s]H 2 : where [ s]H 2 : DΘ Y s]H 1 : DΘ (H1 ) → B1 → B2 , B1 ∈ H om(W (Y ), H2 ), B2 ∈ H om(W (X ), H2 ), and [ s]H 2 : τY−1 B1 → τ X−1 B2 . DΘX (H1 ) is a morphism in DΘ (H1 ), such that [ We omit verification of the fact that the functor F satisfies the properties F F = id DΘ (H 2 ) and F F = id DΘ (H 1 ) , where id DΘ (H 2 ) is the identity functor of DΘ (H2 ), id FΘ (H 2 ) is the identity functor of FΘ (H2 ). It can be done by straightforward calculations. Thus, we can conclude that the categories DΘ (H1 ) and DΘ (H2 ) are isomorphic.

Taking into account Definition 6 and Theorem 3 we formulate the main result of the paper. Theorem 6 LG-equivalent (LG-isotypic) knowledge bases are informationally equivalent. Proof Follows from Theorem 5. We finish the discussion with the following conjecture, which seems quite plausible. We say that a knowledge base K B(H, Ψ, f ) is finite if the algebra H is finite.

24

E. Aladova et al.

Conjecture 1 Two finite knowledge bases are informationally equivalent if and only if they are LG-equivalent (LG-isotypic). Acknowledgements Research of E. Aladova was partially supported by the Israel Science Foundation, grant No. 1623/16.

References 1. Aladova, E. (2018). Syntax versus semantics in knowledge bases I. International Journal of Algebra and Computation, 28(8), 1385–1402. 2. Aladova, E., Plotkin, E., & Plotkin, T. (2013). Isotypeness of models and knowledge bases equivalence. Mathematics in Computer Science, 7(4), 421–438. 3. Aladova, E., & Plotkin, T. (2021). Logically automorphically equivalent knowledge bases models. Journal of Algebra and Its Applications, 22 (paper no.2150148). 4. Aladova, E., & Plotkin, T. (2019). Syntax versus semantics in knowledge bases II. Contemporary Mathematics AMS, 726, 87–98. 5. Andréka, H., Németi, I., & Sain, I. (2001). Algebraic logic. In Handbook of philosophical logic (2nd ed., Vol. II, pp. 133–247). Kluwer Academic Publishers. 6. Blok, W., & Pigozzi, D. (1989). Algebraizable logics. Memoirs of the AMS, 77(396). 7. Codd, E. F. (1970). A relational model of data for large shared data banks. Communication ACM, 13(6), 377–387. 8. Font, J. M., Jansana, R., & Pigozzi, D. (2003). A survey of abstract algebraic logic. Abstract algebraic logic, Part II (Barcelona, 1997). Studia Logica, 74(1–2), 13–97. 9. Halmos, P. R. (1962). Algebraic logic. New York: Chelsea Publishing Co. 10. Henkin, L., Monk, J. D., & Tarski, A. (1985). Cylindric algebras. North-Holland Publ. Co., 1971. 11. Mac Lane, S. (1971). Categories for the working mathematician, graduate texts in mathematics (Vol. 5). New York-Berlin: Springer. 12. Malcev, A. I. (1973). Algebraic systems. Springer. 13. Marker, D. (2002). Model theory: An introduction. Springer. 14. Mashevitzky, G., Plotkin, B., & Plotkin, E. (2004). Automorphisms of the category of free Lie algebras. Journal of Algebra, 282(2), 490–512. 15. Németi, I. (1991). Algebraization of quantifier logics, and introductory overview. Studia Logica, 50(3–4), 485–569. 16. Plotkin, B. (1994). Universal algebra, algebraic logic and databases. Kluwer Academic Publishers. 17. Plotkin, B. (2006). Algebraic geometry in first order logic. Journal of Mathematical Sciences, 137(5), 5049–5097. 18. Plotkin, B. I. (2012). Isotyped algebras. Proceedings of the Steklov Institute of Mathematics, 287, 91–115. 19. Plotkin, B. (2019). Seven lectures on the universal algebraic geometry. Contemporary Mathematics, 726, 143–215. 20. Plotkin, B., Aladova, E., & Plotkin, E. (2013). Algebraic logic and logically-geometric types in varieties of algebras. Journal of Algebra and Its Applications, 12(2), 23 (paper no. 1250146). 21. Plotkin, B., & Plotkin, E. (2015). Multi-sorted logic and logical geometry: Some problems. Demonstratio Mathematica, 48(4), 577–618. 22. Plotkin, B., & Plotkin, T. (2008). Categories of elementary sets over algebras and categories of elementary algebraic knowledge. Lecture Notes in Computer Science, 4800, 555–570. 23. Zhitomirski, G. (2018). On types of points and algebras. International Journal of Algebra and ComputationInternational Journal of Algebra and ComputationInternational Journal of Algebra and Computation, 28(8), 1717–1730.

1 Algebraic Logic and Knowledge Bases

25

Elena Aladova received her Ph.D. in Mathematics from Moscow Pedagogical State University in 2004, under the supervision of Alexei N. Krasilnikov and Gennadiy A. Karasev. She was a postdoc at Bar-Ilan University, Israel, from 2007 to 2011, visiting researcher at Einstein Institute of Mathematics, Hebrew University of Jerusalem, Israel, from 2011 to 2012 and researcher at Mathematical Department of Bar-Ilan University from 2012 to 2016. She also holds a Ph.D. in Computer Science from Bar-Ilan University, completed in 2018 under the supervision of Tatjana Plotkin. Her main interests include universal algebra, algebraic logic, universal algebraic geometry, applications of universal algebra and algebraic logic in Computer Science, in particular, in knowledge bases and databases theories. Boris Plotkin is a Russian and Israely mathematician. Scientific career started in the early 1940s at the Ural University of Sverdlovsk, USSR. He received his Ph.D. in algebra in 1952, became Doctor of Science in 1956, in Moscow University, for the thesis “Radical groups.” His results include constructing groups with the normalizer condition, invention of the locally-nilpotent radical, which became known as the Hirsch-Plotkin radical and many others. In the early 1950s, he posted the famous Engel (nil) problem, opened up to now. In 1960 B. Plotkin moved to Riga, Latvia, where he established the Riga Algebraic Seminar and developed a new theory called “Varieties of representations of algebraic systems.” He is a Doctor honoris causa of University of Latvia. From 1993 B. Plotkin is a full professor of the Hebrew University of Jerusalem, nowadays Professor Emeritus of this university. Among main books of B.Plotkin are “Groups of automorphisms of algebraic systems,” “Varieties of representations of groups.” and “Universal algebra, Algebraic Logic and Databases.” He is a creator of the theory which is now commonly known as “Universal Algebraic Geometry.” Among main interests: group theory, varieties of representations of groups, algebraic theory of automata, algebraic logic, syntax and semantics in logic, Universal Algebraic Geometry. Tatjana Plotkin is a Russian and Israely computer scientist. She received her Ph.D. in Computer Science in 1986 under the supervision of E. Peterson. In the end of 80s she was an Associate Professor in Riga Civil Aviation Institute, Latvia. For the last 20 years she is a Senior Researcher at the Computer Science Department of Bar Ilan University, Israel. Her main areas of expertise include databases, knowledge bases, artificial intelligence and applications of algebra and logic in computer science.

Chapter 2

Guarded Ontology-Mediated Queries Pablo Barceló, Gerald Berger, Georg Gottlob, and Andreas Pieris

Abstract We concentrate on ontology-mediated queries (OMQs) expressed using guarded Datalog∃ and conjunctive queries. Guarded Datalog∃ is a rule-based knowledge representation formalism inspired by the guarded fragment of first-order logic, while conjunctive queries represent a prominent database query language that lies at the core of relational calculus (i.e., first-order queries). For such guarded OMQs we discuss three main algorithmic tasks: query evaluation, query containment, and first-order rewritability. The first one is the task of computing the answer to an OMQ over an input database. The second one is the task of checking whether the answer to an OMQ is contained in the answer of some other OMQ on every input database. The third one asks whether an OMQ can be equivalently rewritten as a first-order query. For query evaluation, we explain how classical results on the satisfiability problem for the guarded fragment of first-order logic can be applied. For query containment, we discuss how tree automata techniques can be used. Finally, for first-order rewritability, we explain how techniques based on a more sophisticated automata model, known as cost automata, can be exploited.

P. Barceló Institute for Mathematical and Computational Engineering, Pontificia Universidad Católica de Chile & IMFD Chile, Santiago, Chile e-mail: [email protected] G. Berger · G. Gottlob Institute of Logic and Computation, TU Wien, Vienna, Austria e-mail: [email protected] G. Gottlob e-mail: [email protected] G. Gottlob Department of Computer Science, University of Oxford, Oxford, UK A. Pieris (B) School of Informatics, University of Edinburgh, Edinburgh, UK e-mail: [email protected] © Springer Nature Switzerland AG 2021 J. Madarász and G. Székely (eds.), Hajnal Andréka and István Németi on Unity of Science, Outstanding Contributions to Logic 19, https://doi.org/10.1007/978-3-030-64187-0_2

27

28

P. Barceló et al.

Keywords Ontology-mediated queries · Tuple-generating dependencies · Guardedness · Conjunctive queries · Query evaluation · Query containment · First-order rewritability

2.1 Introduction The novel application of knowledge representation tools for handling incomplete and heterogeneous data is giving rise to a new field, recently coined as knowledgeenriched data management [3]. A crucial problem in this field is ontology-based data access (OBDA) [31], which refers to the utilization of ontologies (i.e., sets of logical sentences) for providing a unified conceptual view of various data sources. Users can then pose their queries solely in the schema provided by the ontology, abstracting away from the specifics of the individual sources. In OBDA, the ontology O and the user query q, which is typically a conjunctive query, can be seen as two components of one composite query Q = (S, O, q), dubbed ontology-mediated query (OMQ); S is the data schema, indicating that Q will be posed on databases over S [14]. The main tasks that are of special interest in this setting are as follows: Query Evaluation. Given an OMQ Q = (S, O, q), a database D over the schema S, and a candidate answer a, ¯ which is a tuple of constants from the domain of D, the question is whether a¯ belongs to the evaluation of q over every extension of D that satisfies O. In other words, is a¯ a certain answer for Q over D? The set of certain answers for Q over D is denoted Q(D). Query Containment. Given two OMQs Q 1 and Q 2 , both with data schema S, the question is whether Q 1 is contained in Q 2 , i.e., Q 1 (D) ⊆ Q 2 (D) for every database D over S. This is a crucial static analysis task (i.e., no database is involved) with applications in query optimization. Whenever we try to optimize an OMQ Q and get some other one Q  that is easier to evaluate, we have to ensure that Q and Q  are equivalent, i.e., they return the same answer over all databases. This boils down to check that Q is contained in Q  and vice versa. First-Order Rewritability. Given an OMQ Q with data schema S, the question is whether Q can be rewritten as a first-order query ϕ Q that returns the same answer on every input database over the schema S. This is another important static analysis task, which allows us to check whether an OMQ can be executed via standard database management systems (DBMSs), which are highly optimized for evaluating first-order queries. Notice that standard DBMSs are unaware of ontologies, and thus we cannot blindly pass to such systems an OMQ.

2 Guarded Ontology-Mediated Queries

29

2.1.1 Rule-Based Ontology-Mediated Queries While in the OMQ setting described above description logics (DLs) are often used for modeling ontologies, it is widely accepted that for handling arbitrary arity relations in relational databases it is convenient to use Datalog∃ rules, a.k.a. tuple-generating dependencies and existential rules, of the form ∀x∀ ¯ y¯ (ϕ(x, ¯ y¯ ) → ∃¯z ψ(x, ¯ z¯ )) , where ϕ and ψ are conjunctions of atoms. Notice that such rules extend plain Datalog rules with the feature of existential quantification (hence the name Datalog∃ ), which is crucial for knowledge representation purposes as it allows us to invent new objects that are not already present in the input database. Unfortunately, the use of Datalog∃ as an ontology language, without posing any additional syntactic restrictions, leads to the undecidability of all the algorithmic tasks for OMQs described above. The undecidability of query evaluation is implicit in [9], where the implication problem for database dependencies is studied. A stronger result of this kind can be found in [16], where it is shown that query evaluation is undecidable even when the ontology and the conjunctive query are fixed. The undecidability of query containment is immediately inherited from the fact that query containment for plain Datalog queries (without existential quantification) is undecidable [35]. Finally, the undecidability of first-order rewritability is again inherited from the fact that the same problem for plain Datalog queries is undecidable. In fact, we know that a Datalog query is first-order rewritable iff it has bounded recursion [1], while the problem of checking whether a Datalog query has bounded recursion is undecidable [24]. The above negative results led to a flurry of activity during the last decade for identifying syntactic restrictions on Datalog∃ rules that make the algorithmic tasks in question, especially query evaluation, decidable. Several decidable fragments have been proposed in the literature based on different syntactic restrictions; see, e.g., [4, 19, 30]—the list is by no means exhaustive.

2.1.2 Guardedness to the Rescue A prime example of a well-behaved formalism is guarded Datalog∃ [16], which has been inspired by the guarded fragment of first-order logic (GFO) introduced by Andréka, Németi, and van Benthem in [2]. A Datalog∃ rule is guarded if the left-hand side of the implication has an atom, called a guard, that contains all the universally quantified variables. Guarded Datalog∃ is a member of a broader family of knowledge representation formalisms, known as Datalog± [18]. Datalog± languages extend Datalog with useful features such as existential quantification, equality, negation, etc., and at the same time restrict the syntax (such as guardedness) in order to guarantee

30

P. Barceló et al.

decidability of the main tasks. Hence, the symbol ‘+’ refers to the additional features, while the symbol ‘−’ refers to the syntactic restrictions. Just as the robustness and the nice algorithmic properties of GFO can be attributed to the tree model property [2] (i.e., satisfiable GFO-sentences always have a “treelike” model, that is, a model of bounded tree-width), the reason why query evaluation for guarded OMQs, i.e., OMQs where the ontology consists of guarded Datalog∃ rules, is decidable, is because we can focus on “tree-like” universal models; for more details see [16]. In fact, guarded Datalog∃ (plus guarded denial constraints) is essentially a normal form for the Horn fragment of GFO (cf. [29] for more details on well-behaved fragments of GFO regarding query evaluation). It is worth mentioning that the tree model property of the guarded fragment renders decidable even the problem of query evaluation for OMQs where the ontology is a GFO-sentence [5], while this fails for other decidable fragments of first-order logic, most notably the two-variable fragment [32]. The goal of this chapter is to discuss the algorithmic tasks introduced above for guarded OMQs, and explain how worst-case optimal complexity results can be established. Roadmap. After giving some basic preliminaries in Sect. 2.2, we focus in Sect. 2.3 on query evaluation, and explain how classical results on the satisfiability problem for GFO can be applied in order to show that the problem in question is in 2ExpTime. This is done by relying on a technique known as treefication introduced in [5]. In Sect. 2.4, we concentrate on query containment, and we discuss how tree automata techniques can be used in order to establish a 2ExpTime upper bound. This section is based on recent results on query containment from [8]. Then, in Sect. 2.5, we consider first-order rewritability, and explain how techniques based on a more sophisticated automata model, known as cost automata, can be exploited in order to obtain a 2ExpTime upper bound. This section is based on recent results on first-order rewritability from [7]. Finally, in Sect. 2.6, we discuss the above three algorithmic problems for guarded OMQs under the lenses of finite models, i.e., when the evaluation of an OMQ Q = (S, O, q) over a database D is defined by considering only the finite extensions of D that satisfy O, denoted Q fin (D). We present a deep result, which is implicit in [5], that establishes the following: for every guarded OMQ Q, with data schema S, and database D over S, Q(D) = Q fin (D). The latter immediately implies that containment and first-order rewritability for guarded OMQs are invariant with respect to whether we consider all the extensions of the database that satisfy the ontology, or only the finite ones.

2.2 Preliminaries Basics. Let C, N, and V be disjoint, countably infinite sets of constants, (labeled) nulls, and variables, respectively. We adopt the unique name assumption, i.e., different constants represent different values. A schema S is a finite set of relation symbols, each having an associated (non-negative) arity. The width of S, denoted wd(S), is the maximum arity among all relation symbols of S. We write R/n to indicate that the

2 Guarded Ontology-Mediated Queries

31

relation symbol R has arity n ≥ 0. A term is either a constant, a null, or a variable. An atom over S (or simply S-atom) is an expression of the form R(t1 , . . . , tn ), where R ∈ S is of arity n and t1 , . . . , tn are terms. An S-fact is an S-atom whose arguments consist of constants only. Databases. An instance over a schema S (or simply S-instance) is a (possibly infinite) set of S-atoms that contain constants and nulls as arguments only. A database over S (or simply S-database) is a finite set of S-facts, i.e., a finite S-instance that contains constants only. The active domain of an instance J, denoted adom(J), consists of all terms occurring in J. A tree decomposition for an instance J is a tuple δ = (T , (X v )v∈T ), where T is a rooted tree whose set of nodes is T , and (X v )v∈T is a collection of subsets of adom(J), called bags, such that: 1. If R(t1 , . . . , tn ) ∈ J, then there exists a v ∈ T such that {t1 , . . . , tn } ⊆ X v . 2. For all a ∈ adom(J), the set {v ∈ T | a ∈ X v } induces a connected subtree of T. The tree decomposition δ is called guarded if, for every node v ∈ T , there exists an atom R(t1 , . . . , tn ) ∈ J such that X v ⊆ {t1 , . . . , tn }. An instance is acyclic if it admits a guarded tree decomposition. Conjunctive Queries. A conjunctive query (CQ) over S is a first-order formula q(x) ¯ := ∃ y¯ (α1 (¯v1 ) ∧ · · · ∧ αm (¯vm )), where x¯ and y¯ are sequences of variables, each αi (¯vi ) is an S-atom that mentions only variables from v¯ i , or an equality atom of the form vi1 = vi2 , with vi1 , vi2 ∈ v¯ i , and v¯ i ⊆ x¯ ∪ y¯ , for each i ∈ {1, . . . , m}.1 The variables x¯ are the answer variables of q(x). ¯ If x¯ is empty, then q is a Boolean conjunctive query (BCQ). We shall denote by var(q) the variables that occur in q(x). ¯ The evaluation of CQs over instances is defined in terms of homomorphisms. A homomorphism from q to an S-instance J is a mapping h : var(q) → adom(J) such that, for each i ∈ {1, . . . , m}, (i) αi (h(¯vi )) ∈ J if αi (¯vi ) is an S-atom, and (ii) ¯ to indicate h(vi1 ) = h(vi2 ) if α(¯vi ) is the equality atom vi1 = vi2 . We write J |= q(a) that there is a homomorphism h from q to J such that h(x) ¯ = a; ¯ in case q is Boolean, we write J |= q. We also write q(J) for the set of tuples a¯ ∈ adom(J)|x|¯ such that J |= q(a). ¯ Let CQ (resp., BCQ) be the class of all CQs (resp., BCQs). Tuple-Generating Dependencies. A tuple-generating dependency (TGD)2 is a firstorder sentence of the form τ : ∀x∀ ¯ y¯ (ϕ(x, ¯ y¯ ) → ∃¯z ψ(x, ¯ z¯ )) , 1 For

technical clarity, we assume that CQs do not mention constants from C. However, all the results that we discuss can be extended to CQs with constants. 2 For brevity, in the rest of the chapter we adopt the acronym TGD instead of the term Datalog∃ .

32

P. Barceló et al.

where ϕ and ψ are conjunctions of atoms that mention only variables, called the body and head of τ , respectively. For brevity, we shall omit the preceding universal quantifiers and use comma instead of ∧ for joining atoms. We assume that each variable of x¯ is mentioned in at least one atom of ψ. The TGD τ is logically equivalent to the sentence ¯ → qψ (x)), ¯ ∀x¯ (qϕ (x) ¯ and qψ (x) ¯ are the CQs ∃ y¯ ϕ(x, ¯ y¯ ) and ∃¯z ψ(x, ¯ z¯ ), respectively. An where qϕ (x) instance J satisfies τ if qϕ (J) ⊆ qψ (J), while J satisfies a set of TGDs O, denoted J |= O, if J |= τ  for every τ  ∈ O. Let TGD be the class of finite sets of TGDs. A prominent class, which is of special interest for this chapter, is the class of guarded TGDs. A TGD τ is guarded if it has an atom in its body, called a guard, that contains all the body variables. Let G be the class of finite sets of guarded TGDs. Ontology-Mediated Queries. An ontology-mediated query (OMQ) is a triple of the form Q = (S, O, q(x)), ¯ where S is a schema (the data schema of Q), O is a finite set of TGDs (the ontology), and q(x) ¯ is a CQ over S ∪ sig(O), with sig(O) denoting the set of all relation symbols in O. We include the data schema S in the specification of Q in order to emphasize that Q is evaluated over S-databases, even though O and q(x) ¯ may use additional relation symbols. In fact, O can introduce relations that are not present in S, which in turn allows us to enrich the schema of q(x). ¯ The semantics of Q is given in terms of certain answers. Let D be an S-database. ¯ w.r.t. D and O is the set of all tuples a¯ of constants The certain answers to q(x) such that (D, O) |= q(a), ¯ that is, for every instance J ⊇ D that satisfies O it holds that J |= q(a). ¯ We write D |= Q(a) ¯ to indicate that a¯ is a certain answer to q(x) ¯ w.r.t. D and O; in case q is Boolean, we write D |= Q. Moreover, we write Q(D) for the set of all a¯ such that D |= Q(a). ¯ Hence, Q can be seen as a function that maps S-databases to sets of tuples over C. ¯ where O falls in the class We write (O, Q) for the class of OMQs (S, O, q(x)), O ⊆ TGD, and q(x) ¯ in the class Q ⊆ CQ. We call the pair (O, Q) an ontologymediated query language. In this chapter, we mainly deal with the OMQ language (G, CQ), which collects all the OMQs where the ontology consists of guarded TGDs. Example 2.1 Let Q = (S, O, q(x)) ∈ (G, CQ), where S = {P/1, F/2}, and O := {P(x) → ∃y (F(y, x) ∧ P(y))} q(x) := ∃y∃z (P(x) ∧ F(y, x) ∧ F(x, z)).

Consider the S-database D := {P(a), F(a, b), P(b)}. It is easy to verify that Q(D) = { a }. Intuitively, the ontology O states that every person has a father who is himself a person. The query q(x) asks for all persons x that have a father, and that are fathers themselves. Notice that the database D does not explicitly store the fact that a has a father. This is implicit information expressed by the ontology O.  

2 Guarded Ontology-Mediated Queries

33

2.3 Query Evaluation As already discussed in Sect. 2.1, one of the most important tasks for an ontologymediated query language is query evaluation, which is defined as follows: PROBLEM : Eval(O, Q) INPUT : OMQ Q = (S, O, q(x)) ¯ ∈ (O, Q), S-database D, a¯ ∈ adom(D)|x|¯ . QUESTION : Is it the case that a¯ ∈ Q(D)?

It is well-known that Eval(TGD, CQ) is undecidable; implicit in [9]. However, if we focus on (G, CQ), then the problem becomes decidable. In fact: Theorem 2.1 Eval(G, CQ) is 2ExpTime-complete. The above result has been explicitly established in [16]: the upper bound is shown via a sophisticated alternating algorithm that uses exponential space, while the lower bound is shown by simulating the behavior of an alternating exponential space Turing machine. An alternative way to establish the 2ExpTime upper bound of Theorem 2.1 is by a reduction to the satisfiability problem of the guarded fragment of first-order logic, which in turn exploits a technique known as treeification [5]. In what follows, we discuss the latter approach that relies on the fact that satisfiability for the guarded fragment of first-order logic is feasible in double exponential time, as shown in [27]. The goal is to reduce Eval(G, CQ) to the problem of deciding whether a guarded sentence is unsatisfiable. Recall that the guarded fragment of first-order logic, introduced by Andréka, Németi and van Benthem in [2], is a collection of first-order formulas with some syntactic restrictions on quantification patterns, which is analogous to the relativized nature of modal logic. We write GF[T], where T is a schema, for the smallest set of formulas that (i) contains all T-atoms (without constants and nulls) and equalities among variables; (ii) is closed under ¬, ∧, ∨, →; and (iii) if α is an atom containing all the variables of (x¯ ∪ y¯ ), and ϕ ∈ GF[T] with free variables in (x¯ ∪ y¯ ), then ∀x¯ (α → ϕ) and ∃x¯ (α ∧ ϕ) belong to GF[T] as well; α is the guard of the quantifier. It has been shown in [2] that satisfiability for guarded sentences is decidable, while Grädel proved in [27] that it is actually 2ExpTime-complete.

2.3.1 From Eval(G, CQ) to Satisfiability for the Guarded Fragment As said above, the goal is to reduce Eval(G, CQ) to the problem of deciding whether a guarded sentence is unsatisfiable. Consider an instance of Eval(G, CQ), i.e., an OMQ Q = (S, O, q(x)) ¯ ∈ (G, CQ), an S-database D, and a tuple of constants a¯ ∈ adom(D)|x|¯ . We define the first-order sentence

34

P. Barceló et al.

ϕ Q,D,a¯ :=

 α∈D

α ∧



τ ∧ ¬q(a), ¯

τ ∈O

where q(a) ¯ is the sentence obtained from q(x) ¯ after instantiating the variables x¯ with the constants a. ¯ It is easy to verify that a¯ ∈ Q(D) ⇐⇒ ϕ Q,D,a¯ is unsatisfiable. However, ϕ Q,D,a¯ does not directly fall in a well-behaved fragment, and, in particular, in the guarded fragment of first-order logic,  mainly due to the unguarded sentence q(a). ¯ Notice that, strictly speaking, also τ ∈O τ is unguarded despite the fact that O consists of guarded TGDs. Nevertheless, a guarded TGD can be easily converted in polynomial time into anequisatisfiable guarded sentence (see, e.g., [5]) and thus we assume, w.l.o.g., that τ ∈O τ falls in the guarded fragment. The goal now is to transform ϕ Q,D,a¯ into an equisatisfiable guarded sentence ψ Q,D,a¯ . This is done by exploiting the technique of treeification [5]. Let us first recall this technique, and then explain how we use it in order to construct the desired sentence ψ Q,D,a¯ . Treeification Every BCQ q over a schema S can be naturally associated with an S-database, known as its canonical database, which consists of the atoms of q after converting each variable x ∈ var(q) into a constant ax . We say that q is acyclic if its canonical database is acyclic, i.e., it admits a guarded tree decomposition. Consider now a BCQ q over a schema S, and a schema T ⊇ S. The T-treeification of q is defined as the set ΛqT of all acyclic CQs q  over T of size at most three times the number of atoms occurring in q, such that q  is contained in q, i.e., D |= q  implies D |= q for every T-database D. By abuse of notation, write ΛqT  we may  for the union (or disjunction) of its BCQs, i.e., the sentence q  ∈ΛqT q . The main property of treeification is that it preserves satisfiability over acyclic models.3 Since every satisfiable guarded sentence admits an acyclic model, we get the next useful result from [5]; we write ϕ |= ⊥ to denote the fact that ϕ is unsatisfiable: Lemma 2.1 Consider a sentence ϕ ∈ GF[T], and a BCQ q over T. It holds that ϕ ∧ ¬q |= ⊥ ⇐⇒ ϕ ∧ ¬ΛqT |= ⊥. Before we proceed further, let us clarify that an acyclic BCQ does not directly fall in the guarded fragment. However, it is known that every acyclic BCQ can be equivalently rewritten as a guarded sentence [26]. Therefore, in what follows, we assume, w.l.o.g., that the sentence obtained after the treeification of a BCQ is guarded.

3 Here,

we see models as sets of atoms, i.e., as instances.

2 Guarded Ontology-Mediated Queries

35

The Final Construction Recall that we consider an instance of Eval(G, CQ) consisting of the OMQ Q = (S, O, q(x)) ¯ ∈ (G, CQ), the S-database D, and the tuple of constants a¯ ∈ adom(D)|x|¯ . At this point, one maybe tempted to think that to convert the sentence ϕ Q,D,a¯ into an equisatisfiable guarded sentence ψ Q,D,a¯ , we simply need to treeify the BCQ q(a). ¯ However, before doing this, we first need to properly eliminate the ¯ in order to guarantee that after applying constants that occur in α∈D α and q(a) treeification the result will be an equisatisfiable guarded sentence. To this end, we first convert in polynomial time ϕ Q,D,a¯ into a convenient equisatisfiable sentence ϕ Q,D,a¯ , and then apply the treefication technique. Assume that adom(D) = {b1 , . . . , bk }. Let D+ := D ∪ {Cb (b)}b∈{b1 ,...,bk } , where Cb1 , . . . , Cbk are fresh unary relation symbols not in S ∪ sig(O). Let also D+ var be the set of atoms obtained from D+ by replacing each occurrence of a constant b ∈ ¯ by adom(D) with the variable xb . Finally, let q + be the BCQ obtained from q(a) replacing each occurrence of a constant a with a fresh existentially quantified variable ya , and adding the atom Ca (ya ). We now define ϕ Q,D,a¯ as the sentence ⎛



∃xb1 , · · · ∃xbk ⎝C(xb1 , . . . , xbk ) ∧

1≤i< j≤k







xbi = xb j ∧

⎞ α⎠ ∧

 τ ∈O

α∈D+ var

Ξ

τ ∧ ¬q + ,

where C is a fresh k-ary relation symbol. It should be clear that ϕ Q,D,a¯ and ϕ Q,D,a¯ are equisatisfiable sentences. It is also clear that Ξ is a guarded sentence. Thus, by Lemma 2.1, we immediately get that Proposition 2.1 ϕ Q,D,a¯ and Ξ ∧ ΛqT+ , where T = S ∪ {C} ∪ {Cbi }1≤i≤k ∪ sig(O), are equisatisfiable sentences. The above result implies that a¯ ∈ Q(D) iff the guarded sentence Ξ ∧ ΛqT+ is unsatisfiable. Thus, we get a reduction from Eval(G, CQ) to the unsatisfiability problem for guarded sentences. However, it should not be overlooked that this reduction takes exponential time due to treefication. In fact, we know from [5] that |ΛqT+ | ≤ |T| O(|q

+

|)

· (|q + | · wd(T)) O(|q

+

|·wd(T))

,

where |q + | is the size of q + , while ΛqT+ can be constructed in time |q + | · |T| O(|q

+

|)

· (|q + | · wd(T)) O(|q

+

|·wd(T))

.

Nevertheless, since the reduction provided by Proposition 2.1 increases the arity of the schema only polynomially, while the algorithm for checking whether a guarded sentence is unsatisfiable given in [27] is double exponential only on the arity of the underlying schema, we conclude that Eval(G, CQ) is in 2ExpTime, as needed.

36

P. Barceló et al.

Interestingly, query answering against the entire guarded fragment can be shown to be decidable in 2ExpTime with the same construction as given above (cf. [5]). Remark The complexity stated in Theorem 2.1 refers to the combined complexity of query evaluation, i.e., when all the components are part of the input. However, in practice, it is realistic to assume that the OMQ is fixed, and only the database and the tuple of constants are part of the input. In this case, we refer to the data complexity of Eval(G, CQ), which is known to be PTime-complete [17]. Another relevant setting is when the arity of the underlying schema is bounded by an integer. In this case, Eval(G, CQ) is ExpTime-complete [16]. The machinery described above, which exploits the guarded fragment of first-order logic, is not well-suited for obtaining optimal results in the above settings, and more refined techniques are needed. We refer the interested reader to the relevant literature for details.

2.4 Query Containment We now focus on another important algorithmic task for a query language, namely ¯ and Q 2 = (S, O2 , query containment. Consider two OMQs Q 1 = (S, O1 , q1 (x)) ¯ We say that Q 1 is contained in Q 2 , written Q 1 ⊆ Q 2 , if Q 1 (D) ⊆ Q 2 (D) q2 (x)). for every S-database D. The main problem that we study in this section is defined as follows: PROBLEM : Cont(O, Q) INPUT : Two OMQs Q 1 , Q 2 ∈ (O, Q) with the same data schema. QUESTION : Is it the case that Q 1 ⊆ Q 2 ?

It is well-known that Cont(TGD, CQ) is undecidable. This is immediately inherited from the fact that query containment for Datalog queries is undecidable [35], since a Datalog query can be seen as an OMQ that falls in the language (F, CQ), where F denotes the class of full TGDs, i.e., TGDs without existentially quantified variables in the head. However, for (G, CQ) the problem becomes decidable. In fact: Theorem 2.2 Cont(G, CQ) is 2ExpTime-complete. The lower bound holds even if S ∪ sig(O1 ) ∪ sig(O2 ) consists of unary and binary relation symbols only. The lower bound for Cont(G, CQ) is immediately inherited from [12], where it is shown that the containment problem for OMQs where the ontology is formulated using the description logic ELI is 2ExpTime-hard. Since a set of ELI axioms can be equivalently rewritten, for query answering purposes, as a set of guarded TGDs, the 2ExpTime lower bound follows. It remains to establish the upper bound. In the rest of the section, we give some details on how this upper bound can be shown.

2 Guarded Ontology-Mediated Queries

37

2.4.1 Atomic Queries We first focus our attention on the simpler OMQ language (G, AQ0 ), where AQ0 denotes the class of all CQs that consist of a single atom R(), where R is 0-ary relation symbol, and show that query containment is in 2ExpTime. For brevity, we will simply write R instead of R(). Having this result in place, we are then going to explain how it can be extended to the language (G, CQ) by exploiting the technique of treefication discussed in the previous section. We proceed to show that: Theorem 2.3 Cont(G, AQ0 ) is in 2ExpTime. To establish the above result, we first show the so-called acyclic witness property, which states that non-containment for (G, AQ0 ) is witnessed via an acyclic database, i.e., a database that admits a guarded tree decomposition, which in turn allows us to devise a decision procedure for Cont(G, AQ0 ) based on alternating tree automata that runs in 2ExpTime. Summing up, the proof for the 2ExpTime membership of Cont(G, AQ0 ) proceeds in three main steps: 1. Establish the acyclic witness property. 2. Encode the acyclic witnesses as trees that can be accepted by an alternating tree automaton. 3. Construct an automaton that decides Cont(G, AQ0 ); in fact, we reduce our problem to emptiness for two-way alternating parity automata on finite trees. Each one of the above three steps is discussed in more details below. Acyclic Witness Property We proceed to show that non-containment for (G, AQ0 ) is witnessed via an acyclic database. We write Q 1  Q 2 to denote the fact that the OMQ Q 1 is not contained in the OMQ Q 2 , or, equivalently, there exists an S-database D, where S is the data schema of Q 1 and Q 2 , such that Q 1 (D)  Q 2 (D). Proposition 2.2 Suppose that Q 1 and Q 2 are OMQs from (G, AQ0 ) with data schema S. The following are equivalent: 1. Q 1  Q 2 . 2. There exists an acyclic S-database D such that Q 1 (D)  Q 2 (D). Proof The fact that (2) ⇒ (1) is trivial. For the other direction, we need an auxiliary result, which is shown using the notion of guarded unraveling (see, e.g, [2]) and the compactness theorem. Given two databases D1 and D2 , a function h : adom(D1 ) → ¯ ∈ D1 , R(h(a)) ¯ ∈ adom(D2 ) is a homomorphism from D1 to D2 if, for each R(a) D2 . The existence of such a homomorphism is denoted by D1 → D2 . Lemma 2.2 Let D be an S-database, and Q ∈ (G, AQ0 ) with data schema S. If D |= Q then there is an acyclic S-database D such that D |= Q and D → D .

38

P. Barceló et al.

Having the above lemma in place, we can now show that (1) ⇒ (2). By hypothesis, there exists an S-database D such that D |= Q 1 and D |= Q 2 . By Lemma 2.2, there exists an acyclic S-database D such that D |= Q 1 and D → D . It is known that OMQs from (G, AQ0 ) are closed under homomorphisms [14], which immediately   implies that D |= Q 2 . Thus, Q 1 (D)  Q 2 (D), as needed. Encoding Acyclic Databases The next step is to encode acyclic databases as trees that can be accepted by an alternating tree automaton. The key observation here is that acyclic databases are “tree-like”, i.e., are of bounded tree-width. The tree-width of a database D is the minimum width among all the tree decompositions δ = (T , (X v )v∈T ) for D, while the width of δ is max{|X v | | v ∈ T } − 1. It is generally known that a database D whose tree-width is bounded by an integer k can be encoded into a tree over a finite alphabet of double exponential size in k that can be accepted by an alternating tree automaton; see, e.g., [10]. Since the tree-width of an acyclic S-database is bounded by wd(S) − 1, such an encoding can be used for acyclic databases. Let Γ be an alphabet and (N \ {0})∗ be the set of finite sequences of positive integers, including the empty sequence. A Γ -labeled tree is a partial function t : (N \ {0})∗ → Γ , whose domain dom(t) is closed under prefixes, i.e., x · i ∈ dom(t) implies x ∈ dom(t), for all x ∈ (N \ {0})∗ and i ∈ N \ {0}. The elements of dom(t) identify the nodes of t. Given an acyclic S-database D, and a guarded tree decomposition δ for D of width wd(S) − 1, it can be shown that D and δ can be encoded as a ΓS -labeled tree t, where ΓS is an alphabet of double exponential size in wd(S) and exponential size in |S|, such that each node of δ corresponds to exactly one node of t and vice versa. Although every acyclic S-database can be encoded as a ΓS -labeled tree, the other direction does not hold. In other words, it is not the case that every ΓS -labeled tree encodes an acyclic S-database and its corresponding guarded tree decomposition. In view of this fact, we need the additional notion of consistency. A ΓS -labeled tree is called consistent if it satisfies certain syntactic properties—we do not give these properties here since they are not vital in order to understand the high-level idea of the proof. Now, given a consistent ΓS -labeled tree t, we can show that t can be decoded into an acyclic S-database t. From the above discussion and Proposition 2.2, we obtain the following lemma: Lemma 2.3 Suppose that Q 1 and Q 2 are OMQs from (G, AQ0 ) with data schema S. The following are equivalent: 1. Q 1  Q 2 . 2. There exists a consistent ΓS -labeled tree t such that Q 1 (t)  Q 2 (t). Constructing Tree Automata We now proceed with our automata-based procedure. We use two-way alternating parity automata (2ATA) that run on finite labeled trees of unbounded degree. Twoway alternating automata process the input tree while branching in an alternating

2 Guarded Ontology-Mediated Queries

39

fashion to successor states, and thereby moving either down or up the input tree. Our goal is to reduce Cont(G, AQ0 ) to the emptiness problem for 2ATA. As usual, given a 2ATA A , we denote by L(A ) the language of A , i.e., the set of labeled trees it accepts. The emptiness problem is defined as follows: given a 2ATA A , does L(A ) = ∅? Thus, given Q 1 , Q 2 ∈ (G, AQ0 ), we need to construct a 2ATA A such that Q 1 ⊆ Q 2 iff L(A ) = ∅. It is well-known that deciding whether L(A ) is empty is feasible in exponential time in the number of states, and in polynomial time in the size of the input alphabet [23]. Therefore, in order to obtain the desired 2ExpTime upper bound, we should construct A in double exponential time, while the number of states must be at most exponential. We first need a way to check consistency of labeled trees. The construction of an automaton for this task is fairly standard in the literature on automata for guarded logics (see, e.g., [10, 11]), and we omit the details. Lemma 2.4 Consider a schema S. There exists a 2ATA CS that accepts a ΓS -labeled tree t iff t is consistent. The number of states of CS is exponential in wd(S) and linear in |S|. Moreover, CS can be constructed in double exponential time in wd(S) and in exponential time in |S|. Now, the crucial task is, given an OMQ Q ∈ (G, AQ0 ), to devise an automaton that accepts labeled trees which correspond to databases that make Q true. Lemma 2.5 Consider an OMQ Q = (S, O, q) ∈ (G, AQ0 ). There is a 2ATA A Q that accepts a consistent ΓS -labeled tree t iff t |= Q. The number of states of A Q is exponential in wd(S) and linear in |S ∪ sig(O)|. Moreover, A Q can be constructed in double exponential time in the size of Q. Let us give some insights on the construction of A Q . Assume that Q = (S, O, G), i.e., the atomic query consists of the 0-ary predicate G. Roughly speaking, given a consistent ΓS -labeled tree t as input, A Q tries to find derivations that witness the fact that t |= Q. But let us first formalize the notion of derivation. Let D be an S-database. A derivation tree for D and Q is a finite labeled tree T , with η being the node labeling function that assigns facts R(a) ¯ to nodes, where R ∈ S ∪ sig(O) and a¯ ⊆ adom(D), such that the following conditions are satisfied: 1. For the root node v of T , we have that η(v) = G. 2. For each leaf node v of T , we have that η(v) ∈ D. 3. For each non-leaf node v of T , with u 1 , . . . , u k being its children, we have that {η(u 1 ), . . . , η(u k )} is guarded, i.e., it has an atom that contains all the terms of adom({η(u 1 ), . . . , η(u k )}), and ({η(u 1 ), . . . , η(u k )}, O) |= η(v). Intuitively, T describes how the atom G can be entailed from D and O. It is easy to show that D |= Q iff there exists a derivation tree for D and Q. Moreover, due to the guardedness condition in point (3) above, it is possible to show that whenever there is a derivation tree for D and Q, there is one whose branching degree is bounded by a function that is exponential in wd(S ∪ sig(O)). The automaton A Q exploits these facts in order to exhaustively search for derivation trees that witness the fact

40

P. Barceló et al.

that t |= Q on an input tree t. To this end, A Q maintains states for the possible labels that may occur in a derivation tree for t and Q. It turns out that this state set is exponential, as stated in Lemma 2.5. Starting with the atom G, the automaton guesses labels of children of G that may occur in a candidate derivation tree of t and Q. Thanks to alternation, it can then proceed in the same way for each child label until it succeeds to build a derivation tree for t and Q. A detailed analysis of the construction of A Q reveals that A Q can be constructed in double exponential time in the size of Q, where the second exponent depends only on the maximum arity among all relation symbols present in Q. Having the above automata in place, we can show that Cont(G, AQ0 ) can be reduced to the emptiness problem for 2ATA. But let us first recall some key results about 2ATA, which are essential for the final construction. It is well-known that languages accepted by 2ATAs are closed under intersection and complement. Given two 2ATAs A1 and A2 , we write A1 ∩ A2 for a 2ATA, which can be constructed in polynomial time, that accepts the language L(A1 ) ∩ L(A2 ). Moreover, for a 2ATA A , we write A for the 2ATA, which is also constructible in polynomial time, that accepts the complement of L(A ). We can now show the following result: Proposition 2.3 Consider Q 1 , Q 2 ∈ (G, AQ0 ). We can construct in double exponential time a 2ATA A with exponentially many states such that Q 1 ⊆ Q 2 iff L(A ) = ∅. Proof Assume that both Q 1 and Q 2 have data schema S. We define A as the automaton CS ∩ A Q 1 ∩ A Q 2 . By Lemmas 2.3, 2.4 and 2.5, it is easy to verify that indeed A is constructible in double exponential time, while it has exponentially many states,   and that Q 1 ⊆ Q 2 iff L(A ) = ∅. The claim follows. Recall that for a 2ATA A deciding whether L(A ) is empty is feasible in exponential time in the number of states, and in polynomial time in the size of the input alphabet [23]. Consequently, Proposition 2.3 immediately implies that Cont(G, AQ0 ) is in 2ExpTime, and Theorem 2.3 follows.

2.4.2 From Conjunctive Queries to Atomic Queries Let us now explain how we get the desired 2ExpTime upper bound for Cont(G, CQ) by exploiting the fact that Cont(G, AQ0 ) is in 2ExpTime. We first observe that it suffices to focus on the OMQ language (G, BCQ), i.e., queries from (G, CQ) where the CQ is Boolean. This follows from the fact that there is a simple polynomial time reduction from Cont(G, CQ) to Cont(G, BCQ), which is a straightforward adaptation of the one given in [12] for OMQs based on the description logic ELI. Therefore, the goal is to reduce Cont(G, BCQ) to Cont(G, AQ0 ), and then apply Theorem 2.3. This reduction relies on the treefication technique discussed in Sect. 2.3, and is inspired by a translation of guarded negation fixed-point sentences to guarded fixed-point sentences given in [6].

2 Guarded Ontology-Mediated Queries

41

Consider an OMQ Q = (S, O, q) ∈ (G, BCQ), and let C be a relation symbol not in S ∪ sig(O) that has arity wd(q), where wd(q) denotes the width of q, i.e., the number of variables occurring in q. We define the set of TGDs 

ηCQ := q  → G q | q  ∈ ΛqS∪{C}∪sig(O) , where G q is a new 0-ary relation symbol not in S ∪ {C} ∪ sig(O). Notice that the TGDs in ηCQ are, in general, not guarded. However, by construction, their bodies are acyclic Boolean CQs, and this allows us to rewrite each TGD τ into linearly many guarded TGDs, which we denote by γτ . We then define the set of guarded TGDs γCQ :=



γτ .

τ ∈ηCQ

We finally define the OMQ   gC (Q) := S ∪ {C}, O ∪ γCQ , G q ∈ (G, AQ0 ). It can be shown that the translation gC (·) preserves containment. More precisely: Lemma 2.6 Let Q i = (S, Oi , qi ) ∈ (G, BCQ), for i ∈ {1, 2}, and consider a predicate C ∈ / S ∪ sig(O1 ) ∪ sig(O2 ) that has arity maxi∈{1,2} {wd(qi )}. It holds that Q 1 ⊆ Q 2 ⇐⇒ gC (Q 1 ) ⊆ gC (Q 2 ). The above lemma provides the reduction from Cont(G, BCQ) to Cont(G, AQ0 ), which allows us to apply the algorithm for Cont(G, AQ0 ) underlying Theorem 2.3. However, it should not be forgotten that this reduction takes exponential time due to treefication. Nevertheless, since the reduction provided by Lemma 2.6 increases the arity of the schema only polynomially, while the algorithm for Cont(G, AQ0 ) provided by Theorem 2.3 is double exponential only on the arity of the underlying schema, we conclude that Cont(G, BCQ) is in 2ExpTime, as needed.

2.5 First-Order Rewritability We now focus on another algorithmic task that is relevant for OMQs, that is, deciding whether an OMQ can be equivalently rewritten as a first-order query. A first-order (FO) query over a schema S is a (function-free) FO-formula ϕ(x), ¯ with x¯ being its free variables, that uses only relations from S. The evaluation of ϕ over an S-database D, ¯ where |= denotes denoted ϕ(D), is the set of tuples {a¯ ∈ adom(D)|x|¯ | D |= ϕ(a)}, the standard notion of satisfaction for first-order logic. An OMQ Q = (S, O, q(x)) ¯ ¯ over S that is equivalent to Q, is FO-rewritable if there is a (finite) FO-query ϕ Q (x)

42

P. Barceló et al.

i.e., Q(D) = ϕ Q (D) for every S-database D. We call ϕ Q (x) ¯ an FO-rewriting of Q. The main problem that we study in this section is the following: PROBLEM : FORew(O, Q) INPUT : An OMQ Q ∈ (O, Q). QUESTION : Is it the case that Q is FO-rewritable?

It is well-known that FORew(TGD, CQ) is undecidable. This follows from the fact that deciding whether a Datalog query, and thus an OMQ from (F, CQ)4 , is FOrewritable is undecidable. Actually, we know that a Datalog query is FO-rewritable iff it is bounded [1], while the problem of deciding whether a Datalog query is bounded is undecidable [24]. What about OMQs that fall in the language (G, CQ)? The next two examples show that FORew(G, CQ) is a non-trivial problem in the sense that there are queries from (G, CQ) that are not FO-rewritable, but at the same time there are FO-rewritable queries from (G, CQ). Example 2.2 Let Q = (S, O, q) ∈ (G, CQ), where S = {T /3, A/1, B/1}, O := {T (x, y, z), A(z) → R(x, z), T (x, y, z), R(x, z) → R(x, y)} and q := ∃x∃y∃z (T (x, y, z) ∧ R(x, z) ∧ B(y)). Intuitively, an FO-rewriting of Q should check for the existence of a set of atoms {T (c, ai , ai−1 )}1≤i≤k for some k ≥ 0. However, since there is no upper bound for k, this cannot be done via a finite first-order query, and thus Q is not FO-rewritable. A proof for this fact is given below.   By slightly adapting the above example, we get a query that is FO-rewritable. Example 2.3 Let Q  = (S, O, q  ) ∈ (G, CQ) be the query obtain from the OMQ Q given in Example 2.2 by replacing q with q  := ∃x∃y∃z (T (x, y, z) ∧ R(x, z) ∧ B(y) ∧ A(z)), i.e., by simply adding to the CQ q the atom A(z). The query Q  is indeed FOrewritable since the FO-query, which is actually a CQ, ∃x∃y∃z (T (x, y, z) ∧ B(y) ∧ A(z)) is an FO-rewriting of Q  .

 

Interestingly, we can decide whether a query from (G, CQ) is FO-rewritable: Theorem 2.4 FORew(G, CQ) is 2ExpTime-complete. The lower bound holds even if S ∪ sig(O) consists of unary and binary relation symbols only. 4 Recall

that F denotes the class of full TGDs, i.e., TGDs without existentially quantified variables.

2 Guarded Ontology-Mediated Queries

43

The lower bound for FORew(G, CQ) is immediately inherited from [12], where it is shown that deciding FO-rewritability for OMQs where the ontology is formulated using the description logic ELI is 2ExpTime-hard, while, as discussed in the previous section, a set of ELI axioms can be equivalently rewritten, for query answering purposes, as a set of guarded TGDs. It remains to establish the upper bound. In the rest of the section, we discuss how this can be obtained.

2.5.1 Atomic Queries As in the case of query containment, we first concentrate our attention on the simpler language (G, AQ0 ), and show that deciding FO-rewritability is in 2ExpTime. Then, we are going to explain how we can get the desired upper bound for queries from (G, CQ) by exploiting the decision procedure for FORew(G, AQ0 ) and the treeification technique. We proceed to show that: Theorem 2.5 FORew(G, AQ0 ) is in 2ExpTime. Towards a decision procedure for FORew(G, AQ0 ), we first semantically characterize the FO-rewritable OMQs from (G, AQ0 ), and then explain how this semantic characterization can be exploited in order to devise a decision procedure based on automata techniques. Semantic Characterization We give a characterization of FO-rewritability of OMQs from (G, AQ0 ) in terms of the existence of certain acyclic databases. This characterization is related to, but different from characterizations used for OMQs based on DLs such as ELI and EL [12, 13]. The DL characterizations essentially state that a unary OMQ Q (i.e., one whose query has a single answer variable) is FO-rewritable iff there is a bound k such that, whenever the root of a tree-shaped database D is returned as an answer to Q, then this is already true for the restriction of D up to depth k. The proof of the (contrapositive of the) “only if” direction uses a locality argument: if there is no such bound k, then this is witnessed by an infinite sequence of deeper and deeper tree databases that establish non-locality of Q. For guarded TGDs, we would have to replace tree-shaped databases with acyclic databases. However, increasing depth of guarded tree decompositions does not correspond to increasing distance in the Gaifman graph and thus does not establish non-locality. We therefore depart from imposing a bound on the depth, and instead we impose a bound on the number of facts (see Proposition 2.4). It is also interesting to note that, while it is implicit in [12] that an OMQ based on ELI and CQs is FO-rewritable iff it is Gaifman local, there is an OMQ from (G, CQ) that is Gaifman local, but not FO-rewritable. Such an OMQ is the one obtained from the query Q given in Example 2.2, by removing the existential quantification on the variable x in the CQ q, i.e., converting q into a unary CQ. Proposition 2.4 Consider an OMQ Q ∈ (G, AQ0 ) with data schema S. The following are equivalent:

44

P. Barceló et al.

1. Q is FO-rewritable. 2. There exists a k ≥ 0 such that, for every acyclic S-database D, if D |= Q, then there is a D ⊆ D with at most k facts such that D |= Q. Proof For (1) ⇒ (2) we exploit the fact that, if Q ∈ (G, AQ0 ) is FO-rewritable, then it can be expressed as a union of CQs q Q , i.e., a disjunction of CQs. This follows from the fact that queries from (G, AQ0 ) are preserved under homomorphisms [14], and Rossman’s powerful theorem stating that an FO-query is preserved under homomorphisms over finite instances iff it is equivalent to a union of CQs [34]. It is then easy to show that (2) holds with k being the size of the largest disjunct of q Q . For (2) ⇒ (1) we explicitly construct an FO-rewriting of Q. Let Λ Q,k := {D ⊆ D | D is an acyclic S-database, |D | ≤ k, D |= Q}.  We consider the union of Boolean CQs ϕ Q := D∈Λ Q,k qD , where qD is the Boolean CQ obtained from D by replacing each constant c ∈ adom(D) with an existentially quantified variable xc . It is easy to see that ϕ Q is finite (modulo variable renaming). By exploiting Lemma 2.2, it is not difficult to show that ϕ Q is indeed an FO-rewriting .  of Q, i.e., Q(D) = ϕ Q (D), for every S-database D, and the claim follows. The next example illustrates Proposition 2.4. Example 2.4 Let Q = (S, O, G) ∈ (G, AQ0 ), where S = {T /3, A/1, B/1}, and O consists of the TGDs given in Example 2.2 plus the guarded TGD T (x, y, z), R(x, z), B(y) → G. It is easy to verify that, for an arbitrary k ≥ 0, the acyclic S-database Dk = {A(a0 ), T (c, a1 , a0 ), . . . , T (c, ak−1 , ak−2 ), B(ak−1 )} is such that Dk |= Q, but for every D ⊂ Dk with at most k facts, D |= Q. Thus, by Proposition 2.4, Q is not FO-rewritable.   At this point, one might expect that Proposition 2.4 allows us to devise a decision procedure for FORew(G, AQ0 ) based on 2ATA. Indeed, although the semantic characterization established in Proposition 2.4 does not immediately provide a way to devise such a procedure, it can be refined in order to arrive at a criterion that permits an implementation via 2ATA [7]. However, the automata-based decision procedure that emerges from this refined characterization runs in triple exponential time. Roughly, the refined semantic characterization relies on a minimality criterion that is defined on the class of all acyclic databases, and ensures that there are only finitely many “minimal” acyclic databases that satisfy the input OMQ Q. One can then devise a 2ATA A Q that checks for this criterion, and thus, Q is FO-rewritable iff the language accepted by A Q is finite. The latter is feasible in exponential time in the number of states. Unfortunately, the construction of A Q relies on the costly operation of projection, and for this reason A Q has double exponentially many states.

2 Guarded Ontology-Mediated Queries

45

Therefore, this leads to a decision procedure that runs in triple exponential time, which is not optimal. We refer the interested reader to [7] for more details. Cost Automata Approach Since it is not apparent how the use of traditional automata techniques may lead to the desired 2ExpTime upper bound for FORew(G, AQ0 ), we instead are going to exploit the more sophisticated model of cost automata. The goal is to provide a refined version of the semantic characterization given in Proposition 2.4 that relies on a minimality criterion. Since in the cost automata model the operation of minimization is a native citizen (details are given below), we can deal with such a minimality criterion in a more efficient way than standard 2ATA. In what follows, we briefly discuss cost automata, we then revisit the semantic characterization in Proposition 2.4, and finally explain how the refined characterization of FO-rewritability leads to an optimal decision procedure for FORew(G, AQ0 ) based on cost automata. Cost Automata Models. Cost automata extend traditional automata (on words, trees, etc.) by providing counters that can be manipulated at each transition. Instead of assigning a Boolean value to each input structure (indicating whether the input is accepted or not), these automata assign a value from N∞ := N ∪ {∞} to each input. We shall only give a high-level idea of cost automata in the following, and refer the reader to the literature for more details [11, 20, 22]. Here, we focus on cost automata that work on finite trees of unbounded degree, and allow for two-way movements; in fact, the automata that we need are those that extend 2ATA over labeled trees with a single counter. The operation of such an automaton A on each input t will be viewed as a two-player cost game G (A , t) between players Eve and Adam. Recall that the acceptance of an input tree for a conventional 2ATA can be formalized via a two-player game as well, and, in fact, the standard parity game for 2ATA can be seen as a special case of a cost game [11]. However, instead of the parity acceptance condition for 2ATA, plays in the cost game between Eve and Adam will be assigned costs, and the cost automaton specifies via an objective whether Eve’s goal is to minimize or maximize that cost. In case of a minimizing (resp., maximizing) objective, a strategy ξ of Eve in the cost game G (A , t) is n-winning if any play of Adam consistent with ξ has cost at most n (resp., at least n). A defines the following function on the domain of all input trees: A  : t −→ op{n | Eve has an n − winning strategy in G (A , t)}, where op = inf (resp., op = sup) in case Eve’s objective is to minimize (resp., maximize). Therefore, A  defines a function from the domain of input trees to N∞ . We call functions of that type cost functions. A key property of such functions is boundedness. We say that A  is bounded if there exists an n ∈ N such that A (t) ≤ n for every input tree t. We employ cost automata on trees with a single counter, where Eve’s objective is to minimize the cost, while satisfying the parity condition. Such an automaton is known in the literature as dist ∧ parity-automaton [11]. To navigate in the tree, it

46

P. Barceló et al.

may use the directions {0, }, where 0 indicates that the automaton should stay in the current node, and  means that the automaton may move to an arbitrary neighboring node, including the parent. For this type of automaton, we can decide whether its cost function is bounded [11, 21]. As usual, |A | denotes the size A . Then5 : Theorem 2.6 There is a polynomial f such that, for every dist ∧ parity-automaton A using priorities {0, 1} for the parity acceptance condition, boundedness of A  is decidable in time |A | f (m) , where m is the number of states of A . The goal is to reduce FORew(G, AQ0 ) to the problem of deciding whether a dist ∧ parity-automaton is bounded. To this end, we first need to revise the semantic characterization of FO-rewritability provided in Proposition 2.4. A Revised Semantic Charaterization. Consider an S-database D, and a query Q = (S, O, q) ∈ (G, AQ0 ). Recall that a derivation tree for D and Q is a finite labeled tree that describes how the atomic query of Q can be entailed from D and O; the formal definition can be found in the previous section. The height of a derivation tree T for D and Q is the maximum number of nodes of a branch in T , i.e., the maximum number of nodes that lie on a path that leads from the root node to a leaf node without repeating nodes. Assuming that D |= Q, we define the cost of D w.r.t. Q as cost(D, Q) := min{n | T is a derivation tree for D and Q of height n}. The cost of Q is defined as cost(Q) := sup{cost(D, Q) | Q(D) = ∅, where D is an acyclic S-database}. Therefore, the cost of Q is the least upper bound of the height over all derivation trees for all acyclic S-databases D such that D |= Q. If there is no such a database, then the cost of Q is zero since sup ∅ = 0. Actually, cost(Q) = 0 indicates that there is no (acyclic) database D that satisfies Q, which means that Q is unsatisfiable, and thus it is trivially FO-rewritable. Having the notion of the cost of an OMQ from (G, AQ0 ) in place, it should be clear how the semantic characterization in Proposition 2.4 can be refined: Proposition 2.5 Consider an OMQ Q ∈ (G, AQ0 ) with data schema S. The following are equivalent: 1. Condition 2 from Proposition 2.4 is satisfied. 2. cost(Q) is finite. Constructing Cost Automata. We briefly describe how we can use cost automata in order to devise an algorithm for FORew(G, AQ0 ) that runs in double exponential 5 This result from [11] initially relied on an unpublished result. Such a result has been now published

in [21].

2 Guarded Ontology-Mediated Queries

47

time. Consider an OMQ Q = (S, O, G) ∈ (G, AQ0 ). The goal is to devise a dist ∧ parity-automaton B Q such that the cost function B Q  is bounded iff cost(Q) is finite. Therefore, by Proposition 2.5, to check whether Q is FO-rewritable we simply need to check if B Q  is bounded, which, by Theorem 2.6, can be done in exponential time in the size of B Q . The input trees to our automata will be over the same alphabet ΓS that is used to encode acyclic S-databases in Section 2.4. Recall that, for a dist ∧ parity-automaton A , the cost function A  is bounded over a certain class C of trees iff there is an n ∈ N such that A (t) ≤ n for every input tree t ∈ C . Then: Lemma 2.7 There is a dist ∧ parity-automaton H Q such that H Q  is bounded over consistent ΓS -labeled trees iff cost(Q) is finite. The number of states of H Q is exponential in wd(S), and polynomial in |S ∪ sig(O)|. Moreover, H Q can be constructed in double exponential time in the size of Q. The automaton H Q is built in such a way that, on an input tree t, Eve has an n-winning strategy in G (H Q , t) iff there is a derivation tree for t and Q of height at most n. Thus, Eve tries to construct derivation trees of minimal height. The counter is used to count the height of the derivation tree. Having this automaton in place, we can now complete the proof of Theorem 2.5. The desired dist ∧ parity-automaton B Q is defined as CS ∩ H Q , where CS is similar to the 2ATA CS (in Lemma 2.4) that checks for consistency of ΓS -labeled trees. Notice that CS is essentially a dist ∧ parity-automaton that assigns zero (resp., ∞) to input trees that are consistent (resp., inconsistent), and thus, CS ∩ H Q is welldefined. Since the intersection of dist ∧ parity-automata is feasible in polynomial time [11], Lemmas 2.4 and 2.7 imply that B Q has exponentially many states, and it can be constructed in double exponential time. Lemma 2.7 implies also that B Q  is bounded iff cost(Q) is finite. It remains to show that the boundedness of B Q  can be checked in double exponential time. By Theorem 2.6, there is a polynomial f such that the latter task can be carried out in time |B Q | f (m) , where m is the number of states of B Q , and the claim follows.

2.5.2 From Conjunctive Queries to Atomic Queries In this final section, we explain how we get the desired 2ExpTime upper bound for FORew(G, CQ) by exploiting the fact that FORew(G, AQ0 ) is in 2ExpTime. As for containment, it suffices to focus on the OMQ language (G, BCQ); implicit in [12]. Therefore, the goal is to reduce FORew(G, BCQ) to FORew(G, AQ0 ), and then apply Theorem 2.5. To this end, we follow the same approach as for containment. Recall that for an OMQ Q = (S, O, q) ∈ (G, BCQ), gC (Q), where C is a new relation symbol not in S ∪ sig(O) of arity wd(q), is an OMQ that falls in (G, AQ0 ), while gC (·) preserves containment. It turned out that gC (·) preserves also FO-rewritability.

48

P. Barceló et al.

Lemma 2.8 Let Q = (S, O, q) ∈ (G, BCQ), and consider a predicate C ∈ / S∪ sig(O) that has arity wd(q). It holds that Q is FO-rewritable ⇐⇒ gC (Q) is FO-rewritable. Even though the reduction from FORew(G, BCQ) to FORew(G, AQ0 ) provided by Lemma 2.8 is exponential, we can still obtain the desired 2ExpTime upper bound for FORew(G, BCQ). This is because it increases the arity of the schema only polynomially, while the algorithm for FORew(G, AQ0 ) underlying Theorem 2.5 is double exponential only on the arity of the schema.

2.6 Reasoning over Finite Instances The semantics of query evaluation for OMQs is defined in terms of all instances, including finite and infinite ones. In particular, recall that, given an OMQ Q = (S, O, q(x)), ¯ a database D over S, and a tuple a¯ of constants from adom(D), we write D |= Q(a) ¯ whenever, for every (finite or infinite) instance J ⊇ D that satisfies O, it holds that J |= q(a). ¯ However, there are applications in which reasoning over finite instances is more appropriate. A prime example of this is the area of data management, as databases are by definition finite objects. ¯ whenever, for every finite instance J ⊇ D that satisfies O, We write D |=fin Q(a) it holds that J |= q(a). ¯ It is easy to show that for arbitrary sets of TGDs, entailment under arbitrary instances (|=) and entailment under finite instances (|=fin ) are, in general, different. Interestingly, this is not the case when we focus on guarded TGDs, a property known as finite controllability. In particular, a deep result in [5], relying on techniques from [33], established that for OMQs from (G, CQ) the entailment notions of |= and |=fin coincide. This is formalized below. Theorem 2.7 Consider an OMQ Q ∈ (G, CQ) with data schema S, a database D over S, and a tuple a¯ of constants over adom(D). It holds that D |= Q(a) ¯ ⇐⇒ D |=fin Q(a). ¯ In very rough terms, the main idea behind the proof of Theorem 2.7 is to show the following: if there exists a counterexample to the fact that D |= Q(a), ¯ i.e., an instance J ⊇ D that satisfies O but J |= q(a), ¯ then there is also a finite counterexample to it. Following a reasoning similar to that in Proposition 2.2, we can assume, w.l.o.g., that J is acyclic. The finite counterexample Jfin is then defined as a “nearly-acyclic covering” of J with respect to D and Q, which is a finite instance such that: 1. Jfin ⊇ D, 2. Jfin satisfies O, and 3. There is a homomorphism from every set of at most |q| atoms in Jfin to J that is the identity over adom(D).

2 Guarded Ontology-Mediated Queries

49

From (3), which is the “near-acyclicity” condition, one obtains that a¯ ∈ / q(Jfin ). Towards a contradiction, assume that a¯ ∈ q(Jfin ). Then, there is a homomorphism h from q to D that maps x¯ to a. ¯ By composing h with the homomorphism given by (3) that maps the image of q under h to J, we obtain a homomorphism from q to J that maps x¯ to a. ¯ This contradicts the fact J |= q(a). ¯ Therefore, due to (1) and (2), ¯ we conclude that Jfin is a counterexample to D |=fin Q(a). As a direct corollary to Theorem 2.7, we obtain that also the notions of containment and first-order rewritability for OMQs based on guarded TGDs are invariant with respect to whether we consider |= or |=fin . Formally, consider two OMQs Q and Q  from (G, CQ) over the same data schema S. We write Q ⊆fin Q  if, for every database ¯ =⇒ D over S and tuple a¯ of constants over adom(D), it is the case that D |=fin Q(a) ¯ Moreover, we say that Q is FO-rewritable in the finite, if there exists D |=fin Q  (a). ¯ iff a¯ ∈ φ Q (D), for every database D a first-order query φ Q such that D |=fin Q(a) over S and tuple a¯ of constants over adom(D). We then have the following: Corollary 2.1 Consider Q, Q  ∈ (G, CQ) with the same data schema. It holds that: • Q ⊆ Q  iff Q ⊆fin Q  . • Q is FO-rewritable iff Q is FO-rewritable in the finite.

2.7 Conclusions We have discussed in depth the crucial tasks of query evaluation, query containment, and first-order rewritability for guarded ontology-mediated queries. For query evaluation, we explained how classical results on the satisfiability problem for the guarded fragment of first-order logic can be applied. For query containment, we discussed how tree automata techniques can be used, while for first-order rewritability, we explained how techniques based on a more sophisticated automata model, known as cost automata, can be exploited. Finally, we discussed that the above problems are invariant with respect to whether we consider arbitrary or finite models. There are still several open problems that deserve our attention: • It is unclear whether the results on query containment and first-order rewritability presented for guarded ontology-mediated queries can be extended to the case where the ontology is an arbitrary set of guarded first-order sentences. Recall that in this generalized setting, query evaluation is decidable in 2ExpTime [5]. • The problem of extending guarded TGDs with additional features has been extensively studied in the literature. For example, guarded TGDs have been extended with default negation (a.k.a. negation as failure) in a series of papers. In [17], negation is interpreted according to the perfect model semantics, in [28] according to the well-founded semantics, while in [25] according to the stable model semantics. The query evaluation problem for OMQs based on the above extensions of guarded TGDs is by now well-understood. However, query containment and first-order rewritability have remained unexplored.

50

P. Barceló et al.

• Another relevant extension is guarded TGDs with disjunction, which has been studied in [15]. Again, query evaluation is well-understood, while query containment and first-order rewritability have remained open problems.

References 1. Ajtai, M., & Gurevich, Y. (1994). Datalog vs. first-order logic. Journal of Computer and System Sciences, 49(3), 562–588. 2. Andréka, H., Németi, I., & van Benthem, J. (1998). Modal languages and bounded fragments of predicate logic. Journal of Philosophical Logic, 27(3), 217–274. 3. Arenas, M., Hull, R., Martens, W., Milo, T., & Schwentick, T. (2016). Foundations of data management (Dagstuhl perspectives workshop 16151). Dagstuhl Reports, 6(4), 39–56. 4. Baget, J.-F., Leclère, M., Mugnier, M.-L., & Salvat, E. (2011). On rules with existential variables: Walking the decidability line. Artificial Intelligence, 175(9–10), 1620–1654. 5. Bárány, V., Gottlob, G., & Otto, M. (2014). Querying the guarded fragment. Logical Methods in Computer Science, 10(2) 6. Bárány, V., ten Cate, B., & Segoufin, L. (2015). Guarded negation. Journal of ACM, 62(3), 22:1–22:26. 7. Barceló, P., Berger, G., Lutz, C., & Pieris, A. (2018). First-order rewritability of frontier-guarded ontology-mediated queries. In IJCAI (pp. 1707–1713). 8. Barceló, P., Berger, G., & Pieris, A. (2018). Containment for rule-based ontology-mediated queries. In PODS (pp. 267–279). 9. Beeri, C., & Vardi, M. Y. (1981). The implication problem for data dependencies. In ICALP (pp. 73–85). 10. Benedikt, M., Bourhis, P., & Vanden Boom, M. (2016). A step up in expressiveness of decidable fixpoint logics. In LICS (pp. 817–826). 11. Benedikt, M., ten Cate, B., Colcombet, T., & Vanden Boom, M. (2015). The complexity of boundedness for guarded logics. In LICS (pp. 293–304). 12. Bienvenu, M., Hansen, P., Lutz, C., & Wolter, F. (2016). First order-rewritability and containment of conjunctive queries in horn description logics. In IJCAI (pp. 965–971). 13. Bienvenu, M., Lutz, C., & Wolter, F. (2013). First-order rewritability of atomic queries in horn description logics. In IJCAI (pp. 754–760). 14. Bienvenu, M., ten Cate, B., Lutz, C., & Wolter, F. (2014). Ontology-based data access: A study through disjunctive Datalog, CSP, and MMSNP. ACM Transactions on Database Systems, 39(4), 33:1–33:44. 15. Bourhis, P., Manna, M., Morak, M., & Pieris, A. (2016). Guarded-based disjunctive tuplegenerating dependencies. ACM Transactions on Database Systems, 41(4), 27:1–27:45. 16. Calì, A., Gottlob, G., & Kifer, M. (2013). Taming the infinite chase: Query answering under expressive relational constraints. Journal of Artificial Intelligence Research, 48, 115–174. 17. Calì, A., Gottlob, G., & Lukasiewicz, T. (2009). A general datalog-based framework for tractable query answering over ontologies. In PODS (pp. 77–86). 18. Calì, A., Gottlob, G., Lukasiewicz, T., Marnette, B., & Pieris, A. (2010). Datalog+/-: A family of logical knowledge representation and query languages for new applications. In LICS (pp. 228–242). 19. Calì, A., Gottlob, G., & Pieris, A. (2012). Towards more expressive ontology languages: The query answering problem. Journal of Artificial Intelligence, 193, 87–128. 20. Colcombet, T. (2009). The theory of stabilisation monoids and regular cost functions. In ICALP (pp. 139–150). 21. Colcombet, T., & Fijalkow, N. (2016). The bridge between regular cost functions and omegaregular languages. In ICALP (pp. 126:1–126:13).

2 Guarded Ontology-Mediated Queries

51

22. Colcombet, T., & Löding, C. (2010). Regular cost functions over finite trees. In LICS (pp. 70–79). 23. Cosmadakis, S. S., Gaifman, H., Kanellakis, P. C., & Vardi, M. Y. (1988). Decidable optimization problems for database logic programs (preliminary report). In STOC (pp. 477–490). 24. Gaifman, H., Mairson, H. G., Sagiv, Y., & Vardi, M. Y. (1993). Undecidable optimization problems for database logic programs. Journal of ACM, 40(3), 683–713. 25. Gottlob, G., Hernich, A., Kupke, C., & Lukasiewicz, T. (2014). Stable model semantics for guarded existential rules and description logics. In KR. 26. Gottlob, G., Leone, N., & Scarcello, F. (2003). Robbers, marshals, and guards: Game theoretic and logical characterizations of hypertree width. Journal of Computer and System Sciences, 66(4), 775–808. 27. Grädel, E. (1999). On the restraining power of guards. Journal of Symbolic Logic, 64(4), 1719–1742. 28. Hernich, A., Kupke, C., Lukasiewicz, T., & Gottlob, G. (2013). Well-founded semantics for extended datalog and ontological reasoning. In PODS (pp. 225–236). 29. Hernich, A., Lutz, C., Papacchini, F., & Wolter, F. (2017). Dichotomies in ontology-mediated querying with the guarded fragment. In PODS (pp. 185–199). 30. Leone, N., Manna, M., Terracina, G., & Veltri, P. (2012). Efficiently computable Datalog∃ programs. In KR. 31. Poggi, A., Lembo, D., Calvanese, D., De Giuseppe, G., Lenzerini, M., & Rosati, R. (2008). Linking data to ontologies. Journal of Data Semantics, 10, 133–173. 32. Rosati, R. (2007). The limits of querying ontologies. In ICDT (pp. 164–178). 33. Rosati, R. (2011). On the finite controllability of conjunctive query answering in databases under open-world assumption. Journal of Computer and System, 77(3), 572–594. 34. Rossman, B. (2008). Homomorphism preservation theorems. Journal of ACM, 55(3), 15:1– 15:53. 35. Shmueli, O. (1993). Equivalence of DATALOG queries is undecidable. Journal of Logic Programming, 15(3), 231–241. Pablo Barceló is Director of the Institute for Mathematical and Computational Engineering at Pontificia Universidad Católica de Chile, and also Deputy Director of the Millennium Institute for Foundational Research on Data. He received his Ph.D. from the University of Toronto in 2006. His main research interest are in the areas of databases and logic in computer science, where he has contributed with over 60 technical papers in major conferences and journals. He has also been an invited tutorial speaker atACMPODS 2013. He is a member of the editorial board of Logical Methods in Computer Science and a former editor of the Database Principles Column of the SIGMOD Record. During 2019 he chaired the International Conference on Database Theory (ICDT). Gerald Berger received his diploma in computer science from TUWien in 2015. In 2019, he defended his Ph.D. thesis “Static Analysis for Ontology-Mediated Querying” at TU Wien, whereon his Ph.D. was awarded sub auspiciis praesidentis. His research interests include logic in databases and knowledge representation and reasoning. He is now working in private industry. Georg Gottlob is a Professor of Informatics at the University of Oxford and a Fellow of St John’s College. He is also an Adjunct Professor at the Vienna University of Technology (TUWien), from where he obtained his Ph.D. in 1981. Gottlob’s interests include mathematical and computational logic, in particular, finite model theory, computational and descriptive complexity, knowledge representation (KR), knowledge graphs, automated reasoning, artificial intelligence, database theory, and Web data processing. Gottlob is also very interested in graph and hypergraph decomposition methods and is one of the co-developers of the “hypertree decomposition” method. Moreover, with his co-workers, he has been studying studied various logical programming and query languages, most recently the Datalog+/- family, with the goal to design simple tractable languages

52

P. Barceló et al.

with high expressive power and low complexity, having highly parallelizable sub-fragments. Gottlob is also very interested to put Logic into practice. For example, with Christoph Koch, he established the logical foundations of Web data extraction which was published in the 2004 JACM paper “Monadic Datalog and the Expressive Power of Languages for Web Information Extraction,” and designed with his students and Post-Docs the Lixto semi-automatic data extraction system which incorporated monadic Datalog. Lixto also gave rise to a homonymous company which was later acquired by McKinsey. Gottlob was awarded an ERC Advanced Investigator’s Grant for the project “DIADEM: Domain-centric Intelligent Automated Data Extraction Methodology.” Based on results of this project, he co-founded Wrapidity Ltd., a company that specialized in fully automated Web data extraction, which was acquired in 2016 by Meltwater. Andreas Pieris is an Associated Professor in the School of Informatics at theUniversity of Edinburgh since August 2020. Prior to this, he was an Assistant Professor at the University of Edinburgh, a postdoctoral researcher at the Institute of Logic and Computation of the Vienna University of Technology from 2014 until 2016, and a postdoctoral researcher at the Department of Computer Science of the University of Oxford from 2011 until 2014. He received his Ph.D. from the University of Oxford in 2011. His research interests are database theory with emphasis on knowledge-enriched and uncertain data, knowledge representation and reasoning, computational logic and its applications to computer science. He has published more than eighty papers, most of them in leading international conferences and journals. He has served on the PCs of numerous international conferences, including the top-tier database and artificial intelligence conferences, and he has given several invited talks and tutorials.

Chapter 3

Semiring Provenance for Guarded Logics Katrin M. Dannert and Erich Grädel

Abstract Provenance analysis aims at understanding how the result of a computational process with a complex input, consisting of multiple items, depends on the various parts of this input. Here we investigate this for the model checking problem of guarded logics on finite relational structures. Semiring provenance was originally developed for positive database query languages, to understand which combinations of the atomic facts in a database can be used for computing the result of a given query. Based on interpretations of the atomic facts not just by true or false, but by values in an appropriate semiring, one can then answer questions such as the minimal cost of a query evaluation, the confidence one can have that the result is true, or the clearance level that is required for obtaining the output. Semiring provenance was recently extended by Grädel and Tannen to logics with negation, notably first-order logic, dealing with negation by transformation into negation normal form and by semirings of polynomials with a duality on the indeterminates. Here we develop this approach further for the guarded fragment (GF), introduced by Andréka, van Benthem and Németi, based on an analysis of the associated model checking games. Guarded quantification permits to control the complexity of the semiring computations since once has to take sums or products only over those tuples of elements that appear in the guards. Finally, we extend our provenance analysis to the more powerful guarded negation fragment of first-order logics. Keywords Semiring provenance · Guarded logics · Modal logic · Guarded negation · Model checking games · Complexity

Supported by the DFG RTG 2236 UnRAVeL. K. M. Dannert (B) · E. Grädel Mathematische Grundlagen der Informatik, RWTH Aachen University, Aachen, Germany e-mail: [email protected] E. Grädel e-mail: [email protected] © Springer Nature Switzerland AG 2021 J. Madarász and G. Székely (eds.), Hajnal Andréka and István Németi on Unity of Science, Outstanding Contributions to Logic 19, https://doi.org/10.1007/978-3-030-64187-0_3

53

54

K. M. Dannert and E. Grädel

3.1 Introduction In this paper we bring together two different areas of mathematical logic, both of which are relevant for computer science: guarded logics and provenance analysis on the basis of commutative semirings. Guarded logics are fragments of standard logical systems such as first-order logic, fixed-point logic, or second-order logic, in which quantification is restricted in such a way that formulae can only talk about tuples whose elements are, in some sense, close together. In the standard guarded fragment of first-order logic (GF), which we shall discuss in more detail in the next section, these elements must co-exist in some atomic fact. In a graph this would mean that formulae can only refer to single nodes and to edges (or inverse edges), but neither to triples, quadruples etc. of elements, nor to pairs of nodes that are not adjacent. But of course, guarded logics are not restricted to graphs, and co-existence in an atomic fact becomes more interesting and more general in structures with relations of larger arity. Further there are also guarded logics where the notion of “close together” has a more general meaning, for instance that the tuple induces a clique in the Gaifman graph of the structure. A further rather powerful extension is based on the idea to guard negation rather than quantification. A strong reason for studying guarded logics is that they have very interesting and convenient algorithmic and model-theoretic properties. In this paper, we shall just consider guarded fragments of first-order logic, but in principle, our approach extends also to guarded fixed-point logics. Provenance analysis on the other side is an approach that has originally been developed in database theory. It aims at understanding how the result of a computational process with a complex input, consisting of multiple items, depends on the various parts of this input. Specifically, provenance analysis based on interpretations in commutative semirings has been developed for positive database query languages, to understand which combinations of the atomic facts in a database can be used for deriving the result of a given query. In this approach, atomic facts are interpreted not just by true or false, but by values in an appropriate semiring. These values are then propagated from the atomic facts to arbitrary queries in the language, which permits to answer questions such as the minimal cost of a query evaluation, the confidence one can have that the result is true, or the clearance level that is required for obtaining the output. Semiring provenance has recently extended by Grädel and Tannen [12, 13] to logics with negation, notably first-order logic, dealing with negation by transformation into negation normal form and by semirings of polynomials with a duality on the indeterminates. Here we develop this approach further for the guarded fragment (GF), introduced by Andréka, van Benthem and Németi, as well as for the guarded negation fragment (GNF). Guarded quantification permits to control the complexity of the semiring computations since once has to take sums or products only over those tuples of elements that appear in the guards. Provenance analysis of logics is intimately connected to provenance analysis of games. In the same way as formula evaluation or model checking can be formulated in game theoretic terms, also the propagation of provenance values from atomic facts

3 Semiring Provenance for Guarded Logics

55

to arbitrary formulae can be viewed as a process on the associated games. Moreover, provenance analysis of games is of independent interest, and provenance values of positions in a game provide detailed information about the number and properties of the strategies of the players, far beyond the question whether a player has a winning strategy from a given position. We discuss provenance of games here just in terms of a particularly simple case of games, namely finite acyclic games, which are sufficient for first-order logic and its fragments. Our approach relates the provenance analysis of modal and guarded logics to the provenance analysis of the associated games.

3.2 Modal Logic and the Guarded Fragment The guarded fragment (GF) of first-order logic has been introduced by Andréka, van Benthem and Németi in [1]. It is defined by restricting existential and universal quantification in such a way that formulae can only refer to guarded tuples, i.e., tuples of elements that occur together in some atomic fact. Syntactically, this means we consider first-order formulae over some relational vocabulary τ where quantifiers can be used only in the form ∃ y¯ (α ∧ φ) or ∀ y¯ (α → φ) where α is an atomic formula that must contain all free variables of φ (and possibly more). The atom α is called the guard of the quantification. If φ contains only a single free variable x, then the guard may be the equality x = x. An important motivation for introducing the guarded fragment was to explain and generalize the good algorithmic and model-theoretic properties of modal logics (see [6] for background on modal logic). Recall that the basic modal logic ML can be viewed as a fragment of first-order logic, via the standard translation that takes every modal formula ψ ∈ ML to a first-order formula ψ ∗ (x) with one free variable, such that for every Kripke structure K with a distinguished node w we have that K, w |= ψ if, and only if, K |= ψ ∗ (w). This translation takes an atomic proposition P to the atom P x, it commutes with the Boolean connectives, and it translates the modal operators by quantifiers as follows: For ψ = ♦φ, we have ψ ∗ (x) := ∃y(E x y ∧ φ ∗ (y)) and ψ = φ is translated into ψ ∗ (x) := ∀y(E x y → φ ∗ (y)), with the binary relation symbol E describing the accessibility between different nodes of the Kripke structure. The modal fragment of first-order logic is the image of propositional modal logic under this translation. Notice that formulae in the modal fragment can be written using just two variables x and y. The formula φ ∗ (y), used in the translation of the modal operators, is obtained from φ ∗ (x) by exchanging all occurrences of x by y and all occurrences of y by x. Further, the translation of modal logic into first-order logic uses only guarded quantification, so we see immediately that the modal fragment is contained in GF. The guarded fragment generalizes the modal fragment by dropping the restrictions to use only two variables and only monadic and binary predicates, and retains only the restriction that quantifiers must be guarded. It has turned out that almost all important algorithmic and model-theoretic properties of modal logic extend to the guarded fragment. In particular, the following properties of GF have been demonstrated in [1, 9]:

56

K. M. Dannert and E. Grädel

1. The satisfiability problem for GF is decidable. 2. GF has the finite model property, i.e., every satisfiable formula in the guarded fragment has a finite model. 3. GF has a generalized variant of the tree model property: every satisfiable formula of the guarded fragment has a model of small tree width. 4. The notion of equivalence under guarded formulae can be characterized by a straightforward generalization of bisimulation, called guarded bisimulation. See [11] for a detailed discussion of guarded bisimulations in various contexts. One aspect where, at first sight, modal logic and the guarded fragment seem to differ is the complexity of the satisfiability problem. It is well-known that satisfiablity for ML is a Pspace-complete problem [15], whereas we have shown in [9] that the satisfiability problem for GF is complete for 2Exptime, the class of problems p(n) solvable by a deterministic algorithm in time 22 , for some polynomial p(n). But the reason for the double exponential time complexity of GF is essentially just the fact that predicates may have unbounded arity (wheras ML only expresses properties of labelled graphs). Given that even a single predicate of arity n over a domain of n just two elements leads to 22 possible types already on the atomic level, the double exponential lower complexity bound is hardly a surprise. Further, in most of the potential applications of guarded logics the arity of the relation symbols is bounded. But for GF-sentences of bounded arity, the satisfiability problem can be decided in Exptime [9], which is a complexity level that is reached already for rather weak extensions of ML (e.g. by a universal modality) [16]. Thus, the complexity analysis does not really reveal a fundamental difference between modal and guarded logic, beyond the difference caused by the much wider scope of guarded formulae. There is a further very important point that Moshe Vardi has called the robust algorithmic properties of modal logic [17]. The basic modal logic ML is a rather weak logic and the really interesting modal logics, as far as applications in computer science are concerned, extend ML by features such path quantification, temporal operators, least and greatest fixed points etc. Many of these extended modal logics are algorithmically still rather well manageable and actually of considerable practical importance. The most interesting of these extensions is the modal μ-calculus L μ , which extends ML by least and greatest fixed points and subsumes most of the modal logics used for automatic verification including CTL, LTL, CTL∗ , and PDL. The satisfiability problem for L μ is known to be decidable and complete for Exptime [8]. It has turned out that the guarded fragment shares this robustness of modal logic. If we extend GF by similar features as modal logic, in particular by least and greatest fixed points, we still get decidable logics, and in fact we do not even pay a prize in terms of the complexity classes in which we can place the satisfiability problem. Indeed, as we have shown in [14], the satisfiability problem for the guarded fixed point logic μGF is decidable and 2Exptime-complete. For guarded fixed point sentences of bounded width the satisfiability problem is Exptime-complete. By the width of a formula ψ, we mean the maximal number of free variables in the subformulae of ψ. For sentences that are guarded in the sense of GF, the width is bounded by

3 Semiring Provenance for Guarded Logics

57

the maximal arity of the relation symbols, but there are other variants of guarded logics where the width may be larger. Note that for guarded fixed point sentences of bounded width the complexity level is the same as for μ-calculus and for GF without fixed points. Based on all these results it is indeed fair to say that it is the guarded nature of quantification that is the main reason for the good model-theoretic and algorithmic properties of modal logics. For more details, see [10], which can be seen as an answer to [17].

3.3 Semiring Provenance for First-Order Logic and Acyclic Games We present a brief survey on the use of commutative semirings for provenance analysis, both for first-order logic and for acyclic finite games.

3.3.1 Commutative Semirings Definition 1 A semiring is an algebraic structure (K , +, ·, 0, 1), with 0 = 1, such that (K , +, 0) is a commutative monoid, (K , ·, 1) is a monoid, · distributes over +, and 0 · a = a · 0 = 0. The semiring is commutative if · is commutative, and it is idempotent if + is idempotent. All semirings considered in this paper are commutative. Elements of a commutative semiring will be used as truth values for logical statements and as values for positions in games. The intuition is that + describes the alternative use of information, as in disjunctions or existential quantifications, or for different possible choices of a player in a game, whereas · stands for the joint use of information, as in conjunctions or universal quantifications, or for choices in a game that are controlled by the opponent of the given player. Further, 0 is the value of false statements or losing positions, whereas any element a = 0 of a semiring K stands for a “nuanced” interpretation of true or as a value of a non-losing position. Examples Every distributive lattice is an idempotent commutative semiring. Here are some commutative semirings of interest to us: 1. The Boolean semiring B = (B, ∨, ∧, ⊥, ) is the standard habitat of logical truth. 2. N = (N, +, ·, 0, 1) is of importance for multiset semantics in logic and databases. We also use it here for counting winning strategies in model-checking games. 3. T = (R∞ + , min, +, ∞, 0) is called the tropical semiring and is idempotent but not a distributive lattice. It is used for cost-analysis in many areas of computer science. It is important also for analysing the costs of strategies in games.

58

K. M. Dannert and E. Grädel

4. V = ([0, 1], max, ·, 0, 1) is called the Viterbi semiring and is isomorphic to T via x → e−x and y → − ln y. We think of the elements of V as confidence scores, for instance for the truth of a given statement, or the confidence of an agent that she can win a game from a given position. 5. For any set X , the semiring N[X ] = (N[X ], +, ·, 0, 1) consists of the multivariate polynomials in indeterminates from X and with coefficients from N. This is the commutative semiring that is freely generated by the set X . It is used for a general form of provenance.

3.3.2 Provenance for First-Order Logic Given a finite relational vocabulary τ and a finite non-empty universe A, we denote by Atoms A (τ ) the set of all atoms R a¯ with R ∈ τ and a¯ ∈ Ak . Further let NegAtoms A (τ ) be the set of all negated atoms ¬R a¯ of the facts in Atoms A (τ ), and consider the set of all τ -literals on A, Lit A (τ ) := Atoms A (τ ) ∪ NegAtoms A (τ ) ∪ {a op b : a, b ∈ A}, where op stands for = or =. Definition Given any commutative semiring K , a K -interpretation (for τ and A) is a function π : Lit A (τ ) → K that maps all equality and inequality literals to their truth values 0 or 1.   As defined in [12] a semiring interpretation extends to a full valuation π : FO(τ ) → K mapping any fully instantiated formula ψ(a) ¯ (or equivalently, any firstorder sentence of vocabulary τ ∪ A), to a value π [[φ]], by setting π [[ψ ∨ φ]] := π [[ψ]] + π [[φ)]]  π [[φ(a)]] π [[∃xφ(x)]] := a∈A

π [[ψ ∧ φ]] := π [[ψ]] · π [[φ]]  π [[∀xφ(x)]] := π [[φ(a)]]. a∈A

Negation is handled via negation normal forms: we set π [[¬φ]] := π [[nnf(¬φ)]] where nnf(φ) is the negation normal form of φ. Definition A semiring interpretation π : Lit A (τ ) → K is model-defining if for every atom φ ∈ Atoms A (τ ) one of π(φ) and π(¬φ) is 0, and the other is = 0. It uniquely defines the τ -structure Aπ that has universe A, and in which precisely those literals φ are true for which π(φ) = 0.   This definition is motivated by the interpretation described above, that a provenance value of 0 describes a false statement, whereas a non-zero value indicates some nuance of truth. Notice that, if K is not the Boolean semiring, then several different K interpretations may define the same structure. Further, K -interpretations

3 Semiring Provenance for Guarded Logics

59

are interesting, and have a number of applications, also in cases where they do not specify a single model, see [12] and the references given there. Such valuations of first-order logic in a semiring K do in fact have an equivalent definition in terms of K -valuations of the usual model-checking games for firstorder formulae. We next discuss the provenance approach to games, focussing for simplicity just on the case of acyclic games.

3.3.3 Provenance Analysis for Acyclic Games In this section we briefly describe the provenance approach to games as developed in [13], restricting attention to two-player turn-based games on acyclic directed graphs. Such a game is defined by the game graph on which it is played, and by the objectives of the players. Definition A game graph is a structure G = (V, V0 , V1 , T, E), where V = V0 ∪ V1 ∪ T is the set of positions, partitioned into the sets V0 , V1 of the two players and the set T of terminal positions, and where E ⊆ V × V is the set of moves. In the games considered here, the underlying graph G = (V, E) is always acyclic and finite. We denote the set of immediate successors of a position v by vE := {w : (v, w) ∈ E} and require that vE = ∅ if, and only if, v ∈ T . A play from an initial position v0 is a path v0 v1 v2 . . . vm through G where the successor vi+1 ∈ vi E is chosen by Player 0 if vi ∈ V0 and by Player 1 if vi ∈ V1 . A play ends when it reaches a terminal node   vm ∈ T . A strategy for a player in a game is a function that selects moves at points that are controlled by that player. A strategy need not be defined at all positions of a player, but it must be closed in the sense that it defines a move from each position that is reachable by a play that is admitted by the strategy. There are several possibilities to define the notion of a strategy formally. For our purposes it is convenient to identify a strategy with the histories of plays that it admits. Definition For every game graph G = (V, V0 , V1 , T, E), and every initial position v0 ∈ V , the tree unraveling of G from v0 is the game tree T (G, v0 ) of all finite paths from v0 . More precisely, T (G, v0 ) = (V # , V0# , V1# , T # , E # ), where V # is the set of finite paths π = v0 v1 . . . vm from v0 through G, with Vσ# = {π v ∈ V # : v ∈ Vσ }, T # = {π t ∈ V # : t ∈ T }, and E # = {(π v, π vv ) : (v, v ) ∈ E}. For most gametheoretic considerations, the games played on G and its unravelings are equivalent, via the canonical projection ρ : T (G, v0 ) → G that maps every path π v to its end point v.   The elements of T (G, v0 ) are the finite initial segments or histories of all possible plays of G that start at v0 . A strategy of a player can now be viewed as an appropriate subtree of T (G, v0 ).

60

K. M. Dannert and E. Grädel

Definition A strategy of Player σ from v0 in a game G is a subtree of T (G, v0 ), of the form S = (W, F) with W ⊆ V # and F ⊆ (W × W ) ∩ E # , satisfying the following conditions: (1) W is closed under predecessors: if π v ∈ W , then also π ∈ W . (2) If π v ∈ W ∩ Vσ# , then |(π v)F| = 1. # , then (π v)F = (π v)E # . (3) If π v ∈ W ∩ V1−σ A strategy can also be viewed as a function S : W ∩ Vσ# → V such that S(π v) ∈ vE defines the node to which Player σ moves from π v.   Here W is the part of T (G, v0 ) on which the strategy is defined, and F is the set of moves that are admitted by the strategy. We define Strat σ (v0 ) to denote the set of all strategies of Player σ from v0 . A strategy S ∈ Strat σ (v0 ) induces the set Plays(S) of those plays from v0 whose moves are consistent with S. We call S well-founded if it does not admit any infinite plays; this is always the case on finite acyclic game graphs, but need not be the case otherwise. The set of possible outcomes of a strategy S is the set of terminal nodes that are reachable by a play that is consistent with S. Game valuations. Let (K , +, · , 0, 1) be a commutative semiring, and let G = (V, V0 , V1 , T, E) be a finite acyclic game graph. A K -valuation of G for Player σ provides a value f σ (v) ∈ K for every position v ∈ V . Such a valuation is induced by its values on the terminal positions, i.e. by a function f σ : T → K , and by a valuation of the moves, i.e. by a function h σ : E → (K \ {0}). The function f σ : T → K defines the value of every terminal position from the point of view of Player σ . Intuitively, f σ (t) = 0 means that position t is losing for Player σ . For instance, we can specify reachability objectives Tσ by setting f σ (t) = 1 for t ∈ Tσ and f σ (t) = 0 otherwise. But there are many other choices. The functions h σ : E → (K \ {0}) provide a value (or cost) for Player σ of the moves. In many cases valuations of moves are not relevant; we then just put h σ (vw) = 1 for all edges (v, w) ∈ E. When the functions for the two players are identical, i.e. h 0 = h 1 , we often omit the subscripts. The extension of the basic valuations f σ : T → K and h σ : E → K \ {0} to valuations f σ : V → K for all positions relies on the idea that a move from v to w contributes to f σ (v) the value h σ (vw) · f σ (w). These contributions are summed up in the case that v is a position for Player σ (i.e. when she chooses herself the successors), and multiplied in the case that v is a position of the opponent (i.e. when she has to cope with any of the possible successors). Thus  h σ (vw) · f σ (w) f σ (v) :=  w∈vE w∈vE h σ (vw) · f σ (w)

if v ∈ Vσ if v ∈ V1−σ .

An equivalent characterization of the provenance values f σ (v) can be obtained by defining provenance values for plays and strategies.

3 Semiring Provenance for Guarded Logics

61

Definition For a play x = v0 v1 . . . vm from v0 to a terminal node vm , we define its valuation for Player σ as f σ (x) := h σ (v0 v1 ) · · · h σ (vm−1 vm ) · f σ (vm ). Let now S = (W, F) ⊆ T (G, v0 ) be a strategy for Player σ from v0 and ρ S : (W, F) → (V, E) be the restriction of of the canonical homomorphism ρ : T (G, v0 ) → G to S. For any position v ∈ V and any move e ∈ E, the values # S (v) := |ρ S−1 (v)| and # S (e) := |ρ S−1 (e)| indicate how often the position v and the move e appear in the strategy S. We then define the provenance value S ∈ Strat σ (v0 ) as F(S) :=



h σ (e)#S (e) ·

e∈E



f σ (v)#S (v) .

v∈T

Theorem For any commutative semiring K and any finite acyclic game G, let f σ : V → K be the provenance valuation for Player σ , induced by the valuation f σ : T → K of the terminal nodes and h σ : E → K \ {0} of the moves. Then, for every position v  F(S). f σ (v) = S∈Strat σ (v)

If h σ (e) = 1 for all moves e ∈ E, or if the underlying semiring is multiplicatively idempotent (i.e. a 2 = a for all a), then we further have that f σ (v) =





f σ (x).

S∈Strat σ (v) x∈Plays(S)

Example: Cost of strategies. Given a game G, we associate for Player 0 cost functions f 0 : T → R+ and h : E → R+ for the terminal positions and the moves. We define the cost of a strategy S ∈ Strat0 (v) as the sum of the costs of all moves and outcomes that it admits, weighted by the number of their occurrences. Proposition The cost of an optimal strategy from v in G is given by the valuation f 0 (v) in the tropical semiring T = (R∞   + , min, +, ∞, 0). Similarly, we can use game valuations in appropriate semirings for computing confidence scores for positions (decribing the confindence of a player to win from that position) or minimal clearance levels that a player needs to win from a position, assuming that the possible moves have access restrictions (“confidential”, “secret”, “top secret”). For details, see [13]. Counting winning strategies. Consider a game graph G = (V, V0 , V1 , T, E) with a set T of terminal positions and with trivial valuations for the moves, i.e. h σ (vw) = 1 for all edges (v, w). A general provenance analysis for acyclic games is based in the semiring N[T ] of polynomials over the indeterminates t ∈ T , the semiring that is freely generated by the set of terminal positions.

62

K. M. Dannert and E. Grädel

We define f σ : V → N[T ] as the valuations induced by setting f σ (t) = t for j j t ∈ T . Clearly, we can write f σ (v) as a sum of monomials m · t1 1 . . . tk k . This provides a detailed description of the number and properties of the strategies that Player σ has from position v. j

j

Theorem Every monomial m · t1 1 . . . tk k in f σ (v) (with m ∈ N and ji > 0) indicates that Player σ has precisely m strategies S from v with the property that the set of possible outcomes for S is precisely {t1 , . . . , tk }, and precisely ji plays that are   consistent with S have the outcome ti . If we fix any reachability objective W ⊆ T for Player σ , we can write the polynomial f σ (v) as a sum f σ (v) = f σW (v) + gσW (v) where f σW (v) is the sum of those monomials that only contain indeterminates in W (i.e. for which j (t) = 0 whenever t ∈ T \ W ), and gσW (v) contains the rest. Theorem For every subset W ⊆ T and every v ∈ V , Player σ has a strategy to reach W from v if, and only if, f σW (v) = 0. Moreover, if f σW (v) =

 j∈J

mj



t j (t)

t∈W

 then j∈J c j is the number of distinct deterministic strategies from v that Player σ has for this objective.  

3.3.4 Provenance Analysis via Model-Checking Games Let A be a finite relational τ -structure and ψ be a first-order formula in negation normal form. The model checking game G(A, ψ) is defined in the usual way. The positions are expressions φ(a), ¯ obtained from a subformula φ(x) ¯ of ψ, by instantiating the free variables x¯ by a tuple a¯ of elements of A. At a disjunction (ψ ∨ φ), Player 0 (Verifier) moves to either ψ or φ, and at a conjunction, Player 1 (Falsifier) makes an analogous move. At a position ∃xφ(a, ¯ x), Verifier selects an element b and moves to φ(a, ¯ b), whereas at positions ∀xφ(a, ¯ x) the move to to the next position φ(a, ¯ b) is done by Falsifier. The terminal positions of G(A, ψ) are the literals in Lit A (τ ). Literals φ ∈ Lit A (τ ) that are true in the given structure A are the winning terminal positions for Verifier in G(A, ψ) (and the losing ones for Falsifier); for the literals that are false in A it is the other way round. The central observation concerning these games is that, for any structure A and any position φ of a model checking game G(A, ψ), Verifier has a winning strategy from φ if, and only if, A |= φ. Moreover, by duality, or by the determinacy of wellfounded games, Falsifier has a winning strategy from φ if, and only if, A |= φ which, of course, is the case if, and only if, A |= ¬φ.

3 Semiring Provenance for Guarded Logics

63

Provenance analysis provides a broader view on both logic and games. Notice that up to the labelling of the terminal positions as winning or losing for Verifier (Player 0) and Falsifier (Player 1), the model checking game G(A, ψ) only depends on the formula ψ and on the universe A of the structure. Thus, we have a game graph G(A, ψ), and separately a valuation π : Lit A (τ ) → {0, 1}. From Boolean valuations of literals (and of terminal positions of games) we can now move to K -valuations for an arbitrary commutative semiring K , and study the connection between logic and games in this broader context. Let π : Lit A (τ ) → K be any K -interpretation of the τ -literals on A in a semiring K . We can view π as a K -valuation f 0 : T → K of the set of terminal positions of any model-checking game G(A, ψ) (for a τ structure with universe A and a first-order formula ψ) from the point of view of Player 0. The dual valuation f 1 : T → K for Player 1 is obtained by putting f 1 (φ) = π [[φ ¬ ]] where φ ¬ ≡ ¬φ is the complementary literal to φ. Then both f 0 and f 1 extend to valuations f 0 : V → K and f 1 : V → K of all positions of G(A, ψ). In particular, we obtain valuations f 0 (ψ) and f 1 (ψ) for the initial position ψ. Proposition Suppose that π is model-defining, and hence completely specifies a structure Aπ . For every first-order formula ψ and every position φ(a) ¯ in G(A, ψ) we ¯ and π [[¬φ(a)]] ¯ = f 1 (φ(a)). ¯ In particular, Aπ |= ψ have that π [[φ(a)]] ¯ = f 0 (φ(a))   if, and only if, f 0 (ψ) = 0.

3.4 Provenance Analysis for Modal Logic and the Guarded Fragment Recall that modal logic, for a fixed vocabulary {Pi : i ∈ I } of atomic propositions, is given by the grammar φ::=⊥ | | Pi | φ ∨ φ | φ ∧ φ | ¬φ | ♦φ | φ. A transition system (or Kripke structure) for this vocabulary is a labelled directed graph K = (V, E, (Pi )i∈I ) with E ⊆ V × V and Pi ⊆ V , and we write K, v |= φ if φ holds at state v in the transition system K. The set of modal literals for V and τ , denoted MLit V consists of the atoms Pi v and their negations ¬Pi v (for v ∈ V , i ∈ I ), and the edge atoms Evw for v, w ∈ V . Note that literals ¬Evw for the absence of edges are not included. Definition Let K be a semiring. A modal K -interpretation for V is a function π : MLit V → K .   Similar to the case of first-order logic, a modal K -interpretation extends to a K -valuation π : ML × V → K by

64

K. M. Dannert and E. Grädel

π [[⊥, v]] := 0

π [[ , v]] := 1

π [[Pi , v]] := π(Pi v)

π [[¬Pi , v]] := π(¬Pi v)

π [[ψ ∨ φ, v]] := π [[ψ, v]] + π [[φ, v]]  π(Evw) · π [[φ, w]] π [[♦φ, v]] :=

π [[ψ ∧ φ, v]] := π [[ψ, v]] · π [[φ, v]]  π [[φ, v]] := π(Evw) · π [[φ, w]]

w∈vE

w∈vE

π [[¬φ, v]] := π [[nnf(¬φ), v]].

This valuation is usually not the same as the first-order valuation for the translation of modal logic formulae into first-order logic. Notice, that the standard translation of modal logic into first-order logic maps a formula ψ = φ ∈ ML to the first-order formula ψ ∗ (x) := ∀y(E x y → φ ∗ (y)). Rewriting ψ ∗ (x) as ∀y(¬E x y ∨ φ ∗ (y)) we see that a K -interpretation of ψ ∗ (v) ∈ FO requires a basic K -interpretation of Lit V (τ ) for τ = {Pi : i ∈ I } ∪ {E}, which needs to provide values also for the negative literals ¬Evw. Even if we extend the given modal K -interpretation π : MLit V → K in the simplest possible way, by setting π(¬Evw) = 1 if π(Evw) = 0 and π(¬Evw) = 0 otherwise, the resulting valuation will have the property that π [[ψ ∗ (v)]] = π [[∀y(¬Evy ∨ φ ∗ (y))]] = =

 w∈vE



π [[φ (w)]] ·





(π(¬Evw) + π [[φ ∗ (w)]])

w∈V

(1 + π [[φ ∗ (w)]])

w∈vE /

which is, in general something quite different than the value π [[ψ, v]] of the corresponding modal K -interpretation. A sufficient condition that the two valuations coincide would be that the basic modal valuation maps edge atoms only to 0 and 1, and that in the given semiring 1 ∈ K is an absorbing element, i.e. 1 + a = 1 for all a. A justification for our proposed valuation, despite the difference to first-order logic, is that it is more in line with the intuitive meaning of the modal logic formula φ. When we think of the meaning of φ, we think of it as “φ holds at all successors of the current node”, which corresponds to the provenance valuation as defined above. We do not usally interpret φ as the statement “for all nodes w in the Kripke-structure, either there is no edge to w from the current node or φ holds at w”. This interpretation corresponds to the first-order translation of φ but it goes against the local nature of modal logic and is therefore far less intuitive. Additionally, our proposed provenance valuation for modal logic is completely in line with K -valuations for the standard model-checking games for modal logic. Indeed, the model-checking game G(K, ψ) for a transition system K with frame (V, E) and a modal formula ψ has positions (φ, v) where φ is a subformula of ψ and v is a state of K. From positions (φ1 ∨ φ2 , v), Player 0 can move to (φ1 , v) or (φ2 , v) and dually Player 1 moves from (φ1 ∧ φ2 , v) to either (φ1 , v) or (φ2 , v). At positions (♦φ, v) Player 0 can move to any position (φ, w) such that (v, w) ∈ E, and there are analogous moves for Player 1 at positions (φ, v).

3 Semiring Provenance for Guarded Logics

65

If we define the basic valuations for Player 0 of the terminal positions by f 0 (Pi , v) = π(Pi v), f 0 (¬Pi , v) = π(¬Pi v), f 0 (⊥, v) = 0 and f 0 ( , v) = 1, and give any move from a position (♦φ, v) or (φ, v) to (φ, w) the value π(Evw), then we obtain, for any formula ψ ∈ ML and every v ∈ V a valuation f 0 (ψ, v) ∈ K such that f 0 (ψ, v) = π [[ψ, v]]. We now move to a provenance analysis for the guarded fragment GF. We start with a more explicit definition of GF. Definition Given a relational vocabulary τ , the set of guarded formulae in GF(τ ) is defined inductively by the following rules: (1) Every atomic τ -formula belongs to GF(τ ); (2) GF(τ ) is closed under ∧, ∨, and ¬; (3) GF(τ ) is closed under guarded quantification: For every formula φ ∈ GF(τ ), every τ -atom α such that all free variables of φ occur in α, and every tuple y¯ of variables occurring in α, GF(τ ) contains formulae (∃ y¯ . α)φ and (∀ y¯ . α)φ.   Here (3) is the rule of guarded quantification and the atom α is called the guard of the quantification. The semantics of guarded quantification is defined as follows. Let var(α) be set of all variables occurring in the atom α, let A be a τ -structure and let s be a valuation, mapping the free variables of (Q y¯ . α)φ into A. Then we denote by Ts,α the set of all valuations t : var(α) → A such that A |=t α and t coincides with s on the common variables. Then we put A |=s (∃ y¯ . α)φ ⇐⇒ A |=t φ for somet ∈ Ts,α A |=s (∀ y¯ . α)φ ⇐⇒ A |=t φ for allt ∈ Ts,α Notice that in a formula ψ := (Q y¯ . α)φ, the only requirement is that α contains all free variables of φ. But it may contain more variables, and contrary to unguarded quantification in FO, among the free variables of ψ (i.e. the variables occurring in α but not in y¯ ) there may be some that do not occur in φ. To see what happens precisely in an evaluation of a guarded formula, say in a model-checking game or when we define K -interpretations in a semiring K , it is therefore instructive to rewrite the rule of guarded quantification in a way that makes all free variables explicit. In the following, x, ¯ x¯  , y¯ , and z¯ are disjoint (and possibly empty) sequences of variables, and the free variables of the formulae are precisely as displayed. (3) For every formula φ(x¯ y¯ ) ∈ GF(τ ) and every atomic τ -formula α(x¯ x¯  y¯ z¯ ), we can build in GF(τ ) the formulae ψ(x¯ x¯  ) := (∃ y¯ z¯ . α(x¯ x¯  y¯ z¯ ))φ(x¯ y¯ ) and ψ(x¯ x¯  ) := (∀ y¯ z¯ . α(x¯ x¯  y¯ z¯ ))φ(x¯ y¯ ). Thus, the natural model-checking games for GF-formulae are modifications of the first-order model checking games where, for a formula ψ(x¯ x¯  ) := (Q y¯ z¯ . α)φ(x¯ y¯ ),

66

K. M. Dannert and E. Grädel

¯ for every tuple c¯ such that α(a¯ a¯  b¯ c) we have a move from ψ(a¯ a¯  ) to φ(a¯ b) ¯ holds. Notice that this means that there may actually be more than one move from ψ(a¯ a¯  ) ¯ to φ(a¯ b). Let π : Lit A (τ ) → K be a K -interpretation for A and τ . It provides, for every terminal position φ of a GF-model-checking game on A, the basic valuation f 0 (φ) = π(φ). The valuations of the moves are defined as follows. Every move associated with a disjunction or conjunction, going from (φ1 ∨ φ2 ) or (φ1 ∧ φ2 ) to φ1 or φ2 has value 1. Every move associated with a guarded quantification, that is ¯ which is witnessed by the atom α(a¯ a¯  b¯ c) ¯ y¯ ) to φ(a¯ b), from ψ(a¯ a¯  ) := (Q y¯ z¯ . α)φ(a, ¯ ¯ ¯ has value π(α(a¯ a¯ bc)). This induces a valuation f 0 (φ) ∈ K for every position φ in the game. This coincides with the extension of π : Lit A (τ ) → K to π : GF(τ ) → K by the straightforward induction, setting ¯ y¯ )]] := π [[∃ y¯ z¯ α(a¯ a¯  , y¯ , z¯ )φ(a,



¯ π(α(a¯ a¯  b¯ c)) ¯ · π [[φ(a¯ b)]]

b¯ c: ¯ A|=α(a¯ a¯  b¯ c) ¯

π [[∀ y¯ z¯ α(a¯ a¯  y¯ z¯ )φ(a, ¯ y¯ )]] :=



¯ π(α(a¯ a¯  b¯ c)) ¯ · π [[φ(a¯ b)]]

b¯ c: ¯ A|=α(a¯ a¯  b¯ c) ¯

Again negation is handled via negation normal form, where nnf(¬(∃ y¯ . α)φ) = (∀ y¯ . α) nnf(¬φ). As in the case of modal logic, the standard translation of guarded universal quantification into usual first-order syntax taking (∀ y¯ . α)φ to ∀ y¯ (¬α ∨ φ) produces in general formulae that have not the same K -valuations.

3.5 Algorithmic Analysis It is well known that the model checking problems for both modal logic and the guarded fragment can be solved in polynomial time, whereas the corresponding for full first-order logic is Pspace-complete. One way to prove this, and to understand the differences between modal and guarded logic on one side, and full FO on the other side, is to compute the size of the model-checking games. An arbitrary firstorder sentence ψ on a finite structure A has a model checking game G(A, ψ) of size O(|ψ| · |A|width(ψ) ), where |A| is the number of elements of A and width(ψ) is the maximal number of free variables in subformulae of ψ. However, for a formula ψ ∈ GF the size of G(A, ψ) is only O(ψ · A) where A is the length of a natural representation of A (listing all atomic facts). A similar bound applies for the size of model checking games for modal logic. This poses the natural question whether there is a similar difference in the complexities of computing provenance values π [[ψ]], given a formula ψ from ML, GF, or FO, and a K -interpretation π : Lit A (τ ) → K , for some fixed semiring K .

3 Semiring Provenance for Guarded Logics

67

Of course, this question may strongly depend on the semiring K that we consider, and how we measure the complexity of addition and multiplication in K . In the Boolean semiring we simply count the number of operations needed to compute the truth value of a given formula. We can take an analogous approach in an arbitrary semiring and just count the number of arithmetic operations needed to compute a provenance value. This would mean that we abstract from the computational difficulties that arise from representing semiring elements as words over some fixed alphabet and computing sums and products on such representations. This unit cost model is certainly appropriate for finite semirings, but gives relevant insights also in other cases, specifically for an abstract approach over, say, uncountable semirings. In the unit cost model, the total cost |π | of a semiring interpretation π : Lit A (τ ) → K is just the number of literals in Lit A (τ ). It is easy to see that provenance values for positions in an acyclic game G can be computed with O(G) semiring operations (since every edge of the game graph needs to be processed only once). Proposition Let K be an arbitrary semiring. Given a formula ψ ∈ ML or ψ ∈ GF and a corresponding K -interpretation π : Lit A (τ ) → K , the provenance value π [[ψ]] can be computed with O(|ψ| · |π |) semiring operations.   It is very unlikely that this is also holds for arbitrary first-order formulae since, by taking the Boolean semiring, and the Pspace-completeness of first-order model checking, this would imply that P = Pspace. Of course, the unit cost model is unrealistic for many algorithmic applications, in particular if we are interested in practical complexity considerations for computing provenance values in N, or in the semiring of polynomials N[X ]. We therefore aim at a more general approach, assuming that we have a semiring K together with cost functions | | : K → N for the elements and | |+ : K × K → N and | |· : K × K → N for the two semiring operations. We always assume that |a + b| ≤ |(a, b)|+ , and |a · b| ≤ |(a, b)|· for all a, b ∈ K . In fact, over any semiring, one possibility of cost functions for + and · is to simply take the element costs of the sum and the product, respectively. In the case of K = N, one approach is to set |a| = a, and then |(a, b)|+ = a + b and |(a, b)|· = ab, which are the cost functions for the unary representation of natural numbers. Another, in most cases more natural possibility is the logarithmic cost model, with |a| = log a, and the cost of addition and multiplication as |(a, b)|+ := max{|a|, |b|} + 1 and |(a, b)|· := |a| + |b|. Here |(a, b)|+ and |(a, b)|· are upper bounds for |a + b| and |a · b|. For thesemiring N[X ] of polynomials over X we can  define the cost of a monomial as |a x∈X x ex | := log a + x∈X log ex  and the cost of a polynomial as the sum of the costs of its monomials or, alternatively, set the cost of each monomial to one and only consider the number of distinct monomials in a polynomial. In general there are several natural cost functions for a semiring, and the unit cost models (setting all costs to 1) is one of them.

68

K. M. Dannert and E. Grädel

We call a cost function | | : K → N additively bounded if |a + b|, |a · b| = O(|a| + |b|), and multiplicatively bounded if |a + b|, |a · b| = O(|a| · |b|). Clearly, the logarithmic cost model on N is additively bounded whereas the costs of unary representations are multiplicatively bounded. Given a semiring K with associated cost functions, one important complexity parameter for a semiring interpretation π : Lit A (τ ) → K is the maximal cost of its values, i.e. max π := max{|π(α)| : α ∈ Lit A (τ )}. Consider any first-order formula ψ ∈ FO(τ ) and let d(ψ) be the nesting depth of the logical operators in ψ or, equivalently, the maximal length of plays in model checking games for ψ. We can calculate upper bounds for the costs of provenance values π [[ψ]] by looking, again, at the associated model checking game. At the terminal nodes, the costs are bounded by max π . Each non-terminal node has at most |A| immediate successors, so we perform a sum or product of at most |A| values that may appear at lower levels. Proposition Let K be an arbitrary semiring with an associated additively bounded cost function, and let π : Lit A (τ ) → K be a semiring interpretation with m = max π and n = |A|. For any first-order formula ψ of depth d = d(ψ), we then have that |π [[ψ]]| ≤ m · n d . In the case that the cost function of K is multiplicatively bounded, we instead have that d |π [[ψ]]| ≤ m n . We claim that the maximal size bounds of Proposition 3.5 can actually be realized, and in fact even by formulae of modal logic and hence also by guarded formulae in GF. For proper comparison, a modal or guarded quantification should count as an operation of depth two, since provenance values for (♦φ, v) or (φ, v) take into account values of edges (v, w) given by a modal K -interpretation; similarly at a guarded quantification the values of the guard atoms are used in the computation of the provenance values. Let π be a modal K -interpretation on a completely connected frame with n nodes giving to all P-atoms and all edges the same value a, i.e. π(Pi) = π(Ei j) = a for all i, j < n. It follows that 2k π [[(k P, i)]] = a n . Taking, for instance, K = N with an additively bounded cost measure, putting m = |a| = max π and d = 2k we indeed get |π [[(k P, i)]]| = m · n d , and for a multiplicatively bounded cost measure (such as unary representation), we get d |π [[(k P, i)]]| = m n . This exponential cost is not unique to N of course. Consider the polynomial semiring K = N[x, y] and very simple modal K -interpretation on a universe with just two nodes u, v, such that all four possible edges have value 1,

3 Semiring Provenance for Guarded Logics

69

and further π(Pu) = π(Pv) = x and π(Pu) = π(Pv) = y. Then, on both nodes, k the formula k (P ∨ Q) has provenance value (x + y)2 which has cost 2k + 1 even if we set monomial costs to 1. This shows us that there is no essential difference between modal, guarded, and arbitrary first-order formulae ψ with respect to the maximal possible (cost of) provenance values π [[ψ]], for arbitrary K -interpretations over the same universes and with the same bounds for literals. Nevertheless there can be huge differences for particular K -interpretations π of the maximal provenance values π [[ψ]] of modal, guarded, and first-order formulae. Intuitively this arises in cases where the K -interpretations provide very few connections between different elements, which makes the power of modal and guarded quantification very weak, but does not affect arbitrary first-order quantification. Proposition There are model-defining K -interpretations π such that the model defined by π is a Kripke structure, with the property that the provenance values π [[ψ]] of certain guarded first-order formulae are arbitrarily larger that the maximal provenance values of modal formulae φ of the same depth. Similarly, there are K -interpretations with arbitrarily large differences between the provenance values of certain first-order formulae and maximal provenance values of guarded formulae of the same depth.   Proof Let K = N and consider modal K -interpretations π on a (large) universe V that gives value 0 to all potential edges Euv and to all positive atoms Pv, and a value m ≥ 2 to all negated atoms ¬Pv. The maximal provenance values for modal d formulae of depth d is m 2 (independent of V ); this is achieved by formulae of form ¬P ∧ ¬P ∧ · · · ∧ ¬P. However, guarded formulae of form ψ := (∀x1 . x1 = d x1 ) · · · (∀xd xd = xd )¬P xd have provenance values π [[ψ]] = m n . Large differences between provenance values of guarded formulae and unrestricted first-order formulae are, for instance, witnessed by K -interpretations π : Lit A ({P, E}) → N that give value 0 to all positive literals, small values, say 1 or 2, to literals ¬Pa but a very large value m to literals ¬Eab. Maximal provenance values for guarded sentences are achieved by formulae ψ := (∀x1 . x1 = d x1 ) · · · (∀xd xd = xd )¬P xd and these values are bound by 2n (where n = |A|). Indeed, every guarded formula under the scope of a quantifier that has more than one free variable has provenance value 0. Further, if φ(x1 , . . . , xk ) is a quantifier-free formulae of depth d, then provenance values of its instantiations φ(a1 , . . . , ak ) are d bound by m 2 , independent of n. Even any Boolean combination of these two types of guarded formulae values of first-order formula of  cannot achieve the provenance 2 k form ∀x1 . . . ∀xk i, j≤k ¬E xi x j , which are m k n . We conclude that, as in the case of model checking, there is also in the computation of provenance values an exponential gap between the number of semiring operations required for a model or guarded formula on one side, and for an arbitrary first-order formula on the other side. However, in cases where the size of (representations of) semiring elements is the main source of complexity, this difference tends to become less important, or even to disappear. The order of magnitude of provenance values

70

K. M. Dannert and E. Grädel

realizable by modal or guarded formulae is similar to those for arbitrary first-order formulae, and the size of these representations dominates in many cases the number of semiring operations.

3.6 A More Abstract View of Guarded Logics Instead of the syntactic definition given above for the guarded fragment, one may use a different presentation that is based on classical first-order syntax (without relativization of quantifiers), but uses a guarded modification of the semantics which restricts all valuations of the free variables to guarded tuples. This leads to a more semantic view of guardedness which is also more flexible and easily adapts to other variants of guarded logics. Definition 2 Let A be a τ -structure with finite relational vocabulary τ and universe A. A guard system G for A is a collection G ⊆ P(A) of subsets of A, which is downwards closed in the sense that g  ⊆ g ∈ G implies g  ∈ G. A tuple ¯ := {a1 , . . . , ak }, a¯ = (a1 , . . . , ak ) ∈ Ak is G-guarded if the set of its components, [a] belongs to G. The G-semantics for first-order logic on A is defined by inductive rules ¯ for G-guarded a, ¯ which are the usual ones for first-order logic, except for A |=G φ(a), that for quantifiers we have ¯ y) A |=G ∃yφ(a,

:⇐⇒

A |= φ(a, ¯ b) for some b such that [a, ¯ b] ∈ G

A |=G ∀yφ(a, ¯ y)

:⇐⇒

A |= φ(a, ¯ b) for all b such that [a, ¯ b] ∈ G

We now consider the specific guard system G ⊆ P(A) that consist of those sets g ⊆ A that are guarded in A, in the sense that there is an atomic fact A |= R a¯ such that g ⊆ [a]. ¯ Thus, the guarded tuples are obtained from tuples that occur in some atomic fact by copying, permuting or deleting components. It is not difficult to see that the guarded fragment is equivalent to the G-semantics of first-order logic, for this particular guard system G. Theorem There exists a translation φ → φ g from FO to GF, such that, for every formula φ(x) ¯ ∈ FO(τ ), every τ -structure A, and every tuple a¯ with [a] ¯ ∈ G we have that ¯ ⇐⇒ A |= φ g (a). ¯ A |=G φ(a) Further, for every φ(x) ¯ ∈ GF, the G-semantics coincides with the usual first-order semantics.   Notice that the natural model checking game for G-semantics of ψ on A, GG (A, ψ) is obtained as the restriction of usual model checking game G(A, ψ) to the positions φ(a) ¯ for which [a] ¯ ∈ G. In particular, moves from Qyφ(a) ¯ to φ(a, ¯ b) where [a, ¯ b] ∈ / G are no longer available.

3 Semiring Provenance for Guarded Logics

71

This abstract view of guarded logics also leads to a somewhat different view of provenance, based on K -interpretations that give values not only to the literals, but separately also to the guard system. Definition Let K be a commutative semiring K , τ a relational vocabulary, and G ⊆ P(A) be a guard system for the set A. A K -interpretation for τ , A, and G consists of a function π : Lit A (τ ) → K (that maps all equality and inequality literals to their truth values 0 or 1) and a function h : G → K \ {0}. The G-provenance semantics of first-order logic then extends π to a function πG : FO(τ ) → K giving ¯ where quantified formulae to each sentence φ(a) ¯ with [a] ¯ ∈ G the value πG [[φ(a)]], now are treated according to the rules ¯ y)]] := πG [[∃yφ(a,



h([a, ¯ b]) · πG [[φ(a)]] ¯

b:[a,b]∈G ¯

¯ y)]] := πG [[∀yφ(a,



h([a, ¯ b]) · πG [[φ(a)]]. ¯

b:[a,b]∈G ¯

Notice that a K -interpretation for τ , A, and G also gives basic valuations for the terminal positions and the moves of the model checking games GG (A, ψ), for every first-order sentence ψ, which by the rules given in Sect. 3.3 extends to valuations f 0 and f 1 for all positions of that game. Proposition For every position φ of the model checking game GG (A, ψ) and every K -interpretation for τ , A, and G we have that f 0 (φ) = πG [[φ]] and f 1 (φ) =   πG [[¬φ]]. Despite Theorem 3.6, the provenance values defined by this “semantic approach” to the guarded fragment may be different from the provenance values for the syntactic presentation of GF. Indeed, we here give provenance values to the guarded tuples themselves, not to their presentations by an atomic statement. Since a guarded tuple may admit several different syntactic guards, the syntactic approach does not provide a unique provenance value for it. Morerover, even in the case where a guarded tuple has a unique guard, the semantic approach admits to separate the provenance value of its use as guard for a quantifier from its use as an atomic statement as such. However, Theorem 3.6 implies that a formula has a non-zero provenance value in the semantic approach if, and only if, it has a non-zero value in the traditional syntactic approach.

3.7 Guarded Negation First-Order Logic Guarded negation first-order logic, denoted GNF, is a fragment of first-order logic introduced by Bárány, ten Cate and Segoufin [3], which applies the concept of guards not to quantifiers but to negation. As we will see it in some sense generalizes the

72

K. M. Dannert and E. Grädel

guarded logics considered so far. Guarded negation first-order logic can be defined by the grammar φ::=R(x) ¯ | x = y | ∃xφ | φ ∨ φ | φ ∧ φ | α(x¯ y¯ ) ∧ ¬φ( y¯ ), where R is a relation symbol and α(x¯ y¯ ) is an atomic formula that contains all free variables of φ( y¯ ). This yields a logic that contains the existential positive fragment of first-order logic, but also allows for some restricted negation. Further it also generalizes the traditional guarded fragment GF in the sense that every formula φ(x) ¯ can be translated ¯ of GNF such that for every guarded tuple a¯ of a structure A we into a formula φ ∗ (x) ¯ In particular, every sentence of GF have that A |= φ(a) ¯ if, and only if, A |= φ ∗ (a). is equivalent to a sentence of GNF. Additionally, many of the desirable properties of GF survive also for GNF; in particular this holds for the decidability of satisfiability and finite satisfiability, even for the fixed-point extension of GNF. A more detailed model-theoretic analysis of GNF has been presented in [4, 5]. We discuss the embedding of GF into GNF established in [3]. There is one minor point that one has to take care of in such translations. In GF, a formula φ(x) ¯ can also be used to express a property of tuples that are not necessarily guarded, because the restriction to guarded tuples appears only inside the scope of a quantifier, but not necessarily on the top level of the formula, and as far as unguarded tuples are concerned, translations from GF into other logics may be problematic. We therefore restrict formulae to talk only about guarded tuples, for instance by attaching an explicit guard atom: we say that a formula is answer guarded, if it is either a sentence, or of the form α(x¯ y¯ ) ∧ φ( y¯ ), where α(x¯ y¯ ) an atomic formula containing all free variables of φ( y¯ ). Proposition Every answer guarded formula in GF can be translated into an equivalent GNF-formula via polynomial time transformation.   Proof Let φ be an answer guarded formula in GF. First, transform every subformula of the form (∀x¯ . α)ψ into ¬∃x(α ¯ ∧ ¬ψ) and every subformula of the form (∃x¯ . α)ψ into ∃x(α ¯ ∧ ψ). Then consider the subformulae of φ that are of the form ¬θ , starting with the literals. If ¬θ does not have any free variables, we can replace it by ∃x(x = x ∧ ¬θ ). Now suppose that θ has the free variables y¯ . Then θ is in the scope of an innermost guard atom α and we can replace ¬θ by α ∧ ¬θ since α must contain all free variables of θ . Implementing these replacements for every negated subformula of φ (including possibly φ itself) yields a formula φ  which is equivalent to φ and in GNF by construction. Bárány, ten Cate and Segoufin [3] have shown that GNF has the finite model property and have determined the complexity of the satisfiability problem, based on a reduction to GF using Rosati covers [2]. Theorem 1. The satisfiability problem for GNF is 2ExpTime-complete.

3 Semiring Provenance for Guarded Logics

73

2. Every satisfiable GNF-sentence has a finite model of size 22

|φ| O(1)

.  

In the same paper, they have also shown that the evaluation problem for GNF on finite structures has a higher complexity level than GF (which is in polynomial time) but lower than full first-order logic (which is Pspace-complete). Theorem The model checking problem for GNF is PNP[O(log

2

(n))]

-complete.

 

We shall discuss this result below.

3.7.1 Provenance Analysis for GNF We want to provide appropriate definitions for a provenance analysis for guarded negation first-order logic. To do this along the lines described above for other logics, we need a negation normal form for GNF. This poses a problem however, since a negation cannot be simply “pushed through” an existential quantifier; this would lead to universal quantification, which is not allowed in the syntax. To solve this problem, one could modify the syntax to allow for the use of universal quantifiers, but only in the cases where the formula can be translated back into a formula from the original syntax. However, this approach leads to an artificial syntax which would still be asymmetric in the treatment of existential and universal quantifiers. For this reason, we introduce a new variant GNF∗ of guarded negation first-order logic, which is equivalent to GNF for sentences and answer guarded formulae, but permits a few more formulae in order to allow symmetric use of the two quantifiers. Therefore, we will be able to retain all the desirable properties of GNF for sentences and for formulae on guarded tuples, but we will have a more general syntax that has the required dualities to be amenable to a game-based approach and to semiring provenance. Definition 3 We define GNF∗ as the union of two fragments GNF+ and GNF− which are defined by the mutual induction ¯ | x = y | ∃xφ + | φ + ∨ φ + | φ + ∧ φ + | α(x¯ y¯ ) ∧ φ − ( y¯ ) | ¬φ − φ + ::=R(x) φ − ::=¬R(x) ¯ | x = y | ∀xφ − | φ − ∧ φ − | φ − ∨ φ − | α(x¯ y¯ ) → φ + ( y¯ ) | ¬φ + , where R is a relation symbol and α(x¯ y¯ ) is an atomic formula containing all free variables of the formula it is used with. The formulae φ + ∈ GNF+ are called positive, the formulae φ − ∈ GNF− are negative. Remark It might also be interesting to consider the fragment of this logic, where we disallow formulae of the form φ + ∧ ψ + in GNF+ and φ − ∨ ψ − in GNF− . This would lead to a nice simplification of the model checking game: between guards, only one player would move, until control switches to the other player at the next

74

K. M. Dannert and E. Grädel

guard. But even in the model checking game of the full logic GNF∗ as defined above we have the useful property that only one player is in control of the assignments to the variables, and again this control only switches at guards. We shall return to model checking games later in the section, after introducing provenance for GNF∗ . Before doing that, we note some interesting properties of GNF∗ . Proposition 1. Every formula in GNF− is equivalent to the negation of a formula from GNF+ and vice versa. In particular, GNF∗ is closed under transformation to negation normal form. 2. Syntactically, GNF+ ∩ GNF− = ∅. In particular, every formula can be uniquely identified as positive or negative. 3. GNF+ is equivalent to GNF. 4. Every answer guarded formula in GNF∗ is equivalent to a formula in GNF.   Proof If φ is in GNF+ , rewrite every subformula of φ that is in GNF− as the negation of a formula in GNF+ , starting with the outermost subformulae in GNF− . Because formulae of GNF− only occur in answer guarded formulae, applying these transformations from the outside in yields a formula in GNF. ¯ as the negation of a formula If on the other hand φ(a) ¯ is in GNF− , rewrite φ(a) ∗ + ¯ in GNF . Because a¯ is guarded by γ , ¬φ ∗ and therefore φ are equivalent φ (a) ¯ and φ ∗ (a) ¯ ∈ GNF+ is equivalent to a formula from GNF by the to γ (a) ¯ ∧ ¬φ ∗ (a) argument above. Therefore, φ is equivalent to a formula from GNF. While GNF consists of clearly positive formulae which allow negative subformulae, but only applied to a guarded tuple, GNF∗ also permits negative fomulae. However, unlike in first-order logic, every formula is clearly either positive, if it is in GNF+ , or negative, if it is in GNF− . A positive formula, like a GNF-formula, is a positive or existential statement about the whole structure, possibly with negative or universal statements about guarded tuples. In contrast to that, a negative formula is a negative or universal statement about the structure, where positive or existential statements only apply if a guarding condition is fullfilled (remember that in negative formulae, the guarded subformulae have the form α → φ). We are now ready to provide a notion of provenance for GNF∗ . Definition A K -interpretation π : Lit A (τ ) → K (for a commutative semiring K , a universe A and a relational vocabulary τ ) extends to a valuation π : GNF∗ (τ ) → K by the following rules, where φ, ψ are arbitrary formulae in GNF∗ , whereas φ + ∈ GNF+ and φ − ∈ GNF− : π [[ψ ∨ φ]] := π [[ψ]] + π [[φ]]  π [[φ + (a)]] π [[∃xφ + (x)]] :=

π [[ψ ∧ φ]] := π [[ψ]] · π [[φ]]  π [[∀xφ − (x)]] := π [[φ − (a)]].

a∈A

π [[α ∧ φ − ]] := π [[α]] · π [[φ − ]]

a∈A

 π [[α → φ + ]] :=

1, π [[α]] = 0 π [[α]] · π [[φ + ]], otherwise

3 Semiring Provenance for Guarded Logics

75

As before, negation is handled via negation normal form: π [[¬φ]] := π [[nnf(¬φ)]]. As for GF, a K -valuation for GNF∗ may assign a different value to a a formula than the corresponding first-order K -valuation would assign to the standard rewriting in traditional first-order syntax.

3.7.2 Model Checking Games for GNF∗ and Their Provenance Analysis We define model checking games for GNF∗ in a similar way to those of first-order logic. To avoid player switches, which would interfere with defining a notion of provenance, we only consider formulae in negation normal form. The only difference to the model checking games for first-order logic concerns the rules for answer guarded formulae of form α ∧ φ − in GNF+ or α → φ + in GNF− , i.e. the positions where a play switches between GNF+ and GNF− . At a position α ∧ φ − the guard α is evaluated. If it is false, then Player 1 has won. If it is true, then the play proceeds to φ − . Dually, at a position α → φ + , Player 0 has won if the guard α is false and if it is true the play proceeds to φ + . Proposition Let A be a τ -structure, a¯ a guarded tuple in A, and ψ(x) ¯ a GNF∗ formula. Then A  ψ(a) ¯ if, and only if, Player 0 wins the model checking game from ψ(a). ¯   Proof The argument, by induction on the formula, proceeds as for the general firstorder model-checking game. It only remains to consider positions of the form α ∧ φ − in GNF+ and α → φ + in GNF− for which we have modified the rules. If A |= α ∧ φ − , then the guard α evaluates to true and the play proceeds to position − φ from which Player 0 wins by induction. If A |= α ∧ φ − , then either the guard α evaluates to false, and Player 1 wins by definition, or α evaluates to true and the play continues at φ − . Since A |= φ − , Player 1 wins by induction. If A |= α → φ + , then either the guard α evaluates to false, in which case Player 0 wins by definition or, if the play proceeds to φ + , then A |= φ + so Player 0 wins by induction hypothesis. If A |= α → φ + , then the guard α evaluates to true, so the play proceeds to φ + with A |= φ + and Player 1 wins by induction hypothesis. We next consider natural provenance evaluations for the model checking games for GNF∗ and show that they are compatible with the provenance definitions given above. In a model checking game G(A, ψ) for a finite τ -structure A and ψ ∈ GNF∗ there are two kinds of terminal positions: either they are literals φ ∈ Lit A (τ ) or they correspond to an answer-guarded formula where the guard is not satisfied. Given a commutative semiring K and a model-defining K -interpretation π : Lit A (τ ) → K with Aπ = A we obtain valuations f 0 , f 1 for the terminal positions G(A, ψ) as follows. For φ ∈ Lit A (τ ) we put f 0 (φ) = π [[φ]] and f 1 (φ) = π [[¬φ]]. For an answer

76

K. M. Dannert and E. Grädel

guarded formula α ∧ φ − with π [[α]] = 0 we put f 0 (α ∧ φ − ) = 0 and f 1 (α ∧ φ − ) = 1. Dually, for an answer guarded formula (α → φ + ) with π [[α]] = 0 we put f 0 (α → φ + ) = 1 and f 1 (α → φ + ) = 0. Further we defined a valuation h : E → K of the edges of G(A, ψ) as follows. For every move from a node v = (α ∧ φ − ) to w = φ − , or from v = (α → φ + ) to w = φ + , we put h(vw) := π [[α]]. For all other moves (v, w) ∈ E we put h(vw) := 1. The model checking games for GNF∗ do not have cycles. Therefore, we can inductively extend f 0 and f 1 from the terminal position to the entire game graph as defined in Sect. 3.3.3, by the rules  h(vw) · f σ (w) f σ (v) :=  w∈vE h(vw) · f σ (w) w∈vE

if v ∈ Vσ if v ∈ V1−σ .

Proposition Let φ be any position in the model checking game G(A, ψ). Then   f 0 (φ) = π [[φ]] and f 1 (φ) = π [[¬φ]]. Proof The proof is an obvious induction on φ and the only difference to the arguments for arbitrary first-order model checking games concerns the cases where φ is of form α ∧ φ − or α → φ + . For φ = α ∧ φ − we have that f 0 (φ) = π [[α]] · f 0 (φ − ) = π [[α]] · π [[φ − ]] = π [[φ]]. 

Further f 1 (φ) =

1 π [[α]] · f 1 (φ − )

if π [[α]] = 0 otherwise.

On the other side  π [[¬φ]] = π [[α → ¬φ − ]] =

1 π [[α]] · π [[¬φ − ]]

if π [[α]] = 0 otherwise.

Since, by induction hypothesis, f 1 (φ − ) = π [[¬φ − ]], we also have f 1 (φ) = π [[¬φ]].   The arguments for φ = α → φ + are analogous.

3.7.3 Algorithmic Analysis To analyze model-checking and provenance for guarded negation first-order logic algorithmically it is useful to consider its stratification via guarded negation: we − + define fragments GNF+ k and GNFk , for k ≥ 1, where GNF1 is the existential positive fragment of FO, built from atomic formulae R x¯ and x = y by means of disjunction, conjunction, and existential quantification. Similarly, GNF− 1 is the universal nega-

3 Semiring Provenance for Guarded Logics

77

tive fragment, built from negated atoms by conjunction, disjunction, and universal quantification. For k ≥ 2, the formulae in GNF+ k are defined by the grammar φ::=φ + | ∃xφ | φ ∨ φ | φ ∧ φ | α ∧ φ − − − − − where φ + ∈ GF+ k−1 , φ ∈ GNFk−1 and α is a guard for φ . Analogously, GNFk is defined by φ::=φ − | ∀xφ | φ ∨ φ | φ ∧ φ | α → φ + .

It is well-known that the model-checking problem for existential positive formulae, and in fact even for conjunctive queries, is NP-complete. In game theoretic terms, we analyze this problem as follows. For a finite structure A of size m, a tuple a¯ that is guarded in A, and an existential positive formula ψ(a) ¯ of size n the model checking game G(A, ψ) has, in general, exponential size O(n · m n ) and can therefore not be explicitely constructed in an efficient way. However, the game has only a polynomial number of terminal positions, and, in contrast to games for general first-order formulae, it has the property that the exponential branching of the game is only caused by Player 0. This means that once Player 0 has committed herself to a strategy f , the reduced game graph G f , admitting only the plays that are consistent with f , has only polynomial size. As a consequence model-checking games for existential positive formulae can be solved in NP, avoiding an explicit construction. Proposition There is a nondeterministic polynomial-time algorithm that given a finite structure A, a guarded tuple a, ¯ and a formula ψ(x) ¯ ∈ GNF+ 1 guesses, on the ¯ fly, a strategy f for Player 0, constructs the reduced game graph G f of G(A, ψ(a)), and determines whether Player 0 has a winning strategy.   This analysis can be extended to arbitrary formulae of GNF∗ . Theorem Given a finite structure A, a guarded tuple a, ¯ a formula ψ(x) ¯ ∈ GNF∗ , and σ ∈ {0, 1}, the problem whether Player σ wins the model-checking game G(A, ψ(a)) ¯   is in PNP . NP Proof For ψ ∈ GNF+ 1 the problem is in NP and hence also in P . Let now ψ ∈ + GNFk+1 and assume that the result has already been established for formulae in NP under complements, also for formulae in GNF− k and hence, by the closure of P + GNFk . The game for a formula in GNF+ k+1 can be viewed as a game for the top-level existential positive formula whose terminal positions are either literals, or answer guarded formulae α ∧ φ − with φ − ∈ GNF− k . There are only a polynomial number of such terminal nodes, and we can construct them efficiently. For nodes of the form α ∧ φ − we distinguish two cases: if α evaluates to false, we label the node as winning for Player 1. If α evaluates to true, the node is the root of a new game G(A, φ − ). In this case we apply the already established PNP -algorithm to determine the winner, and label the node accordingly. We are left with a game for an existential positive formula that we can solve in PNP . Altogether this is a polynomial composition of PNP -algorithms which is again a PNP algorithm.  

78

K. M. Dannert and E. Grädel

A more sophisticated implementation of this algorithmic idea results in a PNP algorithm that queries the NP-oracle in a restricted way. Indeed, one can make sure that each query to the oracle depends only on the answers to a logarithmic number (with respect to the length of the formula) of previously asked queries, i.e., the probNP[O(log2 n)] lem is in the class PNP ||O(log n) which, by a result due to [7], is the same as P the class of problems solvable by a polyomial-time algorithm with at most O(log2 n) calls to an oracle in NP. As shown by Bárány, ten Cate, and Segoufin [3] the model checking problem for GNF is actually complete for this complexity class. For the provenance analysis of an existential positive formula in a semiring K , it does not suffice to guess a strategy and check whether it is winning. Instead, one has to sum up the provenance values of all possible strategies. For each individual strategy it is still possible to compute the value in polynomial time, provided that this is the case for the basic semiring operations, and that we have an additive cost measure. For provenance values in the semiring of natural numbers we hence have a summation over exponentially many values of a polynomial-time computable function into N; this can be done by a #P-algorithm. Theorem The problem of computing provenance values in N (with the standard logarithmic cost measure) for existential positive first-order formulae is #P-complete.   It is open to what extent this result can be generalized beyond existential positive ∗ formulae, i.e. formulae in GNF+ 1 , to higher levels of GNF .

References 1. Andréka, H., van Benthem, J., & Németi, I. (1998). Modal languages and bounded fragments of predicate logic. Journal of Philosophical Logic, 27, 217–274. 2. Bárány, V., Gottlob, G., & Otto, M. (2010). Querying the guarded fragment. In Proceedings of the 2010 25th Annual IEEE Symposium on Logic in Computer Science, LICS ’10 (pp. 1–10). 3. Bárány, V., ten Cate, B., & Segoufin, L. (2015). Guarded negation. Journal of the ACM, 62(3), 22:1–22:26. 4. Benedikt, M., Bourhis, P., & Vanden Boom, M. (2017). Characterizing definability in decidable fixpoint logics. In 44th International Colloquium on Automata, Languages, and Programming, ICALP 2017 (pp. 107:1–107:14). 5. Benedikt, M., ten Cate, B., & Vanden Boom, M. (2016). Effective interpolation and preservation in guarded logics. ACM Transactions on Computational Logic, 17(2), 8. 6. Blackburn, P., de Rijke, M., & Venema, Y. (2001). Modal logic. Cambridge University Press. 7. Castro, J., & Seara, C. (1996). Complexity classes between kP and kP . RAIRO - Theoretical Informatics and Applications - Informatique Théorique et Applications, 30(2), 101–121. 8. Emerson, A., & Jutla, C. (1988). The complexity of tree automata and logics of programs. Proceedings of FOCS, 1988, 328–337. 9. Grädel, E. (1999). On the restraining power of guards. Journal of Symbolic Logic, 64, 1719– 1742. 10. Grädel, E. (1999). Why are modal logics so robustly decidable? Bulletin of the European Association for Theoretical Computer Science, 68, 90–103.

3 Semiring Provenance for Guarded Logics

79

11. Grädel, E., & Otto, M. (2014). The freedoms of (guarded) bisimulation. In Trends in Logic: Johan van Benthem on Logical and Informational Dynamics, pp. 3–31. Berlin: Springer. 12. Grädel, E., & Tannen, V. (2017). Semiring provenance for first-order model checking. arXiv:1712.01980 [cs.LO]. 13. E. Grädel, E., Tannen, V. (2019). Provenance analysis for logic and games. arXiv: 1907.08470 [cs.LO]. 14. Grädel, E., & Walukiewicz, I. (1999). Guarded fixed point logic. Proceedings of LICS, 1999, 45–54. 15. Ladner, R. (1977). The computational complexity of provability in systems of propositional modal logic. SIAM Journal on Computing, 6, 467–480. 16. Spaan, E. (1993). Complexity of modal logics. Ph.D. thesis, University of Amsterdam, Institute for Logic, Language and Computation. 17. M. Vardi, M. (1997). Why is modal logic so robustly decidable? In N. Immerman, & P. Kolaitis, (Eds.), Descriptive complexity and finite models, Volu. 31 of DIMACS series in discrete mathematics and theoretical computer science Katrin M. Dannert is a German logician and mathematician. She graduated in Mathematics at the RWTH Aachen University in 2017. Since then she has been working as a Ph.D. candidate in the group of Erich Grädel at RWTH Aachen. Her main interests are in provenance and semiring semantics for logics and games. Erich Grädel is a Swiss logician, mathematician, and computer scientist. He was born in 1958 in Basel (Switzerland). Erich Grädel studied mathematics, physics, and history at the University of Basel where he got his doctoral degree in 1987. He held research and teaching positions in Pisa, Berkeley, Zürich, and Basel, before he became Professor for Mathematical Foundations of Computer Science at RWTH Aachen University, Germany in 1993. Further he held short-term positions as a visiting professor in Vienna, Paris, Bordeaux, and Cachan.

Chapter 4

Implicit Partiality of Signature Morphisms in Institution Theory R˘azvan Diaconescu

Abstract We develop an extension of institution theory that accommodates implicitly the partiality of the signature morphisms and its syntactic and semantic effects. This is driven primarily by applications to conceptual blending, but other application domains are possible (such as software evolution). The particularity of this extension is a reliance on ordered-enriched categorical structures. This work is dedicated to Hajnal Andréka and István Németi whose early work played an important role in shaping the abstract model theory dream of the author. Keywords Institution theory · 23 -institutions · Ordered categories · Partial signature morphisms

4.1 Introduction 4.1.1 Institution Theory The mathematical context of our work is the theory of institutions [16] which is a four-decades-old category-theoretic abstract model theory that traditionally has been playing a crucial foundational role in formal specification (e.g. [26]). It has been introduced in [15] as an answer to the explosion in the number of population of logical systems there, as a very general mathematical study of formal logical systems, with emphasis on semantics (model theory), that is not committed to any particular logical system. Its role has gradually expanded to other areas of logic-based computer science, most notably to declarative programming and ontologies. In parallel, and often in interdependence to its role in computer science, in the past fifteen or twenty years it has made important contributions to model theory through the new area called institution-independent model theory [4]—an abstract approach to model theory that R. Diaconescu (B) Simion Stoilow Institute of Mathematics of the Romanian Academy, Calea Grivi¸tei 21, Bucure¸sti, Romania e-mail: [email protected]; [email protected] © Springer Nature Switzerland AG 2021 J. Madarász and G. Székely (eds.), Hajnal Andréka and István Németi on Unity of Science, Outstanding Contributions to Logic 19, https://doi.org/10.1007/978-3-030-64187-0_4

81

82

R. Diaconescu

is liberated from any commitment to particular logical systems. Institutions thus allowed for a smooth, systematic, and uniform development of model theories for unconventional logical systems, as well as of logic-by-translation techniques and of heterogeneous multi-logic frameworks. Mathematically, institution theory is based upon a category-theoretic [23] formalization of the concept of logical system that includes the syntax, the semantics, and the satisfaction relation between them. As a form of abstract model theory, it is the only one that treats all these components of a logical system fully abstractly. In a nutshell, the above-mentioned formalization is a category-theoretic structure (Sign, Sen, Mod, |=), called institution, that consists of (a) a category Sign of socalled signatures, (b) two functors, Sen : Sign → SET for the syntax, given by sets of so-called sentences, and Mod : Sign → CAT for the semantics,1 given by categories of so-called models, and (c) for each signature Σ, a binary satisfaction relation |=Σ between the Σ-models, i.e. objects of Mod(Σ), and the Σ-sentences, i.e. elements of Sen(Σ), such that for each morphism ϕ : Σ → Σ  in the category Sign, each Σ  -model M  , and each Σ-sentence ρ the following Satisfaction Condition holds: if and only if Mod(ϕ)(M  ) |=Σ ρ. M  |=Σ  Sen(ϕ)(ρ) Because of its very high level of abstraction, this definition accommodates not only well established logical systems but also very unconventional ones. Moreover, it has served and it may serve as a template for defining new ones. Institution theory approaches logic and model theory from a relativistic, non-substantialist perspective, quite different from the common reading of formal logic. This does not mean that institution theory is opposed to the established logic tradition, since it rather includes it from a higher abstraction level. In fact, the real difference may occur at the level of the development methodology: top-down in the case of institution theory, versus bottom-up in the case of traditional logic. Consequently, in institution theory, concepts come naturally as presumed features that a logical system might exhibit or not, and are defined at the most appropriate level of abstraction; in developing results, hypotheses are kept as general as possible and introduced on a by-need basis. Although among many forms of abstract model theory, institution theory is the only one that treats everything fully abstractly, it has received important influences from other “less abstract” model theories. A notable example is the view of satisfaction as cone-injectivity that is prominent in Andréka and Németi category theoretic work on generalised axiomatizability [1]. This view underlies the axiomatizability chapter of institution-independent model theory [4].

1 Sign

denotes the dual of Sign.

4 Implicit Partiality of Signature Morphisms in Institution Theory

83

4.1.2 From Total to Partial Signature Morphisms in Institution Theory In the conventional approach to model theory signature morphisms play a minimal role as they are basically reduced to signature inclusions (or extensions, depending on which perspective one takes). The lack of the involvement of proper signature morphisms pervades all forms of model theory with the notable exception of institution theory. Due to its roots within algebraic specification theory which involves signature morphisms in their full generality, institution theory has given signature morphisms a prominent role right from its inception. In time this unique aspect of institution theory has proved to be extremely beneficial in the development of an in-depth model theory at the level of abstract institutions, especially for expressing smoothly at the most abstract level crucial concepts such as quantifiers, interpolation, definability, diagrams, saturation, etc. A more extensive discussion on this topic may be found in [5]. Although in institution theory signature morphisms are considered in their full generality they are always implicitly assumed to be total. Partiality has never been considered for signature morphisms, neither explicitly nor implicitly. However there are a few contexts that on the one hand require partial translations between signatures, and on the other hand require an institution theoretic treatment. Two such contexts are conceptual blending [10, 12, 13] and software evolution [11]. The goal of the present work is to develop an extension of the concept of institution, called 23 -institution, that accommodates partiality for the signature morphism. According to the methodology promoted by institution theory this has to be achieved rather implicitly and axiomatically. Very briefly this goes as follows. First, more structure is added to the category Sign of the signatures and their morphisms. The implicit partiality of the signature morphisms is axiomatised as a partial order on the arrows of Sign. This makes Sign a so-called 23 -category (a terminology favoured by Goguen). These are a special instance of the rather notorious 2-categories and in fact 23 -categories are somehow half-way between ordinary (1-)categories and 2categories. There are also important effects of the implicit partiality of the signature morphisms at the level of the sentence translations and of the model reducts. The sentence translations Sen(ϕ) ought to be allowed to be partial rather than total functions, and the model reducts Mod(ϕ) ought to be allowed to map models to sets of models rather than single models.

4.1.3 Contributions and Structure of the Paper The paper is structured as follows: 1. In a preliminary section we introduce some basic category theoretic notations and terminology, with emphasis on 23 -categories.

84

R. Diaconescu

2. In a section on 23 -institutions we start by recalling the basic concepts of (ordinary) institution theory, then we refine this to the concept of 23 -institution, provide a collection of relevant examples, and develop basic 23 -institution theoretic concepts and results on: •

3 -institutional seeds, that constitute a simple abstract scheme that underlies the 2 definition of many 23 -institutions of interest and that provides a general framework for an easy derivation and understanding of important 23 -institutional

properties. • Theory morphisms, that parallels the corresponding concept from ordinary institution theory but only to a limited extent, since 23 -institution theory admits several relevant concepts of theory morphisms. • Model amalgamation, that extends the corresponding concept from ordinary institution theory to 23 -institutions. 3. We dedicate a special section to the presentation of a scheme for approaching conceptual blending with 23 -institutions that essentially replaces the currently prevalent idea of looking for colimits of theories with another idea, of looking for lax cocones with model amalgamation. Our scheme is supported by the mathematical results of the previous sections, and in addition to that it has also a number of parameters that makes it quite flexible in the applications. 4. In the final technical section we develop a different concept of theory morphism for 23 -institutions that models software changes.

4.2 Category-Theoretic and Other Preliminaries 4.2.1 Categories In general we stick to the established category theoretic terminology and notations, such as in [23]. But unlike there we prefer to use the diagrammatic notation for compositions of arrows in categories, i.e. if f : A → B and g : B → C are arrows then f ; g denotes their composition. The domain of an arrow/morphism f is denoted by  f while its codomain is denoted by f . SET denotes the category of sets and functions and CAT the “quasi-category” of categories and functors.2 The class of objects of a category C is denoted by |C| and its class of arrows simply by C (so by f ∈ C we mean that f is an arrow in C). The dual of a category C (obtained by formally reversing its arrows) is denoted by C .

2 This

means it is bigger than a category since the hom-sets are classes rather than sets.

4 Implicit Partiality of Signature Morphisms in Institution Theory

85

The following functor extends the well known power-set construction from sets to categories: Definition 1 Given a category C the power-set category PC is defined as follows: • |PC| = {A | A ⊆ |C|} and PC(A, B) = {H ⊆ C | h ∈ A, h ∈ B for each h ∈ H }; and • composition is defined by H1 ; H2 = {h 1 ; h 2 | h 1 ∈ H1 , h 2 ∈ H2 , h 1  = h 2 }; then 1 A = {1a | a ∈ A} are the identities.

4.2.2 Partial Functions A partial function f : A →  B is a binary relation f ⊆ A × B such that (a, b), (a, b ) ∈ f implies b = b . The definition domain of f , denoted dom( f ) is the set {a ∈ A | ∃b (a, b) ∈ f }. A partial function f : A →  B is called total when dom( f ) = A. We denote by f 0 the restriction of f to dom( f ) × B; this is a total function. Partial functions yield a subcategory of the category of binary relations, denoted Pfn. Note that dom( f ; g) = {a ∈ dom( f ) | f 0 (a) ∈ dom(g)}. If A ⊆ A by f (A ) we denote the set {b | ∃a ∈ A , (a, b) ∈ f }. Then f (A) is denoted by Im( f ). It is easy to check the following (though not as immediate as in the case of the total functions): Lemma 1 Given partial functions f : A →  B and g : B →  C and A ⊆ A we   have that ( f ; g)(A ) = g( f (A )).

4.2.3

3 2 -categories

A 23 -category is just a category such that its hom-sets are partial orders, and the composition preserve these partial orders. In the literature 32 -categories are also called ordered categories or locally ordered categories. In terms of enriched category theory [20], 23 -category are just categories enriched by the monoidal category of partially ordered sets. Given a 23 -category C by C we denote its ‘vertical’ dual which reverses the partial orders, and by C its double dual C . Given 23 -categories C and C , a strict 3 -functor F : C → C is a functor C → C that preserves the partial orders on the 2 hom-sets. Lax functors relax the functoriality conditions F(h); F(h  ) = F(h; h  ) to F(h); F(h  ) ≤ F(h; h  ) (when h = h  ) and F(1 A ) = 1 F(A) to 1 F(A) ≤ F(1 A ). If these inequalities are reversed then F is an oplax functor. This terminology complies to [2] and to more recent literature, but in earlier literature [19, 21] this is reversed. Note that oplax + lax = strict. In what follows whenever we say “ 23 -functor” without the qualification “lax” or “oplax” we mean a functor which is either lax or oplax.

86

R. Diaconescu

Lax functors can be composed like ordinary functors; we denote by 23 CAT the category of 23 -categories and lax functors. Most typical examples of a 23 -category are Pfn – the category of partial functions in which the ordering between partial functions A →  B is given by the inclusion relation on the binary relations A → B, and PoSET – the category of partially ordered sets (with monotonic mappings as arrows) with orderings between monotonic functions being defined point-wise ( f ≤ g if and only if f ( p) ≤ g( p) for all p). The following 23 -category will be used in the next section in the definition of the main concept of his paper. Definition 2 The category CATP has categories as objects and has arrows/morphisms C → C as mappings C → PC . The composition in CATP is defined as follows: given F : C → C and F  : C → C in CATP then their  composition is the mapping C → PC that maps each arrow f ∈ C to the set f  ∈F f F  f  . By considering the point-wise partial order on the class of the mappings C → PC we get a 23 -category denoted 23 (CATP ). Note that in the above definition we do not require that the mappings C → PC are functors of any kind, not even morphisms of graphs, they are just mappings between classes of arrows. In fact the above composition in general does not preserve functoriality properties.

4.2.4 Various Colimits in 23 -categories Unlike in the case of ordinary categories, colimits in 23 -categories come in several different flavours according to the role played by the order on the arrows. Here we recall some of these for the particular emblematic case of pushouts; the extension to other types of colimits being obvious. Given a span ϕ1 , ϕ2 of arrows in a 23 -category, a lax cocone for the span consists of arrows θ0 , θ1 , θ2 such that there are inequalities as shown in the following diagram: •

(4.1)

θ1



θ2



θ0

ϕ1



• ϕ2

• When the two inequalities are both equalities, this is a strict cocone. In this case θ0 is redundant and the data collapses to the equality ϕ1 ; θ1 = ϕ2 ; θ2 .

4 Implicit Partiality of Signature Morphisms in Institution Theory

87

A lax cocone like in diagram 4.1 is: • •

3 -pushout when it is strict and for any strict cocones (θ1 , θ2 ) and (θ1 , θ2 ) with 2  θk ≤ θk , k = 1, 2, there are unique mediating arrows μ ≤ μ such that θk ; μ = θk and θk ; μ = θk , k = 1, 2; lax pushout when for any lax cocones (θ0 , θ1 , θ2 ) and (θ0 , θ1 , θ2 ) there are unique mediating arrows μ ≤ μ such that θk ; μ = θk and θk ; μ = θk , k = 0, 1, 2;

• weak (lax) pushout when the uniqueness condition on the mediating arrows is dropped from the above properties; • near pushout when for any lax cocone (θ0 , θ1 , θ2 ) the set of mediating arrows {μ | θk ; μ ≤ θk , k = 0, 1, 2} has a maximal element.

Lax pushouts represents the instance of a natural concept of colimit from general enriched category theory [20] to 23 -categories; however in concrete situations, unlike their cousins from ordinary category theory, they can be very difficult to grasp and sometimes appearing quite inadequate. For example in Pfn—which as a(n ordinary) category has all pushouts—there are spans that do not admit a lax pushout. The following counterexample comes from [3]. Here we provide a detailed argument for this based upon ideas that have been sketched only very briefly in [3]. Proposition 1 The span of partial functions ϕ1 : {a1 , a2 }→{b1 , b2 }, ϕ2 : {a1 , a2 }→ {c1 , c2 } where ϕ1 = {(a1 , b1 ), (a2 , b2 )} and ϕ2 = {(a2 , c2 )} does not admit a lax pushout. Proof The proof is by reductio ad absurdum. Let us assume that (ϕ1 , ϕ2 ) does admit a lax pushout (θ0 , θ1 , θ2 ) with apex a set P. Then P = Imθ (where Imθ denotes Imθ0 ∪ Imθ1 ∪ Imθ2 )

(4.2)

since otherwise if there exists x ∈ P \ Imθ by considering the lax cocone (θ0 , θ1 , θ2 ) with the apex Imθ and such that θk = θk , k = 0, 1, 2 (as sets of pairs of elements) there are at least two different mediating partial functions between (θ0 , θ1 , θ2 ) and (θ0 , θ1 , θ2 ). One of them is the partial function μ such that domμ = Imθ and μ( p) = p for each p ∈ domμ, and the other may be just a total function that extends the identity on Imθ by mapping all elements in P \ Imθ to some elements in Imθ . Now let us consider all lax cocones (θ0 , θ1 , θ2 ) over (ϕ1 , ϕ2 ) with the apex the singleton set {}. There are 32 of them because: • When b1 ∈ domθ1 there are 2 possibilities for θ1 , 1 for θ0 (it is total since ϕ1 ; θ1 is total), and 4 for θ2 ; hence in this case there are 2 · 1 · 4 = 8 lax cocones. / domθ1 there are 2 possibilities for θ1 and • When b1 ∈ – when c2 ∈ domθ1 there are 2 possibilities for θ2 and 2 for θ0 , and / domθ1 there are 2 possibilities for θ2 and 4 for θ0 . – when c2 ∈ Hence in this case there are 2 · (2 · 2 + 2 · 4) = 24 lax cocones.

88

R. Diaconescu

{} μ

θ1

θ2

P θ1

{b1 , b2 }

θ2



θ0



ϕ1



θ0

{c1 , c2 } ϕ2

{a1 , a2 } Because of the bijection between the set of the lax cocones (θ0 , θ1 , θ2 ) over (ϕ1 , ϕ2 ) with apex {} and the set of the partial functions μ : P → {} it follows that P has log2 32 = 5 elements. We have that card Imθk ≤ 2 for k = 0, 1, 2. If card Imθ1 = 2 then θ1 (b1 ) = θ0 (a1 ) = θ0 (a2 ) hence Imθ1 ∩ Imθ0 = ∅. If card Imθ2 = 2 then θ2 (c2 ) = θ0 (a2 ) hence Imθ2 ∩ Imθ0 = ∅. It follows that Imθ has at most 4 elements and by (4.2) it further follows that card P ≤ 4 which contradicts our previous finding that card P = 5. A remedy for this would be to restrict the cocones to designated subclasses of arrows as follows. Definition 3 (T-colimits) Given a (1-)subcategory T ⊆ C of a 23 -category C, a lax T-cocone for a span (ϕ1 , ϕ2 ) is a lax cocone (θ0 , θ1 , θ2 ) for the span such that θk ∈ T, k = 0, 1, 2. A lax T-pushout is a minimal lax T-cocone, i.e. for any lax T-cocone (θ0 , θ1 , θ2 ) there exists an unique mediating arrow μ ∈ T such that θk ; μ = θk , k = 0, 1, 2 and, moreover, if θ  is another such lax T-cocone such that θk ≤ θk , k = 0, 1, 2 then we have the relationship μ ≤ μ between the corresponding mediating arrows. This definition extends in the obvious way to general colimits and to the weak case (by dropping off the requirement on the uniqueness of μ). For example, in Pfn by letting T be the sub-category of total functions, any span of partial functions admits a lax T-pushout. This may be understood easily if we note that in this case a lax cocone for a span (ϕ1 , ϕ2 ) of partial functions is the same thing with a cocone in SET for the W-shaped diagram ϕ1 

ϕk

ϕ2 

⊆ ⊆

ϕ10

domϕ1

ϕ20

domϕ2

Lax T-pushouts are important within the context of model amalgamation properties. Near pushouts (terminology from [19]) are much easier to grasp than lax pushouts (for example in Pfn they are the epimorphic cocones) but nevertheless they have received only little consideration due to their pathology of lacking uniqueness, a

4 Implicit Partiality of Signature Morphisms in Institution Theory

89

property that is considered crucial for any kind of colimits. However in [19] it is argued that they constitute a more proper concept of colimit in a ordered categorical context because it involves only inequalities and moreover Goguen argues [12] that their lack of the uniqueness property is exactly what makes them useful for modelling conceptual blending; there he calls them 23 -pushouts.3 Although we hardly use near pushouts in our paper it is worth knowing about them because of their historical role in the categorical approaches to concept blending.

4.3

3 2 -institutions

The outline of this section is as follows. 1. We recall the concept of institution and provide a couple of emblematic examples. Some basic institution theoretic concepts are alo recalled. 2. We introduce the definition of 23 -institutions. 3. We provide some relevant examples of 23 -institutions that constitute extensions of well known corresponding institutions that accomodate partiality of the signature morphisms. 4. We introduce the concept of 23 -institutional seed that serves as a very general way to define 23 -institutions. This is also mathematically convenient especially within the context of the study of model amalgamation properties. 5. We extend the crucial concept of model amalgamation from common institution theory to 23 -institution theory, and we give some general and yet pragmatic sufficient conditions for 23 -institution theoretic model amalgamation. 6. We extend the concept of theory morphism from common institution theory to 3 -institutions, what happens being an unfolding of the original concept to several 2 concepts of theory morphisms. We establish the relationships between these, and we study their basic compositionality and model amalgamation properties.

4.3.1 Institutions An institution I = (SignI , SenI , Mod I , |=I ) consists of • a category SignI whose objects are called signatures, • a sentence functor SenI : SignI → SET defining for each signature a set whose elements are called sentences over that signature and defining for each signature morphism a sentence translation function,

3 Note

the terminology clash with the 23 -pushouts defined here.

90

R. Diaconescu

• a model functor Mod I : (SignI ) → CAT defining for each signature Σ the category Mod I (Σ) of Σ-models and Σ-model homomorphisms, and for each signature morphism ϕ the reduct functor Mod I (ϕ), • for every signature Σ, a binary Σ-satisfaction relation |=IΣ ⊆ |Mod I (Σ)| × SenI (Σ), such that for each morphism ϕ, the Satisfaction Condition M  |=IΣ  SenI (ϕ)ρ if and only if Mod I (ϕ)M  |=IΣ ρ

(4.3)

holds for each M  ∈ |Mod I (ϕ)| and ρ ∈ SenI (ϕ). ϕ ϕ

ϕ

|Mod I (ϕ)|

|=I ϕ

SenI (ϕ)

Mod I (ϕ)

|Mod I (ϕ)|

SenI (ϕ) |=I ϕ

SenI (ϕ)

We may omit the superscripts or subscripts from the notations of the components of institutions when there is no risk of ambiguity. For example, if the considered institution and signature are clear, we may denote |=IΣ just by |=. For M = Mod(ϕ)M  , we say that M is the ϕ-reduct of M  . The institution is called discrete when the model categories Mod(Σ) are discrete (i.e. do not posses non-identity arrows). Example 1 (Propositional logic – PL) This is defined as follows. SignPL = SET, and for any set P, Sen(P) is generated by the grammar S::=P | S ∧ S | ¬S and Mod PL (P) = (2 P , ⊆). For any M ∈ |Mod PL (P)|, depending on convenience, we may consider it either as a subset M ⊆ P or equivalently as a function M : P → 2 = {0, 1}. When M is considered as a function we will sometimes use the notation M p instead of M( p). For any function ϕ : P → P  , SenPL (ϕ) replaces the each element p ∈ P that  occurs in a sentence ρ by ϕ( p), and Mod PL (ϕ)(M  ) = ϕ; M for each M  ∈ 2 P . For any P-model M ⊆ P and ρ ∈ SenPL (P), M |= ρ is defined by induction on the structure of ρ by (M |= p) = ( p ∈ M), (M |= ρ1 ∧ ρ2 ) = (M |= ρ1 ) ∧ (M |= ρ2 ) and (M |= ¬ρ) = ¬(M |= ρ). Example 2 (Many-sorted algebra – MSA) The MSA-signatures are pairs (S, F) consisting of a set S of sort symbols and of a family F = {Fw→s | w ∈ S ∗ , s ∈ S} of sets of function symbols indexed by arities (for the arguments) and sorts (for the results).4 Signature morphisms ϕ : (S, F) → (S  , F  ) consist of a function 4 By

S ∗ we denote the set of strings of sort symbols.

4 Implicit Partiality of Signature Morphisms in Institution Theory

91

ϕ st : S → S  and a family of functions ϕ op = {ϕw→s : Fw→s → Fϕ st (w)→ϕ st (s) | w ∈ S ∗ , s ∈ S}. The (S, F)-models M, called algebras, interpret each sort symbol s as a set Ms and each function symbol σ ∈ Fw→s as a function Mσ from the product Mw of the interpretations of the argument sorts to the interpretation Ms of the result sort.5 An (S, F)-model homomorphism h : M → M  is an indexed family of functions {h s : Ms → Ms | s ∈ S} such that h s (Mσ (m)) = Mσ (h w (m)) for each σ ∈ Fw→s and each m ∈ Mw , where h w : Mw → Mw is the canonical componentwise extension of h, i.e. h w (m 1 , . . . , m n ) = (h s1 (m 1 ), . . . , h sn (m n )) for w = s1 . . . sn and m i ∈ Msi . For each signature morphism ϕ : (S, F) → (S  , F  ), the reduct Mod(ϕ)(M  ) of  for each sort or function an (S  , F  )-model M  is defined by Mod(ϕ)(M  )x = Mϕ(x) symbol x from the domain signature of ϕ. For each signature (S, F), T(S,F) = ((T(S,F) )s )s∈S is the least family of sets such that σ (t) ∈ (T(S,F) )s for all σ ∈ Fw→s and all tuples t ∈ (T(S,F) )w . The elements of (T(S,F) )s are called (S, F)-terms of sort s. For each (S, F)-algebra M, the evaluation of an (S, F)-term σ (t) in M, denoted Mσ (t) , is defined as Mσ (Mt ), where Mt is the componentwise evaluation of the tuple of (S, F)-terms t in M. Sentences are the usual first order sentences built from equational atoms t = t  , with t and t  (well-formed) terms of the same sort, by iterative application of Boolean connectives (∧, ⇒, ¬, ∨) and quantifiers (∀X , ∃X – where X is a sorted set of variables). Sentence translations along signature morphisms just rename the sort and function symbols according to the respective signature morphisms. They can be formally defined by recursion on the structure of the sentences. The satisfaction of sentences by models is the usual Tarskian satisfaction defined recursively on the structure of the sentences. (As a special note for the satisfaction of the quantified sentences, defined in this formalisation by means of model reducts, we recall that M |=(S,F) (∀X )ρ if and only if M  |=(S,F+X ) ρ for each expansion M  of M to the signature (S, F + X ) that adds the variables X as new constants to F.) op

In the following we recall some basic concepts from institution theory that will play a role in this work. For any set E of Σ-sentences: • if M is a any Σ-model, then by M |= E we denote that M |= e for each e ∈ E; • E is consistent when there exists a Σ-model M such that M |= E; • if ρ is a Σ-sentence then E |= ρ denotes the situation when for each Σ-model M if M |= E then M |= ρ too; • by E • we denote {ρ ∈ Sen(Σ) | E |= ρ}.

w = λ, i.e. the empty string, Mλ is a singleton set, hence Mσ is the same with giving an element of Ms , in other words Mσ is a constant.

5 When

92

R. Diaconescu

Theories In any institution, a theory is a pair (Σ, E) consisting of a signature Σ and a set E of Σ-sentences. A theory morphism ϕ : (Σ, E) → (Σ  , E  ) is a signature morphism ϕ : Σ → Σ  such that E  |= Sen(ϕ)E. It is easy to check that the theory morphisms are closed under the composition given by the composition of the signature morphisms; this gives the category of the theories of I denoted ThI . This fact opens the door for the following general construction, that is quite helpful in several situations, especially in the study of logic encodings. Let I = (Sign, Sen, Mod, |=) be any institution. The institution of the theories of I, denoted by I t = (Signt , Sent , Mod t , |=t ), is defined by • Signt is the category Th of the theories of I, • Sent (Σ, E) = Sen(Σ), • Mod t (Σ, E) is the full subcategory of Mod(Σ) determined by those models which satisfy E, • for any theory morphism ϕ : (Σ, E) → (Σ  , E  ) the reduct Mod t (ϕ) : Mod t (Σ  , E  ) → Mod t (Σ, E) is the restriction of Mod(ϕ) to Mod t (Σ  , E  ),6 and • for each (Σ, E)-model M and Σ-sentence e, M |=t(Σ,E) e if and only if M |=Σ e. Model amalgamation Model amalgamation properties for institutions formalize the possibility of amalgamating models of different signatures when they are consistent on some kind of generalized ‘intersection’ of signatures. It is one of the most pervasive properties of concrete institutions and it is used in a crucial way in many institution theoretic studies. A few early examples are [8, 24, 25, 28]. For the role played by this property in specification theory and in institutional model theory see [26] and [4], respectively. A model of a diagram of signature morphisms in an institution consists of a model Mk for each signature Σk in the diagram such that for each signature morphism ϕ : Σi → Σ j in the diagram we have that Mi = Mod(ϕ)M j . A commutative square of signature morphisms Σ0

ϕ1

ϕ2

Σ2

Σ1 θ1

θ2

Σ

is an amalgamation square if and only if each model of the span (ϕ1 , ϕ2 ) admits an unique completion to a model of the square. When we drop off the uniqueness requirement we call this a weak model amalgamation square.

works, i.e. Mod t (ϕ)M  ∈ Mod t (Σ, E) for M  ∈ |Mod t (Σ  , E  )| as a consequence of the Satisfaction Condition in I .

6 This

4 Implicit Partiality of Signature Morphisms in Institution Theory

93

In most of the institutions formalizing conventional or non-conventional logics, pushout squares of signature morphisms are model amalgamation squares [4]. In the literature there are several more general concepts of model amalgamation. One of them is model amalgamation for cocones of arbitrary diagrams (rather than just for spans), another one is model amalgamation for model homomorphisms. Both are very easy to define by mimicking the definitions presented above. While the former generalisation is quite relevant for the intended applications of our work, the latter is less so since at this moment model homomorphisms do not seem to play any role in conceptual blending or in merging of software changes. Moreover amalgamation of model homomorphisms is known to play a role only in some developments in institution-independent model theory [4], but even there most involvements of model amalgamation refers only to amalgamation of models.

4.3.2

3 2 -institutions:

Definition

Definition 4 ( 23 -institution) A 23 -institution I = (SignI , SenI , Mod I , |=I ) consists of • a 23 -category SignI – called the category of the signatures, • a 23 -functor SenI : SignI → Pfn – called the sentence functor, • a lax 23 -functor Mod I : (SignI ) → 23 (CATP ) – called the model functor, such that Mod(ϕ) is a lax functor for each signature morphism ϕ, and • for each signature Σ ∈ |SignI | a satisfaction relation |=IΣ ⊆ |Mod I (Σ)| × SenI (Σ) such that for each morphism ϕ ∈ SignI , the Satisfaction Condition M  |=Iϕ SenI (ϕ)ρ if and only if M |=Iϕ ρ

(4.4)

holds for each M  ∈ |Mod I (ϕ)|, M ∈ |Mod I (ϕ)M  | and ρ ∈ dom(SenI (ϕ)). The difference between 23 -institutions and ordinary institutions, from now on called 1-institutions, is determined by the 32 -categorical structure of the signature morphisms which propagates to the sentence and to the model functors. Consequently the Satisfaction Condition 4.4 takes an appropriate format. Thus, for each signature morphism ϕ its corresponding sentence translation Sen(ϕ) is a partial function Sen(ϕ) →  Sen(ϕ) and moreover whenever ϕ ≤ θ we have that Sen(ϕ) ⊆ Sen(θ ). The sentence functor Sen can be either lax or oplax; depending on how is this we may call the respective 23 -institution a lax or oplax 23 -institution. In many concrete situations it happens that Sen is strict while some general results require it to be either lax or oplax or strict. The model reduct Mod(ϕ) is a lax functor Mod(ϕ) → PMod(ϕ) implying that for each Σ  -model M  we have a class of reducts M rather than a single reduct.

94

R. Diaconescu

In concrete examples this is a direct consequence of the partiality of ϕ: in the reducts the interpretation of the symbols on which ϕ is not defined is unconstrained, therefore there may be many possibilities for their interpretations. “Many” here includes also the case when there is no interpretation. Definition 5 The model functor Mod admits emptiness when there exists a signature morphism ϕ and a ϕ-model M  such that Mod(ϕ)M  = ∅, otherwise it is said that Mod does not admit emptiness. In examples most often the model functors Mod do not admit emptiness, however the general definition does not rule out emptiness and moreover there are significant examples (we will see in Sect. 4.3.6) when emptiness of Mod may happen. • The fact that Mod is a 23 -functor implies also that whenever ϕ ≤ θ we have Mod(θ ) ≤ Mod(ϕ), i.e. Mod(θ )M  ⊆ Mod(ϕ)M  , etc. • The lax aspect of Mod means that for signature morphisms ϕ and ϕ  such that ϕ = ϕ  and for any ϕ  -model M  , we have that Mod(ϕ)(Mod(ϕ  )M  ) ⊆ Mod(ϕ; ϕ  )M  and for each signature Σ and for each Σ-model M that M ∈ Mod(1Σ )M. • The lax aspect of the reduct functors Mod(ϕ) means that for model homomorphisms h 1 , h 2 such that h 1  = h 2 we have that Mod(ϕ)(h 1 ); Mod(ϕ)(h 2 ) ⊆ Mod(ϕ)(h 1 ; h 2 ) and for each M  ∈ Mod(ϕ) and each M ∈ Mod(ϕ)M  that 1 M ∈ Mod(ϕ)1 M  . As already mentioned above model homomorphisms do not play yet any role in conceptual blending or in other envisaged applications of 23 -institutions. Hence the lax aspect of model functors is for the moment a purely theoretical feature which is however supported naturally by all examples. Another technical note: according to Definition 4 although the model reducts are lax functors, their compositions are not necessarily required to hold this property. In [29] there is a 2-categorical generalization of the concept of institution, called 2-institution, that consider Sign to be a 2-category, Sen : Sign → CAT and Mod : Sign → CAT to be pseudo-functors, and that takes a (quite sophisticated categorically) many-valued approach to the satisfaction relation. From these we can see immediately that 2-institutions of [29] do not cover the concept of 23 -institution through the perspective of 23 -categories as special cases of 2-categories, the functors

4 Implicit Partiality of Signature Morphisms in Institution Theory

95

Sen and Mod in 2-institutions diverging from those in 23 -institutions in two ways: they are pseudo-functors (in 23 -category theory this means just ordinary functors) and their targets do not match those of 23 -institutions. This lack of convergence is due to the two extensions aiming to different application domains. Definition 6 (Total signature morphisms) A signature morphism ϕ in a 23 -institution is • Sen-maximal when Sen(ϕ) is total; • Mod-maximal when for each ϕ-model M  , Mod(ϕ)M  is a singleton; and • total when it is both Sen-maximal and Mod-maximal. Corollary 1 In each 23 -institution such that all identity signature morphism are Mod-maximal, the total signature morphisms determine a 1-institution.

4.3.3

3 2 -institutions:

Examples

The following expected example shows that the concept of 23 -institution constitute a generalisation of the concept of institution. Example 3 (Institutions) Each 1-institution can be regarded as a 23 -institution that has all its signature morphisms total. Next we show how any ordinary institution can be very simply expanded to a proper 23 -institution, variations of this construction being obtained by playing with the “weights” attached to the components of the institution. Although the following example has a rather artificial flavour, it shows how the definition of 32 -institution may be interpreted beyond some form of explicit partiality of the signature morphisms. Example 4 (Adding weights to institutions) Let (Sign, Sen, Mod, |=) be an institution. Then (Sign , Sen , Mod  , |= ) is a 23 -institution in which both the sentence and the model functors are proper lax functors. • |Sign | = |Sign| but morphisms in Sign are just pairs ϕ • k with ϕ ∈ Sign and k ∈ {0, 1}. The composition in Sign is defined by (ϕ • k); (ϕ  • k  ) = (ϕ; ϕ  ) • (k ∨ k  ). Moreover ϕ • k ≤ θ • j if and only if ϕ = θ and k ≤ j. These yield Sign as a 23 -category. • For each signature Σ we define Sen (Σ) = Sen(Σ) × {0, 1} and for any Sign morphism ϕ • k and any ρ • x ∈ Sen ((ϕ • k)) we define  (Sen(ϕ)ρ) • x, if x ≤ k Sen (ϕ • k)(ρ • x) = undefined, otherwise. 

96

R. Diaconescu

It is easy to show that Sen is a lax functor. Moreover if {k, k  } = {0, 1} then Sen (ϕ  • k  )(Sen (ϕ • k)(ρ • 1)) is undefined while Sen (ϕ • k; ϕ  • k  )(ρ • 1) is defined; this shows that Sen is a proper lax functor. • We define Mod  (Σ) = Mod(Σ) × {0, 1} and for any Sign -morphism ϕ • k and any M  • x  ∈ |Mod  ((ϕ • k))| we define Mod  (ϕ • k)(M  • x  ) = {(Mod(ϕ)M  ) • x | x ≤ k · x  }. It is easy to show that Mod  is a lax functor. Moreover if {k, k  } = {0, 1} then Mod  (ϕ • k)(Mod  (ϕ  • k  )(M  • 1)) has one element while Mod  (ϕ • k; ϕ  • k  )(M  • 1) has two elements; this shows that Mod  is a proper lax functor. • Finally, the satisfaction relation is defined by M • x |= ρ • y if and only if M |= ρ. The rest of the examples in this section are based on some explicit form of partiality of the signature morphisms and they are relevant in the applications. Example 5 (Propositional logic with partial morphisms of signatures – 23 PL) This example extends the (discrete version of the) ordinary institution PL to a 23 -institution by considering partial functions rather than total functions as signature morphisms; thus Sign = Pfn. SENTENCES. While for each set P, Sen(P) is like in PL, for any partial function ϕ : P →  P  the sentence translation Sen(ϕ) translates like in PL but only the sentences containing only propositional variables P that are translated by ϕ, i.e. that belong to domϕ; hence the partiality of Sen(ϕ). More precisely we have that dom(Senϕ) = SenPL (dom ϕ) and for each ρ ∈ dom(Senϕ) we have that Sen(ϕ)ρ = SenPL (ϕ 0 )ρ . The sentence functor is a strict 23 -functor; the main part for the functoriality argument for Sen goes as follows. Let ϕ, ϕ  be signature morphisms where ϕ = ϕ  and let ρ ∈ Sen(ϕ)). • First we establish the equality of the definition domains: dom Sen(ϕ; ϕ  ) = = = = =

SenPL (dom ϕ; ϕ  ) SenPL ({ p ∈ dom ϕ | ϕ 0 ( p) ∈ dom ϕ  } {ρ ∈ SenPL (dom ϕ) | SenPL (ϕ 0 )ρ ∈ SenPL (dom ϕ  )} {ρ ∈ dom(SenPL ϕ) | SenPL (ϕ 0 )ρ ∈ dom(SenPL ϕ  )} dom(Senϕ ; Senϕ  ).

• The next step is obtained on the basis of the functoriality of SenPL . For each ρ ∈ dom Sen(ϕ; ϕ  ) we have: Sen(ϕ; ϕ  )ρ = SenPL ((ϕ 0 ; ϕ 0 )ρ = SenPL (ϕ 0 )(SenPL (ϕ 0 )ρ) = Sen(ϕ  )(Sen(ϕ)ρ).

4 Implicit Partiality of Signature Morphisms in Institution Theory

97

MODELS. The 23 PL models and model homomorphisms are those of PL, but their reducts differ from those in PL. Given a partial function ϕ : P →  P  and a P  -model   M : P → 2, Mod(ϕ)M  = {M : P → 2 | M p = Mϕ 0 ( p) for all p ∈ dom ϕ} = {M : P → 2 | (domϕ ⊆ P); M = ϕ 0 ; M  }. On the model homomorphisms the reduct is defined by Mod(ϕ)(M  ⊆ N  ) = {M ⊆ N | M ∈ Mod(ϕ)M  , N ∈ Mod(ϕ)N  }. The main part of the lax functoriality of Mod is proved as follows. Let ϕ, ϕ  be signature morphisms such that ϕ = ϕ  and let M  ∈ |Mod(ϕ  )|. For any M ∈ Mod(ϕ)(Mod(ϕ  )M  ) we show that M ∈ Mod(ϕ; ϕ  )M  . Then there exists M  ∈ Mod(ϕ  )M  such that M ∈ Mod(ϕ)M  . For any p ∈ dom(ϕ; ϕ  ) = { p ∈ dom ϕ | ϕ 0 ( p) ∈ dom ϕ  } we have that M p = Mϕ 0 ( p) = Mϕ0 (ϕ 0 ( p))  = M(ϕ;ϕ  )0 ( p) .

since p ∈ dom ϕ and M ∈ Mod(ϕ)M  since ϕ 0 ( p) ∈ dom ϕ  and M  ∈ Mod(ϕ  )M 

This shows that M ∈ Mod(ϕ; ϕ  )M  . Note that Mod(1 P )M = {M}, hence the second condition of the lax functoriality of Mod is satisfied in a strict sense. The following counterexample shows why Mod is a proper lax functor. Let ϕ

ϕ

 { p3 } be such that ϕ( p1 ) = ϕ( p2 ) = p, ϕ( p3 ) = p3 and { p1 , p2 , p3 } → { p, p3 } → dom ϕ  = { p3 }. Note that dom(ϕ; ϕ  ) = { p3 }. Then we consider any ϕ  -model M  and M ∈ Mod(ϕ; ϕ  )M  such that M p1 = M p2 . Because of the latter condition there is no M  such that M ∈ Mod(ϕ)M  . Also in general the reduct functors Mod(ϕ) are proper lax functors, but this works exactly the other way than in the case of Mod. • Let M  ⊆ N  ⊆ T  ∈ |Mod(ϕ)|. Given M ⊆ T ∈ |Mod(ϕ)| such that M ∈ Mod(ϕ)M  and T ∈ Mod(ϕ)T  , we may define N ∈ Mod(ϕ)N  by N p = Nϕ 0 ( p) when p ∈ dom ϕ and N p = M p otherwise. Consequently M ⊆ N ⊆ T . This shows that we have an equality Mod(ϕ)(M  ⊆ N  ); Mod(ϕ)(N  ⊆ T  ) = Mod(ϕ)(M  ⊆ T  ). • Given M  ∈ Mod(ϕ) and M ∈ Mod(ϕ)M  it is obvious that 1 M ∈ Mod(ϕ)1 M  . However Mod(ϕ) fails to be strict on the identities as shown by the following counterexample. Let ϕ : { p, q} →  { p} such that domϕ = { p}. If we take M  = { p},  M = M and N = { p, q} then we have that M ⊆ N ∈ Mod(ϕ)1 M  , which means that Mod(ϕ)1 M  is strictly larger than 1Mod(ϕ)M  = {1 M | M ∈ Mod(ϕ)M  }.

98

R. Diaconescu

SATISFACTION. The satisfaction relation of 23 PL is inherited from PL. The Satisfaction Condition is proved on the basis of that of PL as follows. Let ϕ : P →  P  , M  : P  → 2 and M ∈ Mod(ϕ)M  and ρ ∈ dom(Senϕ). Then M  |= Sen(ϕ)ρ ⇔ ⇔ ⇔ ⇔

M  |= SenPL (ϕ 0 )ρ ϕ 0 ; M  |= ρ (dom ϕ ⊆ P); M |= ρ M |= ρ

by definition of Sen(ϕ) by the S.C. in PL for ϕ 0 since (dom ϕ ⊆ P); M = ϕ 0 ; M) by the S.C. in PL for dom ϕ ⊆ P.

Example 6 (Many sorted algebra with partial morphisms of signatures – 23 MSA) In this example we extend the MSA institution to its 23 variant in a way that parallels the extension of PL to 23 PL. For this reason we will give only the definitions and rather skip the arguments. Given MSA signatures, a partial MSA-signatures morphism ϕ : (S, F) →  (S  , F  ) consists of • a partial function ϕ st : S →  S  , and op st ∗  • for each w ∈ (domϕ ) and s ∈ domϕ st a partial function ϕw→s : Fw→s →  Fϕ st w→ϕ st s . Given ϕ : (S, F) →  (S  , F  ) and ϕ  : (S  , F  ) →  (S  , F  ) their composition ϕ; ϕ  is defined by • (ϕ; ϕ  )st = ϕ st ; ϕ st , and op op op • for each w∈(dom(ϕ; ϕ  )st )∗ and s ∈ dom(ϕ; ϕ  )st : (ϕ; ϕ  )w→s = ϕw→s ; ϕϕ st w→ϕ st s . Given ϕ, θ : (S, F) →  (S  , F  ), then ϕ ≤ θ if and only if • ϕ st ⊆ θ st , and op op • for each w ∈ (domϕ st )∗ and s ∈ domϕ st : ϕw→s ⊆ θw→s . Under these definitions the partial MSA-signature morphisms form a 23 -category, which is the category of the 23 MSA signatures. Given a partial MSA-signature morphism ϕ we denote by domϕ the signature op (domϕ st , domϕ op ) where (domϕ op )w→s = domϕw→s and by ϕ 0 : domϕ → ϕ the resulting (total) MSA-signature morphism. 3 For any signature Σ, Sen 2 MSA (Σ) = SenMSA (Σ) and for any partial MSA3 signature morphism ϕ, Sen 2 MSA (ϕ) is defined by • dom Sen 2 MSA (ϕ) = SenMSA (domϕ) and 3 3 • for each sentence ρ ∈ dom Sen 2 MSA (ϕ), Sen 2 MSA (ϕ)ρ = SenMSA (ϕ 0 )ρ. 3

Like for 23 PL this yields also a strict 23 -functor. 3 For any signature Σ, Mod 2 MSA (Σ) = Mod MSA (Σ) and for any partial MSA3 signature morphism ϕ, each ϕ-model M  , M ∈ Mod 2 MSA (ϕ)M  is defined by • for each sort symbol s in domϕ, Ms = Mϕ st s , and • for each operation symbol σ in domϕ, Mσ = Mϕ op σ .

4 Implicit Partiality of Signature Morphisms in Institution Theory

99

The definition on model homomorphisms is similar, we skip it here. Under these 3 definitions, Mod 2 MSA is a lax functor. The satisfaction relation is inherited from MSA, and the argument for the Satisfaction Condition in 23 MSA is similar to that in 23 PL. Example 7 The 23 MSA example can be twisted by considering less partiality in the signature morphisms. This can be done in several ways, in each case a different 3 -‘sub-institution’ of 23 MSA emerges. 2 1. We constrain ϕ st to be total functions. op 2. We let ϕ st to be partial functions but we constrain ϕw→s to be total. Example 8 The pattern of Example 6 can be applied to the extension of MSA that takes the ‘first order views’ of [6] in the role of signature morphisms. Since first order views are more general the the MSA signature morphisms, the resulting 23 -institution based upon partial first order views can thought as an extension of 23 MSA.

4.3.4

3 2 -institutional

Seeds

So far the Examples 5, 6, 7 and 8 are based upon a pattern that can be described as follows: 1. Consider a concrete 1-institution (that may be quite common). 2. Consider some form of partiality for its signature morphisms; often this can be done in several different ways (see Example 7). 3. Keep the sentences and the models of the original institution, but based on the partiality of the signature morphisms extend the concepts of sentence translations and of model reducts to 23 -institutional ones. The partiality of the sentence translations amounts to the fact that only the sentences that only involve symbols from the definition domain of the (partial) signature morphism can be translated. The relation-like aspect of the model reducts amounts to the fact that symbols that are outside the definition domain of the (partial) signature morphisms can be interpreted in several different ways in the models. 4. The satisfaction relation of the resulting 23 -institution is inherited from the original 1-institution. This pattern pervades a lot of useful 23 -institutions and can be captured as a generic mathematical construction that derives 23 -institutions from 1-institutions; this will be the topic of another publication [7]. However there are interesting examples of 3 -institutions that fall short off this pattern; we have see one already (Example 4) 2 and two more significant ones will appear in Sects. 4.3.6 and 4.5, respectively. In the following we propose a general scheme for defining 23 -institutions that on the one hand serves a technical purpose as it projects a convenient mathematical

100

R. Diaconescu

perspective on situations of interest, and on the other hand constitutes a framework for generating new 23 -institutions, some of them not necessarily being partiality-based. Definition 7 ( 23 -institutional seed) A 32 -institutional seed (Sign, Sen, Ω, T ) consists of • a 23 -category Sign, • a lax 23 -functor Sen : Sign → Pfn (the ‘sentence functor’), and • a designated ‘signature’ Ω ∈ |Sign| and a ‘truth’ function T : Sen(Ω) → 2. Proposition 2 Any 23 -institutional seed S = (Sign, Sen, Ω, T ) extends canonically to a lax 23 -institution I(S) = (Sign, Sen, Mod, |=) as follows: • for each signature Σ ∈ |Sign| we let Mod(Σ) being the partial order defined by |Mod(Σ)| = {M : Σ → Ω | Sen(M) total}, and by the order of Sign(Σ, Ω), • for each signature morphism ϕ we let – for each ϕ-model M  : Mod(ϕ)M  = {M ∈ |Mod(ϕ)| | ϕ; M  ≤ M}, – for each ϕ-model homomorphism M  ≤ N  : Mod(ϕ)(M  ≤ N  ) = {M ≤ N | ϕ; M  ≤ M, ϕ; N  ≤ N }, • for each Σ-model M and each Σ-sentence ρ we let M |= ρ if and only if T (Sen(M)ρ) = 1. Proof For showing the lax functoriality of Mod we consider signature morphisms ϕ, ϕ  such that ϕ = ϕ  and M  ∈ Mod(ϕ  ). Then Mod(ϕ  )(Mod(ϕ)M  ) = {M ∈ Mod(ϕ)M  | M  ∈ Mod(ϕ  )M  } (by the definition of composition in 23 (CATP )) = {M ∈ Mod(ϕ) | ∃M  ∈ Mod(ϕ) such that ϕ; M  ≤ M, ϕ  ; M  ≤ M  } (by the definitions of Mod(ϕ), Mod(ϕ  )) ⊆ {M ∈ Mod(ϕ) | ϕ; ϕ  ; M  ≤ M} (by the monotonicity of the composition in Sign) = Mod(ϕ; ϕ  )M  (by the definition of Mod(ϕ; ϕ  )).

The lax functoriality of Mod on identities may be checked as follows: 1Mod(Σ) (M) = {M} ⊆ {N : M → Ω | M ≤ N , Sen(N ) total} = Mod(1Σ )M. The laxity of the model reduct Mod(ϕ) follows via a straightforward check.

4 Implicit Partiality of Signature Morphisms in Institution Theory

101

For showing the Satisfaction Condition we consider a signature morphism ϕ, a ϕ-model M  , M ∈ Mod(ϕ)M  and ρ ∈ dom Sen(ϕ). Since ϕ; M  ≤ M by the monotonicity of Sen we have that Sen(ϕ; M  ) ⊆ Sen(M). By the lax property of Sen it follows that Sen(ϕ); Sen(M  ) ⊆ Sen(M). Since ρ ∈ dom Sen(ϕ) and since Sen(M  ) is total it follows that Sen(M  )(Sen(ϕ)ρ) = Sen(M)ρ. Consequently T (Sen(M  )(Sen(ϕ)ρ)) = T (Sen(M)ρ) which means (M  |= Sen(ϕ)ρ) = (M |= ρ). The following two situations show that Proposition 2 is a vehicle for obtaining natural 23 -institutions. Example 9 (Seeds for 23 PL, 23 MSA) 1. The 23 PL variant without model homomorphisms arises easily as an I(S) by taking Ω = 2 and by taking T to be the function that evaluates Boolean terms (for example T (¬(0 ∧ 1)) = 1, etc.) 2. Even a local variant of 23 MSA without model homomorphisms such that all carrier sets of the models are subsets of a fixed set U arises as a I(S) by defining Ω = (S Ω , F Ω ) by • S Ω = 2U , i.e. the sets of the subsets of S, and • for any s1 , . . . , sn , s ⊆ U , FsΩ1 ...sn →s is the set of all functions s1 × · · · × sn → s. The truth function T is based upon the evaluation of Ω-terms by recursion and functional composition as follows: • Any term t of sort s gets evaluated as an element T (t) ∈ s (note here the overloading of T ) defined by T (σ (t1 , . . . , tn )) = σ (T (t1 ), . . . , T (tn )). • For any equation t1 = t2 we set T (t1 = t2 ) = 1 if and only if T (t1 ) = T (t2 ). • The evaluation function T extends to composed sentence, in an obvious manner in the case of the Boolean connectives, and as follows in the case of quantifications. Given an Ω-sentence (∀x)ρ where x is a variable of sort s, then  T ((∀x)ρ) = {T (ρ(a)) | a ∈ s} where ρ(a) denotes the Ω-sentence obtained by replacing each occurence of x in ρ by a. Because the definition of 23 -institutional seeds involves deceptively poor data, there is a significant space for defining relevant 23 -institutions that do not fall into the pattern of partiality of signature morphisms. The following example, albeit rather artificial, may give an indication about this potential. Example 10 (A seed beyond partiality) We let

102

R. Diaconescu

• |Sign| = ω, the set of the natural numbers, • arrows m → n are pairs (a, b) of natural numbers such that a ≤ n − m, • the composition of arrows (a, b) : m → n and (c, d) : n → p is (a + c, b ∨ d) : m → p (by b ∨ d we denote the maximum of b and d); note that the composition is well defined, it is associative and has (0, 0) as identities. So far this yields a category. Now we make this into a 23 -category. • Given (a, b), (a  , b ) : m → n we let (a, b) ≤ (a  , b ) if and only if a = a  and b ≤ b. It is easy to check that this yields a partial order which is preserved by the compositions. The lax 23 -functor Sen : Sign → Pfn is defined as follows: • for each m ∈ ω, Sen(m) = {x ∈ ω | x ≤ m}, • for each arrow (a, b) : m → n in Sign, dom Sen(a, b) = {x ∈ ω | x ≤ m, x + a + b ≤ n} and Sen(a, b)(x) = x + a for each x ∈ dom Sen(a, b). The interested reader may check the lax functoriality properties of Sen; we skip this here. Now any choice of Ω and T : Sen(Ω) → 2 completes the definition of this 3 -institutional seed. 2

4.3.5 Model Amalgamation in 23 -institutions The following definition extends the crucial notion of model amalgamation concept from 1-institutions to 23 -institutions. For the sake of simplicity of presentation, this is presented for lax cocones of spans, the general concept for lax cocones over arbitrary diagrams of signature morphisms being an obvious generalisation. Moroever all the results in this section can be presented in that more general framework without a real additional effort. Definition 8 A model for a diagram of signature morphisms in a 23 -institution consists of a model Mk for each signature Σk in the diagram such that for each signature morphism ϕ : Σi → Σ j in the diagram we have that Mi ∈ Mod(ϕ)M j . The diagram is consistent when it has at least one model. Definition 9 (Model amalgamation in 23 -institutions) In any 23 -institution, a lax cocone for a span in the 23 -category of the signature morphisms

4 Implicit Partiality of Signature Morphisms in Institution Theory

103

Σ θ1

θ2



Σ1

θ0

ϕ1



Σ2 ϕ2

Σ0 has model amalgamation when each model of the span admits an unique completion to a model (called the amalgamation) of the lax cocone. When dropping the uniqueness condition, the property is called weak model amalgamation. Note that when the signature morphisms involved in Definition 9 are Modmaximal we get the ordinary concept of model amalgamation for (1-)institution theory. This also means that θ0 and Σ0 -model become redundant. In the proper 23 case their presence is necessary, this being one of the important aspects that distinguishes the 23 case from ordinary (1-)institution theoretic model amalgamation. Example 11 In 23 PL, for the diagram of Definition 9 we consider the signatures Σ0 = { p, p  , p1 , p2 }, Σ1 = { p, p1 , p1 }, Σ2 = { p, p2 , p2 }, Σ = { p, p  , p1 , p2 } and let ϕ1 , ϕ2 , θ0 , θ1 , θ2 be the maximal partial inclusions. We prove that this cocone has model amalgamation as follows. We assume {Mk | k = 0, 1, 2} a model for the span (ϕ1 , ϕ2 ) and define the Σ-model M by M( p) = Mk ( p), M( pk ) = Mk ( pk ), k ∈ 1, 2, and M( p  ) = M0 ( p  ). It is easy to see that M thus defined is the unique amalgamation of M0 , M1 , M2 . In ordinary institution theory the causal dependency between pushout squares and model amalgamation squares is central and well known (cf. [4, 8, 26], etc.). The following result refines this to 23 -institutions in a way intended to maximize its applicability in concrete situations. Proposition 3 For any 23 -institutional seed S and any 1-subcategory T ⊆ Sign such that • Sen preserves and reflects maximality (ϕ is maximal with respect to the order of Sign if and only if it is Sen-maximal), • T contains all maximal signature morphisms, and • if ϕ ∈ T and ϕ ≤ ϕ  then ϕ  ∈ T, in I(S) each lax T-pushout of signature morphisms has weak model amalgamation. Proof We consider a lax T-pushout (θ0 , θ1 , θ2 ) for a span (ϕ1 , ϕ2 ) of signature morphisms like shown in the diagram below, and a model {Mk | k = 0, 1, 2} for the span (ϕ1 , ϕ2 ). By the first and second assumptions this means that we have a lax T-cocone (M0 , M1 , M2 ) for the span (ϕ1 , ϕ2 ). By the universal property of (θ0 , θ1 , θ2 ) there

104

R. Diaconescu

exists an unique signature morphism M : Σ → Ω in T such that θk ; M = Mk for k = 0, 1, 2. (4.5) Ω M

M1

M2

Σ θ1

Σ1

θ2



θ0

ϕ1





M0

Σ2 ϕ2

Σ0 In order to establish that M is a model we show that M is maximal; then since Sen preserves maximality it follows that Sen(M) is total. Let M ≤ N . By the third assumption it follows that N ∈ T. For each k = 0, 1, 2, by the monotonicity of the composition, we have that Mk = θk ; M ≤ θk ; N . Because Mk is maximal (as a consequence of Sen reflecting maximality) it follows that Mk = θk ; N for each k = 0, 1, 2. By the uniqueness of M as a meditating arrow between lax T-cocones it follows that M = N . Hence M is maximal. One quick note on the first condition of Proposition 3 which although holds naturally in many 23 -institutions of interest (such as those from Examples 5, 6, 7 and 8), it has to be assumed in the abstract setup since there are concrete situations when it does not hold (such as the 23 -institution of Example 10 where Sen preserves maximality but does not reflect it). The following result gives the important information that we should in general give up expectations that weak lax cocones may involve ‘non-total’ signature morphisms; this will be also used to strengthen the conclusion of Proposition 3. Proposition 4 For any 23 -institutional seed S and any 1-subcategory T ⊆ Sign such that • Sen is strict, and • T contains all Sen-maximal signature morphisms, for any consistent span (ϕ1 , ϕ2 ) of signature morphisms in the 23 -institution I(S) any of its each weak lax T-pushout cocones (θ0 , θ1 , θ2 ) consists only of Sen-maximal signature morphisms. Proof The consistency of the span means that it has a lax cocone (M0 , M1 , M2 ) such that each Sen(Mk ) is total for k = 0, 1, 2. By the second assumption of the proposition it follows that this is a T-cocone. By the weak lax T-pushout property of (θ0 , θ1 , θ2 ) there exists an M : Σ → Ω in T such that θk ; M = Mk for k = 0, 1, 2 (like in diagram 4.5). Since Sen is strict it follows that Sen(θk ); Sen(M) = Sen(Mk ), k = 0, 1, 2. Because Sen(Mk ) is total, Sen(θk ) must be total too.

4 Implicit Partiality of Signature Morphisms in Institution Theory

105

The outstanding condition of Proposition 4 is that of consistency of the span. Although at the abstract level the consistency of spans has to be assumed axiomatically, in concrete situations, spans of real signature morphisms are very easily consistent. For example in 23 PL it is enough to consider (Mk ) p = 1, k = 0, 1, 2, for all propositional symbols p, and in 23 MSA to consider Mk , k = 0, 1, 2, having a fixed singleton set {∗} as underlying/carrier sets. However the concept gets real substance in 23 -institutions where the signature morphisms carry more structure than the common signature morphisms, an important example being given by that of theory morphisms of Sect. 4.3.6 below. Corollary 2 If in addition to the hypotheses of Proposition 3 we have that Sen is strict then the conclusion of Proposition 3 is that in I(S) each lax T-pushout of signature morphisms has model amalgamation. Proof Let us suppose that a model {Mk | k = 0, 1, 2} of the span (ϕ1 , ϕ2 ) has two amalgamations M and N . In other words θk ; M, θk ; N ≤ Mk for k = 0, 1, 2. Note that the second assumption of Proposition 4 is a consequence of the assumptions of Proposition 3. By the strictness of Sen we have that Sen(θk ; M) = Sen(θk ); Sen(M) for k = 0, 1, 2 and likewise for N . Since Sen(θk ) (by Proposition 4), Sen(M), Sen(N ) (since M, N are models) are total functions, it follows that all Sen(θk ; M), Sen(θk ; N ), k = 0, 1, 2, are total functions too. By the first assumption of Proposition 3 it follows that all θk ; M, θk ; N , k = 0, 1, 2, are maximal. Hence θk ; M = θk ; N = Mk , k = 0, 1, 2. By the uniqueness part of the universal property of lax T-pushouts it follows that M = N . The following corollary indicates that the result of Corollary 2 covers many concrete situations of interest. Corollary 3 In both 23 PL and 23 MSA each lax T-pushout of signature morphisms has model amalgamation in any of the following situations for T (the latter two apply only for 23 MSA): 1. all signature morphisms, 2. the total signature morphisms, 3. the signature morphisms that are total on the sort symbols, i.e. ϕ st are total functions, and op 4. the signature morphisms that are total on the operation symbols, i.e. ϕw→s are total functions. Proof Recall from Sect. 4.3 how 23 PL arises as an I(S). In the case of 23 MSA, although due to cardinality issues it cannot be presented as a whole as an I(S), we may consider ‘localised’ versions that have all carriers of models included in a fixed set U . Thus, given a span of signature morphisms an a model {Mk | k = 0, 1, 2} of it, we may take U to be the union of all the carrier sets in M0 , M1 , M2 . Then the hypotheses of Proposition 3 and Corollary 2 can be checked quite easily in each of the cases for T listed in the statement of the corollary.

106

R. Diaconescu

So far we have established model amalgamation for classes of lax cocones that enjoy a universal property of a colimit. In the following we develop some results that may be used to extend model amalgamation to other classes of lax cocones. First we need a couple of new concepts. Definition 10 (Model conservativeness) In a 23 -institution a signature morphism ϕ is model conservative when for each ϕ-model M there exists a ϕ-model M  such that M ∈ Mod(ϕ)M  . In general, in many concrete situations of interest – 23 PL and 23 MSA included – a signature morphism is model conservative if and only if it is injective (this does not exclude the possibility of partiality). Definition 11 (Model strictness) In a 23 -institution a signature morphism ϕ is Modstrict when for each signature morphism θ such that θ  = ϕ we have that Mod(ϕ); Mod(θ ) = Mod(θ ; ϕ). In general, in many concrete situations of interest – 23 PL and 23 MSA included – a signature morphism is Mod-strict whenever it is total. One way to see this is through the following general result. Proposition 5 For any 23 -institutional seed S, any Sen-maximal signature morphism is Mod-strict in the associated 23 -institution I(S). Proof Since the other inclusion holds by the lax functoriality of Mod, we need only to prove that for each ϕ-model M  we have that Mod(θ ; ϕ)M  ⊆ Mod(θ )(Mod(ϕ)M  ). Any M ∈ Mod(θ ; ϕ)M  is characterised by the properties that Sen(M) is total and that (4.6) θ ; ϕ; M  ≤ M. Now since Sen(ϕ) and Sen(M  ) are total functions it follows that their composition is a total function too, hence by the lax functoriality of Sen is follows that Sen(ϕ; M  ) is a total function too. This means that ϕ; M  is a model in Mod(ϕ)M  . This and (4.6) imply that M ∈ Mod(θ )(Mod(ϕ)M  ). Proposition 6 In any 23 -institution, consider a lax cocone (θ0 , θ1 , θ2 ) of a span of signature morphisms (ϕ1 , ϕ2 ) and a signature morphism μ such that θ  = μ. Then 1. if the lax cocone θ has weak model amalgamation and μ is model conservative then the lax cocone θ ; μ has it too, and 2. if there exists a lax cocone θ  that has weak model amalgamation and such that θ ; μ ≤ θ  , and μ is Mod-maximal and Mod-strict then the lax cocone θ has weak model amalgamation too.

4 Implicit Partiality of Signature Morphisms in Institution Theory

107

Proof 1. Consider a model {Mk | k = 0, 1, 2} for the span (ϕ1 , ϕ2 ). There exists a θ -model M such that Mk ∈ Mod(θk )M, k = 0, 1, 2. Since μ is model conservative there exists a model M  such that M ∈ Mod(μ)M  . Then for each k ∈ 0, 1, 2, Mk ∈ Mod(θk )(Mod(μ)M  ) ⊆ Mod(θk ; μ)M  (by the lax property of Mod). Hence M  is an amalgamation of {Mk | k = 0, 1, 2}. 2. Consider a model {Mk | k = 0, 1, 2} for the span (ϕ1 , ϕ2 ). There exists a μmodel M  such that Mk ∈ Mod(θk )M  , k = 0, 1, 2. Since θk ; μ ≤ θk , k = 0, 1, 2, and since Mod preserves orders, we have that Mod(θk )M  ⊆ Mod(θk ; μ)M  , k = 0, 1, 2. Hence Mk ∈ Mod(θk ; μ)M  , k = 0, 1, 2. By the Mod-maximality assumption we have that Mod(μ)M  = {M}. By the Mod-strictness assumption it follows that for each k = 0, 1, 2, Mk ∈ Mod(θk ) (Mod(μ)M  ) = Mod(θk )M. Hence M is an amalgamation of {Mk | k = 0, 1, 2}. We can combine Propositions 3 and 6 for getting a larger class of lax cocones enjoying weak model amalgamation. Corollary 4 Under the hypotheses of Proposition 3 we consider a lax cocone (θ0 , θ1 , θ2 ) for a span of signature morphisms (ϕ1 , ϕ2 ) and a signature morphism μ such that θ  = μ. Then 1. if θ is a lax T-pushout and μ is model conservative then the lax cocone θ ; μ has weak model amalgamation, and 2. if there exists a lax T-pushout θ  such that θ ; μ ≤ θ  and μ is Sen-maximal then the lax cocone θ has weak model amalgamation. Proof While 1. is a direct consequence of Propositions 3 and 6, the argument for 2. needs a bit of elaboration. By Proposition 5 we get that μ is Mod-strict. Now let M be any μ-model. Because Sen(μ) and Sen(M) are total functions, by the lax functoriality of Sen it follows that Sen(μ; M) is a total function too. Since Sen reflects maximality (one of the hypothesis of Proposition 3) it follows that μ; M is maximal, hence Mod(μ)M = {μ; M}. This shows that μ is Mod-maximal. Now all conditions of Propositions 3 and 6 are fulfilled, therefore the conclusion 2. follows. Example 12 The (weakened version of the) model amalgamation situation of Example 11 can be obtained from Corollary 4 (2.) as follows. • We set T to be the class of the total functions. • For each k = 0, 1, 2 we let θk to be the inclusion of Σk into { p, p  , p1 , p1 , p2 , p2 }. This is a T-pushout. • We let μ be the inclusion { p, p  , p1 , p2 } ⊆ { p, p  , p1 , p1 , p2 , p2 }.

4.3.6 Theory Morphisms in 23 -institutions In 1-institution theory, the concept of theory morphism plays an important role in connection to foundational works in computer science. It was one of the central institution

108

R. Diaconescu

theoretic concepts introduced and studied in the seminal publication [16]. The mathematical foundations of conceptual blending are based on theory morphisms since concepts are modelled as logical theories and their translations as theory morphisms [12, 14]. While theories in 23 -institutions are the same as theories in 1-institutions, the 23 -institution theoretic concept of theory morphism is much more subtle because of the partiality of the sentence translations. In fact there are at least four ways to extend the 1-institution concept of theory morphism to 23 -institutions. Definition 12 In a 23 -institution a theory (Σ, E) consists of a signature Σ and a set E of Σ-sentences (E ⊆ Sen(Σ)). Given two theories (Σ, E) and (Σ  , E  ) in a 23 -institution, a signature morphism ϕ : Σ → Σ  is • a pseudo-morphism of theories when Sen(ϕ)E ⊆ E • , • a weak morphism of theories when Sen(ϕ)E • ⊆ E • , • a strong morphism of theories when for each Σ  -model M  such that M  |= E  there exists M ∈ Mod(ϕ)M  such that M |= E, and • an ultra-strong morphism of theories when for all Σ  -models M  and Σ-models M such that M  |= E  and M ∈ Mod(ϕ)M  we have that M |= E. Fact 1 Any weak morphism is pseudo-morphism and any strong morphism is weak. If Mod does not admit emptiness then any ultra-strong morphism is strong. In 1-institution theory the four concepts of theory morphisms of Definition 12 collapse to the single established 1-institution concept of theory morphism (cf. [4, 16], etc.). But in the realm of 23 -institutions they are in general different concepts as shown by the following very simple counterexamples: • In 23 PL consider Σ = { p, q}, Σ  = {q}, E = { p ∧ q}, E  = ∅. Then ϕ, the maximal partial inclusion of Σ into Σ  (domϕ = {q}), is a pseudo-morphism (Σ, E) → (Σ  , E  ) but it is not a weak one since q ∈ Sen(ϕ)E • \ E • . • In the quantifier-free variant of 23 MSA (which means sentences without quantifiers) consider Σ consisting of one sort symbol s and two constants c, c , Σ  consisting only of the sort symbol s and a constant c , E = {¬(c = c )}, and E  = ∅. Then ϕ, the maximal partial inclusion of Σ into Σ  , is a (trivially) weak morphism (Σ, E) → (Σ  , E  ) but it is not a strong one since any singleton set does not admit a ϕ-reduct that satisfies E  .7 • In 23 PL consider Σ = { p, q}, Σ  = {q}, E = { p ∧ q}, E  = {q}. Then ϕ, the maximal partial inclusion of Σ into Σ  (domϕ = {q}), is a strong morphism (Σ, E) → (Σ  , E  ) but it is not an ultra-strong one. There exists only one model M  |= E  , namely M  (q) = 1. Then M  has a ϕ-reduct M such that M |= E defined by M( p) = M(q) = 1. However not any ϕ-reduct of M  enjoys this property, for example N such that N ( p) = 0 and N (q) = 1.

7 Counterexample

communicated by Daniel G˘ain˘a.

4 Implicit Partiality of Signature Morphisms in Institution Theory

109

In general pseudo-morphisms and do not compose and the ultra-strong ones compose under the condition that Mod is strict rather than (properly) lax. The strictness condition on Mod is a very heavy and unrealistic one in the applications (actually unlike the strictness condition on Sen which holds in a lot of 23 -institutions of interest). This makes both extremes, the pseudo-morphisms and the ultra-strong morphisms, unsuitable as a 23 -institutional replacement for the 1-institution theory morphisms and leaves us only with the middle options. But it is not only the failure in compositionality that makes them unsuitable, their very nature also feel inadequate as can be for example seen by inspecting the very simple examples above. Pseudo-morphisms are too weak and the ultra-strong morphisms seem to require too much. The strong theory morphisms compose unconditionally, while the weak ones compose under a certain condition that holds often in concrete situations. Proposition 7 In any 23 -institution I, by inheriting the 23 -categorical structure of SignI • strong morphisms of theories yield a 23 -category – denoted ThIs , and • when Sen is oplax, the weak theory morphisms yield a 23 -category – denoted ThIw . Proof The proof is based on the fact that the composition of theory morphisms yields a theory morphism; the rest being straightforward. Let us consider theory morphisms ϕ : (Σ, E) → (Σ  , E  ) and ϕ  : (Σ  , E  ) → (Σ  , E  ). For the ‘strong’ case we consider M  ∈ |Mod(Σ  )| such that M  |= E  . Then there exists M  ∈ Mod(ϕ  )M  such that M  |= E  . It follows that there exists M ∈ Mod(ϕ)M  such that M |= E. Then by the lax property of Mod it follows that M ∈ Mod(ϕ; ϕ  ). For the ‘weak’ case we have: Sen(ϕ; ϕ  )E • ⊆ = ⊆ ⊆

(Sen(ϕ); Sen(ϕ  ))E • Sen(ϕ  )(Sen(ϕ)E • ) Sen(ϕ  )E • E •

by the oplax functoriality of Sen by Lemma 1 since ϕ is theory morphism since ϕ  is theory morphism.

From now on whenever we encounter weak theory morphisms we tacitly assume that Sen is oplax. The constructions in the Corollaries 5 and 6 constitute natural examples of 23 institutions that are not based on an explicit form of partiality of signature morphisms. Corollary 5 For any 23 -institution I = (Sign, Sen, Mod, |=) its 23 -category of weak/ strong theory morphisms determines a 23 -institution Iw /Is as follows (i is w or s): • the 23 -category of signatures Signi is ThiI , • Seni is a trivial lifting of Sen to theories, i.e. Seni (Σ, E) = Seni (Σ), etc., • Mod i (Σ, E) is the full subcategory of Mod(Σ) of the Σ-models satisfying E, and for each theory morphism ϕ : (Σ, E) → (Σ  , E  ) and each (Σ  , E  )-model M  Mod i (ϕ)M  = {M ∈ Mod(ϕ)M  | M |= E}

110

R. Diaconescu

• and the satisfaction relation is inherited from I. Proof The only interesting part of the proof is the lax functoriality of Mod i , the rest being straightforward. We consider ϕ : (Σ, E) → (Σ  , E  ) and ϕ  : (Σ  , E  ) → (Σ  , E  ) theory morphisms. For any (Σ  , E  )-model M  we have that Mod i (ϕ)(Mod i (ϕ  )M  ) = Mod i (ϕ){M  ∈ Mod(ϕ  )M  | M  |= E  } definition of Mod i = {M ∈ Mod(ϕ)M  | M  ∈ Mod(ϕ  )M  , M |= E, M  |= E  } definition of Mod i ⊆ {M ∈ Mod(ϕ)M  | M  ∈ Mod(ϕ  )M  , M |= E} = {M ∈ Mod(ϕ)(Mod(ϕ  )M  ) | M |= E} ⊆ {M ∈ Mod(ϕ; ϕ  )M  | M |= E} since Mod is lax = Mod i (ϕ; ϕ  )M  definition of Mod i .

Iw /Is generalise the concept of the “institution of theories” from 1-institution theory [4] to 23 -institutions. Note that both of them constitute examples of 23 -institutions where the model functor may naturally admit emptiness, and this without being inherited from the base institution. There is also an alternative way to complete the definition of ThIw /ThIs to that of a 23 -institution by shifting the weight of the construction from the models side to the sentences side. However this construction is conditioned by I being a lax 3 -institution. 2 Corollary 6 For any lax 23 -institution I = (Sign, Sen, Mod, |=) its 23 -category of weak/strong theory morphisms determines a lax 23 -institution Iw /Is  as follows (i  is w  or s  ): • the 23 -category of signatures Signi  is ThiI , • Seni  (Σ, E) = E • and for each theory morphism ϕ : (Σ, E) → (Σ  , E  ) we let – dom Seni  (ϕ) = E • ∩ dom Sen(ϕ), and – Seni  (ϕ)ρ = Sen(ϕ)ρ for all ρ ∈ dom Seni  (ϕ). • Mod i  is the trivial lifting of Mod, i.e. Mod i  (Σ, E) = Mod(Σ), etc., • and the satisfaction relation is inherited from I. Proof The only interesting part of the proof is the lax functoriality of Seni  , the rest being straightforward. We consider ϕ : (Σ, E) → (Σ  , E  ) and ϕ  : (Σ  , E  ) → (Σ  , E  ) theory morphisms. On the one hand we have that dom Seni  (ϕ); Seni  (ϕ  ) = = = = ⊆ ⊆ =

 dom Seni  (ϕ) ∩ Seni−1  (ϕ)(dom Seni  (ϕ )) E • ∩ dom Sen(ϕ) ∩ Sen(ϕ)−1 (E • ∩ dom Sen(ϕ  )) (definition of Seni  ) E • ∩ dom Sen(ϕ) ∩ Sen(ϕ)−1 (E • ) ∩ Sen(ϕ)−1 (dom Sen(ϕ  )) E • ∩ dom (Sen(ϕ); Sen(ϕ  )) ∩ Sen(ϕ)−1 (E • ) E • ∩ dom Sen(ϕ; ϕ  ) ∩ Sen(ϕ)−1 (E • ) (Sen is lax) E • ∩ dom Sen(ϕ; ϕ  ) dom Seni  (ϕ; ϕ  ) (definition of Seni  ).

4 Implicit Partiality of Signature Morphisms in Institution Theory

111

On the other hand for each ρ ∈ dom Seni  (ϕ); Seni  (ϕ  ), Seni  (ϕ  )(Seni  (ϕ)ρ) = Sen(ϕ  )(Sen(ϕ)ρ) = Sen(ϕ; ϕ  )ρ = Seni  (ϕ; ϕ  )ρ.

4.3.7 Lifting Properties One of the starting motivations in 1-institution theory was the development of a general logic-independent method for the aggregation of software modules, modelled as institutional theories [16]. The process of “putting together”—just to use a favourite phrase of Goguen and Burstall—institutional theories relies on colimits in the category of theory morphisms, an important result being the automatic lifting of colimits from the category of signature morphisms to that of theory morphisms (see [4, 16, 26]). The following results replicate this in the context of 23 -institutions in support of conceptual blending theory. The more complicated situation of colimits and theory morphisms in 23 -institutions leads to a significantly more complex situation with respect to the lifting of colimits from signatures to theories. Proposition 8 (Lifting lax cocones from signatures to theories) Consider a span of weak/strong theory morphisms ϕk : (Σ0 , E 0 ) → (Σk , E k ), k = 1, 2, and a lax cocone for the underlying span of signature morphisms like shown in the following diagram. (4.7) Σ θ1

θ2



Σ1

θ0



ϕ1

Σ2 ϕ2

Σ0  Then for any E ⊆ Sen(Σ) such that k=0,1,2 Sen(γk )E k• ⊆ E • the following diagram displays a lax cocone of theory morphisms (Σ, E)

(4.8)

θ1

(Σ1 , E 1 )

θ2



θ0



ϕ1

(Σ2 , E 2 ) ϕ2

(Σ0 , E 0 ) where 1. in the ‘weak’ case, γk = θk , k = 0, 1, 2, and

112

R. Diaconescu

2. in the ‘strong’ case, γk , k = 0, 1, 2 are any signature morphisms such that θk ≤ γk and Sen(γk ) are total functions. Proof We have to only to show that θk : (Σk , E k ) → (Σ, E), k = 0, 1, 2 are theory morphisms. The ‘weak’ case is straightforward. For the ‘strong’ case we consider any M ∈ |Mod(Σ)| such that M |= E. Because Sen(γk ) are total, by the Satisfaction Condition it follows that for any Mk ∈ Mod(γk )M, Mk |= E k . Since θk ≤ γk by the monotonicity of Mod it follows that Mk ∈ Mod(θk )M. Hence θk are strong theory morphisms. Corollary 7 (Lifting lax pushouts from signature to theories) In the context of Proposition 8, given a 1-subcategory T ⊆ Sign let Tw /Ts denotes the class of weak/strong theory morphisms ϕ such that ϕ ∈ T. We further assume that • Mod does not admit emptiness, • the lax  cocone of signature morphisms is a lax T-pushout, • E • = ( k=0,1,2 Sen(γk )E k• )• . Then the lax cocone of theory morphisms obtained by Proposition 8 • is a lax Tw -pushout when Sen is lax (therefore it is strict) and each signature morphism in T is Sen-maximal, • is a lax Ts -pushout when each signature morphism in T is Mod-maximal. Proof We consider a lax Tw /Ts -cocone θ  for the span of weak/strong theory morphisms. By the lax T-pushout property in Sign (the category of signature morphisms) there exists an unique μ ∈ T such that θk ; μ = θk , k = 0, 1, 2. It only remains to show that μ is a weak/strong theory morphism (Σ, E) → (Σ  , E  ), where (Σ  , E  ) is the vertex of θ  . We first solve the weak case. Let us recall that in this case γk = θk . For that we need the following lemma (we skip its proof): Lemma 2 In any 23 -institution such that Mod does not admit emptiness, for any signature morphism ϕ that is Sen-maximal and for any set E of ϕ-sentences, we have that Sen(μ)E • ⊆ (Sen(μ)E)• . Then

4 Implicit Partiality of Signature Morphisms in Institution Theory

113

Sen(μ)E • =  = Sen(μ)( k=0,1,2 Sen(θk )E k• )• ⊆ (Sen(μ) k=0,1,2 Sen(θk )E k• )• by the second and third assumptions and by Lemma 2 = (k=0,1,2 Sen(μ)(Sen(θk )E k• ))• = ( k=0,1,2 Sen(θk ; μ)E k• ))• by  the strictness assumption on Sen = k=0,1,2 Sen(θk )E k• ))• ⊆ (E • )• since θk are weak theory morphisms = E • . Now comes the strong case. We consider a Σ  -model M  such that M  |= E  . Since μ, θk ∈ T are Mod-maximal, let M be the unique model in Mod(μ)M  and for each k = 0, 1, 2 let Mk be the unique model in Mod(θk )M. Since θk ≤ γk , by the monotonicity of Mod it follows that Mod(γk )M ⊆ Mod(θk )M. Since Mod does not admit emptiness this means that Mk is the unique member of Mod(γk )M too. By the lax property of Mod and by the equalities θk = θk ; μ it follows that Mod(θk )(Mod(μ)M  ) ⊆ Mod(θk )M  which means

Mod(θk )M ⊆ Mod(θk )M  .

By the Mod-maximality assumption it follows that Mod(θk )M  = {Mk }. Since θk is a strong theory morphism (Σk , E k ) → (Σ  , E  ) we have that Mk |= E k . By the Satisfaction Condition for γk (and by keeping in mind that Sen(γk ) is total) we obtain that M |= Sen(γk )E k• , k = 0, 1, 2. This shows that M |= E. The only apparently restrictive assumption in the applications is the Sen/Modmaximality condition on the signature morphisms in T. Very often Sen and Modmaximality say the same thing, namely that the corresponding signature morphisms are total. However Proposition 4 tells us that in many situations of interest, anyway one cannot get beyond that with lax T-pushouts. Although this does not constitute a real restriction in the applications, we may also note that the weak case adds a supplementary technical condition to the strong case, namely that Sen is lax. Proposition 9 (Lifting model amalgamation from signatures to theories) Under the framework of Proposition 8, if • the lax  cocone of signature morphisms has (weak) model amalgamation, and • E • = ( k=0,1,2 Sen(γk )E k• )• then the lax cocone of theory morphisms has (weak) model amalgamation too. Proof We treat both the ‘weak’ and the ‘strong’ case in one shot because there is no essential difference between them.

114

R. Diaconescu

Let i ∈ {w, s}. We consider (M0 , M1 , M2 ) a model for the span of theory morphisms. According to the definition of Mod i we have that M0 ∈ Mod(ϕk )Mk for k = 1, 2. We show that if M is an amalgamation of M0 , M1 , and M2 with respect to the lax cocone of signature morphisms then it is an amalgamation with respect to the lax cocone of theory morphisms too. |= E k• , by the Satisfaction Let k ∈ {0, 1, 2}. Since Mk ∈ Mod(γk )M, since Mk • Condition it follows that M |= Sen(γk )E k . Hence M |= k=0,1,2 Sen(γk )E k• . Therefore M |= ( k=0,1,2 Sen(γk )E k• )• = E • . This completes the proof for the weak model amalgamation case. The conclusion can be extended to the proper (non-weak) model amalgamation case by noting (by a simple reductio ad absurdum argument) that the uniqueness of amalgamation at the level of signature morphisms implies the uniqueness at the level of theory morphisms.

4.4 Theory Blending in 23 -institutions 4.4.1 Computational Creativity and Conceptual Blending Computational creativity is a relatively recent multidisciplinary science, with contributions from/to artificial intelligence, cognitive sciences, philosophy and arts, going back at least until to the notion of bisociation, presented by Arthur Koestler [22]. Its aims are not only to construct a program that is capable of human-level creativity, but also to achieve a better understanding and to provide better support for it. Conceptual blending was proposed by [10] as a fundamental cognitive operation of language and common-sense, modelled as a process by which humans subconsciously combine particular elements of two possibly conceptually distinct notions, as well as their relations, into a unified concept in which new elements and relations emerge. The structural aspects of this cognitive theory have been given rigorous mathematical grounds by Goguen [12, 13], based upon category theory. In this formal model, concepts are represented as logical theories giving their axiomatization. Goguen used the algebraic specification language OBJ [17] to axiomatize the concepts, a language that is based upon a refined version of equational logic; but in fact the approach is independent of the logical formalism used (this is why category theory is involved). This approach is illustrated by the diagram in Fig. 4.1, which has to be read in an order-enriched categorical context: The nodes correspond to logical theories and the arrows to theory morphisms, but the diagram does not commute in a strict sense. There is only a lax form of commutativity, meaning that the compositions in the left- and the right-hand sides of the diagram are both ‘less’ than the arrow at the centre. The ‘less’ comes from the fact that the arrows (to be interpreted as theory morphisms) are subject to an ordering that reflects the fact that they correspond to partial rather than total mappings.

4 Implicit Partiality of Signature Morphisms in Institution Theory

115

Fig. 4.1 23 -categorical blending

In the above-mentioned work by Goguen there are convincing arguments, supported by examples, for this partiality aspect, which represents very much a departure to a different mathematical realm than that of logical theories (even when considered in a very general sense, as commonly done in modern computer science). In category-theoretic terms, this means that we need to consider there categories equipped with partial orders on the hom-sets that are preserved by the compositions of arrows/morphisms. To summarise the main mathematical idea underlying Goguen’s approach to theory blending: Theory blending is a cocone in a 23 -category in which objects represent logical theories and arrows correspond to partial mappings between logical theories.

There is still a great deal of thinking on whether the cocone should actually be a colimit (in other words, a minimal cocone) or not necessarily. An understanding of this issue is that blending should not necessarily be thought as a colimit, but that colimits are related to a kind of optimality principle. Moreover, since 23 -category theory has several different concepts of colimits, there is still thinking about which of those is most appropriate for modelling the blending operation. Goguen’s ideas about theory blending benefited from an important boost with the European FP7 project COINVENT [27] that has adopted them as its foundations. Based on this, a creative computational system has been implemented and demonstrated in fields like mathematics [18] and music [9] (although both use the strict rather than the 23 -version of category theory). However, the COINVENT approach still lacks crucial theoretical features, especially a proper semantic dimension. Such a dimension is absolutely necessary when talking about concepts because meaning and interpretation are central to the idea of concept. For example, the idea of consistency of a concept depends on the semantics. If one considers also the abstraction level of Goguen’s approach in its general form, of non-commitment to particular logical systems, then the institution-theoretic dimension appears as inevitable. In fact, Goguen argued for the role of institution theory in [14], and so does the COINVENT project. Although institution theory in its conventional form cannot be used as such in a proper way because it cannot capture the partiality of theory morphisms (which boils down to the partiality of signature morphisms), its extension to 23 -institutions seem to be the appropriate framework for theory blending.

116

R. Diaconescu

4.4.2 Theory Blending in 23 -institutions The upgrade of Goguen’s approach to conceptual blending within the context of 3 -institutions appears as a stepwise process as follows: 2 1. The input is a consistent span of weak/strong theory morphisms ϕ1 , ϕ2 in a 3 -institution I, which means a consistent span in Iw /Is . 2 2. Then we consider an appropriate lax cocone for the underlying span of signature morphisms that has weak model amalgamation: Σ θ1

θ2



Σ1

θ0

ϕ1



Σ2 ϕ2

Σ0 3. Next we lift it as in Proposition 8 to a lax cocone of theory morphisms: (Σ, E) θ1

(Σ1 , E 1 )

θ2



θ0

ϕ1



(Σ2 , E 2 ) ϕ2

(Σ0 , E 0 ) By virtue of Proposition 9 it follows that this lax cocone of theory morphisms enjoys weak model amalgamation too. Since we started from a consistent span of theory morphisms, it follows that the vertex of the blending cocone – the new theory (Σ, E)—is consistent. This is a very general scheme that has a number of parameters. • A choice of an appropriate 23 -institution for modelling the respective concepts as theories, and their translations by theory morphisms. • A choice between week and strong concepts of theory morphisms. We still need a better understanding on which one of these two concepts of theory morphisms is most appropriate; it may be possible that this depends on the actual application. • What is an ‘appropriate’ lax cocone for the underlying span of signature morphisms is a challenging issue that seems to be difficult to answer at the general level;

4 Implicit Partiality of Signature Morphisms in Institution Theory

117

perhaps seeking for a precise answer at a general level does not even make sense. Some consider that the near pushout solution proposed by Goguen [12] may be too permisive. Though what should be indisputable is the weak amalgamation property for the lax cocone.

4.5 Theory Changes 4.5.1 The Problem of Merging Software Changes The diagram in Fig. 4.1 that depicts the process of theory blending also has an important interpretation in software engineering: In large software-development projects, it often happens that a part of the system is being modified (deleting of code also allowed) by several different programmers concurrently, after which it is necessary to merge the changes to form a single consistent version. Even cooperative distributed writing of papers or documents may fall under this topic; writing scientific papers in LATEX certainly qualifies, as LATEX is indeed a programming language. Like in the case of theory blending, a 23 -categorical approach is necessary (changes being modelled as partial mappings) [11] but this is not enough because of not being able capture the semantic dimension of software. For example in order to be able to have a notion of consistency for merges we need to enhance the approach with a model theory. This software engineering problem is a second application domain that drives our development of the theory of 23 -institutions.

4.5.2 Theory Changes In this section we develop an alternative concept of mapping between theories in 3 -institutions that does not resemble or generalise the theory morphisms from 12 institution theory, but which models software changes. Theory changes formalise the process of modifications in specification or declarative programs. In this modelling a flat (unstructured) specification or program is modelled by a theory. Modifications or changes operate at two different levels, at the signature and the sentences level. The changes at the signature level are encapsulated in the respective concept of signature morphism, while those at the sentences level are made explicit and modelled by the partial inclusion component of the concept of theory changes. This represents a marking of the part of the translated sentences that is not touched by the change, which may consist both of deletions or of adding sentences. The fact that the partial inclusion is not necessarily maximal accounts for the possibility that sentences may be deleted and later added back, or viceversa. Also we assume that the programmer is not committed to the parts that he leaves unchanged.

118

R. Diaconescu

First we develop a theory of partial inclusions. A partial function f : A →  B is an inclusion when f consists only of pairs of elements of the form (a, a). It follows that f ⊆ (A ∩ B)2 and that f = {(a, a) | a ∈ dom f }. Note that, unlike in the case of total inclusions, given two sets A and B they may admit more than one partial inclusion between them and in any case at least one (the empty one).  B and a partial inclusion i : Given A1 , A2 ⊆ A, a partial function f : A →  A2 we let f (i) = {( f 0 (a), f 0 (a)) | a ∈ dom( f ), (a, a) ∈ i}. A1 →  f (A2 ). Lemma 3 f (i) is a partial inclusion f (A1 ) → Another fact gives a functorial property for the above notation:  B and partial incluLemma 4 Given A1 , A2 , A3 ⊆ A, a partial function f : A →  A2 , i 2 : A2 →  A3 , we have that f (i 1 ; i 2 ) = f (i 1 ); f (i 2 ). sions i 1 : A1 → Based on Lemmas 1 and 3 we get another property: Lemma 5 Given partial functions f : A →  B and g : B →  C, sets A1 , A2 ⊆ A  A2 we have ( f ; g)(i) = g( f (i)). and partial inclusion i : A1 → Definition 13 (Theory changes) In any (Σ, E) → (Σ  , E  ) consists of:

3 -institution 2

a theory change (ϕ, i) :

• theories (Σ, E) and (Σ  , E  ); • a signature morphism ϕ : Σ → Σ  ; and • a partial inclusion i : Sen(ϕ)E →  E . Proposition 10 For any 23 -institution I with a strict sentence functor theory changes form a 23 -category as follows: • the composition of theory changes is as shown by the following diagram: (Σ, E)

(ϕ,i)

(ϕ;θ,Sen(θ)(i); j)

(Σ  , E  ) (θ, j)

(Σ  , E  ) • the partial order on theory changes (Σ, E) → (Σ  , E  ) is given by: (ϕ, i) ≤ (ϕ  , i  ) if and only if ϕ ≤ ϕ  and i ⊆ i  . Proof The composition of theory changes is correctly defined because • by Lemma 3 Sen(θ )(i) is a partial inclusion Sen(θ )(Sen(ϕ)E) →  Sen(θ )E  , • the composition of partial inclusions is a partial inclusion, hence Sen(θ )(i); i  is a partial inclusion Sen(θ )(Sen(ϕ)E) →  E  , and • by Lemma 1 and by the strict functoriality of Sen we have that Sen(θ )(Sen(ϕ)E) = Sen(ϕ; θ )E.

4 Implicit Partiality of Signature Morphisms in Institution Theory

119

The partial order on theory changes is also correctly defined because whenever ϕ ≤ θ this implies Sen(ϕ) ⊆ Sen(θ ) which implies Sen(ϕ)E ⊆ Sen(θ )E. Then i ⊆ j parses as a subset relationship between subsets of Sen(θ )E × E  . The understanding of the proof of the associativity of the composition of theory changes is helped by inspecting the following diagram: ((ϕ;ϕ  );ϕ  ,Sen(ϕ  )(Sen(ϕ  )(i);i  ) ; i  )

(Σ, E)

(ϕ;ϕ  ,Sen(ϕ  )(i);i  )

(Σ  , E  )

(ϕ  ,i  )

(Σ  , E  )

(ϕ  ,i  )

(Σ, E)

(ϕ,i)

(Σ  , E  )

(ϕ  ;ϕ  ,Sen(ϕ  )(i  );i  )

(Σ  , E  )

(ϕ;(ϕ  ;ϕ  ),Sen(ϕ  ;ϕ  )(i);Sen(ϕ  )(i  );i  )

Thus all we have to show is that Sen(ϕ  )(Sen(ϕ  )(i); i  ); i  = Sen(ϕ  ; ϕ  )(i); Sen(ϕ  )(i  ); i  , its proof being: Sen(ϕ  )(Sen(ϕ  )(i); i  ) = = Sen(ϕ  )(Sen(ϕ  )(i)); Sen(ϕ  )(i  ) = (Sen(ϕ  ); Sen(ϕ  ))(i); Sen(ϕ  )(i  ) = Sen(ϕ  ; ϕ  )(i); Sen(ϕ  )(i  )

by Lemma 4 by Lemma 5 by the strict functoriality of Sen.

For showing the preservation of partial orders by compositions we consider only the case when (ϕ; i) ≤ (ϕ  , i  ) and ϕ = ϕ   = θ , the other situation getting a similar proof. By the definition of composition we have that • (ϕ, i); (θ, j) = (ϕ; θ, Sen(θ )(i); j), and • (ϕ  , i  ); (θ, j) = (ϕ  ; θ, Sen(θ )(i  ); j). From the monotonicity of composition in Sign it follows that (ϕ, i) ≤ (ϕ  , i  ). From i ⊆ i  it follows that Sen(θ )(i) ⊆ Sen(θ )(i  ) and further that Sen(θ )(i); j ⊆ Sen(θ )(i  ); j. The following is another example of a 23 -institution that does not fall into the partiality pattern characteristic to 23 PL, 23 MSA, etc. Corollary 8 For any 23 -institution I with a strict sentence functor, the 23 -category of theory changes determines a 23 -institution I c as follows: • the 23 -category of signatures Signc is the 23 -category of theory changes, • Senc is a trivial lifting of Sen to theories, i.e. Senc (Σ, E) = Sen(Σ) and Senc (ϕ, i) = Sen(ϕ), • Mod c (Σ, E) is the full subcategory of Mod(Σ) of the Σ-models satisfying E, and for each theory change (ϕ, i) : (Σ, E) → (Σ  , E  ) and each (Σ  , E  )-model M 

120

R. Diaconescu

Mod c (ϕ, i)M  = {M ∈ Mod(ϕ)M  | M |= E} • and the satisfaction relation is inherited from I. In what follows we investigate the possibility of modelling merges of theory changes by pushout constructions. In principle, this should be based upon lifting pushouts from the category of signatures to that of theory changes. Proposition 11 In general, lax T-pushouts do not lift from the category of signatures to that of theory changes. Proof Consider a trivial (lax) T-pushout of signature morphisms consisting only of identities; let the span be ϕ1 = ϕ2 = 1Σ and the cocone be θ0 = θ1 = θ2 = 1Σ . Let ρ be a Σ-sentence and let E 0 = E 1 = E 2 = {ρ} and i 1 = i 2 = 1 E0 . Let us suppose that there exists a lax T-pushout (1Σ , jk ), k = 0, 1, 2 for the span given by (1Σ , i 1 ) and (1Σ , i 2 ). • By considering the lax cocone given by (1Σ , 1 E0 ) everywhere we infer that all jk , k = 0, 1, 2 are total. • By considering the lax cocone given by (1Σ , ∅), (1Σ , 1 E0 ), (1Σ , ∅), let (1Σ , u) be the unique mediating theory change. From (1Σ , jk ); (1Σ , u) = (1Σ , ∅), k = 1, 2 we infer that ρ ∈ / dom u. It follows that j0 ; u = 1 E0 which is a contradiction. By contrast to lax pushout, near pushouts lift trivially from signatures to theory changes: Proposition 12 Given a span of theory changes (ϕk , i k ) : (Σ0 , E 0 ) → (Σk , E k ), k = 1, 2, and a near pushout for the underlying span of signature morphisms like shown in the following diagram Σ θ1

θ2



Σ1

θ0

ϕ1



Σ2 ϕ2

Σ0 for any E ⊆ Sen(Σ), (θk , ∅) : (Σk , E k ) → (Σ, E), k = 0, 1, 2 constitues a near pushout cocone for the given span of theory changes. Proof First, it is to establish that we have a lax cocone as (ϕk , i k ); (θk , ∅) = (ϕk ; θk , ∅) ≤ (θ0 , ∅) for k = 1, 2. Let (θk , jk ) : (Σk , E k ) → (Σ  , E  ), k = 0, 1, 2 be a lax cocone for the given span of theory changes. Then let μ be the maximal signature morphism such that θk ; μ ≤  E  by dom u = θk , k = 0, 1, 2. We define the partial inclusion u : Sen(μ)E →    E ∩ Sen(μ)E. Then (θk , ∅); (μ, u) = (θk ; μ, ∅) ≤ (θk , jk ), k = 0, 1, 2.

4 Implicit Partiality of Signature Morphisms in Institution Theory

121

Now, for any (μ , u  ) such that (θk , ∅); (μ , u  ) ≤ (θk , jk ), k = 0, 1, 2 we have that θk ; μ ≤ θk , k = 0, 1, 2. By the maximality assumption on μ it follows that μ ≤ μ. Since dom u  ⊆ Sen(μ )E ∩ E  , since μ ≤ μ, by the monotonicity of Sen it follows that dom u  ⊆ E  ∩ Sen(μ)E = dom u, hence u  ⊆ u. The results of Propositions 11 and 12 tell that the established concepts of pushouts in 23 -categories cannot be used for modelling merges of software changes. Lax pushouts do not lift and near pushouts lift to everything, hence an entirely new concept is needed for that.

4.6 Conclusions In this paper we have developed an extension of the concept of institution that accommodates the partiality of signature morphisms. We have done this implicitly by resorting to order-enriched categories. For this extension we have studied fundamental concepts and properties such as theory morphisms and model amalgamation. We have seen that both of them are significantly more refined than their ordinary institution theoretic correspondents and yet include the latter as particular cases. After developing the fundamentals of 23 -institutions, in the second part of the paper we have explored applications to conceptual blending and software evolution. With respect to the former application domain, the main proposal of our work is that conceptual blending should be modelled as lax cocones of 23 -institutional theories admitting model amalgamation. Further research include • a deeper study of the above mentioned applications, including a better understanding of the adequacy of weak versus strong theory morphisms for conceptual blending, and what concept of cocones would be adequate for merging software changes; • the exploration of other applications; • heterogeneity for 23 -institutions; and • a generic construction of 23 -institutions from ordinary institutions that captures the partiality of the signature morphisms explicitly via inclusion systems (this task is already almost completed, see [7]). Acknowledgements Thanks to Adriana Balan for reading a preliminary draft of this work and for making a series of useful comments especially related to order-enriched colimits. Those comments have led to the fixing of some conceptual errors. Thanks also to the reviewer who discovered some technical bugs and also suggested a series of improvements in the presentation.

122

R. Diaconescu

References 1. Andréka, H., & Németi, I. (1981). A general axiomatizability theorem formulated in terms of cone-injective subcategories. In B. Csakany, E. Fried, & E.T. Schmidt (Eds.), Universal algebra (Vol. 29, pp. 13–35). North-Holland: Colloquia Mathematics Societas János Bolyai. 2. Borceux, F. (1994). Handbook of categorical algebra. Cambridge University Press. 3. Bou, F., Eppe, M., Plaza, E., & Schorlemmer, M. (2014) D2. 1: Reasoning with amalgams. Technical report, COINVENT Project. http://www.coinventproject.eu. 4. Diaconescu, R. (2008). Institution-independent model theory. Birkhäuser. 5. Diaconescu, R. From universal logic to computer science, and back. In G. Ciobanu & D. Méry (Eds.), Theoretical aspects of computing—ICTAC 2014, Vol. 8687 of Lecture notes in computer science. Springer, Berlin. 6. Diaconescu, R. (2016). Functorial semantics of first-order views. Theoretical Computer Science, 656, 46–59. 7. Diaconescu, R. (2017). Generic partiality for 23 -institutions. arXiv:1711.04666 [math.LO]. 8. Diaconescu, R., Goguen, J., & Stefaneas, P. (1991) Logical support for modularisation. In G. Huet, & G. Plotkin (Eds.), Logical environments (pp. 83–130). Cambridge, 1993. Proceedings of a Workshop held in Edinburgh, Scotland. 9. Eppe, M., Confalonieri, R., Maclean, E., Kaliakatsos-Papakostas, M.A., Cambouropoulos, E., Schorlemmer, W.M., Codescu, M., & Kühnberger, K.U. (2015). Computational invention of cadences and chord progressions by conceptual chord-blending. In International Joint Conference on Artificial Intelligence (pp. 2445–2451). AAAI Press 10. Fauconnier, G., & Turner, M. (1998). Conceptual integration networks. Cognitive Science, 22(2), 133–187. 11. Goguen, J.A. (1995). Categorical approaches to merging software changes. Unpublished draft. 12. Goguen, J.A. An introduction to algebraic semiotics, with application to user interface design (pp. 242–291). Springer, Berlin Heidelberg. 13. Goguen, J.A. (2005). What is a concept? In F. Dau, M.L. Mugnier, & G. Stumme (Eds.), Conceptual structures: common semantics for sharing knowledge (Vol. 3596, pp. 52–77). Lecture notes in computer science. Springer, Berlin. 14. Goguen, J.A. (2006). Mathematical models of cognitive space and time. In D. Andler, & M. Okada (Eds.), (A preliminary version was published in Reasoning and c,ognition). 15. Goguen, J.A. & Burstall, R.M. (1983). Introducing institutions. In E.M. Clarke, & D. Kozen (Eds.), Logic of programs (Vol. 164, pp. 221–256). Lecture notes in computer science. Springer, Berlin. 16. Goguen, J. A., & Burstall, R. M. (1992). Institutions: abstract model theory for specification and programming. Journal of the ACM, 39(1), 95–146. 17. Goguen, J.A., Winkler, T., Meseguer, J., Futatsugi, K., & Jouannaud, J.P. (2000). Introducing OBJ. In J.A. Goguen & G. Malcolm (Eds.), Software engineering with OBJ: algebraic specification in action. Advances in formal methods. Kluwer Academic 18. Ramirez, D.G. (2020). Generating fundamental notions of fields and Galois theory through formal conceptual blending. Unpublished paper. 19. Jay, C.B. (1991). Partial functions, ordered categories, limits and cartesian closure. In G. Birtwistle (Ed.), IV Higher Order Workshop, Banff 1990: Proceedings of the IV Higher Order Workshop, Banff, Alberta, Canada 10–14 Sept 1990 (pp. 151–161). Springer London. 20. Kelly, M. (1982). Basic concepts of enriched category theory. Cambridge University Press. 21. Kelly, M., & Street, R. (1974). Review of elements of 2-categories. In Category seminar Sydney 1972/1973 (pp. 75–103). Lecture notes in mathematics. Springer, Berlin. 22. Koestler, A. (1964). The act of creation. London Hutchinson. 23. Mac Lane, S. (1998). Categories for the working mathematician. Graduate texts in mathematics, Springer. 24. Meseguer, J. (1989). General logics. In H.-D. Ebbinghaus et al. (Eds.), Proceedings, Logic Colloquium, 1987 (pp. 275–329). North-Holland.

4 Implicit Partiality of Signature Morphisms in Institution Theory

123

25. Sannella, D., & Tarlecki, A. (1988). Specifications in an arbitrary institution. Information and Control, 76, 165–210. 26. Sannella, D., & Tarlecki, A. (2012). Foundations of algebraic specifications and formal software development. Springer, Berlin. 27. Schorlemmer, M., Smaill, A., Kühnberger, K.U., Kutz, O., Colton, S., Cambouropoulos, E., & Pease, A. (2014). COINVENT: towards a computational concept invention theory. In S. Colton, D. Ventura, N. Lavrac, & M. Cook (Eds.), International Conference on Computational Creativity (pp. 288–296). computationalcreativity.net. 28. Tarlecki, A. (1986). On the existence of free models in abstract algebraic institutions. Theoretical Computer Science, 37, 269–304. 29. Vidal, J.C., & Tur, J.S. (2010). A 2-categorial generalization of the concept of institution. Studia Logica, 95(3):301–344. R˘azvan Diaconescu graduated in 1988 from the University of Bucharest. In 1994 he obtained his Ph.D. in Mathematics and Computation from Oxford. He is Research Professor at Simion Stoilow Institute of Mathematics of the Romanian Academy where he currently heads the department of Number Theory and Computational Methods. His main research area is institution theory (a categorical form of abstract model theory) and its applications to computing science, especially (but not only) formal specification and verification. He is the author of Institution-independent Model Theory published at Birkhauser in 2008, which is the authoritative monograph of this area.

Chapter 5

The Four Essential Aristotelian Syllogisms, via Substitution and Symmetry Vaughan R. Pratt

Abstract There being no limit to the number of categories, there is no limit to the number of Aristotelian syllogisms. Aristotle showed that this potential infinity of syllogisms could be obtained as substitution instances of finitely many syllogistic forms, further reduced by exploiting symmetry, in particular the premises’ independence of their order. A consensus subsequently emerged that there were 24 valid assertoric syllogistic forms. A more modern concern, completeness, showed that there could be no more. Using no additional principles beyond substitution and symmetry, we further reduce these 24 to four forms. A third principle, contraposition, allows a further reduction to two forms, namely the unconditional form and the conditional form, conditioned on one of the terms being inhabited. We achieve these reductions via a regularly organized proof system in the form of a graph with 24 vertices and three kinds of edges corresponding to the three principles. Keywords Logic · Aristotle · Syllogism · Proof net

5.1 Dedication This paper is for István Németi and Hajnal Andréka, to appear in this Festschrift in their honour. Before launching into my topic I should offer a few words of appreciation. I first met István and Hajnal in August 1979 on the steps of the meeting place in Hanover where the 6th Conference on Logic, Methodology, and Philosophy of Science was about to start. I was very keen to meet them as I had recently been reading Henkin, Monk and Tarski’s Cylindric Algebras, to which they had been added for the second volume to bring the authorship up to five.

V. R. Pratt (B) Computer Science Department, Stanford University, 353 Jane Stanford Way, Stanford, CA 94305, USA e-mail: [email protected] © Springer Nature Switzerland AG 2021 J. Madarász and G. Székely (eds.), Hajnal Andréka and István Németi on Unity of Science, Outstanding Contributions to Logic 19, https://doi.org/10.1007/978-3-030-64187-0_5

125

126

V. R. Pratt

Wanting to show my appreciation for cylindric algebras, in place of the customary greeting the first thing I said to them was that when D = ∅ = V (the case of the V empty universe), the concrete cylindric algebra 2 D collapsed in a way that validated all sentences of first order logic. Hence the empty universe did not invalidate any sentence, contrary to the conventional wisdom that some existential sentences valid in all nonempty universes were invalid in the empty universe. (This depends on a quirk of concrete cylindric algebras where interpretations of symbols includes interpretation of variables. Validity in first order logic means truth in all interpretations, so if there are no elements in the domain to assign to variables then every sentence, as well as its negation, is vacuously valid because there cannot be any interpretation to witness falsehood.) With anyone else such a greeting might have seemed a bit abrupt if not borderline nerdy. However they took it very well and even expressed their appreciation of the point. Since then I have met them at many conferences related to algebraic logic, sometimes on their account when they have kindly invited me to speak. A month before I met them I had written an MIT technical report proving that every free separable dynamic algebra was residually finite, from which follows completeness of the Segerberg axioms for propositional dynamic logic. From time to time István encouraged me to publish this report, but as I had subsequently rewritten it and presented it at the 1980 ACM Symposium on Theory of Computation I felt this wasn’t necessary. In connection with that report I have two things to thank István for. The first is a short paper he wrote a year or so later in which he showed that every free dynamic algebra is representable (as a Kripke structure). He pointed out first that “separable” in my result was redundant in the case of free algebras, and second that subdirect products of Kripke structures are Kripke structures. The handling editor asked me to referee it. On the one hand I felt this was all immediate, on the other it was clearly a much better way to state the result. Much as I wished I’d thought of that, I immediately gave it my blessing and it duly appeared in print. The second is that during the next decade, from time to time István kept encouraging me to publish my original technical report, on the ground that it contained material that had not made its way into the STOC paper. He persisted, and the upshot was that in 1992 it appeared in volume 50 of Studia Logica on pages 571–605. In recent years I have traveled less and therefore haven’t seen as much of István and Hajnal. At our most recent encounter, a number of years ago, István had taken an interest in the Bell inequalities for quantum mechanics, which John Bell had worked out in the mid-1960s. István expressed to me his astonishment that classical statistics could be violated in such a counterintuitive way. Although I’d taken four years of physics as an undergraduate in the early1960s, Bell’s theorem was much too new to have had any impact in Australia back then, so this was an area of physics that István was much further along than me. I therefore very much regretted being unable to share his astonishment at the time. Since then, quantum computing has obliged us computer scientists to get caught up with entanglement, of which Bell’s inequali-

5 The Four Essential Aristotelian Syllogisms …

127

ties are a natural consequence. While counterintuitive, entanglement is nevertheless an inevitable consequence of superposition, which in turn is a consequence of the solution space of Schrödinger’s equation being closed under linear combinations.

5.2 The Main Theorem Whereas in logic the conclusion of an argument traditionally follows its premises, mathematics reverses that order by enunciating the theorem before proving it. The advantage of the latter order is psychological: if the theorem sounds interesting it motivates the reader to follow the argument, while if it does not it frees the reader to move on to other things. With that advantage in mind, we begin by exhibiting an organization of the 24 valid assertoric Aristotelian syllogisms as the four groups shown in Table 5.1. We then further organize those syllogisms into just two groups, namely the two connected components of the graph in Fig. 5.1. Theorem 1 The edges of the graph of Fig. 5.1 constitute a sound and complete proof system for the 24 assertoric Aristotelian syllogisms. The 14 thin undirected edges, 9 thick undirected edges, and 16 directed edges correspond to the customary derivation principles of conversion, obversion, and con-

Table 5.1 The 24 valid syllogisms, in four classes General Particular 1. AAA-1 2. EAE-1 3. EAE-2 4. AEE-2 5. AEE-4 ∃M 16. AAI-3 17. EAO-3 18. EAO-4

6. OAO-3 7. IAI-3 8. AII-3 9. EIO-3 10. EIO-4 ∃S 19. EAO-1 20. AAI-1 21. AAI-4

Fig. 5.1 The connections between the 24 valid syllogisms

11. AOO-2 12. EIO-2 (10) 13. EIO-1 (9) 14. AII-1 (8) 15. IAI-4 (7) 22. EAO-2 (19) 23. AEO-2 24. AEO-4

128

V. R. Pratt

traposition respectively. Omission of the 16 directed edges disconnects the graph into four connected components corresponding to the grouping in Table 5.1. The remainder of this paper provides some historical background, establishes concepts and terminology, and proves the theorem.

5.3 Aristotle Aristotle is said to have been born in 384 BCE, which would make 2017 CE the year of his 2400th birthday. Archimedes was supposedly born 97 years later, similarly making 2014 CE the year of his 2300th birthday.1 While Archimedes is generally considered the greatest mathematician of antiquity in the western world, Aristotle is with equal justification considered its broadest philosopher. Aristotle made profound contributions to speculative, natural, and practical philosophy that deeply influenced the next 22 centuries of philosophers in Europe and the Middle East. Today we view his natural philosophy as science and his practical philosophy as government, politics, economics, and ethics. This leaves his speculative philosophy as his topics most characteristic of what we think of as philosophy today. During the following two millennia ideas foreign to the ancient Greeks augmented academic philosophy, which while not taking the Greek gods seriously did take monotheism seriously starting in the middle of the first millennium. Aristotle’s closest concept to anything redolent of religion was his addition of the Aether to the traditional elements of earth, air, fire, and water, namely as a substance in the heavens that was neither hot nor cold nor wet nor dry but instead divine; one might call him a non-worshipping pantheist. The most technical part of Aristotle’s speculative philosophy is what later was organized as his Organon [1]. At its core is his Prior Analytics which treats the syllogism, the oldest systematic form of logic to have survived to the present day.

5.4 From Aristotle to the 19th Century There is a long but rather unclear (by mathematical standards) development of our understanding of Aristotle’s syllogisms between 350 BCE and the 18th century. The subject came into sharper focus with writings of Augustus De Morgan [3] at the University of London and Sir William Hamilton at the University of Edinburgh in the 1840s. Both had developed their own accounts of the subject independently. Hamilton accused De Morgan of plagiarizing his account, which led to a public battle that raged in the Athenaeum. 1 In

reckoning these birthdays it should be noted that by convention the Western calendar contains no year zero, instead calling the year that preceded 1 CE the year 1 BCE. There is therefore a parity shift at that boundary.

5 The Four Essential Aristotelian Syllogisms …

129

This little public spat inspired George Boole to write about his idea to found Aristotle’s syllogistic on the arithmetic of ordinary polynomials with integer coefficients, reduced to just the multilinear ones with coefficients 0 and 1 via the axiom x 2 = x. Boole broached the idea in a short pamphlet in 1847, and developed it into a book in 1854. Neither Boole nor anyone else in that century was able to accept that x + x = 0, with the result that William Jevons advocated instead for the theory of complemented distributive lattices favored by Charles Peirce as definitive of Boole’s logic. Starting with Zhegalkin in 1927 and independently2 Marshall Stone in 1936 in the US, Boole’s system was recognized as the theory of Boolean rings as one of several possible finite equational axiomatizations of Boolean algebra, besides complemented distributive lattices. Thanks to L.E.J. Brouwer and his student Arend Heyting, Heyting algebras that satisfy the Law of the Excluded Middle are another, and recently I wrote about yet another [6]. Modern classical logic takes Boolean algebra as the algebraic basis for zero-order or propositional logic and extends it to one or more domains and predicates thereon, with universally and existentially quantified variables ranging over those domains. Although almost all logic research today involves some variant of Boole’s logic and its extensions, Boole himself envisaged his system not as overthrowing Aristotle’s syllogistic but as an algebraic way of formalizing and extending it. That and a certain elegant simplicity to Aristotle’s syllogistic has made it an object of continuing study in its own right, with his Square of Opposition as a unifying concept.

5.5 20th Century Treatments of the Aristotelian Syllogisms During the 19th century De Morgan, Boole, Peirce, Jevons, Peano, and Frege were instrumental in sharpening mathematical logic. By the early 20th century considerable further progress in various directions had been made by Russell, Hilbert, Łukasiewicz, Lowenheim, Skolem, Brouwer, and others. During the three decades 1900–1930 the Polish logician Jan Łukasiewicz worked on a wide variety of propositional logics [4]. In 1929 he introduced a parenthesis-free notation now referred to as Polish notation, writing KPQ and CPQ for respectively conjunction and implication between propositions P and Q, and negation of P as NP. As an example of its use he expressed the sentences XaY and XiY as respectively Axy and Ixy, allowing him to write XeY as NIxy and XoY as NAxy. He then derived the properties of Aristotle’s Square of Opposition and all 24 of the assertoric syllogisms from the following four axioms. 1. Aaa 2. I aa 3. C K Amb Aam Aab (that is, (A(m, b) ∧ A(a, m)) → A(a, b)) 4. C K AmbI ma I ab. 2 Soviet

Russia’s isolation from the west at the time resulted in inadequate recognition of priorities in some cases.

130

V. R. Pratt

These axioms assert respectively that inclusion is reflexive, all categories are nonempty, inclusion is transitive, and if m ⊆ b and m ∩ a = ∅ then a ∩ b = ∅. An obvious concern with this axiomatization is its second axiom, that all categories are nonempty. This disposes of the problem of existential import, which we’ll return to later, by brute force. However categories like unicorns and black swans that might be empty, or like two-dimensional cubes and prime square integers that are surely empty, don’t fit into Łukasiewicz’s ontology. Furthermore the syllogisms are just a small fragment of what Łukasiewicz’s language can express, which therefore amounts to a framework for studying many other things besides the syllogisms rather than a self-contained account of syllogistic reasoning. In §36 of his 1947 book Elements of Symbolic Logic [7] Hans Reichenbach organized the 24 valid syllogisms into the four groups we exhibited in Table 5.1. His grouping was however not based on the proof theory underlying our Fig. 5.1 but on certain distinguishing characteristics. There are 15 unconditional forms, namely five with a general conclusion and ten with a particular conclusion. The remaining nine have “existential import”: three are conditioned on the middle term being inhabited and the remaining six on the subject S being inhabited. He nominated Barbara (AAA1), Darii (AII-1), Darapti (AAI-3), and Barbari (AAI-1) as suitable representatives of these four classes, as well as acknowledging Hilbert and Ackermann as an earlier source of this grouping. Timothy Smiley in “What is a syllogism?” [9] and John Corcoran in “Aristotle’s Natural Deduction System” [2] both raised the further objection to Łukasiewicz that a syllogism is not meant to be understood as a sentence but rather as a form of argument involving multiple sentences. Corcoran pointed out that Łukasiewicz’s sentences were fragments of second order logic, to be judged true or false. Smiley and Corcoran maintained that syllogisms should be judged according to how soundly they reason, for which the prevailing terminology among “syllogisters” is “valid” or “invalid” (my own preference would be for “sound” or “unsound” since “valid” is understood more commonly as a property of sentences meaning “true under all interpretations of the symbols in the language”). My own first exposure to logic was in high school in 1961 when our swim team’s coach lent me a thin booklet on the assertoric syllogisms of Aristotle. As a college freshman in 1962 I took a course in philosophy whose logic section used Copi and neglected syllogisms. In 1968 I wrote a computer program to parse and reduce to disjunctive normal form the syllogisms Lewis Carroll had published as a series in a newspaper, possible because every sentence was general (no particulars) [5]. In 2006 my interest in syllogisms was revived by Wikipedia’s article on the subject. In 2015 it occurred to me to look for a structural reason for the fact that the 24 syllogisms were organized as four figures each with six syllogisms, and I subsequently drew up such a 6 ∗ 4 table for a paper offering a connection between Aristotle, Boole, and categories [6]. More recently (the point of this paper), I realized that a better factoring of 24 was (5 + 3) ∗ (1 + 2), which expands to 5 + 3 + 10 + 6. These are the number of syllogisms equivalent to each of Reichenbach’s essential syllogisms Barbara, Darii,

5 The Four Essential Aristotelian Syllogisms …

131

Darapti, and Barbari. Furthermore 5 of the 10 syllogisms in the class Darii are derivable from the 5 in Barbara by application of contraposition to the major premise, and the remaining 5 by the same operation on the minor premise. The analogous relation holds between the 3 syllogisms of Darapti and the 6 syllogisms of Barbari. The next section presents the concept of Aristotle’s syllogistic from a point of view intended to motivate our contribution.

5.6 The Aristotelian Syllogisms From here on our goal is not to find any fault with either the 24 valid syllogisms or with any prior treatments, but simply to prove that substitution and symmetry suffice to reduce their number to four, and to expose an interesting structure created by the rule of contraposition, visualized with the graph of Fig. 5.1 in Sect. 5.2. Even with no background in zoology, hopefully you would accept the following line of reasoning. If no llamas are fish, and all sharks are fish, then no sharks are llamas.

Likewise with no background in the print world, you might just as happily accept If no newspapers are books, and all monographs are books, then no monographs are newspapers.

Now “no sharks are llamas” is not logically equivalent to “no monographs are newspapers”, not even slightly since one speaks of animals and the other of printed matter. However these clearly inequivalent conclusions are arrived at from their respective premises by deductively equivalent lines of reasoning. Aristotle is the first on record in the western world to study deductive equivalence of arguments. Had the Greek words for “llama” and “newspaper” been recognizable as categories to Aristotle he would have called each of these two arguments a valid syllogism and identified them as substitution instances of their common syllogistic form “no P are M, all S are M, therefore no S are P”. This form can be succinctly written as PeM,SaM SeP. Even more succinctly it can be written in two parts as EAE-2.

5.6.1 Moods The first part of EAE-2, namely EAE, is called the mood. It picks out the three connectives or copulas, one per sentence, between the two categories or terms of the sentence. Besides A and E there are also I and O.

132

V. R. Pratt

The two positive connectives are A and I, being respectively the general and the particular positive forms. XaY expresses “all X are Y” or ∀i.X (i) → Y (i), while XiY expresses “some X are Y” or ∃i.X (i) ∧ Y (i). The two negative connectives are E and O. XeY denotes the substitution instance or contrary of XaY obtained by substituting Y for Y. Similarly XoY expresses the contrary of XiY resulting from the same substitution, Y for Y. There being three connectives in a syllogism, each having four possibilities, it follows that there are 43 = 64 possible moods. Aristotle’s Square of Opposition arranges these four connectives to form the square A E . Horizontal movement in the square corresponds to taking the contrary, IO i.e. negating just the right hand term. Diagonal movement contradicts, i.e. negating the whole sentence. Those two directions of movement are invertible. There is also a downward movement, from A to I and from E to O, that Aristotle refers to as subalternation and views as a form of weakening. But besides not being invertible, it also raises the following problem of existential import. If all black swans are black, and all black swans are swans, * then some swans are black.

(The * indicates a dubious conclusion.) The problem is that both premises are always true, albeit vacuously so in the case when no black swans exist. Yet Aristotle recognized this line of reasoning as what we now call Darapti in the third figure (the middle term on the left in both premises), namely as strengthening the minor premise in Datisi, which in this instance would be that some black swans are swans. However it is not really stronger because this supposed strengthening drops any mention of existence. Thus whereas Datisi is sound, Darapti is only sound if its conclusion is true to begin with. But if we assume the conclusion as a premise, one might reasonably ask what is the point of judging such a circular argument as sound? Aristotle’s justification of syllogisms is that they serve to add their conclusions to our knowledge. Since circular arguments don’t do that, should Aristotle have rejected Darapti? The general position today seems to be not to question the soundness of a circular syllogism but rather the need to properly augment our knowledge. Useless though the number zero might have seemed in the past, mathematicians eventually embraced it for the sake of completeness. Circular arguments have a similar status, which Reichenbach expressed for syllogisms like Darapti as a tacit third premise MiM, some black swans are black swans. (And for syllogisms like Barbari, SiS.) Were it not for the problem of existential import, there would be only two kinds of syllogisms, general and particular. Addressing this problem doubles that number to the four treated by Reichenbach.

5 The Four Essential Aristotelian Syllogisms … Table 5.2 The four figures 1 2 MP SM SP

PM SM SP

133

3

4

MP MS SP

PM MS SP

5.6.2 Figures A syllogistic form contains two copies of each of S, P, and M. Ignoring the mood, their distribution determines the figure of the form. The conclusion is always of the form SP. Each premise is one of the forms M* or *M where * is a placeholder for one of S or P. There being two premises, there are four possibilities, each leaving two places denoted by the two *’s, one for S and one for P. On the face of it, it would seem that these can be placed in either order, making a total of 8 possibilities. There being 64 moods, there are therefore 64 ∗ 8 = 512 possible forms. But order of premises is immaterial from the standpoint of deductive equivalence. Aristotle exploited this by normalizing the order to put the premise containing P first, called the major premise. This reduces the number of premise configurations from 8 to 4, namely those shown in Table 5.2. The first three of these are Aristotle’s three figures, numbered accordingly from 1 to 3; the fourth was added by his student Theophrastus. The major premise is MP or PM according to whether the figure number is odd or even respectively. The minor premise is SM or MS according to whether the figure is one of the first two or the last two respectively.3 The upshot is therefore 64 × 4 = 256 well-formed syllogistic forms, each determined by its mood and figure. Our example is an instance of the form EAE-2. All substitution instances of a given form are deemed equally valid.

5.7 Obversion and Conversion A syllogism with an odd-numbered figure has P on the right of both the major premise and the conclusion, while those in the Second Figure have M on the right of both premises. The obverse of such a syllogism is the result of taking the contrary of the two sentences with a common right side, thereby changing the mood but not the figure. It is customary to regard obversion as an inference rule in its own right. In this paper we view it as a substitution instance in which not-P is substituted for P, effected 3 This

pattern will be familiar to those acquainted with counting in binary.

134

V. R. Pratt

by taking the contrary. Since substitution is the means by which Aristotle reduced a potential infinity of possible syllogistic forms to a small finite number, it is surely fair to continue to use substitution to further reduce the number. The converse of a syllogism with a mood containing E or I is the result of exchanging the terms of the sentence with that copula. (If the sentence is the conclusion then S and P need to be renamed as each other, and hence the premises need to be exchanged.) Like obversion, conversion is customarily viewed as an inference rule, but since symmetry is tacitly used to reduce the 512 forms to 256 with the convention that the major premise comes first, again it would seem fair to treat a further possible reduction via a commutativity as equally justifiable without having to call it a separate inference rule but merely another application of symmetry. That is, we are proposing to use only the two principles sufficient to reduce the number to 256, to further reduce that number to four. There are traditionally 24 valid syllogisms when including the conditionally valid ones. Five of them have a general conclusion, all of them unconditionally valid; call these the general syllogisms. Theorem 2 When obversion and conversion are used to identify syllogisms, there remains only one syllogism with a general conclusion. Proof Aristotle’s two axioms are AAA-1 and EAE-1, Barbara and Celarent. From the point of view adopted here, these are substitution instances of each other and hence should not be counted as two distinct forms. That is, EAE-1 follows from AAA-1 by obversion. Applying converse, then obverse, then converse yields respectively EAE-2, AEE-2, and AEE-4. In this way we have identified all five syllogisms having a general conclusion, by alternating obversion with conversion. This alternation is shown in Fig. 5.1, connecting the syllogisms numbered 1–5.  Having accounted for one class of syllogisms in Table 5.1, we have three more to account for. Now there are 10 unconditionally valid syllogisms with a particular conclusion, call these the particular syllogisms, the second class. There are 9 conditionally valid syllogisms, 3 of which require the middle term to be inhabited, call these the ∃M syllogisms, the third class. The remaining 6 require the subject of the conclusion to be inhabited; call these the ∃S syllogisms, the fourth class. Theorem 3 Alternating obversion and conversion suffice to reduce the 24 valid syllogisms to just four: a general syllogism, a particular syllogism, a ∃M syllogism, and a ∃S syllogism. Proof Figure 5.1 without the directed (vertical) edges shows how this works for each of the four groups. The diagonal edges and the edge between 19 and 22 complete the connections. In the case of syllogisms 8, 15 and 21, conversion is applied to the conclusion, entailing switching the premises, which leaves the Second and Third Figures unchanged (their premises have M in the same column) but interchanges the First and Fourth Figures.

5 The Four Essential Aristotelian Syllogisms …

135

Reichenbach’s four essential syllogisms Barbara (AAA-1), Darii (AII-1), Darapti (AAI-3), and Barbari (AAI-1) [8] are suitable representatives of each of these four equivalence classes.

5.8 Contraposition There are exactly twice as many syllogisms on the right of Table 5.1 as on the left. This raises the question of whether there might be some underlying structural reason for this. We answer this with the observation that contraposition creates a 2-to-1 surjection from the right side onto the left. Equivalently, there is a bijection between the first and second columns, and a second bijection between the first and third columns. These two bijections are created from the two premises in each row of the first column: the major premise yields the second column while the minor premise yields the third. The basic operation of contraposition is to negate (take the contradictory of) both the selected premise and the conclusion, and exchange them. This inevitably breaks the naming rules for the three pairs of terms. The terms in the resulting conclusion are renamed to S and P, and the remaining term (which appears in both premises) is renamed to M. The major premise of the result is whichever premise contains P, and the two premises are arranged accordingly to put the major premise first. To illustrate, for the third line in our table we have EAE-2 in the first column, namely PeM, SaM SeP. Negating the major premise and the conclusion turns this into PiM, SaM SiP. Exchanging them yields SiP, SaM PiM. Renaming P to S and M to P (whence S becomes M) turns this into MiS, MaP SiP. Lastly we move the new major premise to the first position giving MaP, MiS SiP, or AII-3. Repeating this process for the minor premise, we obtain PeM, SoM SiP. The exchange then yields PeM, SiP SoM. The requisite renaming just exchanges M and P to give MeP, SiM SoP, namely EIO-1. Lemma 1 Independently of mood, for syllogisms in the first three Figures contraposition changes the Figure according to the following transition diagrams, one for each premise. Major: 2 ⇒ 3  1 Minor: 3 ⇒ 2  1 The double right arrow between 2 and 3 signifies the need to switch the premises at the end. Syllogisms in the Fourth Figure remain there, and always need to switch the premises at the end.

136

V. R. Pratt

Proof The contradictory of a sentence leaves the terms unchanged, while exchanging the two contradicted sentences, and exchanging the premises if needed, permutes the three terms on the left independently of those on the right. It follows that the requisite change to the figure is independent of the mood of the syllogism. The actual transitions can then be verified by inspection. Note that the Fourth Figure has the unique property that all three of S,M,P each appear both on the left and the right (there are no duplicates on either side) and contraposition cannot change that property.  Theorem 4 In each row of Table 5.1, the syllogisms in the second and third column are obtained from the one in the first column by applying contraposition to respectively its major and minor premise. Proof The transition tables in the lemma make it easy to verify the Figures in the second and third columns. To verify the Moods, first check that the result conclusion is the contradictory of the selected premise. The contradictory of the initial conclusion is moved to the selected premise or the unselected one according to whether the transition is a single or double arrow respectively (always the unselected one in the case of the Fourth Figure). The unselected premise then moves unchanged to the remaining position of the result.  Figure 5.1 summarizes Theorems 2 and 4 with a visualization of the connections between the 24 syllogisms, using the numbering from Table 5.1. The thick horizontal lines denote the nine obversions. The thin lines denote the 14 conversions: nine horizontal, four diagonal any one of which suffices to connect the second and third columns of Table 5.1, and one connecting 19 and 22 (the only one available), for a total of 23 unoriented lines. The oriented vertical lines (those with arrowheads) denote the 16 contrapositions, pointing up or down according to whether they act on the major or minor premise respectively of the syllogisms in column 1. All lines are invertible, with the proviso that the oriented lines include the orientation information (up or down) when being inverted (it is necessary to know which premise led to that syllogism).

5.9 Proof of Theorem 1 At this point we have everything needed to prove Theorem 1 save what it means for Fig. 5.1 to constitute a proof system, and moreover one that is sound and complete. Proof systems customarily have axioms as the initial theorems from which the remaining theorems are derived. Since all the edges of Fig. 5.1 constitute equivalences, we could arbitrarily take any one syllogism in a connected component as the axiom from which the rest of that component is derived. However since this choice is completely arbitrary, making such a choice adds nothing of signifiance. Hence we may as well stop with just Fig. 5.1 itself as the proof system and dispense with choosing a representative of each component as an axiom.

5 The Four Essential Aristotelian Syllogisms …

137

There are actually two proof systems in Fig. 5.1. Omitting the directed edges gives the system that reduces the 24 assertoric syllogisms to four, while including them gives the system that reduces the 24 to just two syllogisms. This is all that needs to be said to justify calling Fig. 5.1 a proof system. It remains to say what we mean by sound and complete in this context. As usual soundness is not in question. One meaning of completeness is that no other syllogisms besides the classical 24 are valid. This was first shown by Łukasiewicz, and confirmed since then by others, and therefore does not need our attention. What we have in mind here is simply that there are sufficient edges in Fig. 5.1 to reduce the number of connected components to four in the absence of contraposition (the directed edges), and further to two with them. This notion of completeness is easily verified by inspection of Fig. 5.1.

5.10 Conclusion We took the position that the principle of substitution used to reduce the potential infinity of valid syllogistic forms to a finite number was of the same nature as the rule of obversion, and inferred that obversion should therefore be allowed to further reduce that number with the same justification. We also argued that the principle justifying the convention of putting the major premise first was of the same nature as the principle justifying the rule of conversion, so that too should be allowed to further reduce the number. We showed that this allowed the 24 assertoric syllogisms, including the conditionally valid ones, to be further reduced to just four: one with a general conclusion, one with a particular conclusion, one contingent on its middle term being inhabited, and one contingent on its subject being inhabited. This matches up exactly to Reichenbach’s four essential syllogisms [8]. This classification accounts for respectively 5, 10, 3, and 6 syllogisms identifiable using obversion and conversion. We then pointed out a one-to-two correspondence between the first column of Table 5.1 and the other two columns created by the rule of contraposition, thereby further reducing the four essential syllogisms to two, namely the unconditional syllogism and the conditional one.

References 1. Aristotle. (1962). Analytica priora. In Minio-Paluello, L. (ed.), Aristoteles Latinus III.14. BrugesParis: Descle de Brouwer. 2. Corcoran, J. (1974). Aristotle’s natural deduction system. In Corcoran, J. (ed.), Ancient logic and its modern interpretations, synthese historical library, (pp. 85–131). Springer. 3. Heath, P. (ed.) (1966). On the syllogism, and other logical writings. Routledge and Kegan Paul. 4. Lukasiewicz, J. (1957). Aristotle’s syllogistic (2nd ed.). Oxford: Clarendon Press.

138

V. R. Pratt

5. Pratt, V. R. (1969). Translation of Lewis Carroll’s syllogisms into logic. Masters Thesis, University of Sydney. 6. Pratt, V. R. (2017). Aristotle, boole, and categories. In Moss, L. (ed.), Rohit Parikh on logic, language and society, LNCS. Springer. 7. Reichenbach, H. (1947). Essentials of symbolic logic. Macmillan. 8. Reichenbach, H. (1952). The syllogism revised. Philosophy of Science, 19(1), 1–16. 9. Smiley, T. J. (1973). What is a syllogism? J. Philosophical Logic, 2(1), 136–154. Vaughan Pratt is Professor Emeritus of Computer Science at Stanford University. He obtained a double honours degree in pure mathematics and physics from the University of Sydney, and his Ph.D. from Stanford under Donald Knuth. He taught at MIT from 1972 to 1980, working in natural language, algorithms, complexity theory, and logics of programs. While there he helped found and was president of Research and Consulting Inc., a consortium of MIT faculty. In 1980 while on sabbatical from MIT he joined Stanford’s Sun workstation project, subsequently managing it until and the Pixrect graphics system. He taught at Stanford from 1981 until his retirement in 2000. In March 2000 he founded Tiqit Computers, which operated until 2010. His current interests include concurrency theory, algebraic logic, global environmental change, and automobile technology. He is a Fellow of the Association for Computing Machinery and serves on the editorial board of several professional journals.

Chapter 6

Adding Guarded Constructions to the Syllogistic Ian Pratt-Hartmann

Abstract The relational syllogistic extends the classical syllogistic by allowing predicate phrases of the forms “r s every q”, “r s some q” and their negations, where q is a common (count) noun and r a transitive verb. It is known that both the classical and relational syllogistic admit a finite set of syllogism-like rules whose associated derivation relation is sound and complete (the latter only when reductio ad absurdum is allowed). In this article, we extend the classical and relational syllogistic by allowing ‘guarded’ predicate phrases of the form “r s only qs”, and their negations. We show that, in both cases, the resulting logic is pspace-complete. It follows, on the assumption that nptime = pspace, that neither extension admits a finite set of syllogism-like rules whose associated derivation relation is sound and complete, even when reductio ad absurdum is allowed. We also show that further extending these systems with noun-complementation in sentence-subjects results in logics which are exptime-complete. Keywords Syllogism · Guarded fragment · Computational complexity · Proof theory

6.1 Introduction By the classical syllogistic, here denoted S , we understand the set of English sentences of the forms Every p is a q No p is a q

Some p is a q Some p is not a q,

(6.1)

I. Pratt-Hartmann (B) Department of Computer Science, University of Manchester, Manchester M13 9PL, UK e-mail: [email protected] Instytut Informatyki, Uniwersytet Opolski, 45-040 Opole, Poland © Springer Nature Switzerland AG 2021 J. Madarász and G. Székely (eds.), Hajnal Andréka and István Németi on Unity of Science, Outstanding Contributions to Logic 19, https://doi.org/10.1007/978-3-030-64187-0_6

139

140

I. Pratt-Hartmann

where p and q are common (count) nouns. Mutatis mutandis, this is of course the language studied in Aristotle’s Prior Analytics book A. The catalogue of syllogisms familiar to generations of logicians since can be shown to constitute, with the addition of some technical rules to handle corner-cases, a sound and complete proof system for the classical syllogistic, under standard model-theoretic semantics [2, 8]. By the relational syllogistic, here denoted R, we understand the classical syllogistic together with the set of English sentences of the forms Every p r s every q Every p r s some q No p r s any q No p r s every q

Some p r s every q Some p r s some q Some p r s no q Some p does not r every q,

(6.2)

where p and q are common (count) nouns and r is a transitive verb. Here we take quantifiers in subjects to outscope both verb negation (if present) and quantifiers in objects; and we take verb negation (if present) to outscope quantifiers in objects. In particular, “Some p does not r every q” is read as ∃x( p(x) ∧ ¬∀y(q(y) → r (x, y))), or equivalently, ∃x( p(x) ∧ ∃y(q(y) ∧ ¬r (x, y))). Thus understood, the sentences in both (6.1) and (6.2) come in mutually contradictory pairs. That is: both S and R are effectively closed under negation. The relational syllogistic—unlike its classical forebear—allows us to capture valid arguments having an essentially relational character, for example: Some artist admires every beekeeper No beekeeper admires every artist Some artist is not a beekeeper.

(6.3)

A sound and complete proof system, in the form of a finite set of syllogism-like proof-rules, was given for the relational syllogistic by [6, Theorem 4.10]. In contradistinction to the classical case, the rule of reductio ad absurdum is indispensable here (op. cit. Theorems 3.7 and 4.1). Evidently, both the classical and relational syllogistic are fragments of first-order logic, and it is easy to see using standard results that their respective satisfiability problems are decidable. In fact, the satisfiability problem for the classical syllogistic is easily shown to be nlogspace-hard by reduction from directed graph-reachability; and the satisfiability problem for the relational syllogistic can be shown (though not so easily) to be in nlogspace by a reduction in the opposite direction (PrattHartmann and Moss op. cit. Theorem 4.11). Thus, the satisfiability problems for both the classical and the relational syllogistic are nlogspace-complete. The classical syllogistic and its latter-day extensions are by no means unique in respect of decidability, of course. Numerous other decidable fragments of first-order logic have been studied in recent decades, particularly under the banners of modal logic or description logics. Such logics secure low computational complexity by allowing only guarded quantification, a notion first introduced by Andréka et al. [1].

6 Adding Guarded Constructions to the Syllogistic

141

Here, a universal quantifier can appear only in a context of the form ∀x(α ¯ → ϕ) where α is an atomic formula containing all the variables appearing free in ϕ, and an existential quantifier only in a context of the form ∃x(α ¯ ∧ ϕ). Using guarded quantification, we are able to express the property of, for example, admiring only beekeepers, since the formula ∀y(admire(x, y) → beekeeper(y)) is guarded; but we cannot express the property of admiring every beekeeper, as occurs in the argument (6.3), since the formula ∀y(beekeeper(y) → admire(x, y)) is not guarded. The guarded fragment is significant on account of its model-theoretic and computational properties. In regard to the latter, [3] shows that the satisfiability problem for the guarded fragment is 2-exptime-complete, falling to exptime-complete if there is any fixed bound (of at least 2) on the number of variables. This prompts us to ask what happens if we replace the patterns of quantification found in the relational syllogistic with their guarded counterparts? What happens, indeed, if we extend the relational syllogistic with guarded quantification? More concretely, consider the sentence forms Every p r s only qs Every p r s some non-q

Some p r s only qs Some p r s some non-q.

(6.4)

These sentences are expressed by the guarded first-order formulas ∀x( p(x) → ∀y(r (x, y) → q(y))) ∀x( p(x) → ∃y(r (x, y) ∧ ¬q(y)))

∃x( p(x) ∧ ∀y(r (x, y) → q(y))) (6.5) ∃x( p(x) ∧ ∃y(r (x, y) ∧ ¬q(y))).

Let the system P be the result of adding the forms in (6.4) to S , and let the system Q be the result of adding the forms in (6.4) to R. Here is an example of a valid argument in P: Every artist admires only beekeepers Some artist admires some non-carpenter Some beekeeper is not a carpenter. From a linguistic point of view, the sentence forms in (6.4) are rather stilted, and seem somehow less natural than those in (6.2). However, they are arguably fully grammatical, and certainly express sensible logical conditions. It is natural therefore to ask how the systems P and Q compare, in complexity-theoretic terms, to the systems S and R, and whether we can give sound and complete syllogism-like proof procedures for them. It is obvious that the satisfiability problem for P is at least as hard as that for S , since the former language subsumes the latter; similarly the satisfiability problem for Q is at least as hard as that for R. However, little else is obvious. All of the languages considered above—S , P, Q and R—admit a further extension, which will turn out to be significant in the sequel, namely, the inclusion of noun-level negation. Consider the forms

142

I. Pratt-Hartmann

Every non- p is a q

Some non- p is not a q,

(6.6)

which correspond to the first-order formulas ∀x(¬ p(x) → q(y))

∃x(¬ p(x) ∧ ¬q(x)).

(6.7)

Let the system S + be the result of adding the forms in (6.6) to S . Equivalently, we may take S + to be the extension of S by allowing p and q in (6.1) to range over common nouns such as artist, beekeeper etc. and their noun-level negations, such as non-artist, non-beekeeper. (Aristotle himself, of course, briefly considered just such an extension of the syllogistic in De Interpretatione; see also, for example, [7].) Let the system R + likewise be the result of allowing p and q in the forms (6.1) and (6.2) to range over common (count) nouns or their noun-level negations. Thus, R + includes sentences such as “No non-artist admires every non-beekeeper.” It is easy to show (for example by reduction to 2-SAT) that the satisfiability problem for S + remains in nlogspace. Moreover, there exists a finite set of syllogismlike proof-rules for S + constituting a sound and complete proof system [6, Theorem 3.6]. With R + , however, we encounter a surprise. The satisfiability problem for this logic is exptime-complete, and there exists no finite set of syllogism-like proofrules for R + constituting a sound and complete proof system, even when reductio ad absurdum is allowed [6, Theorem 6.3, Theorem 6.12]. Thus, for R (but not for S ), adding noun-level negation has a dramatic impact on the proof-theoretic and complexity-theoretic properties of the logic. These results suggest investigating corresponding extensions of the systems P and Q: we take P + to be the logic obtained by extending P with noun-level negation, and Q + to be the logic obtained by extending Q with noun-level negation. Thus, for example, P + and Q + include sentences such as “No non-artist admires only non-beekeepers.”

Fig. 6.1 Inclusion relations between S , P , Q , R , S + , P + , Q + and R +

6 Adding Guarded Constructions to the Syllogistic

143

The lattice of inclusion relations among the logics S , P, Q, R, S + , P + , Q and R + is shown in Fig. 6.1. Again, we ask: what is the complexity of the satisfiability problems for the systems P + and Q + , and do they admit sound and complete syllogism-like proof procedures? It is obvious on the basis of inclusion relations that the satisfiability problem for P + is at least as hard as that for S + , and the satisfiability problem for Q + at least as hard as that for R + . Again, however, little else is obvious. In this article, we show that the satisfiability problems for the logics P and Q are both pspace-complete, and that the satisfiability problems for the logics P + and Q + are both exptime-complete. It follows from widely accepted complexity-theoretic separation assumptions that none of these logics admits a finite set of syllogism-like rules whose associated derivation relation is sound and complete, even when reductio ad absurdum is allowed. +

6.2 Technical Preliminaries Fix some countably infinite set, whose elements we refer to as unary atoms. We use the (possibly decorated) variables o, p, q to range over the unary atoms. A unary literal is an expression given by the syntax  := p

|

p. ¯

We use the (possibly decorated) variables , m to range over unary literals. An S formula is an expression given by the syntax ϕ := ∀( p, )

|

∃( p, ),

and an S + -formula is an expression given by the syntax ϕ := ∀(, m)

|

∃(, m).

The language S is the set of S -formulas, and similarly for S + . Obviously, S  S + . We use the (possibly decorated) variables ϕ, ψ to range over formulas, regardless of language. The intention is that S formalizes the Classical syllogistic, with sentence forms given by (6.1), while S + formalizes the Classical syllogistic together with the sentence-forms No non- p is a q Some non- p is a q Every non- p is a q Some non- p is not a q.

(6.8)

The semantics are then as expected: an interpretation A consists of a non-empty domain A equipped with a function p → p A defined on the unary atoms, where

144

I. Pratt-Hartmann

p A ⊆ A for every unary atom p. This function is extended to all unary literals by setting p¯ A = A \ p A . Truth of S + -formulas in a structure A is then given by A |= ∀(, m) ⇔ A ⊆ m A

A |= ∃(, m) ⇔ A ∩ m A = ∅.

We are here keeping with present-day fashion, and departing from older treatments of the syllogism, in taking ∀( p, ) not to entail its sub-alternate form, ∃( p, ). However, the matter is inessential, as existential presupposition can, if desired, always be restored by adding premises of the form ∃( p, p). Now fix a second countably infinite set, whose elements we refer to as binary atoms. We use the (possibly decorated) variable r to range over the binary atoms. A binary literal is an expression given by the syntax t := r

|

r¯ ,

an R-molecule is an expression given by the syntax c := 

|

∀( p, t)

|

∃( p, t),

and an R-formula is an expression given by the syntax ϕ := ∀( p, c)

|

∃( p, c),

where c ranges over R-molecules. Similarly, an R + -molecule is an expression given by the syntax c :=  | ∀(, t) | ∃(, t), and an R + -formula is an expression given by the syntax ϕ := ∀(, c)

|

∃(, c),

where c ranges over R + -molecules. Again, we refer to the sets of formulas thus defined as the languages R and R + . We use the (possibly decorated) variables c, d to range over molecules (both in R and R + ) and ϕ, ψ to range over formulas. Observe that the above syntax overloads the symbols ∀ and ∃, allowing them to combine either a unary literal and a binary literal to produce a molecule, or a unary literal and a molecule to produce a formula. The intention is that R formalizes the relational syllogistic, with sentence forms given by (6.2), while R + formalizes its extension with noun-level negation. The semantics of R and R + is then as expected. We take structures additionally to be equipped with a function r → r A defined on the binary atoms, where r A ⊆ A × A for every binary atom r . The function ·A is extended to binary literals and to R + molecules by setting

6 Adding Guarded Constructions to the Syllogistic

145

p¯ A =A \ p A r¯ A =A × A \ r A ∀(, t)A ={a ∈ A | for all b ∈ A , a, b ∈ t A } ∃(, t)A ={a ∈ A | for some b ∈ A , a, b ∈ t A }, and truth of R + -formulas in a structure A is given by interpreting the sentence forming operators ∀ and ∃ as above, namely, A |= ∀(, c) ⇔ A ⊆ cA

A |= ∃(, c) ⇔ A ∩ cA = ∅.

This concludes our preliminary remarks concerning the languages S , S + and their relational extensions R and R + . We turn now to the guarded counterparts to R and R + that form the subject of this paper. A P-molecule is an expression given by the syntax c := 

|

O( p, r )

|

N ( p, r ),

and a P-formula an expression given by the syntax ϕ := ∀( p, c)

|

∃( p, c),

where c is a P-molecule. Similarly, a P + -molecule is an expression given by the syntax c :=  | O(, r ) | N (, r ), and a P + -formula an expression given by the syntax ϕ := ∀(, c)

|

∃(, c),

where c is a P + -molecule. Again, we refer to the sets of formulas thus defined as the languages P and P + . Thus, P  P + . The intention is that P formalizes the fragment of English given by the sentence forms (6.4), while P + formalizes its extension with noun-level negation. Accordingly, the semantics of P + is given by setting O(, t)A ={a ∈ A | for all b such that a, b ∈ t A , b ∈ A } / A }, N (, t)A ={a ∈ A | for some b such that a, b ∈ t A , b ∈ and interpreting all other operators as above. In terms of the more familiar syntax of first-order logic, the P-molecules O( p, r ) and N ( p, r ) correspond to the respective 1-place formulas ∀y(r (x, y) → p(y))

∃y(r (x, y) ∧ ¬ p(y)).

146

I. Pratt-Hartmann

We remark that the quantification here is guarded. Finally, we define the language Q to consist of the languages R and P taken together; similarly, the language Q + consists of the languages R + and P + taken together. Thus, we may informally think of Q + as the extension of Q with noun-level negation. The inclusion relationships between the languages S , P, Q, R and their respective extensions with noun-level negation are depicted in Fig. 6.1. We remark that the Q + -molecules come in opposite pairs, in the sense that, for any structure A, any unary literal  and any binary atom r : ∀(, r )A =A \ ∃(, r¯ )A ∀(, r¯ )A =A \ ∃(, r )A O(, r )A =A \ N (, r )A . We write c¯ for the opposite of c. (Thus, c¯¯ = c.) It is easy to verify that, if L is any of the six languages R, R + , P, P + , Q or Q + , and c is an L -molecule, then c¯ is also an L -molecule. Likewise, Q + -formulas also come in opposite pairs, in the sense that, for any structure A, any unary literal  and any molecule c: A |= ∀(, c) if and only if A |= ∃(, c). ¯ We write ϕ¯ for the opposite of ϕ. (Thus, ϕ¯¯ = ϕ.) It is easy to verify that, if ϕ is an L -formula, where L is any of the eight languages mentioned above, then so is ϕ. ¯ Thus, all these languages are effectively closed under negation. As usual, if Φ is a set of L -formulas and A a structure, we write A |= Φ to mean that A |= ϕ for all ϕ ∈ Φ. A set of formulas Φ is satisfiable if there exists a structure A such that A |= Φ. If L is any of the eight languages defined above, the satisfiability problem for L is the following problem: given a finite set Φ of L -formulas, return Y if Φ is satisfiable; otherwise, return N. It was shown in [6] that the satisfiability problems for S , S + and R are all nlogspace-complete, but that the satisfiability problem for R + is exptimecomplete. The least straightforward of these results, that concerning R, depends on analysis of a sound and refutation-complete collection of syllogistic proof rules for that language (see Sect. 6.5, below). In this article, we obtain tight complexity bounds for the satisfiability problems for the four remaining languages of Fig. 6.1, using these results to draw various (tentative) conclusions regarding the existence of sound and complete syllogistic proof systems for these languages. When considering the satisfiability of a set Φ of formulas in any of the above languages, we may safely ignore unary and binary atoms not occurring in Φ. By a signature, we mean a subset of the unary and binary atoms; the signature of Φ, denoted σΦ , is just the set of the unary and binary atoms occurring in Φ. In the sequel, we allow a structure A to define only those sets p A and r A where p and r belong to a particular σΦ , which will always be clear from context.

6 Adding Guarded Constructions to the Syllogistic

147

6.3 Lower Complexity Bounds Lower complexity bounds for the systems P and P + are straightforward to obtain. We proceed by directly encoding runs of (space bounded) Turing machines using formulas of these languages. For P, we consider deterministic Turing machines, and for P + , alternating Turing machines. In order to define the advertised encodings, we must first outline the presentation of Turing machines used. Fix some alphabet Σ with |Σ| ≥ 2, and write Σ  = Σ ∪ {, }, where  and  are symbols not in Σ which we read as left-margin and blank, respectively. We take a deterministic Turing machine M to comprise a single tape with squares numbered 0, 1, 2, …, over which a read-write head can move, a set of states S, initial and final states s0 and s ∗ , and a transition table comprising a finite set of tuples s, d, t, e, δ, where s, t ∈ S, d, e ∈ Σ  and δ ∈ {left, stay, right}. In the initial configuration, M is in state s0 , the head is over tape-square 0 containing the symbol , tape-squares 1 to |x| contain the input string x (assumed to be a word over Σ), and all other squares contain the symbol . We imagine M to transition from one configuration to another according to the tuples in the transition table: the tuple

s, d, t, e, δ is interpreted as stating that if M is currently in state s and the head is over a tape-square containing the symbol d, then in the next configuration, the symbol under the head is replaced by e, M transitions to state t, and the head moves by up to one square as specified by by δ. We assume that the transition table contains no tuple s ∗ , . . ., and, for all s ∈ S \ {s ∗ } and all d ∈ Σ  , contains exactly one tuple

s, d, . . .. We further assume that M never overwrites the symbol  with any other symbol, and never writes that symbol over any other. The run of M with input x is the (finite or infinite) sequence of configurations generated by the transition table from the initial configuration in the obvious way. The run is accepting if a configuration is reached in which the state is s ∗ and the head is positioned over square 0, otherwise rejecting. Note that the run may be finite (i.e. have final configuration with state s ∗ ) and still be rejecting, namely, when the head is not over the leftmost square. For any function f : N → N, M is f -timebounded if the run of M has length at most f (|x|), where |x| denotes the length of x. Similarly, and for the purposes of the present article, we may say that M is f -space-bounded if, for all strings x over Σ, the index of any tape square visited in any configuration in the run of M is at most f (|x|). The latter definition is actually slightly non-standard: space-bounds normally do not count the space required for input and output; however, for the functions f we are concerned with here, this detail may be ignored. The class pspace is thus the class of languages accepted by some f space-bounded deterministic Turing machine, where f is a polynomial; and the class exptime is the set of languages accepted by some 2 f -time-bounded deterministic Turing machine, where f is a polynomial. Theorem 6.1 The satisfiability problem for P is pspace-hard. Proof We show that, for any polynomial f , and any f -space-bounded deterministic Turing machine M over some alphabet Σ, we may map any input string x over Σ to a

148

I. Pratt-Hartmann

set of P-formulas Φx such that x is accepted by M if and only if Φx is unsatisfiable; moreover, this mapping is computable using only logarithmic space (as a function of |x|). Since the complement of any language in pspace is (obviously) also in pspace, this proves the theorem. Fix some f -space-bounded deterministic Turing machine M over Σ, then. We show how to compute, for a given string x, the set of P-formulas Φx . Let n = |x|. We may assume without loss of generality that the symbols o, pi,d and qi,d,s are unary atoms, and the symbols ri,d,s , si,d,s and i,d,s are binary atoms, for all i (0 ≤ i ≤ f (n)), all d ∈ Σ  and all s ∈ S. It helps to think of these unary and binary atoms as being interpreted over a domain whose elements are the various configurations encountered during the run of M on input x as follows: o—the property of being the initial configuration; pi,d —the property of being a configuration in which the ith tape square contains d, but the head does not lie over that square; qi,d,s —the property of being a configuration in which the ith tape square contains d, the head lies over that square and the program state is s; ri,d,s —the relation holding between two configurations such that, in the first, the head lies over the ith tape square, in the second, the program state is s, and M transitions from the first configuration to the second, writing symbol d on the ith square and moving the head right; si,d,s —like ri,d,s but the head remains stationary; i,d,s —like ri,d,s but the head moves left. We are now ready to define Φx so as to encode the run of M on x; this we do by adding formulas in stages. The initial configuration with input string x is encoded by adding to Φx the set of formulas {∃(o, q0,,s0 )} ∪ {∀(o, pi,x[i] ) : 1 ≤ i ≤ |x|} ∪ {∀(o, pi, ) : |x| + 1 ≤ i ≤ f (n)}, where x[i] indicates the ith symbol of x (counting from 1). Recall that s0 is the initial state,  the symbol on the leftmost (index 0) tape square and  the blank symbol. Whenever the transition-table of M contains a tuple s, d, t, e, right, we add to Φx the set of formulas {∀(qi,d,s , N (o, ri,e,t )) | 0 ≤ i < f (n)}, stating that M performs this transition when triggered to do so. Tuples of the forms

s, d, t, e, stay or s, d, t, e, left are handled similarly, but with ri,e,t replaced by si,e,t or i,e,t , respectively. (We remark in passing that the unary atom o here functions essentially as a dummy: when a transition is made, this is necessarily to a non-initial configuration, not satisfying o; however, none of the formulas in Φx makes any use of this fact.) We also add to Φx sets of formulas encoding the effects of the transitions indicated by ri,e,t , si,e,t and i,e,t . For ri,e,t , these effects are that tape square i will subsequently contain e, that the head will then be over square i + 1, that the contents of tape

6 Adding Guarded Constructions to the Syllogistic

149

squares other than i will be unaffected: {∀(qi,d,s , O( pi,e , ri,e,t )) | 0 ≤ i ≤ f (n), d, e ∈ Σ  , s, t ∈ S} {∀( pi+1,d , O(qi+1,d,t , ri,e,t )) | 0 ≤ i < f (n), d, e ∈ Σ  , s, t ∈ S} {∀( p j,d , O( p j,d , ri,e,t )) | 0 ≤ i < f (n), 0 ≤ j ≤ f (n), j = i, j = i + 1, d, e ∈ Σ  , s, t ∈ S}. The first of these sets of formulas states that, if the head of M is over square i, and M makes a transition involving writing the symbol e, moving the head right and transitioning to state t, then, in the resulting configuration, square i will contain e, and the head will not lie over it; the second states that, under those circumstances, the head will be over the (i + 1)th square, reading whichever symbol d was there before, and with the machine in state t; the final set of formulas states that all other squares will contain whatever they contained before, and the head will not be over them. Similar formulas, with minor adjustments, apply to si,e,t and i,e,t . Finally, we add to Φx the formula ∀(q0,,s ∗ , q¯0,,s ∗ ) stating that there are no configurations with the property q0,,s ∗ : in other words, we never reach a configuration in which M terminates in an accepting configuration. If the run of M on input x is non-accepting, then, by interpreting the unary and binary atoms in Φx as indicated above over the configurations in that run, we immediately obtain a model of Φx . Conversely, if Φx has a model, then that model must contain a sequence of elements corresponding (in the obvious sense) to the run of M on x; and that run cannot be accepting. Thus, the run of M on input x is accepting if and only if Φx is unsatisfiable. It is straightforward to check (remembering M is  fixed) that the computation of Φx uses O(log n) memory. Thus, the satisfiability problem for P (pspace-Hard) is essentially harder than that for R (in nlogspace). We next turn our attention to the language P + . To make the proof more perspicuous, we begin with a simple observation on the power of this language to express disjunction. Fix some signature σ of unary and binary atoms, and let o, p, q be unary atoms of σ . Now let x, y be unary atoms not in σ , and r a binary atom not in σ . Denote by Φo→ p∨q the set of three P + -formulas {∀(o, N (x, r )), ∀( p, ¯ O( y¯ , r )), ∀(q, ¯ O(y, r ))}. Notice that the latter two formulas here are not in the language P. Evidently, if A |= Φo→ p∨q , then, for all a ∈ oA , either a ∈ p A or a ∈ q A . Indeed, if a is in the set oA , but in neither of the sets p A or q A , then the trio of formulas Φo→ p∨q requires a to be related by r to (i) something, (ii) non-y’s only and (iii) y’s only—clearly an impossibility. Conversely, any structure A interpreting σ and satisfying the prop-

150

I. Pratt-Hartmann

erty that oA ⊆ p A ∪ q A can be expanded to a structure A+ such that A+ |= Φo→ p∨q . + + + Indeed, let x A = A \ oA , y A = oA ∩ p A and r A = { a, a | a ∈ oA }. Thus, from the point of view of satisfiability, we may regard Φo→ p∨q as simply stating the condition that o entails the disjunction of p and q. For ease of readability, we henceforth write ∀(o, p ∨ q) as an abbreviation for Φo→ p∨q , proceeding as if this were a P + formula with the obvious interpretation. Of course, the atoms x, y and r must be chosen afresh for each triple o, p, q of unary atoms for which this abbreviation is used. With this observation behind us, we can return to the lower complexity bound for P + , again beginning with the presentation of Turing machines used in the proof. We take an alternating Turing machine M to be just like a deterministic Turing machine, as described above, except that the set of non-halting states S \ {s ∗ } of M is partitioned into universal states, S∀ and existential states, S∃ . Further, we assume that, for any s ∈ S \ {s ∗ } and any d ∈ Σ  , the transition table contains either one or two transitions s, d, t, e, δ, and that, in addition, if s ∈ S∀ , then t ∈ S∃ ∪ {s ∗ }, and if s ∈ S∃ , then t ∈ S∀ ∪ {s ∗ }. Thus, in runs of an alternating Turing machine, the configurations alternate between existential and universal states, with binary branching possible at each non-halting state. The run of M on input x takes the form of a tree of configurations, where the root is the initial configuration (as with deterministic Turing machines) and the daughter(s) of any configuration in the tree are obtained by applying the transitions of M in the expected way. The set of accepting configurations in this tree is the smallest set satisfying the following conditions: (i) a halting configuration—i.e. a configuration at a leaf of the tree, in which the program state is s ∗ —is accepting just in case the head is over the tape square 0 (as with deterministic Turing machines); (ii) an existential configuration is accepting just in case some available transition leads to an accepting configuration; and (iii) a universal configuration is accepting just in case all of its available transitions lead to accepting configurations. A string is accepted by an alternating Turing machine just in case the initial configuration—at the root of the tree—is accepting. The notion f -spacebounded is defined exactly as for deterministic Turing machines. The class apspace thus is the set of languages accepted by some f -space-bounded alternating Turing machine, where f is a polynomial. It is a standard result that apspace = exptime (see, e.g. [4] p. 48 ff.). Theorem 6.2 The satisfiability problem for P + is exptime-hard. Proof We show that, for any polynomial f , and any f -space-bounded alternating Turing machine M over some alphabet Σ, we may map any input string x over Σ to a set of P + -formulas Ψx such that x is accepted by M if and only if Ψx is unsatisfiable; moreover, this mapping is computable using only logarithmic space (as a function of |x|). Since apspace = exptime, and since the complement of any language in exptime is (obviously) also in exptime, this proves the theorem. Fix some f -space-bounded alternating Turing machine M over Σ, then. We show how to compute, for a given string x, the set of P + -formulas Ψx . The proof proceeds as for Theorem 6.1: for any alternating Turing machine M with space bound f , we

6 Adding Guarded Constructions to the Syllogistic

151

show how to encode runs with a given input x using the formulas Ψx . If s, d, t, e, δ and s, d, t  , e , δ   are distinct tuples in the transition table of M available for any configuration with program state s and read-symbol d, we arbitrarily designate one of these as the negative-transition, and the other as the positive-transition. If M provides only one transition for such configurations, we make an identical copy of that transition and proceed as before. Since M may without loss of generality be assumed to halt on all inputs, the run of M on any input X is a finite tree in which all non-leaf vertices (which represent nonhalting configurations) have exactly two daughters. If M does not accept x, then the initial configuration—i.e., the root of the tree—is non-accepting. Indeed, for every vertex representing a non-accepting existential configuration, both its daughters represent non-accepting (universal) configurations, while, for every vertex representing a non-accepting universal configuration, at least one of its two daughters represents a non-accepting (existential) configuration. In this case then, we have a natural subtree of non-accepting configurations. This sub-tree must contain the root (since M by assumption does not accept x); each existential configuration in this sub-tree has two non-accepting daughters, and each universal configuration, at least one (reached by either the positive or the negative transition available at that point); finally, each leaf of this sub-tree must be non-accepting. It is obvious that the existence of such a tree of non-accepting configurations is necessary and sufficient for M not to accept x. The formulas Ψx are intended to be interpreted over a sub-tree of non-accepting configurations in the run of M on x. We employ unary atoms o, pi,d , qi,d,s (0 ≤ i ≤ f (|x|), d ∈ Σ  , s ∈ S), and binary atoms ri,d,s , si,d,s , i,d,s (0 ≤ i ≤ f (|x|), d, e ∈ Σ  , s, t ∈ S), interpreted as in the proof of Theorem 6.1. In addition, for all i, − d (0 ≤ i ≤ f (|x|), d ∈ Σ  ), and all s ∈ S∀ , we employ a pair of unary atoms qi,d,s + and qi,d,s , interpreted as follows − qi,d,s —the property of being a configuration in which ith tape square contains d, the head lies over that square and the program state is s, and the negative-transition of M leads to a non-accepting state; + —the property of being a configuration in which ith tape square contains d, qi,d,s the head lies over that square and the program state is s, and the positive-transition of M leads to a non-accepting state. 

We are now ready to define Ψx so as to encode the non-accepting sub-tree in the run of M on x; this we do by adding formulas in stages. The initial configuration with input string x is encoded by adding to Ψx the set of formulas {∃(o, q0,,s0 )} ∪ {∀(o, pi,x[i] ) : 1 ≤ i ≤ |x|} ∪ {∀(o, pi, ) : |x| + 1 ≤ i ≤ f (n)}. The existence of (non-accepting) successor configurations is encoded differently for existential and universal states. If s ∈ S∃ , any vertex in the tree of non-accepting configurations corresponding to program state s has both daughters present, since both the negative and positive transitions available in this configuration lead to nonaccepting configurations. For any such s, then, whenever the transition-table of M contains any tuple s, d, t, e, right, we add to Ψx the set of formulas

152

I. Pratt-Hartmann

{∀(qi,d,s , N (o, ri,e,t )) | 0 ≤ i ≤ f (n)}, stating that M performs this transition when triggered to do so. A similar set of formulas is added for tuples of the form s, d, t, e, left and s, d, t, e, stay, with the obvious adjustments. If s ∈ S∀ , any vertex in the tree of non-accepting configurations corresponding to program state s has at least one daughter present, since either the negative or positive transition available in this configuration leads to non-accepting configurations. For any such s, then, we add to Ψx the set of formulas − + ∨ qi,d,s ) | 0 ≤ i ≤ f (|x|), d ∈ Σ  , s ∈ S∀ }. {∀(qi,d,s , qi,d,s

Note that these formulas employ the notation ∀( p, p ∨ q) explained above. In addition, whenever the transition-table of M contains a tuple s, d, t, e, right designated as a negative transition, we add to Ψx the set of formulas − , N (o, ri,e,t )) | 0 ≤ i < f (n)}, {∀(qi,d,s

and whenever the transition-table of M contains a tuple s, d, t, e, right designated as a positive transition, we add to Ψx the set of formulas + , N (o, ri,e,t )) | 0 ≤ i < f (n)}. {∀(qi,d,s

These formulas guarantee the existence of the required transitions in the sub-tree of non-accepting configurations. Again, similar sets of formulas are added for tuples of the form s, d, t, e, left and s, d, t, e, stay, with the obvious adjustments. The formulas encoding the effects of transitions are as before. That is, for the binary atom ri,e,t (representing a transition in which the head moves right), we add to Ψ M,x the sets of formulas {∀(qi,d,s , O( pi,e , ri,e,t )) | 0 ≤ i ≤ f (n), d, e ∈ Σ  , s, t ∈ S} {∀( pi+1,d , O(qi+1,d,t , ri,e,t )) | 0 ≤ i < f (n), d, e ∈ Σ  , s, t ∈ S} {∀( p j,d , O( p j,d , ri,e,t )) | 0 ≤ i < f (n), 0 ≤ j ≤ f (n), j = i, j = i + 1, d, e ∈ Σ  , s, t ∈ S}, and similarly, with minor adjustments, for si,e,t and i,e,t . Finally, we again add to Ψx the formula ∀(q0,d,s ∗ , q¯0,d,s ∗ ) stating that there are no configurations in the sub-tree of non-accepting configurations with the property q0,d,s ∗ , since these are by definition accepting configurations.

6 Adding Guarded Constructions to the Syllogistic

153

It is again immediate from the suggested interpretations of the unary and binary atoms in Ψx that M accepts input x if and only if Ψx is unsatisfiable. And it is again straightforward to check that, for a fixed M, the set of P + -formulas Ψx can be computed from any string x ∈ Σ ∗ using memory O(log |x|).

6.4 Upper Complexity Bounds In Sect. 6.3, we obtained lower complexity bounds for the systems P and P + . In this section, we obtain matching upper bounds for the respective systems Q and Q + . The former requires rather more work than the latter. We describe algorithms using pseudo-code, calculating their time- and space-requirements in the standard way, and taking the possibility of implementation on Turing machines as obvious. We allow algorithms to make non-deterministic choices, it being understood that a set of strings L is accepted by a non-deterministic algorithm A , if, for every string x, x ∈ L just in case A has some run on input x reporting acceptance. Savitch’s theorem states that any language recognized by a non-deterministic algorithm running with space bound f (n) ≥ log n is also recognized by a deterministic algorithm with a space bound f (n)2 (see, e.g. [4] p. 15 ff.). The following notions will prove useful in the sequel. Let L be one of the languages S , P, Q, R, S + , P + , Q + or R + . Call a set of L -formulas complete over a signature σ just in case, for any L -formula ϕ featuring only predicates in σ , either ϕ or ϕ¯ is in Φ. Call Φ complete if it is complete over σΦ . A completion of Φ is a complete set of formulas Ψ ⊇ Φ. It is obvious that a set of formulas Φ is satisfiable if and only if it has a satisfiable completion. Let Φ be a set of Q-formulas. We define the binary relation ⇒Φ over the set of Q-molecules to be the smallest reflexive, transitive relation including { p, c | ∀( p, c) ∈ Φ} ∪ { c, ¯ p ¯ | ∀( p, c) ∈ Φ}. If s is a set of Q-molecules, we define the closure of s under Φ, denoted s Φ , to be ¯ and ·Φ is the set {d | c ⇒Φ d for some c ∈ s}. Obviously, c ⇒Φ d implies d¯ ⇒Φ c, a closure operator in the usual sense. We say that s is closed under Φ if s = s Φ . Observe also that, if A |= Φ with a ∈ A, then a ∈ cA and c ⇒Φ d imply a ∈ d A . Theorem 6.3 The satisfiability problem for Q is in pspace. Proof We present a polynomial space algorithm A which, given a set of Q-formulas Φ, will report success if and only of Φ is satisfiable. We may assume that Φ contains at least one existential formula, since, otherwise, it is satisfied by any structure in which the extension of every unary atom is empty. We may also assume without loss of generality that Φ is complete. For if not, we may simply generate the completions of Φ one by one, applying our algorithm to each completion in turn, and either recovering the memory used if the generated completion is unsatisfiable or terminating successfully if it is satisfiable. Moreover, by Savitch’s theorem, it suffices to describe

154

I. Pratt-Hartmann

a non-deterministic algorithm A , running in polynomial space, such that, given any complete set of Q-formulas Φ, A has a successful run just in case Φ is unsatisfiable. This we now proceed to do. All Q-formulas and -molecules in the remainder of this proof will be assumed to be over the signature of Φ. We construct a set A whose elements are pairs s, i, where s is a set of unary Q-molecules and i ∈ {−1, 0, 1}. We define a notion of coherence for such sets, and show that A is coherent if and only if Φ is satisfiable. Finally, we present a non-deterministic algorithm running in polynomial space which, given Φ as input, has a successful run just in case A is not coherent. This proves the theorem. Our first task is to define the set A. The definition employs three functions allowing us to construct sets of Q-molecules from others. Let p be a unary atom, r a binary atom, and s any set of Q-molecules. We define w( p, r, s) = ({ p} ∪ {q | O(q, r ) ∈ s} ∪ {q¯ | ∀(q, r¯ ) ∈ s})Φ w( p, r¯ , s) = ({ p} ∪ {q¯ | ∀(q, r ) ∈ s})Φ w( p, ¯ r, s) = ({ p} ¯ ∪ {q | O(q, r ) ∈ s} ∪ {q¯ | ∀(q, r¯ ) ∈ s})Φ . We refer to sets of Q-molecules of the forms w( p, r, s), w( p, r¯ , s) and w( p, ¯ r, s) as minimal witness sets. To understand the motivation for minimal witness sets, suppose B |= Φ and b ∈ cB , and let s be the set of Q-molecules satisfied by b in B. If s contains a Q-molecule of the form c = ∃( p, r ), then there exists some b ∈ B such that b satisfies all the Q-molecules of w( p, r, s) in B. If c is instead the Q-molecule ∃( p, r¯ ) or N ( p, r ), then the same holds but, with w( p, r, s) replaced by w( p, r¯ , s) or w( p, ¯ r, s), respectively. With these auxiliary definitions behind us, we construct the set A, proceeding in levels. Letting A0 ={ { p, c}Φ , 0 | ∃( p, c) ∈ Φ} Ai+1 ={ w( p, r, s), 1 | s, ι ∈ Ai , ι ∈ {−1, 0, 1}, ∃( p, r ) ∈ s} ∪ { w( p, r¯ , s), −1 | s, ι ∈ Ai , ι ∈ {−1, 0, 1}, ∃( p, r¯ ) ∈ s} ∪ { w( p, ¯ r, s), 1 | s, ι ∈ Ai , ι ∈ {−1, 0, 1}, N ( p, r ) ∈ s}  for all i ≥ 0, we set A = i≥0 Ai . Since Φ by assumption contains at least one existential formula, A is non-empty. To motivate this construction, it helps to think of the elements of A0 as witnessing the existential Q-formulas in Φ, and the elements s, ι of Ai+1 , where ι ∈ {−1, 1}, as providing witnesses for the existential Q-molecules satisfied by elements of Ai . The integer ι indicates whether s, ι stands in the relation r or r¯ to the element for which it provides a witness. Observe that, for all s, ι ∈ A, s is closed under Φ. We call A coherent if, for all elements a = s, ι and b = t, κ of A, all Qmolecules c, all unary atoms p and all binary atoms r , the following hold: (i) if c ∈ s then c¯ ∈ / s; / t; and (ii) if ∀( p  , r ) ∈ s, ∀( p, r¯ ) ∈ s and p  ∈ t then p ∈

6 Adding Guarded Constructions to the Syllogistic

(iii) if ∀( p  , r ) ∈ s, O( p, r ) ∈ s and p  ∈ t, then p ∈ t.

155



Note that the notion of coherence ignores the component ι in the elements s, ι of A. Lemma 6.1 If A is coherent, then Φ is satisfiable. Proof We define a structure A with domain A by setting, for every unary atom p and binary atom r in the signature of Φ: p A ={ s, ι | p ∈ s} r A ={

s, ι, w( p, r, s), 1 | ∃( p, r ) ∈ s} ∪ {

s, ι, w( p, ¯ r, s), 1 | N ( p, r ) ∈ s} ∪ {

s, ι, t, κ | ∀( p, r ) ∈ s, p ∈ t}. To show that A |= Φ, it suffices to prove that, for every unary atom c in the signature of Φ, and every a = s, ι ∈ A, c ∈ s implies a ∈ cA . For suppose ∃( p, c) ∈ Φ. Then a = { p, c}Φ , 0 ∈ A0 ⊆ A, whence a ∈ p A ∩ cA , and A |= ∃( p, c). On the other hand, suppose ∀( p, c) ∈ Φ and a = s, ι ∈ p A . By construction of A, p ∈ s, and, since s is closed under Φ, c ∈ s, whence a ∈ cA , thus establishing A |= ∀( p, c). We proceed by cases. If c = p is a unary atom, then, by construction of A, c ∈ s implies a ∈ cA . If c = p¯ is a negated unary atom, then, by coherence of A (Clause (i)), c ∈ s implies p ∈ / s, whence a ∈ cA by construction of A. If c = ∃( p, r ), then, by definition of A, b = w( p, r, s), 1 ∈ A with p ∈ w( p, r, s), and, by construction of A, b ∈ p A and a, b ∈ r A , whence a ∈ cA . If c = N ( p, r ), then, by definition of A, b = w( p, ¯ r, s), 1 ∈ A with p¯ ∈ w( p, ¯ r, s). By the second case above (c = p), ¯ we have b ∈ p¯ A , and by the construction of A, we have a, b ∈ r A , whence a ∈ cA . If c = ∃( p, r¯ ), then, by definition of A, b = w( p, r¯ , s), −1 ∈ A, with, p ∈ / w( p, r¯ , s). By construction of A, therefore, b ∈ p A . We must show that a, b ∈ r A . Firstly, since b is not of the form w( p  , r, s), 1 or w( p¯  , r, s), 1, it follows from the construction of A that a, b can be in r A only if ∀( p  , r ) ∈ s for some unary atom p  ∈ w( p, r¯ , s). But in that case, the definition of w( p, r¯ , s) ensures that we also have p¯  ∈ w( p, r¯ , s), contradicting the coherence of A (Clause (i)). Therefore, a, b ∈ / r A , whence a ∈ cA . If c = ∀( p, r ), and b = t, κ ∈ p A , then, by construction of A, p ∈ t and hence

a, b ∈ r A , whence a ∈ cA . If c = ∀( p, r¯ ), we suppose that b = t, κ ∈ A is such that a, b ∈ r A , and show that p ∈ / t. There are three cases to consider, depending on why a, b ∈ r A . 1. ∀( p  , r ) ∈ s and p  ∈ t for some unary atom p  . By coherence of A (Clause (ii)), p ∈ / t. 2. ∃( p  , r ) ∈ s and t = w( p  , r, s) for some unary atom p  . But p¯ ∈ t by defini/ t. tion of w( p  , r, s), and so, by the coherence of A (Clause (i)), p ∈

156

I. Pratt-Hartmann

3. N ( p  , r ) ∈ s and t = w( p¯  , r, s) for some unary atom p  . Again, p¯ ∈ t by / t. definition of w( p¯  , r, s), and so, by the coherence of A (Clause (i)), p ∈ / p A , whence a ∈ cA . Thus, a, b ∈ r A implies b ∈ If c = O( p, r ), we suppose that b = t, κ ∈ A is such that a, b ∈ r A , and show that p ∈ t. There are again three cases to consider, depending on why a, b ∈ r A . 1. ∀( p  , r ) ∈ s and p  ∈ t for some unary atom p  . By coherence of A (Clause (iii)), p ∈ t. 2. ∃( p  , r ) ∈ s and t = w( p  , r, a) for some unary atom p  . By definition of w( p  , r, s), p ∈ t. 3. N ( p  , r ) ∈ s and b = w( p¯  , r, s) for some unary atom p  . By definition of w( p¯  , r, s), p ∈ t. Thus, a, b ∈ r A implies b ∈ p A , whence a ∈ cA . This completes the proof of Lemma 6.1.



Lemma 6.2 If Φ is satisfiable, then A is coherent. Proof Suppose B |= Φ. We first define a function μ : A → B and show that, for all a = s, ι ∈ A, and all Q-molecules c in the signature of Φ, c ∈ s → μ(a) ∈ cB .

(6.9)

The definition of μ proceeds in stages, as follows. 1. Suppose a ∈ A0 . Then a = { p, d}Φ , 0 for some p, d such that ∃( p, d) ∈ Φ. Since B |= Φ, pick b ∈ p B ∩ d B and set μ(a) = b. If c ∈ s then either p ⇒Φ c or d ⇒Φ c. This guarantees (6.9). 2. Suppose a = s, ι ∈ Ai+1 \ Ai , where i ≥ 0, and that μ has been defined on Ai , satisfying (6.9). Then there exists a  = s  , κ ∈ Ai , a unary atom p and a binary atom r , such that one of the following cases holds: (i) ∃( p, r ) ∈ s  , s = w( p, r, s  ) and ι = 1; (ii) ∃( p, r¯ ) ∈ s  , s = w( p, r¯ , s  ) and ι = −1; (iii) ¯ r, s  ) and ι = 1. Consider the case (i). Let b = μ(a  ), so N ( p, r ) ∈ s  , s = w( p,  B that b ∈ ∃( p, r ) . Thus, we may select b ∈ B such that b ∈ p B and b , b ∈ r B . Furthermore, if, for any unary atom q, O(q, r ) ∈ a  , then, again by (6.9) (applied to a  ), b ∈ O(q, r )B , whence b ∈ q B . Likewise, for any ∀(q, r¯ ) ∈ a  , b ∈ q¯ B . We set μ(a) = b, and show that (6.9) holds. From the definition of w( p, r, a  ), if c ∈ s, then either p ⇒Φ c, or q ⇒Φ c for some q such that O(q, r ) ∈ s  , or q¯ ⇒Φ c for some q such that ∀(q, r¯ ) ∈ s  . In each case, therefore, c ∈ bB , as required. Cases (ii) and (iii) are handled similarly. Having defined μ, we claim that A must be coherent. We consider the three conditions in the definition of coherence in turn (i) This follows immediately from (6.9). (ii) Suppose for contradiction that ∀( p, r ), ∀( p  , r¯ ) ∈ s and p, p  ∈ t. By four applications of (6.9), we have μ(a) ∈ ∀( p, r )B ∩ ∀( p  , ¬r )B and μ(b) ∈ p B ∩ ( p  )B , which is impossible.

6 Adding Guarded Constructions to the Syllogistic

157

(iii) Suppose that ∀( p, r ), O( p  , r ) ∈ s and p ∈ t. By two applications of (6.9), we have μ(a) ∈ ∀( p, r )B ∩ O( p  , r )B , whence B |= ∀( p, p  ), and hence, by the assumed completeness of Φ, ∀( p, p  ) ∈ Φ. But then p  ∈ t, from the fact that t is closed under Φ. This completes the proof of the Lemma 6.2.



Lemmas 6.1 and 6.2 show that to test the unsatisfiability of Φ, it suffices to test the coherence of A. We present a non-deterministic algorithm that has a successful run (returns Y) if and only if A is not coherent. The algorithm makes use of two auxiliary procedures, seek(c,d) and seek(c/d)—also non-deterministic—whose arguments c and d are Q-molecules. The former has a successful run if and only if there exists a = s, ι ∈ A such that c ∈ s and d ∈ s; the latter has a successful run if and only if there exists a = s, ι ∈ A such that c ∈ a but d ∈ / a. As one might expect, these algorithms ignore the component ι in the elements s, ι of A. The procedure seek(c,d) works by selecting, non-deterministically, some existential formula ∃( p, e) ∈ Φ (by assumption, there is at least one), and computing the set s = { p, e}Φ . If s contains both c and d, the procedure returns with success. Otherwise, if s contains any existential Q-molecule e, i.e. of any of the forms ∃( p, r ), ∃( p, r¯ ) or N ( p, r ), then one such e is selected, and s is replaced by either w( p, r, s), w( p, r¯ , s) or w( p, ¯ r, s), respectively, according to the form of e. Once the new value of s has been computed, the old one can be discarded, and the space re-used. The process continues, either terminating with success (if s ever contains both c and d), or with failure, if the number of iterations exceeds the number of Q-molecules in the signature of Φ. It is obvious that seek(c,d) requires only polynomial space (as a function of |Φ|), and has the advertised properties. The procedure seek(c/d) is defined similarly. Using these auxiliary procedures, we can now present our non-deterministic procedure for the consistency of A: begin inconsis pick i ∈ {1, 2, 3} if i = 1 pick a Q-molecule c in vocabulary of Φ if seek(c,c) ¯ return Y if i = 2 pick unary atoms p, p  and binary atom r from vocabulary of Φ if seek(∀( p, r ),∀( p  , r¯ )) and seek( p, p  ) return Y if i = 3 pick unary atoms p, p  and binary atom r from vocabulary of Φ if seek(∀( p, r ),O( p  , r )) and seek( p/ p  ) return Y return N end inconsis

158

I. Pratt-Hartmann

The three non-deterministic choices for i correspond to the three conditions in the definition of coherence of A. Indeed, it is immediate that the procedure inconsis has a successful run if and only if A is not coherent. Moreover, since seek(c,d) and seek(c/d) run in polynomial space, so does inconsis. This completes the proof of the theorem.  We now turn to the satisfiability problem for the logic Q + , where the upper complexity bound can be obtained quite easily using resolution theorem-proving. Let Φ be any set of Q + -formulas, regarded as formulas of first-order logic in the normal way. Let these be converted into a set of clauses Γ , again in the normal way. Thus, for example, the formula ∃( p, ∀(q, r¯ )), which corresponds to the first-order formula ∃x( p(x) ∧ ∀y(q(y) → ¬r (x, y))), translates to the pair of clauses p(a)

¬q(y) ∨ ¬r (a, y),

where a is a Skolem constant. Or again, the formula ∀( p, N (q, r )), which corresponds to the first-order formula ∀x( p(x) → ∃y(r (x, y) ∧ ¬q(y))), translates to the pair of clauses, ¬ p(x) ∨ r (x, f (x))

¬ p(x) ∨ ¬q( f (x)),

where f is a Skolem function. We observe that, if Γ arises as just described, then no clause of Γ contains more than one literal featuring a binary predicate. This fact will be crucial in the ensuing argument. Recall that, according to the completeness theorem for resolution theoremproving, Φ is unsatisfiable if and only if there exists a resolution proof of the empty clauses ⊥ from Γ . Let ≺ be any partial ordering on the set of atoms of the signature of Γ . We call ≺ an A-ordering if it is well-founded and preserved under substitutions (of terms for variables): i.e. α ≺ β ⇒ θ (α) ≺ θ (β) where α and β are literals, and θ is any function replacing each variable with some term in the relevant vocabulary. We extend ≺ to literals by simply ignoring the negation sign. The rule of ≺-ordered resolution is the same as the standard resolution rule, but subject to the restriction that the resolved-on literals must be maximal in their respective clauses. Similarly, the rule of ≺-ordered factoring is the same as the standard factoring rule, but subject to the restriction that the unified literals must be maximal in their clause. It is well-known that, as long as ≺ is an A-ordering, the completeness of resolution theorem-proving is preserved even when we limit ourselves to ≺-ordered resolution and ≺-ordered factoring. (For details, see Leitsch [5, p. 218 ff.]) We now apply the technique of ordered resolution to eliminate any binary predicates from Γ . Recall that, crucially, if Γ is the result of converting a set Φ of Q + -formulas into clause form, no clause in Γ contains more than one literal featuring a binary predicate. Let C  and C  be two clauses which resolve to form a clause C. We call C a non-unary resolvent of C  and C  if the eliminated literal of C  (and hence also of C  ) in this resolution is non-unary. If Γ is a set of clauses, the

6 Adding Guarded Constructions to the Syllogistic

159

non-unary derived set of Γ is the set of all clauses C which are non-unary resolvents of some pair of clauses in Γ . Lemma 6.3 Let Γ be a set of clauses each of which has at most one non-unary literal. Let Γ1 be the set of clauses in Γ having only unary literals; and let Γ2 be the set of clauses in Γ having exactly one non-unary literal. Now let Γ2 be the non-unary derived set of Γ2 ; and let Γ  = Γ1 ∪ Γ2 . Then Γ has a model if and only if Γ  has. Proof The only-if-direction is immediate, since Γ entails Γ  . For the if-direction, suppose Γ has no model. Define the partial order ≺∗ on the set of atoms by: A ≺∗ A if A is unary and A is non-unary. Trivially, ≺∗ is well-founded and preserved under substitutions, and thus is an Aordering. By the completeness theorem for A-ordered resolution, there is a derivation D of ⊥ from Γ using ≺∗ -ordered resolution and factoring. (Think of D as a tree of inference steps with leaves in Γ and root ⊥.) Since ≺∗ ranks non-unary literals above unary literals, any resolutions in D which eliminate non-unary literals lie at the leaves of D. Removing these leaves will leave us with a derivation of ⊥ from clauses in Γ  ,  whence Γ  has no model. There are, of course, A-orderings other than ≺∗ . One very useful specimen is the depth-ordering ≺d , which we now proceed to define. If x is a variable and α is either a term or an atomic formula, define d(x, α) to be 0 if either x does not occur in α or x = α, and to be 1 + maxi d(x, βi ) if α = a(β1 , . . . , βm ). Likewise, define d(α) to be 0 if α is a variable or individual constant, and to be 1 + maxi d(βi ) if α = a(β1 , . . . , βm ). Now define the ordering ≺d on atoms by A ≺d A if

d(A) < d(A ), Vars(A) ⊆ Vars(A ) and d(x, A) < d(x, A ) for all x ∈ Vars(A),

where Vars(E) denotes the set of variables occurring in expression E. It is wellknown that ≺d is an A-ordering [5, pp. 218 ff.]. We can now prove the desired theorem. Theorem 6.4 The satisfiability problem for Q + is in exptime. Proof Let a set Φ of Q + -formulas be given. We may transform Φ in polynomial time into a set of clauses Γ in the standard way. Applying Lemma 6.3, we construct, in polynomial time, a clause set Γ  involving only unary literals, such that Γ  has a model if and only if Γ has. Clearly, |Γ  | ≤ |Γ |2 . Since all function-symbols in Γ are Skolem-functions, the depth of every clause in Γ is at most 2 (i.e. no function symbols appear inside any others); moreover, it is simple to check that this is also true of every clause in Γ  . Since the signature of Γ involves no function-symbols of arity greater than 1, any clause C ∈ Γ  containing distinct variables x and y may be written as a disjunction C0 ∨ C1 ∨ C2 , where C0 is ground, C1 has no ground literals and involves only the

160

I. Pratt-Hartmann

variable x, and C2 has no ground literals and involves only the variable y. Thus, for any structure A, A |= C if and only if A |= C0 or A |= C1 or A |= C2 . Now let Δ be the result of replacing any such C ∈ Γ  by one of the corresponding clauses C0 , C1 or C2 . Thus, no clause in Δ involves more than one variable—hence we may without loss of generality to suppose this variable to be x. Since |Γ  | ≤ |Γ |2 , there are at most exponentially many possibilities for Δ, and it is obvious that Γ  has a model if and only if some such set Δ has. That is, we can construct, in time bounded by an exponential function of Γ  , a set of sets of clauses K such that Γ  has a model if and only if some clause set in K has, and such that, for every Δ ∈ K and every C ∈ Δ, d(C) ≤ 2, and one of the following conditions holds: N1 C is ground; N2 for every literal L of C, Vars(L) = Vars(C) = {x} for some variable x. Let us call a clause satisfying either N1 or N2 normal. It is routine to check that, ≺d -ordered resolution and factoring preserves the property of normality as defined above, and does not increase the depth of normal clauses. Hence, repeated application of ≺d -ordered resolution and factoring reaches saturation after at most exponentially many steps. We can then simply check whether the empty clause has been generated in this process. 

6.5 Proof-Theoretic Consequences By a substitution language, we mean any set of formulas characterized as the set of substitution instances of some finite set of formula-schemas. Thus, all of the languages considered above, namely S , P, Q, R, S + , P + , Q + and R + , are substitution languages. If L is a substitution language, we take a syllogism-like proof rule for L to be a sequence ϕ1 , . . . , ϕn , ψ of L -formulas. We call ϕ1 , . . . , ϕn the antecedents and ψ the consequent of the rule. (We allow the possibility n = 0.) For example, the familiar syllogisms, Barbara and Darii are syllogism-like proof rules for S : Every o is a p Every p is a q Every o is a q

Every p is a q Some o is a p Some o is a q.

Given any set of syllogism-like proof rules, proofs may be constructed as trees in the expected way: premises lie at the leaves, the conclusion is the root, and each non-leaf vertex together with its daughters forms a substitution-instance of one of the proof rules. In addition, we may optionally allow proofs to feature the rule of reductio ad absurdum, according to which any derivation of an absurdity ∃( p, p) ¯ ¯ yields a derivation of ψ from Φ. A proof-system from a set of premises Φ ∪ {ψ} is said to be refutation-complete if, from any unsatisfiable set of formulas Φ it is possible to derive an absurdity.

6 Adding Guarded Constructions to the Syllogistic

161

Lemma 6.4 Let L be a substitution language and X a fixed, finite set of syllogismlike proof rules in L . The problem of determining whether there is a derivation of a conclusion θ from premises Θ using the rules X (but without reductio ad absurdum) is in ptime. Hence, if the rules X (without reductio ad absurdum) yield a sound and refutation-complete proof system, then the satisfiability problem for L is in ptime. Proof Let σ = σΘ∪{θ} ∪ r where r is a fresh binary atom. We first observe that, if there is a derivation of θ from Θ using the rules X , then there is such a derivation involving only the atoms occurring in σ . For, given any derivation of θ from Θ, uniformly replace any unary atom that does not occur in Θ ∪ {θ } with one that does. Similarly, uniformly replace any binary atom which does not occur in Θ ∪ {θ } with one which does (or with r in case Θ ∪ {θ } contains no binary atoms). This process obviously leaves us with a derivation of θ from Θ, using the rules X . To prove the lemma, let the total number of symbols occurring in Θ ∪ {θ } be n. Certainly, |σ | ≤ n. Let X comprise k1 proof-rules, each of which contains at most k2 atoms (unary or binary). The number of rule instances involving only atoms in σ is bounded by p(n) = k1 n k2 . Hence, we need never consider derivations with ‘depth’ greater than p(n). Let Θi be the set of formulas involving only the atoms in σ , and derivable from Θ using a derivation of depth i or less (0 ≤ i ≤ p(n)). Evidently, |Θi | ≤ |Θ| + p(n). It is then straightforward to compute the successive Θi in total time bounded by a polynomial function of n.  Lemma 6.4 concerns only proof-systems without reductio ad absurdum. We observe, however, that reductio ad absurdum has no effect when applied to complete sets of formulas. Lemma 6.5 Let L be a substitution language and X a set of syllogism-like proof rules in L . Let Φ be a complete set of L -formulas over some signature σ . If there is a derivation of an absurdity from Φ using the rules X together with reductio ad absurdum, then there is a derivation of an absurdity from Φ using the rules X , but without reductio ad absurdum. Proof Suppose that there is derivation of some absurdity ⊥ from Ψ , using the rules X . Let the number of applications of (RAA) employed in this derivation be k; and assume without loss of generality that ⊥ is chosen so that this number k is minimal. If k > 0, consider the last application of (RAA) in this derivation, which derives a ¯ discharging a premise ψ. Then there is an (indirect) derivation of formula, say, ψ, some absurdity ⊥ from Ψ ∪ {ψ}, employing fewer than k applications of (RAA). By minimality of k, ψ ∈ / Ψ , and so, by the completeness of Ψ , ψ¯ ∈ Ψ . But then we can replace our original derivation of ψ¯ with the trivial derivation, so obtaining a derivation of ⊥ from Ψ with fewer than k applications of (RAA), a contradiction. Therefore, k = 0, or, in other words, there is a derivation of an absurdity from Ψ without using reductio ad absurdum.  Lemma 6.6 Let L be a substitution language and X a finite set of syllogism-like proof rules in L , that, together with reductio ad absurdum, is sound and complete. Then the satisfiability problem for L is in nptime.

162

I. Pratt-Hartmann

Proof Let a finite set of L -formulas Φ be given. Clearly, Φ is satisfiable if and only if it is included in a complete set of satisfiable formulas, so guess such a set Θ (over the signature of Φ). By the supposed soundness and completeness of X , Θ is satisfiable if and only if there is no derivation of an absurdity from Θ using the rules of X (with reductio ad absurdum). By Lemma 6.5, this is equivalent to the condition that there is no derivation on an absurdity without reductio ad absurdum from Θ. And by Lemma 6.4, this can be checked in polynomial time.  Theorem 6.5 If ptime = pspace, there is no finite set of syllogism-like rules that (without reductio ad absurdum) is sound and refutation-complete for P or any of its extensions. If nptime = pspace, there is no finite set of syllogism-like rules that is sound and complete for P or any of its extensions (even with reductio ad absurdum). There is no finite set of syllogism-like rules that (without reductio ad absurdum) is sound and refutation-complete for P + or any of its extensions. If nptime = exptime, there is no finite set of syllogism-like rules that is sound and complete for P or any of its extensions (even with reductio ad absurdum). Proof Suppose, L is a substitution language for which there is a finite set of syllogistic rules whose associated proof system, without reductio ad absurdum, is sound and complete. By Lemma 6.4, the satisfiability problem for L is in ptime. Now, if L ⊇ P, then the satisfiability problem for L is pspace-hard by Theorem 6.1, contradicting the assumption that ptime = pspace. Further, if L ⊇ P + , then the satisfiability problem for L is exptime-hard by Theorem 6.2, contradicting the fact that ptime = exptime. Suppose, on the other hand, that there is a finite set of syllogistic rules that, together with the rule of reductio ad absurdum, is sound and complete for L . By Lemma 6.6, the satisfiability problem for L is in nptime. If L ⊇ P, then the satisfiability problem for L is pspace-hard by Theorem 6.1, contradicting the assumption that nptime = pspace. If L ⊇ P + , then the satisfiability problem for L is exptimehard by Theorem 6.2, contradicting the assumption that nptime = exptime.  It was shown in [6], without recourse to any complexity-class separation assumptions, that there is no finite set of syllogism-like rules without reductio ad absurdum yielding a sound and complete proof system for R. However, there is such a set of rules that is sound and complete in the presence of reductio ad absurdum. Thus: for the language R, reductio ad absurdum is indispensable. It is further shown there, again without recourse to any complexity-class separation assumptions, that there is no finite set of syllogism-like rules that is sound and complete for R + , even in the presence of reductio ad absurdum. It is not known whether the complexity-theoretic assumptions in Theorem 6.5 are similarly dispensable, though this seems plausible. Acknowledgements The author would like to thank Lawrence S. Moss for his comments on an early draft of this paper.

6 Adding Guarded Constructions to the Syllogistic

163

References 1. Andréka, H., Németi, I., & van Benthem, J. (1998). Modal languages and bounded fragments of predicate logic. Journal of Philosophical Logic, 27, 217–274. 2. Corcoran, J. (1972). Completeness of an ancient logic. Journal of Symbolic Logic, 37(4), 696– 702. 3. Grädel, E. (1999). On the restraining power of guards. Journal of Symbolic Logic, 64(4), 1719– 1742. 4. Kozen, D. (2006). Theory of computation. Texts in computer science. London: Springer. 5. Leitsch, A. (1997). The resolution calculus. Texts in theoretical computer science. Berlin: Springer. 6. Pratt-Hartmann, I., & Moss, L. (2009). Logics for the relational syllogistic. Review of Symbolic Logic, 2(4), 647–683. 7. Reichenbach, H. (1952). The syllogism revised. Philosophy of Science, 19(1), 1–16. 8. Smiley, T. (1973). What is a syllogism? Journal of Philosophical Logic, 2, 135–154. Ian Pratt-Hartmann studied Mathematics and Philosophy at Brasenose College, Oxford, and Philosophy at Princeton University, from where he gained his Ph.D. in 1987. He is currently Senior Lecturer in the Department of Computer Science at the University of Manchester, and since 2014 has held a joint appointment as Professor of Computer Science at the University of Opole. Dr. Pratt-Hartmann’s research interests range widely over the field of AI and cognitive science, including computational logic, spatial logic and natural language semantics.

Chapter 7

The Significance of Relativistic Computation for the Philosophy of Mathematics Krzysztof Wójtowicz

Abstract In the paper I discuss the importance of relativistic hypercomputation for the philosophy of mathematics, in particular for our understanding of mathematical knowledge. I also discuss the problem of the explanatory role of mathematics in physics and argue that relativistic computation fits very well into the socalled programming account. Relativistic computation reveals an interesting interplay between the empirical realm and the realm of very abstract mathematical principles that even exceed standard mathematics and suggests, that such principles might play an explanatory role. I also argue that relativistic computation does not have some of the weaknesses of other hypercomputational models, thus it is particularly attractive for the philosophy of mathematics. Keywords Relativistic computation · Hypercomputation · Mathematical explanation · Mathematical realism · Indispensability argument · Set theory · Explanatory proof · Abstract explanations · Program explanations Relativistic computation is one of many contributions of Hajnal Andréka and István Németi to investigations into the relationships between physics, mathematics and logic. The model of a relativistic computer was presented by Németi in [72] and first appeared in print (to my knowledge) in Pitowsky [76] and Hogarth [46].1 Apart from the intrinsic interest of the model, it allows traditional philosophical questions concerning the nature of mathematical knowledge and the mutual inspiration between mathematics and physics to be viewed in a new light. Relativistic computation can be seen as an extension of standard computation—it has some of its features but differs greatly in many respects. A computation performed by a computer is a physical operation based upon our knowledge of physical theories 1 These ideas were also presented by D. Malament. For a brief history, see Andréka et al. [2]. For a general discussion, see for instance also Earman and Norton [29], Etesi and Németi [32], Hogarth [47, 48], Shagrir and Pitowsky [87].

K. Wójtowicz (B) Faculty of Philosophy, Warsaw University, Warsaw, Poland e-mail: [email protected] © Springer Nature Switzerland AG 2021 J. Madarász and G. Székely (eds.), Hajnal Andréka and István Németi on Unity of Science, Outstanding Contributions to Logic 19, https://doi.org/10.1007/978-3-030-64187-0_7

165

166

K. Wójtowicz

(and advanced technology). It can be used to obtain new mathematical knowledge as is manifested by computer-assisted proofs.2 We might also think of an even more “entangled” relationship between mathematical and physical knowledge in relation to hypothetical quantum computers. If efficient quantum computers existed, they could solve at least some mathematical tasks in a more efficient way than standard computers, but we would have no access to the quantum proof whatsoever as any measurement performed during quantum computation would destroy it.3 Both ordinary computers and hypothetical quantum computers operate within the classical model of computation—they remain within the “Turing realm”. Their functioning is fully compatible with the physical Church–Turing thesis, according to which all physical processes in nature have an algorithmic character. However, this thesis is disputed: there is heated discussion concerning the possibility of nonalgorithmic processes in nature, and many non-standard models of computation have been defined.4 However, the importance of hypercomputation has not yet been done full justice as there are only a few papers discussing its possible impact on philosophy and the foundations of mathematics. A notable exception is the papers of Andréka and Németi (and other members of the Hungarian group), for instance Etesi and Németi [32], Németi and Dávid [71], Andréka et al. [2]. Here I examine the model of the Relativistic Turing Machine (henceforth, I will use the acronym ‘RTM’), which I consider to be one of the most interesting and philosophically fruitful hypercomputational models. In the paper I will focus on its importance for the philosophy of mathematics, leaving the physical aspect aside. The structure of the paper is as follows 1. 2. 3. 4. 5.

A short refresher on the RTM model RTM and the status of mathematical knowledge ‘RTM proofs’ and the problem of mathematical explanation The theoretical virtues of the RTM model Concluding remarks.

2 The best-known and most discussed example is the four-color theorem, proved in 1976 [7, 8]. A more contemporary example is Kepler’s conjecture concerning optimal packing of spheres in space, which was proved by Hales [39, 40]. There is an intense discussion concerning the status of computer proofs, in particular concerning the status of “empirical input”, which is present as we have to rely on strong and sophisticated physical theories (like quantum mechanics) in order to believe that our computer performs what it is supposed to so that it is reliable. This shows that the problem of the “empirical ingredient” in mathematics cannot be simply dismissed. 3 In some cases, quantum computing provides a speed-up effect, the most spectacular example being Shor’s quantum factoring algorithm [88, 89]. So, if quantum computers existed, they could perhaps speedup at least some of the proofs. The importance of quantum computation for the philosophy of mathematics is discussed in Wójtowicz [106, 108]. For an introduction to quantum computing, see for example Nielsen and Chuang [70]; a popular survey on quantum algorithms is Montanaro [68]. An important paper also discussing the philosophical issues is Deutsch et al. [28]. 4 Cf. e.g. the special issues of Minds and Machines (12, 2002 13, 2003), Applied Mathematics and Computation (178, 2006), Theoretical Computer Science (317, 2004), Parallel Processing Letters (3, 2012). Syropoulos [97] is a survey of the hypercomputational models; a recent survey of analogue models of computation is given in Bournez and Pouly [14]; a general survey of physical models is Piccinini [75]; for a general discussion on the physical Church–Turing thesis see e.g. Cotogno [21] or Piccinini [74].

7 The Significance of Relativistic Computation …

167

7.1 A Short Refresher on the RTM Model An RTM is defined as a physical system (containing in particular a standard Turing machine) that operates in a region of spacetime (a Malament–Hogarth spacetime). It is defined by the following conditions5 : (1) (2) (3) (4) (5)

A curve γ exists such that the journey along γ takes an infinite time; γ lies in the past of a certain point p (the Malament–Hogarth event); γ starts at a certain point q; It is possible to travel from q to p along another curve β within a finite time; Signals can be sent from the curve γ to the point p.

The intuitive picture is as follows: imagine two travelers, T1 and T2, who start at point q and travel towards p along two different routes. The clocks on their spaceships are measuring time. The journey of the first traveler T1 along γ takes an infinite time (according to T1’s clocks). Take T1 to be a Turing machine/computer that performs a computation during the journey and can send signals. The second traveler T2 gets from q to p within a finite time along the curve β. Importantly, the second traveler T2 can receive signals from the first traveler T1, i.e. the computer). A concrete physical implementation is presented in Németi and Dávid [71]: consider two computers H and L (H—high; L—low; i.e. H is T1 and L is T2). L is travelling towards a slowly rotating Kerr black hole; H travels along a different route, performs the computation and sends a signal when it terminates. We can think of computer H as travelling along the “infinite” curve γ and sending signals to point p, whereas computer L travels towards the black hole along the “finite” curve β. Both H and L have their clocks ticking, and as L approaches the inner event horizon of the black hole the ratio between the speed of their clocks increases. This means that the signal sent by H after 10100 years (of H’s time!) will be received by L after (say) 4 minutes and 50 s; the signal sent by H after 101000 years will be received by L after 4 minutes and 55 s, etc. This means that if H ever sends a signal, L will receive this signal within a finite, fixed time for instance 5 of L’s minutes, which is the moment when L crosses the event horizon. The crucial fact is that if no signal is received by L within this fixed time, it will mean that H has never sent a signal; however, as H was supposed to send the signal after the computation terminates, the absence of the signal means that H’s computation did not terminate. Finally, L will be able to know within 5 minutes that a certain computation will loop forever. This scenario might be considered sheer speculation—after all, we usually do not decide to enter a black hole and spend the rest of our lives there… The problem of the physical feasibility of such processes is examined for instance in Németi and Dávid [71], Andréka et al. [2]—it will not be discussed here.6 From the point of view of 5A

truly clear description is given e.g. in Andréka et al. [2], Németi and Dávid [71], Etesi and Németi [32]. I follow these papers, therefore the presentation of the RTM model given here can be viewed as an extended (and modified) quotation. 6 A computer performing 1010000 computational steps needs energy, it might break down, the black hole might evaporate etc. Another difficulty is the communication problem addressed in Németi and

168

K. Wójtowicz

the technology available today, all these considerations have the status of thought experiments, but in paragraph 4 I will argue that they are much more reasonable and realistic than many thought experiments discussed in philosophical literature.

7.2 RTM and Mathematical Knowledge The seminal example of exploiting relativistic computation (i.e. an RTM) to improve our mathematical knowledge was given in Etesi and Németi [32] (and many other papers): it performs a direct, physical check of the consistency of ZFC (i.e. ZermeloFraenkel set theory with the Axiom of Choice). The computer travelling along the “infinite” curve generates all formal proofs within ZFC, and it sends a signal if and only if one of them appears to be the proof of “0 = 1”. We are approaching the black hole, so we have to wait only 5 min to learn the answer due to the special properties of spacetime in the neighborhood of a black hole. If the signal arrives, we learn that ZFC is inconsistent; otherwise, we are assured that ZFC is consistent. So, the RTM is able to provide an answer to one of the most fundamental questions in philosophy and the foundations of mathematics. This is possible because the RTM provides an increase in computational power that outperforms the Turing machine. Queries of the form “n ∈ A?” can be answered not only for recursive sets, but also for more complicated ones, as was shown in Etesi and Németi [32]. An elaboration and further results are given in Welch [103]: the class of “relativistic decidable sets” is proved to be 2 .7 Consider using the RTM to settle the truth of an open problem, like the twin prime conjecture (TPC). The computer checks all possible proofs, just as in the case of checking the consistency of ZFC, and after finding the proof (either for TPC or its negation) it sends a signal. It is not very probable that TPC turns out to be independent of ZFC, but in this case we can take another open problem as example. For the sake of discussion I assume that TPC is provable within ZFC and we got the message ‘YES’ from our RTM.8 A natural question is whether the twin prime conjecture has now turned into genuine mathematical knowledge. The situation seems strange: if we trust Dávid [71]. Wüthrich [109] discusses it in the context of quantum mechanics, considering viable solutions exploiting quantum entanglement. As getting into a black hole is not particularly attractive, in Andréka et al. [6] a scenario is discussed that is more optimistic for the observer and is based on the wormhole hypothesis. It is also interesting to observe that the possibility of performing a hypercomputational task (like checking the consistency of ZFC) within Special Relativity is equivalent to the existence of superluminal signals [73]. 7 In general, this is a subset of  : equality holds under the assumption that there is no fixed 2 upper bound on the number of signals sent on a finite path. Theoretical results of a different type (concerning ordinal computation times) are given in Hamkins and Lewis [44] (Hamkins [42] is a more popular presentation). It is known Stannett [95] that exactly the countable ordinal times (i.e. times < ω1 ) can be imbedded into real numbers. This opens space for investigations concerning the relationships between general notions of analogue computability and ordinal computability. 8 We might also think of two RTMs, one of which tries to prove TPC, while the other tries to prove its negation. Then we will either obtain (i) no signal (TPC is independent of ZFC), (ii) one signal

7 The Significance of Relativistic Computation …

169

General Relativity (and our technology), we also should trust our RTM and consider the twin prime conjecture to be no less plausible than any theorem proved with the use of a standard computer (like the four-color theorem or Kepler’s conjecture) in which advanced technology, based on quantum mechanics is also used and in which empirical factors (in particular, trust in our physical theories) also play a role. So, if we believe the four-color theorem, why not just accept the “RTM-verified TPC” as a legitimate part of our system of beliefs? Perhaps it should become a new axiom? This possibility is discussed later.

7.3 “RTM Proofs” and the Problem of Mathematical Explanation The primary role of a mathematical proof is to demonstrate that a proposition is indeed true, which is all that matters from the orthodox point of view. However, this is a vast simplification: a proof has more purposes than this, in particular an explanatory one. Mathematicians seek understanding: notions such as “grasping the idea”, “revealing the deep reasons” or even “explaining why the proof really works” are used by mathematicians when describing their activities, even if they are notoriously difficult to characterize.9 Semantic elements are ubiquitous in mathematics: mathematicians are interested in the “interplay of ideas” rather than in formal operations. In fact, no mathematician presented with a proof fully formalized within ZFC would ever be able to understand it.10 In the case of “RTM-demonstrated theorems”, we would have no hint as to why they are true and how their proofs work: all we would learn is that the proof exists, so (we will learn whether it is TPC or its negation which is provable in ZFC), or (iii) two signals, which means that ZFC is inconsistent. So, this scenario involves an implicit consistency check. 9 A clear expression of the thesis (in the context of computer-assisted proofs) is given by Rota: “Verification is proof, but verification may not give the reason. What, then, are we to mean by a reason” [84, 187]. An eminent mathematician, Mordell, claims that “Even when a proof has been mastered, there may be a feeling of dissatisfaction with it, though it may be strictly logical and convincing: such as, for example, the proof of a proposition in Euclid. The reader may feel that something is missing. The argument may have been presented in such a way as to throw no light on the why and wherefore of the procedure or on the origin of the proof or why it succeeds” [69, 11] (quotation from Mancosu [66, 142]). For an introduction to the problem of explanation within mathematics, see Mancosu [65–67]. 10 This claim might be considered controversial (I am indebted to the anonymous referee for the comments which inspired this footnote). For sufficiently short proofs understanding would not pose a problem—but I think that our understanding of even a short, one-page long formal proof in ZFC would consist in appealing to semantics. We would not treat the string “∀x,y∃z∀t (t∈z⇔(t∈x ∨ t∈y))” as meaningless, but rather would recognize it as the Union Axiom. Similarly we would try to identify the meaning of the formulas in the proof—and we would grasp the essence of the formal manipulations because we would interpret them in some way. Even understanding of hypothetical 1000-pages long formal proofs would consist in “translating” them into the “everyday mathematical concepts” (i.e. for instance we would think of lines 3235-5298 of the proof as of the application of the Central Limit Theorem—and this would constitute our understanding). However the relationship

170

K. Wójtowicz

the RTM would serve as a kind of “Yes/No” oracle.11 No semantic aspects whatsoever would be revealed; we would not be able to grasp the idea of the proof, we would have no insight into its conceptual structure, and we would not be able to survey even a tiny fraction of it.12 The only reason to believe the truth of the theorem would be the outcome of a physical experiment involving an RTM, but this would result in a feeling of insufficiency, or perhaps even frustration. However, it is clear that our epistemic situation has changed: after the RTM experiment, we are inclined to have a much stronger belief in the truth of TPC than before. The situation of a hypothetical proposition α which is independent of ZFC but whose truth has been directly checked by the RTM is even more intricate than the case of the twin prime conjecture. Let us focus on number theory: there is a 9-variable polynomial p(x 1 , …, x 9 ) such that the statement “there are integers n1 , … n9 , such that p(n1 , … n9 ) = 0” is independent of ZFC, see Jones [50]. Obviously, we can find out whether it has an integer solution just by checking all 9-tuples (there are countably many of them); however, then exactly one of the two sentences “p(x 1 , …, x 9 ) has an integer solution” and “p(x 1 , …, x 9 ) does not have an integer solution” turns out to be true. This means that we have found a relatively simple number-theoretic sentence α with the following properties: 1. 2.

Is independent of ZFC. The truth of α (within N) can be established by a direct “α RTM check”.

Now, after the RTM experiment, are we in possession of genuine new mathematical knowledge? ZFC is widely considered to be the standard formal idealization of mathematics, and I will follow this tradition. This means that no additional assumptions exceeding ZFC—like large cardinal axioms or forcing axioms—are made in ordinary mathematical contexts. In particular, our sentence α, which is independent of ZFC, does not follow from commonly accepted mathematical principles. So why should we accept it as a mathematical fact? Perhaps it is more appropriate to consider α to be an abstract formulation of an empirical claim about the behavior of some exotic physical systems in a mathematical disguise? I will discuss this issue

between the expertise in manipulating symbols and understanding is intricate as the never-ending discussions concerning Chinese room and its numerous variants show. 11 An oracle, “Pythiagora”, is described in Rav [82]; the author claims that it would have a destructive impact on mathematics as proof lies at the heart of mathematics, not theorems. A comparable situation is also considered in Tymoczko [101] (the first philosophical paper on the proof of the four-color theorem): Simon is a mathematical genius who solves the most difficult open problems in mathematics. But after some time, Simon refuses to present any proofs as he considers them to be too complex for us to understand. Should we trust him, and replace the standard proof (being a warrant for the truth of mathematical claims) by the phrase “Simon says”? The difference between Pythiagora, Simon, and the RTM is substantial: we know how the RTM works, so perhaps the oracle metaphor is misguided? However, from the purely mathematical point of view, the difference seems to be not so big: we only get the output, without any information concerning the proof. 12 The relationships between real and formal proofs in mathematics has been receiving much attention in recent years, see for instance the recent Hamami [41] and the extensive bibliography within. For an illuminating discussion concerning the role of proofs see Rav [82], Dawson [27].

7 The Significance of Relativistic Computation …

171

later, but one thing seems certain: given the outcome of the experiment, it would be very strange to accept and believe ¬α, even if ZFC + ¬α is formally consistent. The problem of accepting new principles is not new in mathematics and is particularly important in foundational theories (like set theory). The axiom of choice is an obvious example and today it is commonly accepted, but let us consider the example of the continuum hypothesis (CH). Its metamathematical status is clear: according to Gödel’s and Cohen’s results, CH is independent of ZFC, but there is a discussion going on about whether CH is a well-settled mathematical problem and whether a mathematical solution of the continuum problem is possible.13 Of course, such a solution would consist in finding plausible axioms in order to “fix” the value of the continuum. Attempts to find such axioms have been made also by Gödel, albeit without much success [37, 38]; see also Ellentuck [31], Solovay [94].14 Woodin’s program [104, 105] is a modern example of such an enterprise that has formidable technical “machinery” on the one hand and methodological (and philosophical) argumentation concerning the plausibility of the axiom candidates on the other. In many cases there are some intuitions concerning the truth (reliability) of a hypothetical new axiom, and if strong conceptual reasons were prevailing, it could be accepted.15 However, the outcome of the RTM experiment might be a surprise as we might have had no previous expectations or intuitions. Having no clear (meta)mathematical insights into the relevant “conceptual space”, we would be forced to accept a new mathematical claim due to external reasons. This would be strange, but it would be even more strange just to dismiss the outcome of the RTM experiment. If we were forced to decide between α and ¬α, we would incorporate α into our system of beliefs rather than make the decision by tossing a coin. However, would we consider it to be genuine mathematical knowledge? Using the classic terminology (which, after Gettier, sounds old-fashioned, but the Gettier paradoxes do not seem to be easily applicable to mathematics), knowledge concerning natural numbers is a justified and true belief concerning natural numbers. Standard theorems fulfill the three following conditions. Take Fermat’s theorem: (i) it has been proved (so there is a justification); (ii) if the notion of truth makes sense in mathematics, then obviously theorems proved in a rigorous way are true; (iii) we believe that there are no positive natural numbers x, y, z, n, with n > 2, such that xn + yn = zn . But the case of α seems to be quite different as there is no standard mathematical justification in sight. The reasons for its acceptance are to some extent empirical. 13 See

for instance Hauser [45] for a philosophical discussion and arguments in favor of the thesis that CH can be considered a legitimate, meaningful mathematical question. 14 Large cardinals are not a remedy: Levy and Solovay showed that assuming the existence of measurable, compact, Ramsey cardinals is consistent with both CH and ¬CH [59]. Moreover‚ Easton [30] proves that the value of the continuum can be almost arbitrary as the κ → 2κ function can behave in hugely different ways. 15 Such a discussion is important in modern set theory; cf. the discussion Feferman [33], Friedman [34], Maddy [62], Steel [96] (or more recent Maddy [63]). The recent discussion on the multiverse and hyperuniverse approach in set theory or the inner model program has a similar flavor: the aim is to find some axioms that give a true description of the set-theoretic world. See Hamkins [43], Arrigoni and Friedman [9].

172

K. Wójtowicz

We might claim that this is nothing new, as empirical factors are already present in ordinary computer-assisted proofs. The reluctance of mathematicians to accept them is diminishing, so perhaps we should argue along similar lines for the acceptance of the new knowledge? However, the situation of α is quite different, and the argument concerning standard computer proofs does not apply! In the latter case it is clear what the computer’s role is: it performs a tedious (but well defined) fragment of the computations and the mathematical content is well understood. So the computer gives empirical support to the claim that a standard proof exists; from the purely logical point of view nothing strange happens with the proof, but an RTM check leads to a vastly different situation: we would discover that no proof exists but that α is nevertheless true. What is discovered then? Two answers seem natural: (1) α is a new truth about numbers; (2) α is an empirical claim concerning the results of an empirical procedure. The question of whether the RTM delivers mathematical or rather physical knowledge is subtle. Both (1) and (2) are unsatisfactory. As for (1), the sentence α is not a standard mathematical theorem as it cannot be discovered using standard mathematics. It also is not a discovery of a new reliable principle based on fundamental analysis of mathematical concepts as it would be in the case of set theory. But, turning to (2), it is also unnatural to consider it to be a purely empirical claim,16 therefore I propose considering it as a new axiom candidate with robust evidence from science. Elsewhere I have argued that it might fit well into a Quinean account, where empirical, mathematical and even logical knowledge form a coherent web of beliefs [107]. The reasons for accepting a mathematical principle/axiom/theory are clearly not only conceptual but are deeply intermingled with the analysis of the role of mathematics in science. Quine’s famous description of our knowledge as a seamless “web of beliefs” seems to fit well here: all the claims hang together and they have innumerable links to other beliefs, in particular to empirical data which impinge on the “outer borders” of the web. New data and discoveries cause the web to change, until as a whole it achieves a kind of equilibrium: “our statements about the external world face the tribunal of sense experience not individually but only as a corporate body” [81]. This applies also to mathematics and even logic. So both seemingly contradictory views concerning the status of RTM knowledge (i.e., genuine mathematical knowledge versus an abstract formulation of an empirical claim) can be reconciled. Including α in our scientific theory would make a better fit with the data than ¬α, thus turning α into a reliable mathematical principle that at the same time has clear physical motivation.17 Quine’s quasi-empiricist account has been very influential, and much of the modern realism–antirealism debate is centered around his indispensability argument 16 In Szabó [98] a radical view is defended according to which formal systems can be treated as (embodied in) physical objects. This goes with a formalistic and physicalistic philosophy of mathematics, therefore my claim concerning the unnaturalness of (2) could not be accepted from his point of view. 17 Results concerning the necessary properties of numbers needed in physical theories fit this picture exactly. Such results are presented in Andréka et al. [5]; a detailed study of the properties needed to model accelerating observers is given in Székely [100].

7 The Significance of Relativistic Computation …

173

for mathematical realism. There has been extensive discussion in recent years on the Enhanced Indispensability Argument, which is a kind of a modified “inference to the best explanation” version of Quine’s original argument.18 The problem of mathematical explanations in science arises here in a very natural way, as mathematical concepts are present in scientific theorizing, and in scientific explanations in particular. So the decision as to whether mathematics should be interpreted along realistic or instrumentalist lines has an impact on this issue. Mathematical explanations are obviously non-causal: explanations of phenomena are given in terms of abstract mathematical relationships, not in terms of particular stories.19 The explanatory role of mathematics is a subject of debate, i.e. whether there are genuinely mathematical explanations, or rather explanations that use mathematics but are not inherently mathematical (see Lange [57] for a discussion). There are several accounts of mathematical explanations in science, and a general discussion would take us too far afield.20 Here I focus on an idea which I like very much and consider to be particularly interesting in this context: the programming account. The general idea of program explanations has been proposed in Jackson and Pettit [49]; they distinguish between a process explanation which gives a detailed account of the actual causes that led to a particular event and a program explanation, which appeals to some abstract property of a system, namely one that is not causally efficacious but ensures the instantiation of a causally efficacious property or entity that is an actual cause of the explanandum. One of the discussed examples is the cracking of a fragile glass filled with hot water. We can explain it either by describing how particular particles bumped into the glass and damaged it, or by giving general laws which predict and explain this event but do not provide all the details (for instance, where exactly the glass breaks and when). This general idea was picked up by Lyon [61] in the context of mathematical explanations. According to the programming account, mathematical facts are considered to be “programming properties”, i.e. constraints of a modal character on the world. For instance, if a computer loops forever when trying to solve the halting problem (or another undecidable problem), we explain it by invoking mathematical results, not by any laws of physics or engineering. This picture is particularly convincing in the case of “no-go theorems”, which explain why a certain physical phenomenon cannot occur.

18 The literature is already so vast that it would not make much sense to cite it here; I only mention [10, 11], which are important early contributions. 19 There are many examples discussed in the literature; one of the simplest is Euler’s theorem, which explains why it is not possible to cross the famous system of bridges in Königsberg (crossing every bridge exactly once). No physical details of this system are relevant for the explanation. There are of course much more advanced examples, to mention a few: (i) The Borsuk-Ulam theorem explains why there must be two antipodal points on the surface of the Earth where two physical parameters (say: temperature and pressure) are equal (cf. Baker [10, 11], Baker and Colyvan [12]); (ii) Lyon and Colyvan [60] discuss the explanatory role of phase spaces in physics, in which theorems on these spaces explain some phenomena; (iii) Baron [13] discusses a theorem on stochastic processes which can be considered to provide an explanation of the behavior of predators. 20 A recent survey of non-causal (in particular: mathematical) explanations is for instance [83].

174

K. Wójtowicz

Consider an RTM checking Peano Arithmetic (PA) for consistency, i.e. going through all possible proofs within PA and checking whether any of them is the proof of “0 = 1”. Importantly, this is a different kind of checking the truth of a sentence than the RTM check of α that is mentioned in the context of the 9-variable polynomial. In that case we performed a direct check of whether the equation p(n1 , … n9 ) = 0 is true for any of the 9-tuples n1 ,…,n9 by directly checking the truth of the sentence α. However, we do not perform the check of Con(PA) in this direct sense; instead, we try out all proofs within PA to find inconsistency, therefore this kind of consistency checking should perhaps be called an “indirect Con(PA) RTM check” in order to stress the difference.21 We assume the optimistic (and realistic) scenario, i.e. that the process of looking for proof of “0 = 1” within PA has not terminated. But why is this so? The most natural answer is simply “Because PA is consistent”. The physical details of the computational process (the order of moves, the speed of the device, the geographic location etc.) are not essential. Because Con(PA) is a theorem of ZFC, we would rather say that the explanation of the result of the “indirect Con(PA) RTM check” is provided by ZFC, which imposes mathematical modal constraints on the world.22 Similarly, we could perform an “indirect Con(ZFC) RTM check”.23 It would be attractive to identify similar modalities explaining the “indirect Con(ZFC) RTM check” by citing a theorem which shows that this outcome is the expected or even necessary one. As before, a natural explanation of the fact that the RTM did not find a proof of the contradiction in ZFC is “Because ZFC is consistent”.24 Con(ZFC) is not provable in ZFC, so ZFC cannot provide such an explanation. However, there are stronger theories that prove the required result: for instance, for a suitable large cardinal axiom A, ZFC + A proves Con(ZFC). So, just as we use standard mathematics to explain the behavior of standard physical systems or the result of the “indirect Con(PA) RTM check”, we use a strengthening of ZFC to explain the “indirect Con(ZFC) RTM check”, but why should we believe ZFC + A? Some of the possible arguments might be conceptual, and some might have their source in scientific practice. If ZFC + A is a reliable theory (which is admittedly a very vague term but I think that it allows to give at least a preliminary classification25 ), we can say 21 The anonymous referee pointed out that this is really a case-by-case check and the result of this procedure is that it is not the case that we can prove 0 = 1 in PA. In order to justify Con(PA) we need one more step, i.e. the double-negation elimination. It is therefore non-constructive—and indirect. 22 This is true under the assumption of consistency of ZFC (if ZFC is inconsistent, we can prove everything, in particular Con(PA)). If we already know that ZFC is consistent (for instance by proving it in a stronger theory), we can use this fact as an explanation for the fact, that our procedure ends in the expected way. 23 In fact, our two RTMs from footnote 8 that check the truth/falsehood of the twin prime conjecture are performing (as a nice “byproduct” of the primary task) an “indirect Con(ZFC) RTM check”. 24 In fact, this is also the best explanation of the fact that no (human) mathematician has ever found an inconsistency. You cannot find something which does not exist, and the detailed story of your search does not contribute to the explanation. 25 See Feferman [33], Friedman [34], Maddy [62], Maddy [63], Steel [96] on the problem of justifying new axioms for set theory. I use the term “reliable” is the sense, that ZFC + A is reliable if the (new) axiom A is justified. Unfortunately, this problem exceeds vastly the scope of this study.

7 The Significance of Relativistic Computation …

175

that the explanation for the result of the hypercomputational process would be given within ZFC + A. This is an interesting situation: we need to transcend standard mathematics in order to give the explanation of an empirical fact. Also, the situation replicates: if ZFC + A explains the result of the “indirect Con(ZFC) RTM check”, then we can perform an (indirect) “indirect Con(ZFC + A) RTM check”. If it turns out that ZFC + A is consistent, we can use a stronger theory, say ZFC + A1 , to explain this fact. A natural progression of increasingly stronger theories (ZFC + A, ZFC + A1 , ZFC + A2 , etc.) arises, each one providing the required explanation of the outcome of the RTM experiment concerning “Con(The_Weaker_Theories)”.26 In this context, we might think of large cardinal axioms which provide a natural “calibration” of consistency strength, or of reflection principles.27 So, on one hand there are very abstract and purely conceptual considerations concerning possible strengthening(s) of ZFC; on the other hand, there are possible RTM experiments that provide important data that needs to be explained. A mathematical realist will interpret these results as evidence of a correspondence between physical and mathematical reality: “advanced metaphysics” seems to hide behind the explanation of the (hypothetical) physical process. The instrumentalist will argue in a more restrictive way that using the ZFC + A as a tool better suits the purposes of physics: for instance, it is better at predicting the outcome of an “indirect Con(T) RTM check” (T being ZFC or any other theory in question). However, regardless of whether we view them as arguments for the mathematical character of the Universe or for the empirical character of mathematics, it is interesting to observe that a theory that transcends standard mathematics can appear naturally in physics. The “fine tuning” between very abstract mathematical considerations and the possibility of explaining the behavior of physical systems is remarkable. Galileo famously claimed that the book of Nature is written by God in the language of mathematics; now we might speculate that this language is really the language of abstract set theory. The focus of the article is relativistic computation and the problem of mathematical explanation is discussed in this particular context. However, I will allow myself to briefly mention the importance of results concerning the logical foundations of physical theories such as special or general relativity for the discussion concerning the explanatory role of mathematics. Here I think of the results presented in Andréka et al. [1] (and earlier/later papers with similar content). In this case, mathematics (in particular, logic) makes it possible to reveal the structure of physical theories, for instance the role which is played by particular assumptions (axioms). An important example in foundational studies is reverse mathematics, in which the axioms needed 26 Trivially, ZFC + Con(ZFC) proves Con(ZFC) (and T + Con(T) proves Con(T)…), but this choice

of theory explaining the outcome is ad hoc. 27 It is natural to think of such progressions in foundational studies. Gödel considered the reasonableness of formulating increasingly stronger axioms of infinity (which is often called “Gödel’s program”) and the research on large cardinals is one of the central parts of contemporary set theory. We can also think of reflection principles that express beliefs about the similarity between the universe and its parts. All these new axioms exceed ZFC.

176

K. Wójtowicz

to prove particular theorems are identified.28 “Reverse relativity”—an approach that is similar in spirit—is investigated in Andréka et al. [1].29 In this way, mathematics (and in particular logic) can provide important insights into the conceptual structure of the scientific theory in question. We might view these results as the identification of fundamental, logical constraints within special and general relativity, so the idea of mathematical modalities might be applied here as well. Friend [35], Friend and Molinini [36] observe that these results are particularly important as they provide a mathematical explanation for a whole physical theory not just isolated phenomena— and I fully agree. This topic cannot (regretfully…) be discussed here, but it deserves thorough study.30

7.4 The Theoretical Virtues of the RTM Model The analyses concerning hypercomputational models can be viewed as thought experiments (which does not exclude the possibility that they will one day turn into genuine experiments). Thought experiments are a standard way of discussing philosophical problems and presenting arguments, but in some cases the examples are highly artificial and their importance for the theoretical debate is limited. It is important to observe that the RTM is not a purely theoretical fiction in this sense and is superior to many other hypercomputational models in this respect.31

28 In Reverse Mathematics, mathematical notions are encoded in the language of second-order arithmetic Z2 , and the main focus of study is identifying the strength of axioms (often: set existence axioms) necessary to prove ordinary mathematical theorems. Simpson [92] is the main reference. 29 “Further, we intend to analyze the logical structure of the theory: which assumptions are responsible for which predictions; what happens if we weaken/fine-tune the assumptions, what we could have done differently. We seek insights, a deeper understanding. We could call this approach ‘reverse relativity’ in analogy with ‘reverse mathematics’” Andreka et al. [1, 608]. An elaboration of these ideas in the context of why-questions within axiomatic relativity is given in Székely [99]. An important and interesting feature of this approach is that theories – not sentences – are considered to be answers to particular why-questions. So, loosely speaking, the content necessary to present an explanation is presented as a theory (which seems to suit scientific needs better than just presenting a single principle/axiom). 30 A topic I find particularly interesting is the choice of first-order logic as the framework (motivations were given for instance in Andréka et al. [4]) and the impact of this choice on the overall philosophical picture. 31 Performing an RTM check is not performing a supertask (i.e. a computation with, informally speaking, unbounded acceleration). The computation proceeds in an ordinary way, the hypercomputational effect appears due to the relationship between the observer’s and the computer’s time. We might view this effect due to an imbedding of the computer’s time into the observer’s time without preserving cofinality, which allows the outcome of the computational process within a bounded time to be ‘observed’ (cf. Stannett [95, 14]). This is not an inherent feature of the computing subject (computer/computer), in particular we do not have to resolve puzzles like “What happens just after the infinite accelerating computation terminates?”, as in Thompson-lamp-like examples. (For supertasks see Manchak and Roberts [64] or the former entry Laraudagoitia [58].

7 The Significance of Relativistic Computation …

177

In some cases we get out of the hypercomputational system exactly what we put in: a non-algorithmic result is available only when we introduce (in some way) non-algorithmic initial data.32 This criticism should be taken seriously, but there are also counter-arguments. The results presented in Pour-El and Richards [77, 78, 79], Pour-El and Zhong [80] are potentially immensely powerful ones. One of them is the existence of a differential equation of the form y’(x) = F(x, y), which is defined on a rectangle R such that the function F(x, y) is computable but the equation has no computable solution. Also, the existence of a three-dimensional wave equation is proved which has computable coefficients and computable initial data, but the solution is not computable [80]. Kreisel [54, 901] remarks that such results can be understood as a description of an analogue system (device), defined (specified) in an algorithmic way, but not simulable (even in principle) by a Turing machine. Such considerations are not new: in Kreisel [53] the problem of the possible existence of non-computable physical constants is discussed. In Kreisel [55, 44] the author claims that classical physical systems are algorithmic, but this need not be the case for quantum mechanical systems. A similar problem was already investigated in Scarpellini [86]: the author considers the possibility of building  an analog device which produces a certain function f(x) such that the claim “ f (x)cos(nx)dx > 0” is not decidable, but the device could ‘decide’ it by a direct measurement, i.e. by a physical operation. (See also Scarpellini [85], which contains some speculations concerning possible undecidability in classical physics.)33 These discussions are a fragment of the general debate on the physical Church– Turing thesis, i.e. whether Nature processes information in a non-algorithmic way and could provide us with some help in solving undecidable problems. Results in the spirit of Pour-El and Richards suggest a positive answer: there might be a computably specifiable system that produces non-algorithmic output. The results concerning the wave equation might perhaps be interpreted even as arguments for a stronger thesis, i.e. that there are non-algorithmic processes in classical mechanics contrary to Kreisel’s aforementioned claim. However, the results are highly theoretical and the construction of a “wave computer” (being a physical implementation of these results) is not in sight. This problem is discussed in Weihrauch and Zhong [102], in which the propagation of waves on different function spaces are analyzed. The authors claim that an implementation of a “wave computer” is probably not possible within any reasonable computational model. Finally, the results of Pour-El/Richards/Weihrauch/Zhong do

32 Such doubts concern, for example, the ARNN model [90, 91], which—in cases in which the coefficients are computable—is equivalent to the standard Turing model. Davis [26] claims therefore that the model does not really go beyond the Turing limit in any reasonable sense. See also Davis [25] for a forceful critique of the very idea of hypercomputation. See also Copeland [16–19] for a general discussion of the notion of computation. 33 In a series of papers, Da Costa and Doria [20, 23, 24] provide interesting examples of sentences that are undecidable within ZFC and that seem to have a natural physical interpretation. (In Chaitin et al. [15] the results are put together and given a popular presentation.) Independence results in the language of relativity theory (involving models based on Archimedean fields) are presented in Andréka et al. [3].

178

K. Wójtowicz

not provide compelling reasons for accepting the thesis that some form of usable hypercomputational systems of this kind can be found.34 One of the obstacles to accepting the “Nature provides usable hypercomputation” thesis is the issue of idealizations. Even if there are non-algorithmic processes in nature (maybe even in a jar containing yeast in our kitchen…), the problem of extracting this non-algorithmic information arises. Moreover, if the nonalgorithmicity depends on preparing some input data with absolute precision, how can we ever be sure that this can be done? The degree of idealization is enormous. In contrast, we do not encounter problems of this kind in the case of the RTM. An important feature of an RTM is that it operates on discrete structures. No idealizations like “let us take a non-recursive distance” or “let us take a process which produces non-computable outputs and let us take one of these outputs” are necessary. The degree of idealization (concerning preparing the initial data) is not greater than in the case of a standard computer: we only have to represent 0 and 1 by two distinguishable objects (here I abstract from time and space limitations, which are a different issue). The RTM model is not only consistent with physics, but also operates on ‘downto-earth’ discrete structures, which makes it very appealing for the philosopher. Of course, we have to trust General Relativity, in particular the theory of black holes, but we also have to trust quantum mechanics even in our GPS system—since they use atomic clocks. So, even if we consider the whole “RTM story” to be only a thought experiment, I think that this gives much more insight than a thought experiment in which we live infinitely long lives, perform infinitely precise measurements, have a direct grasp of ℵ2018 , or discuss mathematical matters with extra-terrestrial beings.

7.5 Concluding Remarks The RTM model is interesting per se as a possible physical implementation of an abstract model of computation, but it also has remarkably interesting implications for the philosophy of mathematics, in particular for discussions concerning the relationships between mathematical and physical knowledge. Metaphorically speaking, the RTM allows us to touch a bit of infinity. Cantor’s dictum that no exact science can be given foundations without a modicum of metaphysics seems very appropriate in this context.35 Even if we are not willing to draw metaphysical conclusions, relativistic computation shows in an interesting way how seemingly very abstract mathematical principles might enter the scene as a kind of mathematical modality that “programs” the universe. 34 It

is also not clear whether there might be hypercomputation on the quantum level. Kieu [51, 52] defined such a quantum system: a retort is given in Smith [93]. The status of quantum hypercomputation is not settled. In Cubitt et al. [22] a quantum system is constructed for which the spectral gap problem is undecidable; so, if it were possible to verify it by empirical methods, it would yield an “empirical solution” to an undecidable problem. 35 “Ohne ein Quentchen Metaphysik läßt sich, meiner Überzeugung nach, keine exacte Wissenschaft begründen.” (from Cantor’s Nachlass).

7 The Significance of Relativistic Computation …

179

Acknowledgements I would like to express my gratitude to the Editors (especially to Gergely Székely) for many helpful comments and bibliographic hints during the preparation of the paper. I am also indebted to Istvan Németi and Hajnal Andréka for their friendly support and feedback. Finally I would like to thank the anonymous referee for insightful and stimulating comments. The preparation of this paper was supported by an National Science Center (NCN) grant: 2016/21/B/HS1/01955.

References 1. Andréka, H., Madarász, J., & Németi, I. (2007). Logic of space-time and relativity theory. In M. Aiello, I. Pratt-Hartmann, & J. Van Benthem (Eds.), Handbook of spatial logics. Dordrecht: Springer. 2. Andréka, H., Németi, I., & Németi, P. (2009). General relativistic hypercomputing and foundation of mathematics. Natural Computing, 8(3), 499–516. 3. Andréka, H., Madarász, J., & Németi, I. (2012). Decidability, undecidability, and Gödel’s incompleteness in relativity theories. Parallel Processing Letters, 22(3). 4. Andréka, H., Madarász, J. X., Németi, I., Németi, P., & Székely, G. (2011). Vienna Circle and logical analysis of relativity theory. In A. Máté, M. Rédei, & F. Stadler (Eds.), Der Wiener Kreis in Ungarn/The Vienna Circle in Hungary (Vol. 16). Veröffentlichungen des Instituts Wiener Kreis. Vienna: Springer. 5. Andréka, H., Madarász, J., Németi, I., & Székely, G. (2012). What are the numbers in which spacetime? arXiv:1204.1350v1 [gr-qc]. 6. Andréka, H., Németi, I., & Székely, G. (2012). Closed timelike curves in relativistic computation. Parallel Processing Letters (3). 7. Appel, K., & Haken, W. (1977). Every planar map is four colorable, part I: Discharging. Illinois Journal of Mathematics, 21, 429–490. 8. Appel, K., Haken, W., & Koch, J. (1977). Every planar map is four colorable, part II: Reducibility. Illinois Journal of Mathematics, 21, 491–567. 9. Arrigoni, T., & Friedman, S.-D. (2013). The hyperuniverse program. Bulletin of Symbolic Logic, 19(1), 77–96. 10. Baker, A. (2005). Are there genuine mathematical explanations of physical phenomena? Mind, 114(454), 223–238. 11. Baker, A. (2009). Mathematical explanations in science. British Journal for the Philosophy of Science, 60(3), 611–633. 12. Baker, A., & Colyvan, A. (2011). Indexing and mathematical explantation. Philosophia Mathematica, 19, 224–232. 13. Baron, S. (2014). Optimization and mathematical explanation: Doing the Levy Walk. Synthese, 191(2014), 459–479. 14. Bournez, O., & Pouly, A. (2018, 14 May). A survey of analogue models of computation. arXiv: 1805.05729v1 [cs.CC]. 15. Chaitin, G., Da Costa, N. C. A., & Doria, F. A. (2012). Gödel’s way. Exploits into an undecidable world. Boca Raton: CRC Press, Taylor & Francis Group. 16. Copeland, J. (2002). Hypercomputation. Minds and Machines, 12, 461–502. 17. Copeland, J. (2002). Accelerating Turing machines. Minds and Machines, 12, 281–301. 18. Copeland, J. (2004). Hypercomputation: Philosophical issues. Theoretical Computer Science, 317, 251–267. 19. Copeland, J. (2008). The modern history of computing. In E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy (Fall 2008 Edition). http://plato.stanford.edu/archives/fall2008/ entries/computing-history. 20. Da Costa, N. C. A., & Doria, F. A. (1991). Undecidability and incompleteness in classical mechanics. International Journal Theoretical Physics, 30, 1041–1073.

180

K. Wójtowicz

21. Cotogno, P. (2003). Hypercomputation and the physical Church-Turing thesis. British Journal for the Philosophy of Science, 54, 181–223. 22. Cubitt, T. S., Perez-Garcia, D., & Wolf, M. W. (2015). Undecidability of the spectra gap. Nature, 528, 207–211 (full version: arXiv:1502.04573v3 [quant-ph]). 23. Da Costa, N. C. A., & Doria, F. A. (1994). Suppes predicates an the construction of unsolvable problems in the axiomatized sciences. In P. Humpreys (Ed.), Patric Suppes: Scientific philosopher (pp. 151–193). Kluwer Academic Publishers. 24. Da Costa, N. C. A., & Doria, F. A. (1996). Structures, Suppes predicates, and Boolean-valued models in physics. In P. I. Bystrov & V. N. Sadovsky (Eds.), Philosophical logic and logical philosophy (pp. 91–118). Kluwer Academic Publishers. 25. Davis, M. (2006). Why there is no such discipline as hypercomputation. Applied Mathematics and Computation, 178, 4–7. 26. Davis, M. (2004). The myth of hypercomputation. In: C. Teuscher (Ed.), Alan Turing: Life and legacy of a great thinker (pp. 195–212). Berlin: Springer. 27. Dawson, J. W., Jr. (2006). Why do mathematicians re-prove theorems. Philosophia Mathematica, III, 14, 269–286. 28. Deutsch, D., Ekert, A., & Lupacchini, R. (2000). Machines, logic and quantum physics. The Bulletin of Symbolic Logic, 6(3), 265–283. 29. Earman J., Norton J.D. (2003). Forever is a day: supertasks in Pitowsky and Malament-Hogarth spacetimes. Philosophy of Science, 60, 22–42. 30. Easton, W. B. (1970). Powers of regular cardinals. Annals of Mathematical Logic, 1, 139–178. 31. Ellentuck, E. (1975). Gödel’s square axioms for the continuum. Mathematische Annalen, 216, 29–33. 32. Etesi, G., & Németi, I. (2002). Turing computability and Malament-Hogarth spacetimes. International Journal of Theoretical Physics, 41(2), 342–370. 33. Feferman, S. (2000). Why the programs for new axioms need to be questioned. The Bulletin of Symbolic Logic, 6, 401–413. 34. Friedman, H. (2000). Normal mathematics will need new axioms. The Bulletin of Symbolic Logic, 6, 434–446. 35. Friend, M. (2015). On the epistemological significance of the Hungarian project. Synthese, 192, 2035–2051. 36. Friend, M., & Molinini, D. (2015). Using mathematics to explain a scientific theory. Philosophia Mathematica, 24(2), 185–213. 37. Gödel, K. (1970). Some considerations leading to the probable conclusion, that the true power of the continuum is ℵ2 . In: S. Feferman (Ed.), Kurt Gödel. Collected works (Vol. 3, pp. 420–421). Oxford: Oxford University Press. 38. Gödel, K. (1970). A proof of Cantor’s continuum hypothesis from a highly plausible axiom about orders of growth. In: S. Feferman (Ed.), Kurt Gödel. Collected works (Vol. 3, pp. 422– 423). Oxford: Oxford University Press. 39. Hales, T. C. (2000). Cannonballs and honeycombs. Notices of the American Mathematical Society, 47(4), 440–449. 40. Hales, T. C. (2005). A proof of the Kepler conjecture. Annals of Mathematics. Second Series, 162(3), 1065–1185. 41. Hamami, Y. (2018). Mathematical inference and logical inference. The Review of Symbolic Logic, 11(4), 665–704. 42. Hamkins, J. D. (2002). Infinite time Turing machines. Minds and Machines, 12, 521–539. 43. Hamkins, J. D. (2012). The set-theoretic multiverse. Review of Symbolic Logic, 5(3), 416–449. 44. Hamkins, J. D., & Lewis, A. (2000). Infinite time Turing machines. Journal of Symbolic Logic, 65, 567–604. 112 Hauser K. (2002). Is Cantor’s continuum problem inherently vague?. Philosophia Mathematica, 10, 257–292. 46. Hogarth, M. L. (1992). Does General Relativity allow an observer to view an eternity in a finite time? Foundations of Physics Letters, 5, 173–181.

7 The Significance of Relativistic Computation …

181

47. Hogarth, M. L. (1993). Predicting the future in relativistic spacetimes. Studies in History and Philosophy of Science. Studies in History and Philosophy of Modern Physics, 24, 721–739. 48. Hogarth, M. L. (1994). Non-Turing computers and non-Turing computability. PSA, 1, 126– 138. 49. Jackson, F., & Pettit, P. (1990). Program explanations: A general perspective. Analysis, 50(2), 107–117. 50. Jones, J. P. (1980). Undecidable diophantine equations. Bulletin of the American Mathematical Society, 3(2), 859–862. 51. Kieu, T. (2002). Quantum hypercomputation. Minds and Machines, 12, 541–561. 52. Kieu, T. (2003). Quantum algorithm for Hilbert’s tenth problem. International Journal of Theoretical Physics, 42(7), 1461–1478. 53. Kreisel, G. (1974). A notion of mechanistic theory. Synthese, 29, 11–26. 54. Kreisel, G. (1982). Review of Pour-El and Richards. Journal of Symbolic Logic, 47, 900–902. 55. Kreisel, G. (1965). Mathematical logic. In T. L. Saaty (Ed.), Lectures on modern mathematics (Vol. 3). New York: Wiley. 56. Kreisel, G. (1967). Mathematical logic: What has it done for the philosophy of mathematics? In R. Schoenman (Ed.), Bertrand Russell: Philosopher of the century. London: George Allen and Unwin. 57. Lange, M. (2013) What makes a scientific explanation distinctively mathematical?, British Journal for the Philosophy of Science, 64(3), 485–511. 58. Laraudogoitia, J. P. (2013). Supertasks. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy (Fall 2013 Edition). https://plato.stanford.edu/archives/fall2013/entries/spacet ime-supertasks/. 59. Levy, A., & Solovay, R. M. (1967). Measurable cardinals and the continuum hypothesis. Israel Journal of Mathematics, 5, 234–248. 60. Lyon, A., & Colyvan, M. (2008). The explanatory power of phase spaces. Philosophia Mathematica, 16(2), 227–243. 61. Lyon, A. (2012). Mathematical explanations of empirical facts, and mathematical realism. Australasian Journal of Philosophy, 90(3), 559–578. 62. Maddy. (2000). Does mathematics need new axioms? The Bulletin of Symbolic Logic, 6, 413–422. 63. Maddy. (2011). Defending the axioms. Oxford: Oxford University Press. 64. Manchak, J., & Roberts, B. W. (2016). Supertasks. In E. N. Zalta (Ed.), The stanford encyclopedia of philosophy (Winter 2016 Edition). https://plato.stanford.edu/archives/win2016/ entries/spacetime-supertasks. 65. Mancosu, P. (2001). Mathematical explanation: Problems and prospects. Topoi, 20, 97–117. 66. Mancosu, P. (2008). Mathematical explanation: Why it matters. In P. Mancosu (Ed.), Philosophy of mathematical practice (pp. 134–150). Oxford: Oxford University Press. 67. Mancosu, P. (2018). Explanation in mathematics. Stanford encyclopedia of philosophy. https:// plato.stanford.edu/archives/sum2018/entries/mathematics-explanation/. 68. Montanaro, A. (2015). Quantum algorithms: An overview. https://www.nature.com/articles/ npjqi201523; (also: arXiv:1511.04206v2). 69. Mordell, L. (1959). Reflections of a mathematician. Montreal: Canadian Mathematical Congress. 70. Nielsen, M. A., & Chuang, I. L. (2000). Quantum computation and quantum information. Cambridge University Press. 71. Németi, I., & Dávid, G. (2006). Relativistic computers and the Turing barrier. Journal of Applied Mathematics and Computation, 178(1), 118–142. 72. Németi, I. (1987). On logic, relativity, and the limitations of human knowledge. Iowa State University, Department of Mathematics, graduate course during the academic year 1987/88. 73. Németi, P., & Székely, G. (2012). Existence of faster than light signals implies hypercomputation already in special relativity. In S. B. Cooper, A. Dawar, & B. Löwe (Eds.), How the World Computes. CiE 2012 (Vol. 7318). Lecture Notes in Computer Science. Berlin, Heidelberg: Springer.

182

K. Wójtowicz

74. Piccinini, G. (2011). The physical Church-Turing thesis: Modest or bold? British Journal for the Philosophy of Science, 62, 733–769. 75. Piccinini, G. (2017). Computation in physical systems. https://plato.stanford.edu/archives/ sum2017/entries/computation-physicalsystems/. 76. Pitowsky, I. (1990). The physical Church thesis and physical computational complexity. Iyyun, 39, 81–99. 77. Pour-El, M. B., & Richards, J. I. (1979). A computable ordinary differential equation which possesses no computable solution. Annals of Mathematical Logic, 17, 61–90. 78. Pour-El, M. B., & Richards, J. I. (1981). The wave equation with computable initial data such that its unique solution is not computable. Advances in Mathematics, 39, 215–239. 79. Pour-El, M. B., & Richards, J. I. (1989). Computability in analysis and physics. Berlin: Springer. 80. Pour-El, M., & Zhong, N. (1997). The wave equation with computable initial data whose unique solution is nowhere computable. Mathematical Logic Quarterly, 43(4), 499–509. 81. Quine, W. v. O. (1953). Two dogmas of empiricism. In From a logical point of view (pp. 20–46). Cambridge: Harvard University Press. 82. Rav, Y. (1999). Why do we prove theorems? Philosophia Mathematica, 7(1999), 5–41. 83. Reutlinger, A., & Saatsi, J. (2018). Explanation beyond causation. Philosophical perspectives on non-causal explanation. Oxford: Oxford University Press. 84. Rota, G.-C. (1997). The phenomenology of mathematical proof. Synthese, 111, 183–196. 85. Scarpellini, B. (2003). Comments on: ‘Two undecidable problems of analysis’. Minds and Machines, 13, 79–85. 86. Scarpellini, B. (1963). Zwei Unentscheitbare Probleme der Analysis. Zeitschrift für mathematische Logik und Grundlagen der Mathematik, 9, 265–289 (English, revised version: Minds and Machines, 2003, 13, 49–77). 87. Shagrir, O., & Pitowsky, I. (2003). Physical hypercomputation and the Church-Turing thesis. Minds and Machines, 13, 87–101. 88. Shor, P. (1997). Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM Journal on Computing, 26, 1484–1509. 89. Shor, P. (1994). Algorithms for quantum computation: Discrete logarithms and factoring. In: Proceedings 35th Annual Symposium on Foundations of Computer Science (pp. 124–134). 90. Siegelmann, H. T. (1998). Neural networks and analog computation: Beyond the Turing limit. Boston, MA: Birkhauser. 91. Siegelmann, H. T. (2003). Neural and super-Turing computing. Minds and Machines, 13, 103–114. 92. Simpson, S. (2009). Subsystems of second order arithmetic. Cambridge: Cambridge University Press. 93. Smith, W. D. (2006). Three counterexamples refuting Kieu’s plan for “quantum adiabatic hypercomputation”; and some uncomputable quantum mechanical tasks. Applied Mathematics and Computation, 178, 184–193. 94. Solovay, R. M. (1995). Introductory note to *1970a, *1970b, *1970c. In S. Feferman (Ed.), Kurt Gödel. Collected works (Vol. 3, pp. 405–420). Oxford: Oxford University Press. 95. Stannett, M. (2006). The case for hypercomputation. Applied Mathematics and Computation, 178, 8–24. 96. Steel, J. R. (2000). Mathematics needs new axioms. The Bulletin of Symbolic Logic, 6, 422– 433. 97. Syropoulos, A. (2008). Hypercomputation. Computing beyond the Church-Turing barrier. Springer: New York. 98. Szabó, L. E. (2017). Meaning, truth and physics. In G. Hofer-Szabó & L. Wro´nski (Eds.), Making it formally explicit (pp. 165–177). European Studies in Philosophy of Science 6. Springer International Publishing. 99. Székely, G. (2011). On why-questions in physics. In A. Máté, M. Rédei, & F. Stadler (Eds.), Der Wiener Kreis in Ungarn/The Vienna Circle in Hungary (Vol. 16, pp. 181–189). Veröffentlichungen des Instituts Wiener Kreis. Vienna: Springer.

7 The Significance of Relativistic Computation …

183

100. Székely, G. (2012). What properties of numbers are needed to model accelerated observers in relativity? In J.-Y. Béziau, D. Krause, & J. R. Becker Arenhart (Eds.), Conceptual clarifications tributes to Patrick Suppes (1922–2014) (pp. 161–174). College Publications (also arXiv:1210.0101v1 [math.LO]). 101. Tymoczko, T. (1979). The four-color problem and its philosophical significance. The Journal of Philosophy, 76(2), 57–83. 102. Weihrauch, K., & Zhong, N. (2002). Is wave propagation computable or can wave computers beat the Turing machine? Proceedings of the London Mathematical Society, 85(2), 312–332. 103. Welch, P. (2008). The Extent of Computation in Malament-Hogarth Spacetimes. British Journal for the Philosophy of Science, 59, 659–674 (arXiv:gr-qc/0609035v1). 104. Woodin. (1999). The axiom of determinacy, forcing axioms and the nonstationary ideal. Berlin, New York, de Gruyter. 105. Woodin. (2001). The continuum hypothesis. Parts I and II. Notices of the AMS, 48(6–7), 567–576, 681–690. 106. Wójtowicz, K. (2009). Theory of quantum computation and philosophy of mathematics (I). Logic and Logical Philosophy, 18(3–4), 313–332. 107. Wójtowicz, K. (2014). The physical version of Church’s thesis and mathematical knowledge. In A. Olszewski, B. Bro˙zek, & P. Urba´nczyk (Eds.), Church’s thesis: Logic, mind nature (pp. 417–431). Kraków: Copernicus Center Press. 108. Wójtowicz, K. (2019). Theory of quantum computation and philosophy of mathematics (II). Logic and logical philosophy, 29(1), 173–193. 109. Wüthrich, C. (2015). A quantum-information-theoretic complement to a general-relativistic implementation of a beyond-Turing computer. Synthese, 192, 1989–2008.

Krzysztof Wójtowicz is a Polish philosopher of mathematics. He graduated with MSc in Mathematics from the University of Warsaw; in 1998 he received a Ph.D. in Philosophy at the University of Warsaw and has been affiliated with this university throughout all his academic career (becoming a full professor in 2013). His scientific interests focus on philosophy of mathematics— in particular on the realism-antirealism debate, the problem of mathematical explanations in science and non-standard models of computation. He is also interested in the probabilistic description of conditionals in natural language.

Part II

Algebraic Logic

Chapter 8

Generalized Quantifiers Meet Modal Neighborhood Semantics Johan van Benthem and Dag Westerståhl

Abstract In a mathematical perspective, neighborhood models for modal logic are generalized quantifiers, parametrized to points in the domain of objects/worlds. We explore this analogy further, connecting generalized quantifier theory and modal neighborhood logic. In particular, we find interesting analogies between conservativity for linguistic quantifiers and the locality of modal logic, and between the role of invariances in both fields. Moreover, we present some new completeness results for modal neighborhood logics of linguistically motivated classes of generalized quantifiers, and raise new types of open problem. With the bridges established here, many further analogies might be explored between the two fields to mutual benefit. Keywords Modal logic · Generalized quantifier theory · Neighborhood semantics · Permutation invariance · Conservativity · Monotonicity

8.1 Introduction: Quantifiers and Neighborhoods Hajnal Andréka and István Németi have long been leaders in the algebraic study of the foundations of logic. In particular, the high abstraction levels provided by their expertise in algebra, with a judicious influx of model-theoretic ideas from modal We dedicate this paper to Hajnal and István, in friendship and admiration. J. van Benthem (B) Institute for Logic, Language and Computation, University of Amsterdam, Science Park 904, 1090 GE, Amsterdam, The Netherlands e-mail: [email protected] Department of Philosophy, Stanford University, Stanford, CA, USA School of Humanties, Tsinghua University, Beijing, China D. Westerståhl Department of Philosophy, Stockholm University, 106 91 Stockholm, Sweden e-mail: [email protected] School of Humanties, Tsinghua University, Beijing, China © Springer Nature Switzerland AG 2021 J. Madarász and G. Székely (eds.), Hajnal Andréka and István Németi on Unity of Science, Outstanding Contributions to Logic 19, https://doi.org/10.1007/978-3-030-64187-0_8

187

188

J. van Benthem and D. Westerståhl

logic, have led to new perspectives on well-established core systems of the field such as first-order predicate logic. A case in point are the decidable guarded fragment and the generalized first-order semantics found in Andréka et al. [2]. In a recent paper Andréka et al. [3], triggered by work of Aldo Antonelli on weak first-order logics, this abstraction was taken one step further, viewing existential and universal quantifiers as arbitrary generalized quantifiers that can be parametrized to objects or tuples of objects in a model. The authors of the latter paper suggest that this generalized quantifier perspective can be fruitfully compared to neighborhood semantics in modal logic, a generalization of the standard modal semantics with accessibility relations Rst between points, that is finding ever more uses. A neighborhood model is a structure M = (W, N , V ) with W a non-empty set of worlds, V a valuation mapping proposition letters to subsets of W , while the neighborhood relation N s X is a binary point-to-set relation connecting each point s to its neighborhoods X . Thus, a neighborhood relation is exactly the same as a point-parametrized unary generalized quantifier, in a precise sense to be explained below. In this small piece, we explore this suggested analogy a little bit further by comparing three topics at the interface of generalized quantifier theory in linguistics and modal neighborhood logic. The attraction in pursuing this interface is not just technical, but also involves a meeting of different cultures. Intuitions about modal logic are often epistemic, temporal, or computational [31], whereas many of the intuitions underlying generalized quantifier theory involve connections with appealing empirical observations about natural language [19]. Incidentally, the above structural connection can be viewed in two ways. As described so far, a neighborhood frame (W, N ) has the type e, e, t, t where e is the type of entities, t of truth values, and a, b of all functions from a-type entities to b-type ones. That is, N is a relation between entities and sets of entities, or, equivalently, a function from entities to sets of sets of entities (its family of ‘neighborhoods’). Next, a unary generalized quantifier on a domain W is simply a sets of subsets of W . Thus, it is exactly what we will call a uniform neighborhood frame, meaning that N is a constant function. Ordinary neighborhood frames are then point-parameterized unary generalized quantifiers.1 Now, this type is isomorphic to another, viz.2 : e, t, e, t

1 Admittedly,

point-parameterized generalized quantifiers have not been studied much in the GQ literature, and in this paper we mostly restrict attention to standard generalized quantifiers, i.e. to uniform neighborhood frames. 2 This equivalence can be derived formally in categorical logics such as the Lambek Calculus.

8 Generalized Quantifiers Meet Modal Neighborhood Semantics

189

This shift to a new, though equivalent, type makes our quantifiers (pointparameterized or not) into functions on the power set of the domain of primitive entities: shifting (W, N ) to (W, F), where F(X ) = {s ∈ W : N s, X } Taking this functional setting would bring our study closer to algebraic semantics for modal logics, involving a Boolean algebra with additional operators about which less or more can be assumed (monotonicity, intersection closure, and other special properties are well-known extras that will return later on in this article), cf. Blackburn et al. [7], Venema [36] for details and references on modal algebra. We will shift to the functional perspective occasionally in what follows, but the relational one will be our main vehicle. The three topics that we will highlight in this paper concern basic features of the two realms to be connected. The first topic is the well-known locality of modal logics, where evaluation takes place in a local environment of the current point, viewed in tandem with the equally well-known conservativity of quantifiers in natural language, that come with restrictions to relevant subdomains of a whole model. The second topic is the role of fundamental invariance relations in both fields that constrain permissible denotations: notions of bisimulation in modal logic, and of permutation or isomorphism invariance for quantifiers. Our third topic concerns the core business of much of modal logic: finding new complete axiomatic systems for specific generalized quantifiers, now taking inspiration from generalized quantifier theory. In each case, we show how to make the comparison, point out which analogies are fruitful and which ones less so, we state a number of new observations, referring to current literature for deeper developments, and we end with an assessment and further ways to go. In all of this, our main emphasis is on raising new kinds of questions, though we back them up by some new observations throughout. We assume that the reader knows the basics of both fields to be connected. For a survey of generalized quantifier theory in logic and linguistics, we refer to Peters and Westerståhl [19], and for an up-to-date introduction to modal neighborhood semantics, to the recent textbook Pacuit [17].

8.2 Locality and Conservativity We start by recalling a few basic notions. The language of modal propositional logic has formulas constructed using proposition letters, Boolean connectives, and modal operators , ♦. In relational models M = (W, R, V ), a formula ϕ is true at point s (written M, s |= ϕ) if ϕ is true at all t with Rst—and ♦ϕ is the existential dual. The same modal language can also be interpreted over neighborhood models M = (W, N , V ), where ϕ is now true at point s if N s, {t ∈ W | M, t |= ϕ}. In this case, we speak of modal neighborhood logic—whose semantics also has an alternative ‘monotone’ version, explained below.

190

J. van Benthem and D. Westerståhl

8.2.1 Locality in Modal Semantics Standard modal logic satisfies locality, which is often taken to be a characteristic property of modal languages in general, cf. Blackburn et al. [7]. Technically, evaluating a modal formula ϕ at a point s in a model M = (W, R, V ) yields the same truth value as evaluating ϕ at s in any submodel of M that contains s and is closed under taking R-successors. This locality property extends to generalized topological semantics [30]: a modal formula ϕ is true at s in M iff ϕ is true at s in any restriction M|O of the model M to an open set O containing s. Here a topological frame is a neighborhood frame (W, N ) where, for some topology on W , we have N s, X iff for some open set Y , s ∈ Y ⊆ X .3 Now both facts are instances of a more general observation in Bonnay and Westerståhl [9]. First, we need some definitions. Let M = (W, N , V ) be a neighborhood model, and let A be a subset of W . The submodel M|A = (A, N A , V A ) arises by restricting the domain of M to A, letting the new neighborhoods for a point s be {X ∩ A : N s, X }, cf. van Benthem and Pacuit [33]. Modal formulas need not preserve their truth values in passing to arbitrary submodels, but there exist special submodels where they do. A generated submodel of M is a submodel M|A satisfying the following equivalence: for all s ∈ A and all X, N s, X iff N A s, X ∩ A Fact 8.1 If M|A is a generated submodel of M, then for all s ∈ A and all formulas ϕ, M, s |= ϕ iff M|A, s |= ϕ.  Proof By a straightforward induction on modal formulas. To show how the above definition works, here is the inductive case for the modal operator ψ. We let [[ϕ]]M = {s ∈ W : M, s |= ϕ}. For s ∈ A, we have the following equivalences: M|A, s |= ψ iff N A s, [[ψ]]M|A iff N A s, [[ψ]]M ∩ A iff N s, [[ψ]]M iff M, s |= ψ.

(induction hypothesis) (generated submodel)

 Now, it is easy to see that in the case of the relational semantics, the above notion of generated submodel is exactly the usual one. Likewise, in the topological semantics, the subsets which yield generated submodels are precisely the open sets. Thus, these two are special instances of our Fact. 3 In the above-mentioned functional version of neighborhood models, a topological frame would be

of the form (W, F), where F is the interior function of the topology.

8 Generalized Quantifiers Meet Modal Neighborhood Semantics

191

It is time to highlight a distinction between two versions of the semantics for modal neighborhood logic. The proof above uses the precise semantics, where M, s |= ϕ iff N s, [[ϕ]]M However, there is also a monotone semantics, with the truth condition M, s |=m ϕ iff there is X such that N s, X and X ⊆ [[ϕ]]m M In the topological semantics for modal logic, and in the standard relational semantics, the two versions are equivalent. But in general, when the families {X ⊆ W | N s, X } need not be closed under supersets, |= and |=m may differ. Interestingly, though, the distinction does not matter for the current topic: Fact 8.2 Invariance for generated submodels holds under the monotone semantics as well.  Proof An analogous inductive argument goes through for |=m too.



While the precise semantics is the absolute minimum needed for algebraic semantics, there are some technical complications in its theory (cf. [17]), while also, the monotone semantics is the one that occurs in most applications. Hence we will use the latter in the sequel, unless explicitly indicated otherwise. In standard relational semantics, there is a smallest local environment for which the above invariance holds, namely, the smallest generated submodel whose domain consists of the point s and all points reachable from it in a finite number of R-steps. This is no longer true in the topological semantics, where open sets can decrease without limit, and a fortiori, this uniqueness does not hold in neighborhood semantics. More precisely, given (W, N ) and s ∈ W , let  {A ⊆ W : s ∈ A and (A, N A ) is a generated subframe of (W, N )} As = If (As , N As ) is itself a generated subframe of (W, N ), then it is the smallest such subframe containing s. But it need not be generated subframe of (W, N ): Counter-example. Consider the neighborhood frame (N, N ), where for n > 0, N n, X holds iff n ∈ X , but N 0, X holds iff 0 ∈ X and X is co-finite. Since for all k, the set of X ⊆ N such that N k, X holds is a filter, even the minimal normal modal logic K is sound for (N, N ), but it is easily checked that (A0 , N A0 ) is not a generated subframe according to the above definition. Indeed, imposing the existence of smallest neighborhoods in topological semantics takes us back to relational semantics. This is the well-known sense in which relational models correspond to special topological ‘Alexandrov spaces’ (cf. [30]). But here is a slightly different take in our current terms. In relational semantics, let Rs denote the set of R-successors of s. The following fact can be found in Bonnay and Westerståhl [9]:

192

J. van Benthem and D. Westerståhl

Fact 8.3 If (W, N ) is a topological neighborhood frame such that for all s ∈ W , (As , N As ) is a generated subframe of (W, N ), then (W, N ) is in fact a relational frame, in the sense that there is a unique relation R ⊆ W 2 such that N s, X iff Rs ⊆ X .  Thus, neighborhood modalities may have a variety of local environments in a model—and under special circumstances, even a smallest one.

8.2.2 Conservativity and Domain Restriction for Quantifiers A binary quantifier on a domain W is a binary relation between subsets of W . These appear in English as the interpretations of determiner phrases, such as “every”, “no”, “most”, “exactly four”, “all but three”, “more than two-thirds of the”, and of adverbs like “always”, “usually”, “never”. Other languages may use different linguistic constructions for these purposes, but it is a well-attested empirical fact that quantifiers across human languages all satisfy the following equivalence (cf. [19]): Q AB iff Q A(A ∩ B)

Conservativity

The appeal of conservativity, and perhaps even an explanation of its ubiquity, stems from its role in facilitating efficient ‘local’ processing: in understanding an expression, we can restrict attention to various smaller relevant subdomains. This processing restriction can also occur with unary generalized quantifiers Q B (think of expressions like “everyone”, “someone"), where Q merely denotes some family of subsets of the whole domain of individuals. Now, this resembles what we saw for modal logic. Barwise and Cooper [6] say that a unary quantifier Q lives on a set A (which can be taken to be a subset of the domain) if for all subsets B of the domain, Q B iff Q B ∩ A. In these terms, a binary quantifier Q  is conservative iff for every subset A of the domain, the restriction of Q  to A (the unary quantifier obtained by fixing the first argument to A) lives on A. This is exactly the clause we used in defining generated submodels. The domains of such models are live-on sets. But there is more to the comparison. In the modal case, the restriction to the local environment A is drastic: we ignore W − A and evaluate all formulas inside the submodel M|A. Thus, we perform what in logic is called a relativization of the model. But this is not the intent of Conservativity. When we say All fans wore a cap, we are merely saying that all fans were fans who were cap wearers, not that all fans wore a cap that was itself a fan: the embedded quantifier “a cap” can look outside of the domain of fans.4 4 The

difference between relativization and conditionalization is ubiquitous. For a discussion in the setting of dynamic-epistemic logics of model update, cf. van Benthem [26].

8 Generalized Quantifiers Meet Modal Neighborhood Semantics

193

Even so, “a cap" will impose its own form of conservativity: both quantifiers impose restrictions on arguments of the transitive verb “x wears y”, where x gets restricted to fans, and y to caps. The analogue for modal logic might be iterated modalities referring to different points with different neighborhoods, where live-on sets can be different at different points. But of course, the basic modal language has no explicit syntactic counterpart for binary relational atoms.5 One way of highlighting the difference between the two frameworks focuses on the binary nature of linguistic quantifiers Q AB. Its immediate reflection in a modal neighborhood language would not be a unary modality ϕ but rather a binary modality ϕψ referring to a binary neighborhood relation N s, X Y in a straightforward manner. Such binary modalities have been little used so far, though they occur with weak implicational logics, cf. de Jongh and Shirmohammadzadeh [10]. In an implicational setting, we often think of an antecedent ϕ as restricting attention to the submodel (not necessarily generated) consisting of all points in the model satisfying ϕ, while the consequent ψ is evaluated in that set by reference to the whole model. Indeed, several types of conditional satisfy Conservativity, such as classical entailment saying that ψ is true in all ϕ-worlds, or counterfactual entailment saying that ψ is true in the closest ϕ-worlds to the current one.6 But Conservativity can fail for more complex assertions. A temporal conditional ϕ ⇒ ψ saying that after every instance of ϕ, there will be a later instance of ψ, supports no valid implication to ϕ ⇒ (ϕ ∧ ψ). Thus, it seems of interest to determine what the conservativity constraint means in a modal language, or indeed, any logical language. Can we tell from the syntactic logical form of a statement when this behavior occurs? We give two versions of such a result, which may be viewed as model-theoretic preservation theorems of a somewhat unusual kind. First consider the language of propositional logic. Say that a formula ϕ( p, q) with perhaps also other proposition letters beyond p, q, is p-conservative in q if it satisfies the following validity: |= ϕ( p, q) ↔ ϕ( p, p ∧ q) Fact 8.4 A formula θ is p-conservative in q iff it can be defined by a formula of the form ϕ ∨ ( p ∧ ψ(q)), where ϕ contains no occurrences of q, and p does not occur in ψ.  Proof Obviously, formulas of this form are p-conservative in q. Conversely, if θ is p-conservative in q, consider any disjunctive normal form θ  of θ . Applying the valid equivalence θ  ( p, q) ↔ θ  ( p, p ∧ q), we replace q by p ∧ q throughout. In this way, disjuncts where the literal ¬ p occurs together with q become contradictions and can be dropped up to logical equivalence. Next, in disjuncts where the literal ¬ p occurs together with ¬q, the new conjunct ¬( p ∧ q) can be dropped, since 5 For non-iterated modalities, however, the relativizations of unary quantifiers are exactly the binary

quantifiers satisfying Conservativity and the Extension property to be defined later. generalized quantifier perspective on conditionals was proposed in van Benthem [22].

6A

194

J. van Benthem and D. Westerståhl

it already follows from ¬ p, leaving no occurrence of q. Finally, all disjuncts in which p appeared remain the same up to logical equivalence after the substitution. The resulting, still equivalent, formula is easily seen to be logically equivalent to a disjunction of a q-free formula plus a formula where we can put p in front, followed by a p-free disjunction.  We can extend this style of analysis to richer logical languages. A very simple illustration, still allowing for the use of normal forms, is this. Say that a formula in monadic first-order logic is P-conservative in Q if replacing occurrences of a subformula Qx by P x ∧ Qx, for any variable x, results in an equivalent formula. Fact 8.5 A formula θ in monadic first-order logic is P-conservative in Q iff it is definable by a Boolean combination of (a) Q-free formulas, (b) formulas ∃x (P x ∧ ψ) where ψ is P-free and has no other free variables than x, and (c) for each variable u, a formula as in Fact 8.4 (with p replaced by Pu, q by Qu, etc.).  Proof Again, the direction from the special forms to Conservativity is obvious. In the other direction let, for simplicity, the relevant unary predicates be just P, Q, R. A ‘state description of x’ is a formula of the form (¬)P x ∧ (¬)Qx ∧ (¬)Rx. θ is equivalent to a Boolean combination of atomic formulas P x, Qy, etc. and formulas of the form ∃x sd(x), where sd(x) is a state description of x; we can assume that the bound variable is the same in each. Treating all these as atoms, write the formula in disjunctive normal form; we obtain a formula θ  equivalent to θ which is still P-conservative in Q. Now replace all occurrences of Qu, for all variables u, by Pu ∧ Qu. In each disjunct of the normal form, we can apply the same argument as for Fact 8.4, both inside the quantified conjuncts and among conjuncts which are state descriptions sd(u). In particular, inspecting the four relevant types of object: (a) ∃x(P x ∧ Qx ∧ . . .) and (b) ∃x(P x ∧ ¬Qx ∧ . . .) are unchanged up to logical equivalence after the substitution, (c) ∃x(¬P x ∧ Qx ∧ . . .) becomes the contradictory ∃x(¬P x ∧ P x ∧ Qx ∧ . . .) and can be dropped, and the final substitution (d) ∃x(¬P x ∧ ¬(P x ∧ Qx) ∧ . . .) is equivalent to ∃x(¬P x ∧ ψ) with ψ Q-free, yielding a Q-free formula overall.  More challenging cases for determining the precise syntactic impact of conservativity would be the language of basic modal logic over relational models, or all of first-order logic. For the latter, we conjecture that it suffices to mark each occurrence of a quantifier by a unique bound variable, and then to make sure that each occurrence of an atom Qx has its variable governed by a relativized quantifier ∃x(P x ∧ . . .) or ∀x(P x → . . .).7 This syntactic class allows any Q-free formulas, and it is closed under all Booleans and quantifiers. However this may be, our simple observations may already have shown how notions from generalized quantifier theory can generate new types of model-theoretic question in the modal and first-order realm. 7 In

the modal case, without explicit variable binding, this description requires more care. The syntactic governing has to be done at the right level of modal operator depth, and there are further combinatorial complexities. For instance, the modal formula ( p ∨ r ) ∧ (q ∨ r ) is p-conservative in q since it is equivalent (in the normal modal logic K) to (r ∨ ( p ∧ q)).

8 Generalized Quantifiers Meet Modal Neighborhood Semantics

195

Coda and Caveat. Generalized Quantifier Theory also has other intuitions concerning locality. Let us view a quantifier Q as a functor assigning to each model M a set Q M of subsets of the domain. Then the following well-known constraint on how the functor Q assigns its values says intuitively (in the binary case) that a quantifier Q AB only cares about the two sets it is comparing: Extension For all models M, N with A ∪ B included in both domains, Q M AB iff Q N AB Here we hit a difficulty in comparing the two sides of this paper, namely, generalized quantifier theory and neighborhood semantics for modal logic. The Extension constraint is harder to phrase in modal logic, or in logical systems in general. It requires a regimented class of models where the structure assigned to models is constrained. This is not the thinking in modal neighborhood semantics, where models can have arbitrary neighborhood relations. Instead, Extension says that, when models are related in some way (by the submodel relation, or perhaps another important cross-model relation), the quantifier structures in these models have to be similar.8 The intuition underlying this condition seems to be a coherence constraint: all models in the class are compatible, and are fragments of one big supermodel. This is not normally assumed in modal logic, where validity is a case-by-case notion.9 It should be noted, however, that on the generalized quantifier side, Extension does have a clear syntactic counterpart. Say that a sentence ϕ = ϕ(P, . . .) is P-restricted if for all M, M |= ϕ ⇔ M|P M |= ϕ. This is stronger than P-conservativity: a Prestricted sentence is P-conservative in all other unary predicate symbols Q occurring in it, not just a specific one. Now it is immediate that ϕ is P-restricted iff it is equivalent to ψ (P) (the relativization of ψ to P) for some P-free sentence ψ (just replace P x by x = x in ϕ and relativize). We leave this topic open-ended here, though the mismatch will return with our next topic. A deeper connection may well have to be category-theoretic.

8.3 Invariance and Simulation 8.3.1 Modal Logic and Invariance Like many logical systems, modal logic allows for a semantic invariance analysis of its expressive power, Blackburn et al. [7]. The basic notion of structural invariance that fits the expressive power of the modal language on neighborhood models comes in several varieties. We will couch the following discussion in terms of the monotone truth condition stated above for ϕ. 8 To

find analogues for this in the modal realm, one might require, say, that accessibility relations should be the same in the overlap of any two models in one’s model class. 9 It might be of interest, however, to see what restrictions like this mean in the setting of modal logics for model-changing operations, cf. Aucher et al. [4].

196

J. van Benthem and D. Westerståhl

Let M = (W, N , V ) and M = (W  , N  , V  ). A binary relation Z ⊆ W × W  is a bisimulation between M and M if (a) if s Z t, then s ∈ V ( p) ⇔ t ∈ V  ( p) for all atoms p, (b) if s Z t and N s, X , then there is a set Y with N  t, Y such that ∀y ∈ Y ∃x ∈ X x Z y; and vice versa starting from s Z t and N  t, Y . Fact 8.6 Modal formulas are invariant for bisimulation.



The proof is by induction on modal formulas, where the inductive case for ϕ mirrors exactly the back-and-forth clause of bisimulation. The fact explains our earlier observation about generated submodels. The identity is a bisimulation between the full model M and the generated submodel M|A. In logic, invariance is connected with definability and the genesis of language [25]. Once we have invariant structure, the issue arises of languages expressively complete for defining it.10 A typical result in this vein is the theorem saying that the first-order formulas in a signature with binary R and unary P, Q, ... having one free variable x that are invariant for bisimulation on relational models are precisely those definable in the basic modal language. For an exposition of this result and the literature around it, see Blackburn et al. [7]. The theorem was lifted to the neighborhood setting in Pauly [18], using two-sorted first-order models with point and set objects, with an abstract neighborhood relation N and membership relation E.11 Fact 8.7 The formulas in a two-sorted first-order language for points and sets that are invariant for bisimulation are precisely those that are definable in the modal propositional language with the monotone neighborhood semantics.  Remark. The above result is not the only way of phrasing the issue. Given that bisimulation is also about preserving relational structure, one may ask for a richer notion of invariance with respect to relations, not just properties of points. For standard relational models, one such additional requirement is ‘safety for bisimulation’ [24], whose definition we forego here. The point is that, in such an extended language, we can now determine which definable operations on relations preserve bisimulation, and the answer for the first-order language is: the operations of regular algebra without iteration. Bisimulation-safe operations on neighborhood relations have been studied in Pauly [18]. Remark. The above analysis extends to the precise neighborhood semantics for the modal language. In particular, the above notion of bisimulation can be modified to deal with the precise semantics for ϕ defined earlier. But the solution involves several non-trivial subtleties: cf. Hansen et al. [13], and the detailed explanations and references in Pacuit [17].12, 13 10 Bisimulation is not the only structural relation between models for which this makes sense. One can also use isomorphism, potential isomorphism, or many other equivalence relations. 11 Here we work in Henkin-style, without assuming that the set objects are a full power set. 12 The solution is not just a symmetric version of the above back-and-forth clause between the two matching neighborhoods. The latter rather fits the language of ‘instantial neighborhood logic’, [34]. 13 For an analysis of a related challenge to bisimulation, cf. Baltag and Cinà [5].

8 Generalized Quantifiers Meet Modal Neighborhood Semantics

197

8.3.2 Invariance and Generalized Quantifiers Prima facie, the same thinking applies to the realm of generalized quantifiers. There is a standard constraint on quantifiers which seems exactly in the same spirit, namely invariance for permutations or even isomorphism between models: Isomorphism Invariance If F is a bijection between the domains of two models M and N , then for all A, B, Q M AB iff Q N F[A]F[B]. In logic, isomorphism invariance is basic, but also permissive, many notions pass this test. So, what if we make the test stricter, replacing isomorphism by the rougher relation of bisimulation (cf. [32] for a systematic study)? Do we get a much narrower class of generalized quantifiers? Here is one way of formulating the constraint, where we lift the relation of bisimulation to a function between sets: Bisimulation Invariance If Z is a bisimulation between the domains of models M, N , then for all sets A, B ⊆ M and C, D ⊆ N , (a) Q M AB iff Q N Z [A]Z [B], (b) Q N C D iff Q M Z −1 [C]Z −1 [D]. 14 This condition seems plausible, and it suggests an elegant functional reformulation of the notion of bisimulation between neighborhood models. What it says is that the bisimulation relation Z , viewed as a function, and the neighborhood relation corresponding to a unary quantifier Q, viewed as a function from sets to sets by the type shift of Sect. 8.1, commute. The match is not precise, as will be clear by spelling out some details, but here is an intriguing point where we find a connection with co-algebra, cf. Hansen et al. [13], Jacobs [14]. We do not pursue this line here, however, because there is also a problem. Our aim was to classify generalized quantifiers as linguistic constructions in terms of invariance. But in the preceding scenario, the neighborhood relation corresponding to the quantifier is not a logical construction, but a relation among objects and individuals. Its invariance is the starting point, the same way in which ‘structurepreserving’ transformations of models take the invariance of the relevant atomic structure for granted. What one studies then is which further constructions over this base vocabulary define invariant notions. The difference that gets in the way of a direct comparison is as with our discussion of Extension. The Isomorphism constraint assumes a functor assigning quantifiers to domains, a setting different from that of modal logic and model theory.15 Even with this tension, several interesting things can be asked about generalized quantifiers on the analogy with modal logic.

use the double clause since the maps Z an Z −1 induced by the relation Z (i.e. Z [A] = {c : ∃a ∈ A a Z c} and Z −1 [C] = {a : ∃c ∈ C a Z c}) are not in general inverses. 15 Of course, this is a natural setting, witness the extensive literature on permutation invariance and logicality (e.g. [8, 12, 29]). 14 We

198

J. van Benthem and D. Westerståhl

One issue is the earlier notion of safety. Given generalized quantifiers that satisfy some invariance across models, say, with respect to some notion of bisimulation, which operations on these quantifiers give rise to new defined quantifiers supporting invariance for that bisimulation? In this perspective, attention would shift to algebras of generalized quantifiers, on the analogy of algebras of neighborhood relations (cf. [27]). A different, but also quite natural, way of connecting the modal and generalized quantifier perspectives runs via the extensive existing model theory of generalized quantifier languages such as E L(Q): first-order logic with a generalized quantifier Q added [16, 19, 21]. Modal perspectives make sense here, although there does not seem to be a full-fledged study of modal fragments of generalized quantifier logics. Coming at the same fragments from a different direction, modal logicians have added specific generalized quantifiers to the basic modal repertoire, such as counting quantifiers for numbers of successors in ‘graded modal logic’. In the next section, we will encounter some formalisms that lie more in the latter vein of introducing quantifiers into modal logic. It might have seemed that invariance and definability are the most obvious meeting point between modal neighborhood logic and generalized quantifier theory, but what we have mainly done is discussing obstacles. This may just be a sign of our not having found the key to unlocking the correct analogy—but at this stage, we are happy to hand over this challenge to the reader. Coda. There are also possible comparisons of expressive power that do not run into the above considerations. For instance, generalized quantifier theory also has characterizations of special families of quantifiers through different combinations of properties. An example is the result in van Benthem [23] that the classical quantifiers in the Square of Opposition are exactly those that have the three basic properties of Conservativity, Double Monotonicity (the quantifier allows for either upward or downward monotonicity inference in both of its arguments), and Variety (the quantifier is not constant in truth value on non-empty domains). This type of characterization does not seem to have been considered so far for capturing repertoires of modal operators.

8.4 Modal Logics of Quantifiers Modal neighborhood semantics is used for very general purposes, where neighborhood structures can be topologies or more abstract algebraic structures validating only very weak logics close to the minimal logic of one or more algebraic operators added to Boolean algebra. But with neighborhood relations viewed as quantifiers, new concrete questions arise, that we will sample here. In what follows, for convenience, we restrict ourselves mostly to uniform neighborhood relations without any dependence on points. Also, we reverse our earlier bias, and concentrate mostly on the precise semantics for modal neighborhood logic.

8 Generalized Quantifiers Meet Modal Neighborhood Semantics

199

8.4.1 Modal Logic of Permutation-Invariant Quantifiers Much of modal logic is about axiomatizing reasoning with various modalities. Starting with the basic property of permutation invariance that makes quantifiers ‘count’, what sort of modal propositional logics arise when neighborhood relations are to be permutation invariant? This task is related to completeness for standard logics with permutation invariant generalized quantifiers added. With uniform neighborhood relations, the modal setting is the special case of logics PL(Q), that is, classical monadic predicate logic with just one unary generalized quantifier variable added (no identity, and no ∀ or ∃), whose interpretation Q on a domain W is such that if A ∈ Q, |A| = |B| and |W − A| = |W − B|, then B ∈ Q. Richer logics with16 predicates of any arity have been investigated in Anapolitanos and Väänänen [1], but in what follows, we look at our simpler case. The minimal modal logic E (in the basic modal language) of arbitrary neighborhood frames just adds one inference rule to classical propositional logic17 : (RE)

If  ϕ ↔ ψ, then  ϕ ↔ ψ

On uniform models, formulas of the form ϕ or ♦ϕ are either true at all states or at none. Reduction principles like the following now become valid18 : (R1) ( p ∧ q) ↔ (q ∧  p) ∨ (¬q ∧ ⊥) (R2) ( p ∨ q) ↔ (q ∧ ) ∨ (¬q ∧  p) (R3) (ϕ ∧ ♦ψ) ↔ (¬ψ ∧ ⊥) ∨ (♦ψ ∧ ϕ) (R4) (ϕ ∨ ♦ψ) ↔ (¬ψ ∧ ϕ) ∨ (♦ψ ∧ ) Let EU be E with (R1) – (R4) added. Using these, in EU all formulas are equivalent to formulas of modal depth ≤ 1. The following result is folklore. Fact 8.8 EU is the logic of all (finite) uniform neighborhood frames.



Now, what about validity when we require that the uniform neighborhood frames be permutation invariant? With the precise semantics, nothing happens. Fact 8.9 Under the precise semantics, EU is the logic of all uniform and permutation invariant neighborhood frames.  16 With parametrized

N s, X , the language becomes slightly richer, with point-dependent quantifiers. But as said above, we will ignore this extension here. 17 That is, E is the set of formulas containing all classical tautologies, and closed under the rules of Uniform Substitution, Modus Ponens, and (RE), and this is exactly the set of formulas valid in all neighborhood frames; cf. Pacuit [17]. 18 To see this, one can use the fact that the following schemes are valid on uniform frames: ϕ[ψ] ↔ (ψ ∧ ϕ[]) ∨ (¬ψ ∧ ϕ[⊥]) ϕ[♦ψ] ↔ (♦ψ ∧ ϕ[]) ∨ (¬♦ψ ∧ ϕ[⊥]).

200

J. van Benthem and D. Westerståhl

Proof Suppose EU ϕ. By the above, there is a finite uniform model M=(W, Q, V ), with W = {s1 , . . . , sn } and M, s j |= ϕ for some j. Here Q can be any subset of P(W ); we have no further information about its structure. What we need is a model where the quantifier is given by a set of numbers, where two sets of the same cardinality are either both in or both out. But there is a trick to achieve this. We assign numbers of objects to each state, standing for the number of points satisfying the corresponding state description, noting that the resulting numbers for disjoint unions are just the sums of those for the atoms inside. Here is the precise formulation: Lemma 1 There is a function f : P(W ) → N such that (a) f (∅) = 0, (b) f is injective, and (c) f is additive (i.e. if X ∩ Y = ∅, then f (X ∪ Y ) = f (X ) + f (Y )).19 This can be proved by induction on n. The base case is trivial. If such an f is given for W = {s1 , . . . , sn }, with f (W ) = N , and s is a new state, then for any new set Y = {s} ∪ X , let f  (Y ) = N + 1 + f (X ), and f  (X ) = f (X ) for all old X . In particular f  ({s}) = N + 1. Then f  is as desired. Getting back to the proof of Fact 8.9, let W  be the union of pairwise disjoint sets  = (W  , Q  , V  ), where X ∈ Q  iff for S1 , . . . , Sn , where |Si | = f ({si }). Let M  some X ∈ Q, |Y | = f (X ), and V ( p) = si ∈V ( p) Si . Finally, let ρ : W  → W map each s ∈ Si to si . Then we have, for s ∈ W  , (a) s ∈ V  ( p) iff ρ(s) ∈ V ( p) and, using the additivity and injectivity of f , for X ⊆ W , (b)

ρ −1 [X ] ∈ Q  iff X ∈ Q

Together, (a) and (b) tell us that ρ is a bounded morphism (in the neighborhoodsemantic sense, see Pacuit [17]) from M onto M, from which it follows that if s is  such that ρ(s) = s j , we have M , s |= ϕ.20

8.4.2 Imposing More Conditions Next, what happens if we impose further conditions on neighborhood relations, such as the upward monotonicity and intersection closure leading to the filter structure that is typical for standard modal logic over relational models? Indeed, the logic K is sound for an arbitrary neighborhood frame (W, N ) if and only if (W, N ) is a filter frame: that is, for each s ∈ W , the set {X | N s, X } is a filter. In particular, when W is finite, all filters are principal, and then the condition is exactly that (W, N ) is a relational frame. In this case, adding permutation invariance to uniformity changes the picture significantly. is really a fact about finite Boolean algebras: for any such algebra A, there is an injective additive function f from its domain to natural numbers, with f (0) = 0. 20 The style of proof in here is like that of the more general result in Anapolitanos and Väänänen [1], and injective additive functions on Boolean algebras also occur in the analysis of the logic of comparative sizes in Ding et al. [11]. 19 This

8 Generalized Quantifiers Meet Modal Neighborhood Semantics

201

Fact 8.10 The logic of finite uniform filter frames is K45. Adding permutation invariance changes the logic to being either S5 or the trivial logic K⊥.  Proof On any uniform neighborhood frame, in the presence of K the axioms (R1) – (R4) are equivalent to .4 + .5. For the converse we may, by the above, restrict attention to ordinary relational frames. If K45 ϕ, there is, by the usual completeness theorem, a relational model M = (W, R, V ) and s ∈ W such that R is transitive and euclidean and M, s |= ϕ. Then M[s], s |= ϕ, with M[s] the rooted submodel of M generated by s. Since R transitive and euclidean, it follows that M[s], seen as a neighborhood model, is in fact uniform. For the second claim, note that there are only two permutation invariant principal filters on P(W ): either P(W ) itself or {W }. The logic of the former is K⊥, that of the latter (since accessibility is the universal relation) is S5.  But perhaps the most interesting question concerns the intermediate case of uniform frames (W, Q) where Q is only required to be closed under supersets.21 We can think of this as the logic of “enough”, read as the existence of a set of witnesses of at least the threshold value associated with enough. This appears to be a harder case, and we only have some observations to offer. First, the minimal modal logic is now the system EM, which is like E except that the rule (RE) has been replaced by (RM)

If  ϕ → ψ, then  ϕ → ψ

However, if permutation invariance is also required, we do get new validities: Fact 8.11 If the propositional formula α is incompatible with β, and ϕ with ψ, then (α ∨ β) ∧ (ϕ ∨ ψ) → (α ∨ ϕ) ∨ (β ∨ ψ) is valid for uniform monotone permutation invariant frames.



Remark: This principle is not valid without the monotonicity requirement.22 Proof If the atomic symbols are p1 , . . . , pk , each formula can be seen as a disjunction of conjunctions of the form p1∗ ∧ . . . ∧ pk∗ , where pi∗ is pi or ¬ pi . List these conjunctions in some order, and name them 1, . . . , 2k . Then each propositional formula can be identified with a set A ⊆ {1, . . . , 2k }. Now the above result will follow by repeated applications of the following Claim Let A1 , A2 be disjoint subsets of {1, . . . , 2k }. Take n 1 ∈ A1 , n 2 ∈ A2 , and let π swap n 1 and n 2 but nothing else. Then: 21 Rather than restricting to special neighborhood frames that are ‘monotone’ in this sense, we could also use the monotone semantics for the modal language over arbitrary frames. 22 Consider the case α = p ∧ q, β = p ∧ ¬q, ϕ = ¬ p ∧ q, ψ = ¬ p ∧ ¬q. Then the antecedent of our formula is equivalent to  p ∧ ¬ p and the consequent to q ∨ ¬q, but this implication is clearly not valid if the quantifier Q is not monotone: for example, let |V ( p)| = 3, |V (q)| = |W | = 5, and let Q = {X ⊆ W : |X | = 2 or |X | = 3}.

202

J. van Benthem and D. Westerståhl

|= A1 ∧ A2 → π(A1 ) ∨ π(A2 ) Suppose that M = (W, Q, V ) is permutation invariant and uniform, that Q is monotone, and that M |= A1 ∧ A2 (or rather M, s |= A1 ∧ A2 , but, by uniformity, we can forget about s for boxed formulas). So there are X, Y ∈ Q with X = [[A1 ]]M and Y = [[A2 ]]M . By assumption, X ∩ Y = ∅. Let X j = X ∩ [[{ j}]]M k and Y j = Y ∩ [[{ j}]]M , and let x j = |X  j | and y j = |Y j |, for j = 1, . . . , 2 (x j and y j may be finite or infinite). Thus, X = j∈A1 X j and Y = j∈A2 Y j . We need only distinguish two cases. Case 1: xn 1 ≤ yn 2 . Let Z be like X , except that X n 1 is replaced by a set U of xn 1 members of Yn 2 . It follows that |Z | = |X |, and Z ⊆ [[π(A1 )]]M . Moreover, we have |W − Z | = |W − X |. (If h is a permutation of W which swaps X n 1 and U (but nothing else), then Z = h(X ).) By permutation invariance, Z ∈ Q, and so by monotonicity, M |= π(A1 ). Case 2: xn 1 > yn 2 . Then we swap Yn 2 with yn 2 members of X n 1 , and similarly obtain  M |= π(A2 ). This proves the Claim and thereby our Fact. However, so far, we have not been able to determine the complete modal logic for the monotone quantitative case. Open problem: What is the complete modal logic of “enough”? Remark. A natural addition to all of the above logics is a universal modality U , and hence an existential modality E. The resulting system corresponds to monadic predicate logics P L + (Q), where in addition to Q, we have the usual quantifiers ∀ and ∃ with their standard interpretation (though still no identity). In this setting, uniformity is expressible by a single formula: a frame (W, N ) is uniform iff ϕ ↔ U ϕ is true in that frame. It is straightforward to give extensions of our earlier completeness results to this extended language, but we ignore details here.

8.4.3 Modal Logics of Specific Quantifiers Finally, instead of axiomatizing reasoning about classes of quantifiers, one can also look at the modal logic of specific cases. For a concrete example, consider the quantifier P1 “precisely one”. In the extended modal neighborhood language with an additional universal modality we have: Fact 8.12 The complete modal logic of P1 is axiomatized by the system E plus, for disjoint ϕ, ψ, (a) P1(ϕ ∨ ψ) ↔ (P1ϕ ∧ ¬E(ψ ∧ ¬ϕ)) ∨ (P1ψ ∧ ¬E(ϕ ∧ ¬ψ)) (b) P1ϕ → Eϕ



8 Generalized Quantifiers Meet Modal Neighborhood Semantics

203

Proof It suffices to consider a consistent normal form of depth 1, since these still exist for the present language with uniform quantifiers. Quantifiers P1 will only occur then in front of purely propositional formulas, that can be brought in disjunctive normal form. Next, by repeated applications of the first axiom, we end up with a Boolean combination of cases (a) P1 attached to exactly one state description, (b) existential modalities over state descriptions, and (c) proposition letters. Bringing this Boolean combination in disjunctive normal form again, by the consistency, there must be at least one conjunction remaining, and such conjunctions consist of a state description, followed by a list of quantified statements of the form (¬)P1 sd, (¬)E sd for each state description sd. Here the second axiom makes sure there are no contradictions of the form P1ϕ ∧ ¬Eϕ. Finally, it is easy to see by direct inspection that the conjunction can be satisfied semantically, reading off, for each state description, whether it needs to hold nowhere, in just one point, or in at least two points.  “Precisely one" is of course just one case,23 similar logics can be written for any finite cardinality. For a logic of all finite cardinalities in a similar propositional language, cf. [20] on the ‘numerical syllogistic’. Coda: Numerical transposition. Despite the modal propositional guise of the above logics, they can also be viewed numerically, as fragments of additive arithmetic. We briefly explain how this perspective works in our setting. As an illustration, consider the earlier completeness of the neighborhood logic of an arbitrary permutation-invariant quantifier under the precise semantics. Now perform the following transposition on the satisfiability problem for a given modal formula in its normal form of modal depth 1. Assign unique variables over natural numbers to each atomic state description, record the sums of these values that are to be in Q according to the normal form as a finite list of additive terms t1 , . . . , tk , and those for the ones outside of Q as a complement list s1 , . . . , sm . Next, to find a satisfying model for all of this, it suffices to solve this system of inequalities in the natural numbers: all numerical terms for the si must have values different from all those for the t j . The reason for the adequacy of the equational procedure is this. If such values exist, then we can use them for the right multiplicities of the atoms, just as we did in the final part of the earlier modal completeness proof.24 Whether these systems of inequalities have a solution is decidable, via standard equation solving mechanisms. Indeed, looking at what is needed, this is only a small universal fragment of Presburger Arithmetic, which is decidable even in its full firstorder version. Thus, another take on the topics in this section is that we are really

23 A

similar analysis would work for the binary quantifier “Precisely one A is B". special conditions have to hold for the quantifier Q, such as monotonicity, then this can be worked into the method by adding suitable further inequalities closing off the t-list upward.

24 If

204

J. van Benthem and D. Westerståhl

axiomatizing small fragments of arithmetic in logical style. In particular, we conclude that all logics discussed here are decidable.25

8.5 Conclusion In this exploratory paper, we have explored a suggested analogy between generalized quantifiers and modal neighborhood logics a bit further. We found interesting similarities in semantic notions of locality, new questions concerning modal logics for specific classes of quantifiers, and delicate but intriguing problems concerning the role of invariances in both realms. Even so, we only scratched the surface, ignoring more general linguistic constructions such as polyadic or branching quantifiers and quantifier-like modal constructions, possible junctions between the model theory of generalized quantifier logics and modal logics, or possible counterparts in generalized quantifier theory for the rich theory of translations between modal logics.26 One suggestive analogy ignored in this article concerns ‘fine structure’. Much of modal logic is about semantic fine-structure, and the balance of expressive power and complexity in logic design. But equally well, generalized quantifier theory has sought to parametrize into easier and harder quantifiers, using definability in formal languages, or more computationally, devices such as ‘semantic automata’ [23]. Whether this congeniality can be made into a useful semantic interface remains to be seen. In addition to expressive fine-structure, however, there is also deductive fine-structure. Modal logics are often much simpler deductive subsystems of larger proof engines for first-order or higher-order logic. Generalized quantifier theory, too, has looked at deductive fine-structure of natural language, for instance, in the form of simple fast ‘natural logics’ inside more complex reasoning with quantifiers cf. [15]. The two views of deductive fine-structure are not the same, and it would be of interest to export, for instance, natural logic thinking to the modal realm. This paper is largely a set of pointers. Much remains to be done, but the glimpses we have shown may entice the reader to go further on the road ahead. Acknowledgements We thank Nick Bezhanishvili, Larry Moss, Eric Pacuit, and an anonymous referee for helpful comments and useful information. Johan van Benthem is supported by the Major Program of the National Social Science Foundation of China (No. 17ZDA026). Dag Westerståhl is supported by grant no. 2016-02458 of the Swedish Research Council.

25 A

similar reduction to additive arithmetic applies to propositional logics with modalities for quantitative probabilities, cf. van Benthem [28]. 26 In making these connections, this paper can be seen as a continuation of van der Hoek and de Rijke [35], which explored connections between generalized quantifier theory and modal logic over its standard relational models.

8 Generalized Quantifiers Meet Modal Neighborhood Semantics

205

References 1. Anapolitanos, D. A., & Väänänen, J. (1981). Decidability of some logics with free quantifier variables. Zeitschrift f. math. Logic und Grundlagen d. Math., 27, 11–22. 2. Andréka, H., van Benthem, J., & Németi, I. (1998). Modal languages and bounded fragments of predicate logic. Journal of Philosophical Logic, 27(3), 217–274. 3. Andréka, H., van Benthem, J., & Németi, I. (2017). On a new semantics for first-order predicate logic. Journal of Philosophical Logic, 46(3), 259–267. 4. Aucher, G., van Benthem, J., & Grossi, D. (2018). Modal logics of sabotage revisited. Journal of Logic and Computation, 28(2), 269–303. 5. Baltag, A., & Cinà, G. (2018). Bisimulation for conditional modalities. Studia Logica, 106(1), 1–33. 6. Barwise, J., & Cooper, R. (1981). Generalized quantifiers in natural language. Linguistics and Philosophy, 4, 159–219. 7. Blackburn, P., de Rijke, M., & Venema, Y. (2001). Modal logic. Cambridge: Cambridge University Press. 8. Bonnay, D. (2008). Logicality and invariance. Bulletin of Symbolic Logic, 14(1), 29–68. 9. Bonnay, D., & Westerståhl, D. (2021) Carnap’s Problem for modal logic. The Review of Symbolic Logic, online, 1–29. 10. de Jongh, D., & Shirmohammadzadeh, F. (2018). Two neighborhood semantics for subintuitionistic logics. Technical report PP–2018–08, ILLC, University of Amsterdam. 11. Ding, Y., Harrison-Trainor, M., & Holliday, W. (2018). The logic of comparative cardinality. UC Berkeley: Logic Group. Manuscript. 12. Feferman, S. (1999). Logic, logics and logicism. Notre Dame Journal of Formal Logic, 40, 31–54. 13. Hansen, H. H., Kupke, C., & Pacuit, E. (2009). Neighbourhood structures: Bisimilarity and basic model theory. Logical Methods in Computer Science, 5(2:2), 1–38. 14. Jacobs, B. (2016). Introduction to coalgebra. Cambridge: Cambridge University Press. 15. Moss, L. (2015). Natural logic. In S. Lappin & C. Fox (Eds.), Handbook of contemporary semantic theory, 2nd edn (pp. 646–681). Wiley-Blackwell. 16. Mundici, D. (1985). Other quantifiers: An overview. In J. Barwise & S. Feferman (Eds.), Model-theoretic logics (pp. 211–233). Berlin: Springer. 17. Pacuit, E. (2017). Neighborhood semantics for modal logic. Springer. 18. Pauly, M. (2001). Logic for social software. Ph.D. thesis, University of Amsterdam. 19. Peters, S., & Westerståhl, D. (2006). Quantifiers in language and logic. Oxford: Oxford University Press. 20. Pratt-Hartmann, I. (2009). No syllogisms for the numerical syllogistic. In O. Grumberg, M. Kaminski, S. Katz, & S. Wintner (Eds.), Languages: From formal to natural (pp. 192–203). Springer. 21. Väänänen, J. (1999). Generalized quantifiers. In Generalized quantifiers and computation. Lecture Notes in Computer Science (Vol. 1754). Springer. 22. van Benthem, J. (1984). Foundations of conditional logic. Journal of Philosophical Logic, 13, 303–349. 23. van Benthem, J. (1986). Essays in logical semantics. North-Holland. 24. van Benthem, J. (1998). Program constructions that are safe for bisimulation. Studia Logica, 60(2), 311–330. 25. van Benthem, J. (2002). Invariance and definability: Two faces of logical constants. In W. Sieg, R. Sommer, & C. Talcott (Eds.), Reflections on the foundations of mathematics. ASL Lecture Notes in Logic (Vol. 15, pp. 426–446). 26. van Benthem, J. (2011). Logical dynamics of information and interaction. Cambridge: Cambridge University Press. 27. van Benthem, J. (2014). Logic in games. The MIT Press.

206

J. van Benthem and D. Westerståhl

28. van Benthem, J. (2017). Against all odds: When logic meets probability theory. In J. Katoen, R. Langerak, & A. Rensink (Eds.), Festschrift for Ed Brinksma. Springer Lecture Notes in Computer Science 10500 (pp. 239–253). 29. van Benthem, J. (2021). Semantic perspectives in logic. In G. Sagi & J. Wood (Eds.), The semantic conception of logic. Cambridge University Press. 30. van Benthem, J., & Bezhanishvili, G. (2007). Modal logics of space. In M. Aiello, et al. (Eds.), Handbook of spatial logics (pp. 217–298). Dordrecht: Springer. 31. van Benthem, J., & Blackburn, P. (2007). Modal logic: A semantic perspective. In J. van Benthem, P. Blackburn, & F. Wollter (Eds.), Handbook of modal logic. Studies in Logic and Practical Reasoning (pp. 1–84). Elsevier. 32. van Benthem, J., & Bonnay, D. (2008). Modal logic and invariance. Journal of Applied NonClassical Logics, 18(2–3), 153–173. 33. van Benthem, J., & Pacuit, E. (2011). Dynamic logic of evidence-based beliefs. Studia Logica, 99(1), 61–92. 34. van Benthem, J., Bezhanishvili, N., & Enqvist, S. (2017). Instantial neighbourhood logic. The Review of Symbolic Logic, 10(1), 116–144. 35. van der Hoek, W., & de Rijke, M. (1993). Generalized quantifiers and modal logic. Journal of Logic, Language and Information, 2, 19–58. 36. Venema, Y. (2006). Algebras and coalgebras. In P. Blackburn, J. van Benthem, & F. Wolter (Eds.), Handbook of modal logic (pp. 331–426). Amsterdam: Elsevier Science. Johan van Benthem is a Dutch logician. He graduated at the University of Amsterdam in 1977, and has worked since on modal logic, logic and natural language, and many other areas in pure and applied logic. He was a University Professor of Logic in Amsterdam until 2015, and is now Henry Waldgrave Stuart Professor of Philosophy at Stanford University and Jin Yuelin Professor of Logic at Tsinghua University Beijing. Dag Westerståhl is a Swedish philosopher and logician. He graduated in 1977 from the University of Gothenburg. His area of research includes logic, generalized quantifiers, formal semantics, and philosophy of language. He is Professor Emeritus of Theoretical Philosophy and Logic at Stockholm University, and Jin Yuelin Professor of Logic at Tsinghua University, Beijing.

Chapter 9

On the Semilattice of Modal Operators and Decompositions of the Discriminator Ivo Düntsch, Wojciech Dzik, and Ewa Orłowska

Dedicated to our friends Hajnal and Istvan with respect and gratitude for long lasting inspiration, cooperation, and friendship.

Abstract We investigate the join semilattice of modal operators on a Boolean algebra B. Furthermore, we consider pairs  f, g of modal operators whose supremum is the unary discriminator on B, and study the associated bi-modal algebras. Keywords Algebraic logic · Modal operators · Unary discriminator · Join semilattice · Dual pseudocomplementation

9.1 Introduction Boolean algebras with operators were introduced by Jónsson and Tarski [11] in connection with their investigations into relation algebras. It was observed much later that the simplest case of modal algebras, that is, expansions of Boolean algebras with a single unary normal operation which preserves finite joins could serve as algebraic I. Düntsch (B) School of Mathematics and Computer Science, Fujian Normal University, Fuzhou, Fujian, China e-mail: [email protected] Department of Computer Science, Brock University, St Catharines, ON, Canada W. Dzik Institute of Mathematics, University of Silesia, Katowice, Poland e-mail: [email protected] E. Orłowska National Institute of Telecommunications, Szachowa 1, 04-894 Warszawa, Poland e-mail: [email protected] © Springer Nature Switzerland AG 2021 J. Madarász and G. Székely (eds.), Hajnal Andréka and István Németi on Unity of Science, Outstanding Contributions to Logic 19, https://doi.org/10.1007/978-3-030-64187-0_9

207

208

I. Düntsch et al.

semantics for the minimal modal logic K and its extensions. It is straightforward to see that the set M (B) of modal operators on a Boolean algebra B can be made into a bounded join semilattice; here the smallest element f 0 is the constant mapping f ≡ 0, and the largest element f 1 is the unary discriminator. In this chapter, we study the lattice theoretic properties of M (B), in particular, the existence and form of dual pseudocomplements. If B is a complete Boolean algebra, the problems concerning M (B) are solved. In particular, M (B) is dually pseudocomplemented if and only if B is complete. Generalizing the notion of pseudocomplement, we consider pairs  f, g of modal operators (companions) whose join in M (B) is the discriminator f 1 . Such pairs lead to bimodal algebras B, f, g which we call discriminator decomposition algebras. These algebras give rise to a study of bimodal logics in which the modalities f, g are connected by the discriminator decomposition condition. Such logics are not included in the main stream of research on multimodal logics. We observe that the equational class generated by such algebras is equipollent to the class of algebraic models of the logic K∼ which generalizes both K and its complementary counterpart K∗ [8]. In the final part of the chapter we address the question when f has a proper companion, i.e. a companion g with g = f 1 , and give answers for several classes of modal algebras. It turns out that the existence of a proper companion can be expressed in a 1st order language, and is related to both the structure of B and the modal operator f . In particular, we conclude that this property is not a global property of the logic, but depends on the model, for instance, whether B has an atom or not. Connections to some classes of modal algebras are given, and examples are provided throughout.

9.2 Notation and First Definitions We regard an ordinal as the set of its predecessors, and cardinals as initial ordinals; ω is the first infinite ordinal [13]. As in our context no generality is lost, we shall tacitly assume that a class A of algebras is closed under isomorphic copies. If no confusion can arise, we will refer to an algebra simply by its base set. The equational class generated by A is denoted by Eq(A). The ternary discriminator function on an algebra A is a function t : A3 → A defined by  a, if a = b, t (a, b, c) = c, if a = b. A ternary term t (x, y, z) which represents the discriminator function is called a ternary discriminator for A. If A is a class of algebras with a common discriminator term, then Eq(A) is called a discriminator variety. Discriminator varieties have very strong universal algebraic properties, see e.g. [16] or [1].

9 On the Semilattice of Modal Operators and Decompositions of the Discriminator

209

If B is a Boolean algebra, this can be simplified: A mapping d : B → B is a unary discriminator function, if  0, if a = 0, d(a) = (9.2.1) 1, otherwise. A unary term d(x) which represents the unary discriminator function on B is called a unary discriminator for B. The next observation is well known. Lemma 9.1 A Boolean algebra B has a unary discriminator if and only if it has a ternary discriminator. A frame is a pair X, R where X is a set and R a binary relation on X . The identity relation on X is denoted by 1 X , or just by 1 if X is understood; VX (or just V ) denotes the universal relation. If x ∈ X , then R(x) := {y : x Ry} is the range of x (with respect to R). Suppose that P, ≤ is a partially ordered set and Q ⊆ P. Then ↓ Q := {x : (∃y)[y ∈ Q and x ≤ y]} is the downset generated by Q. If Q = {x}, we just write ↓ x if no confusion can arise. If P has a smallest element 0, then Q + := Q \ {0}, otherwise, Q + := Q. Q is called dense (in P) if for every y ∈ P + there is some x ∈ Q + such that x ≤ y. ub(Q) is the set of all upper bounds of Q. In the sequel, a semilattice is assumed to be a join semilattice. Suppose that S, ∨, 1 is an upwardly bounded semilattice. A dual annihilator of x ∈ S is some y ∈ S such that x ∨ y = 1. Such y is proper if y = 1. x is called dually dense if its only dual annihilator is 1. If x ∈ S has a smallest annihilator, this element is called the dual pseudocomplement of x, denoted by x ⊥ . An element x ∈ S is called open if x = y ⊥ for some y ∈ S. If each x ∈ S has a dual pseudocomplement, the structure S, ∨, ⊥ , 1 is called a dually pseudocomplemented semilattice. It is well known that the class of dually pseudocomplemented semilattices is equational, see e.g. [3, p. 104]. Lemma 9.2 [7, Theorem 1] If S, ∨, ⊥ , 1 is a dually pseudocomplemented semilattice, then the set O(S) of open elements of S is a 1 – subsemilattice of S and a Boolean algebra with x ∧ O(S) y := (x ⊥ ∨ y ⊥ )⊥ .

9.2.1 Boolean Algebras Throughout, B, +, ·, −, 0, 1 is a nontrivial Boolean algebra (BA), usually only referred to by its universe B. 2 is the BA with universe {0, 1}, and B is the MacNeille completion of B. The set of atoms of B is denoted by At (B), and the set of ultrafilters of B is denoted by Ult(B). The mapping h : B → 2Ult(B) with h(x) = {F ∈ Ult(B) : x ∈ F} is the Stone embedding. We write a = a0 ⊕ · · · ⊕ an , if a = a0 + · · · + an , and the ai are nonzero and pairwise disjoint. The symmetric difference x · −y + −x · y of x, y ∈ B is denoted by xy.

210

I. Düntsch et al.

B is called a finite-cofinite algebra (FC-algebra), if every element of B \ 2 is a finite sum of atoms or the complement of such an element. If B is an FC-algebra, κ a cardinal, and |B| = κ, then B is isomorphic to the BA FC(κ) which is generated by the one element subsets of κ. If γ ∈ κ, we let Fγ be the ultrafilter of FC(κ) cofinite sets. generated by {γ }, and Fκ be the ultrafilter of ∈ M : y ≤ x}. Moreover, there is If M is dense in B and x ∈ B + , then x = {y  M [12, Lemma 4.9]. a pairwise disjoint family M ⊆ M such that x = Recall some facts about Boolean interval algebras: Let L be a linear order with smallest element 0m . Suppose that ∞ is a symbol not in L, and set L ∞ := L ∪ {∞} with x  ∞ for all x ∈ L. An interval of L is a set of the form [s, t) = {u ∈ L : s ≤ u  t}, where s, t ∈ L ∞ . I nt Alg(L) is the collection of all finite unions of intervals [x00 , x01 ) ∪ [x10 , x11 ) ∪ . . . ∪ [xt0(x) , xt1(x) ),

(9.2.2)

together with the empty set. It is well known that I nt Alg(L) is a Boolean algebra [12, p.10], called the interval algebra of L. Each nonzero x ∈ I nt Alg(L) can be written in the form (9.2.2) in such a way that x ij ∈ L + , x 0j < x 1j < x 0j+1 ; note that the intervals [x 0j , x 1j ) are pairwise disjoint. The representation of x in this form is unique [12, p. 242], and we call it the standard representation. For each x ∈ I nt Alg(L)+ , we let Rel(x) := {x 0j : j ≤ t (x)} ∪ {x 1j : j ≤ t (x)} be the set of relevant points of x, and Int(x) := {[x 0j , x 1j ) : j ≤ t (x)} be the set of relevant intervals of x. For unexplained notation and concepts in the area of universal algebra the reader is invited to consult [4], and for Boolean algebras we refer the reader to [12].

9.3 Modal Algebras An operator on B is a mapping B → B; note that this is more general than the terminology of [11]. If f is an operator on B, then its dual (operator) f ∂ is defined by f ∂ (a) := − f (−a). We also set f ∗ (a) := − f (a), and f ∗ (a) := f (−a); clearly, f ∗ = ( f ∂ )∗ . A mapping f : B → B f is called normal if f (0) = 0, and additive if f (x + y) = f (x) + f (y) for all x, y ∈ B. A modal operator f on B is a normal and additive mapping. In this case, B, f  is called a modal algebra. The class of modal algebras is denoted by MOA. It may be remarked that MOA is not locally finite, indeed, a modal algebra need not have a finite subalgebra; an example can be found in [10, p. 251].

9 On the Semilattice of Modal Operators and Decompositions of the Discriminator

211

The following operators will be the extremal elements in the semilattice of modal operators, defined in Sect. 9.4: We set f 0 (x) := 0 for all x ∈ B, and  0, if x = 0, f (x) := 1, otherwise. 1

Note that f 1 is the unary discriminator on B. In modal logics f 1 is known as the universal modality [8, 9]. A modal operator f on additive, or simply complete, if for B is called completely  every M ⊆ B such that M exists, { f (x) : x ∈ M} also exists and is equal to  f ( M). An ideal I of B is called trivial, if I = {0}, proper if I = B, and closed, if a ∈ I implies f (a) ∈ I . It is well known that there is a one–one correspondence between closed ideals and congruences on B, f , see e.g. [2, §3], and therefore, we will call closed ideals also congruence ideals. A modal operator f is called a closure operator or S4 operator, if it satisfies Cl1 x ≤ f (x), Cl2 f ( f (x)) = f (x). In this case, B, f  is called a closure algebra or an S4 algebra. If X := X, R is a frame, we define a mapping R : 2 X → 2 X by R(Y ) = {x : R(x) ∩ Y = ∅}. In modal logic, R is the (interpretation of) the diamond operator ♦ in the frame X . The structure 2 X , R is called the complex algebra of X , denoted by Cm(X ). Theorem 9.1 [11, Theorem 3.9] 1. Cm(X ) is complete and atomic, and R is a completely additive normal operator. 2. If B = B, f  is a complete atomic Boolean algebra and f is completely additive, then B is isomorphic to some Cm(X, R).  The canonical structure of a modal algebra B := B, f  is the frame Cst(B) := Ult(B), R f , where F R f G ⇐⇒ f [G] ⊆ F i.e. { f (a) : a ∈ G} ⊆ F.

(9.3.1)

R f is called the canonical relation of f . If a, b ∈ At (B), and Fa , Fb are the principal ultrafilters generated by a, respectively, b, then Fa R f Fb ⇐⇒ a ≤ f (b).

(9.3.2)

Observe that R f is the universal relation on Ult(B) if and only if f = f 1 , and R f is the empty relation if and only if f = f 0 . The following representation theorem is seminal for algebraic semantics of modal logics.

212

I. Düntsch et al.

Theorem 9.2 [11, Theorem 3.10] Let B := B, f  ∈ MOA, and h : B → 2Ult(B) be defined by h(a) = {F ∈ Ult(B) : a ∈ F}. Then, h is a MOA embedding into 2Ult(B) , R f (= CmCst(B)); in particular, h( f (a)) = R f h(a). The algebra CmCst(B) is called the canonical extension of B, denoted by Bσ . It is well known that B ∼ = Bσ if and only if B is finite. We will sometimes assume that w.l.o.g. B, f  is a subalgebra of Bσ . In this case, R f  is denoted by f + . We shall usually just write B σ instead of Bσ . For a history of and an introduction to Boolean algebras with operators, the reader is invited to consult [10]. We shall use several axioms of modal logics in their algebraic form and the corresponding property of their canonical relation: K. T. 4. B.

f (0) = 0 and f (x + y) = f (x) + f (y). x ≤ f (x). f ( f (x)) ≤ f (x). f ( f ∂ (x)) ≤ x.

R f is a binary relation. R f is reflexive. R f is transitive. R f is symmetric.

Conjunctions of axioms are usually written in juxtaposition of the axioms; for example, KT means a modal logic based on the axioms K and T. Some abbreviations are common: KT4 → S4, KT4B → S5. The next result provides a convenient criterion for a modal algebra to be subdirectly irreducible: Theorem 9.3 (Rautenberg’s criterion [14, p. 155]) Let B = B, f  be a modal algebra. Then, B is subdirectly irreducible if and only if (∃a = 1)(∀b = 1)(∃n ∈ ω)b · f ∂ (b) · ( f ∂ )2 (b) · · · · · ( f ∂ )n (b) ≤ a.

(9.3.3)

If B is a K4 algebra, then (9.3.3) is equivalent to the statement (∃a = 1)(∀b = 1)b · f ∂ (b) ≤ a,

(9.3.4)

and if B is an S4 algebra, i.e. if f is a closure operator, then (9.3.3) is equivalent to (∃a = 1)(∀b = 1) f ∂ (b) ≤ a.

(9.3.5)

9.4 The Semilattice of Modal Operators Let M (B) be the set of all modal operators on a fixed Boolean algebra B. For f, g ∈ M (B) set ( f ∨ g)(x) := f (x) + g(x). The following observation which is straightforward to prove, is the basis for the considerations in this section:

9 On the Semilattice of Modal Operators and Decompositions of the Discriminator

213

Theorem 9.4 M (B), ∨, f 0 , f 1  is a bounded (join) semilattice. We remark in passing that the structure M (B), ∨, f 0 , f 1 , ◦, 1  is a bounded idempotent semiring where ◦ is composition of functions and 1 is the identity. For each x ∈ B define the relativization of f 1 to ↓ x by  0, if y = 0, f x (y) := x, otherwise. Clearly, each f x is a modal operator on B. Theorem 9.5 M (B) is a complete semilattice if and only if B is complete.  Proof “⇒”: Suppose that ∅ = M ⊆ B. Let f := { f x : x ∈ M}, and choose some y ∈ B + . Our aim is to show that f (y) is the least upper bound of M. Since f x (y) = x and f x ≤ f , it is clear that f (y) is an upper bound of M. Next, let z be an upper bound of M. Then, x ≤ z for all x ∈ M, and therefore, f x ≤ f z . It follows that of f z . f ≤ f z , hence, f (y) ≤ f z (y) = z, the  latter by definition  “⇐”: For ∅ = Q ⊆ M (B) set Q(a) := { f (a) : f ∈ Q} for each a ∈ B. 0 In this case it is in fact  a complete lattice: If ∅ = Q ⊆ M (B), then f is a lower bound of Q, and thus, { f ∈ M (B) : f is a lower bound of Q} is well defined, and clearly, it is the greatest lower bound of Q.  Our next topic in this section is the existence of dual pseudocomplements in M (B). We start with a characterization of dual pseudocomplements in M (B): Lemma 9.3 Let f ∈ M (B).

 1. If f has a dual pseudocomplement f ⊥ , then f ⊥ (x) = {− f (y) : 0  y ≤ x} for each x ∈ B + .  2. If for each x ∈ B + , {− f (y) : 0  y ≤ x} exists, then f ⊥ exists, and  ⊥

f (x) =

0,

{− f (y) : 0  y ≤ x}, if x = 0, if x = 0.

 Proof Let B be the completion of B. If M ⊆ B, then B M denotes the supremum of B, all sums in B coincide with of M in B. Since B is a dense subalgebra   existing the sums in B; in particular, if B M = b ∈ B, then B M = B M. 1. Suppose that f ⊥ is a dual pseudocomplement of f , and let x ∈ B + , 0  y ≤ x. Then, y ≤ x implies f ⊥ (y) ≤ f ⊥ (x). Since f (y) + f ⊥ (y) = 1, we have − f (y) ≤ f ⊥ (y), and thus, − f (y) ≤ f ⊥ (x). It follows that f ⊥ (x) is an upper bound of {− f (y) : 0 y ≤ x}. Assume that B {− f (y) : 0  y ≤ x}  f ⊥ (x) for some x ∈ B + . Choose some ax ∈ B such that  B

{− f (y) : 0  y ≤ x} ≤ ax  f ⊥ (x),

214

I. Düntsch et al.

and define f : B → B by f (0) := 0, and 

1, if z  x, ax , if 0 = z ≤ x.



f (z) :=

Then, f ∈ M (B), and f (z) + f (z) = 1, if z  x. If 0 = z ≤ x, then f (z) + f (z) = f (z) + ax ,  {− f (y) : 0  y ≤ x}, ≥ f (z) + B

≥ f (z) + − f (z), = 1,

since 0 = z ≤ x,

Since f ⊥ is the dual pseudocomplement of f , we have f ⊥ ≤f , in particular, f (x) ≤ f (x) = ax . This contradicts ax  f ⊥ (x). Therefore, B {− f (y) : 0  for each x ∈ B + . y ≤ x} exists and is equal to f ⊥ (x) 2. Set f (0) := 0, and f (x) := {− f (y) : 0  y ≤ x}. We first show that f is additive. By definition of f , we have ⊥

f (x + z) =



{− f (y) : 0  y ≤ x + z}.

Enumerate {u : 0 ≤ u ≤ x} by {u i : i ∈ I }, and {v : 0 ≤ v ≤ z} by {v j : j ∈ J }. Then, {y : 0  y ≤ x + z} = {y · x + y · z ∈ B : 0  y · x + y · z} = {u i + v j : u i = 0 or v j = 0},

and we have  {−( f (u i ) + f (v j )) : u i  = 0 or v j  = 0},  = {− f (u i ) · − f (v j ) : u i  = 0 or v j  = 0},   = {− f (u i ) · − f (v j ) : u i  = 0, v j = 0} + {− f (u i ) · − f (v j ) : u i = 0, v j  = 0}    

f (x + z) =

=1

 + {− f (u i ) · − f (v j ) : u i  = 0, v j  = 0},   = {− f (u i ) : u i  = 0} + {− f (v j ) : v j  = 0}  + {− f (u i ) · − f (v j ) : u i  = 0, v j  = 0}.

=1

Fix u i . Since − f (u i ) · − f (v j ) ≤ − f (u i ) for all j ∈ J , we obtain − f (v j ) : j ∈ J } ≤ − f (u i ), and thus,

 {− f (u i ) ·

9 On the Semilattice of Modal Operators and Decompositions of the Discriminator

f (x + z) =

215

  {− f (u i ) : u i = 0} + {− f (v j ) : v j = 0},

= f (x) + f (z). Thus, f ∈ M (B). Furthermore, f (x) + f (x) = f (x) +

 {− f (y) : 0  y ≤ x} ≥ f (x) + − f (x) = 1,

and therefore, f ∨ f = f 1 . Next, suppose that f ∨ g = f 1 for some g ∈ M (B). Let x = 0 and assume that f (x)  g(x), i.e. f (x) · −g(x) = 0. Then, 0 =



{− f (y) : 0  y ≤ x} · − g(x) =  

 {− f (y) · −g(x) : 0  y ≤ x}.

f (x)

Thus, there is some 0  y ≤ x such that − f (y) · −g(x) = 0, i.e. − f (y)  g(x). Since y ≤ x, we have g(y) ≤ g(x), and thus, − f (y)  g(y), which implies f (y) + g(y) = 1. This contradicts our hypothesis f ∨ g = f 1 , and it follows that f is the dual pseudocomplement of f .  Theorem 9.6 If B is complete, then M (B) is dually pseudocomplemented.  Proof If B is complete, then {− f (y) : 0  y ≤ x} exists for each x ∈ B + , and the mapping defined by 

0, if x = 0, f ⊥ (x) :=  {− f (y) : 0  y ≤ x}, if x = 0 is the dual pseudocomplement of f by Lemma 9.3.



In particular, M (B) is dually pseudocomplemented, if B is finite. Let Mc (B) be the set of completely additive normal operators of B. It is obvious that Mc is a sub-semilattice of M (B). If B is complete, we can say more: Theorem 9.7 Suppose that B is complete. 1. Mc (B) is complete. 2. g = f ⊥ for some f ∈ M (B) if and only if g is completely additive. 3. Mc (B) is a Boolean algebra, and f ⊥⊥ is the largest element of Mc (B) below f for every f ∈ M (B).  Proof 1. Let f, g ∈ Mc (B) and ∅ = M ⊆ B; since B is complete, M exists. We  first show that f ∨ g is complete, i.e. that ( f ∨ g)( M) = {( f ∨ g)(a) : a ∈ M}. Consider

216

I. Düntsch et al.

( f ∨ g)







 M = f M +g M ,   = { f (a) : a ∈ M} + {g(a) : a ∈ M},  = { f (a) + g(a) : a ∈ M},  = {( f ∨ g)(a) : a ∈ M},

by definition of ∨, f and g are complete, infinite distrib. of +, by definition of ∨ .

It is straightforward to extend this over infinite joins of complete modal operators. + additive: Suppose that ∅  = M ⊆ B , and M exists. 2. “⇒”: f ⊥ is completely  The aim is to show that { f ⊥ (m) : m ∈ M} also exists and is equal to f ⊥ ( M). In the proof we shall use the de Morgan rules for infinite sums, see e.g. [12, Lemma 1.33]. Let M = {m i : i ∈ I }. Now, z ∈ ub({ f ⊥ (m i ) : i ∈ I })  ⇐⇒ z ∈ ub({ {− f (y) : 0  y ≤ m i } : i ∈ I }),

by Lemma 9.3,

⇐⇒ z ∈ ub({− f (y) : 0  y ≤ m i , i ∈ I }),

by transitivity of ≥,

⇐⇒ z ∈ ub({− f (y) : 0  y, y = y · m i , i ∈ I }), ⇐⇒ z ∈ ub({− f (t) : 0  t ≤ m i , i ∈ I }),  ⇐⇒ z ∈ ub({− f (t) : 0  t ≤ M}),  ⇐⇒ z ∈ ub f ⊥ ( M).

substituting t for y · m i , t ≤ m i implies − f (m i ) ≤ − f (t)

  Since f ⊥ ( M) exists, this implies that { f ⊥ (m i ) : i ∈ I } exists and is equal to  f ⊥ ( M), and therefore, f ⊥ is complete. “⇐”: Conversely, let g ∈ Mc (B); we need to find some f ∈ M (B) such that f ⊥ = g. The obvious candidate for f is g ⊥ which exists, since B is complete. Since g ∨ g ⊥ = f 1 , we have f ⊥ = g ⊥⊥ ≤ g, since g ⊥⊥ is the dual pseudocomplement of g ⊥ . For ≥, first observe that f ⊥ (x) = g ⊥⊥ (x) = g ⊥ (g ⊥ (x)),  = g ⊥ ( {−g(y) : 0  y ≤ x}),  = {g ⊥ (−g(y)) : 0  y ≤ x},  = { {−g(z) : 0  z ≤ −g(y)} : 0  y ≤ x}  = {−g(z) : 0  z ≤ −g(y) and 0  y ≤ x},

by definition of g ⊥ , since g ⊥ is compl. additive, by definition of g ⊥ , by transitivity of ≥ .

Thus, g(x) ≤ f ⊥ (x) ⇐⇒ g(x) ≤

 {−g(z) : 0  z ≤ −g(y) and 0  y ≤ x}. (9.4.1)

Assume that g(x)  f ⊥ (x). Then, g(x) · − f ⊥ (x) = 0, i.e.

9 On the Semilattice of Modal Operators and Decompositions of the Discriminator

g(x) ·



{g(z) : 0  z ≤ −g(y) and 0  y ≤ x} = 0.

217

(9.4.2)

If g(x)=1, then f ⊥ (x) = g ⊥⊥ (x) = 1, and g(x) ≤ f ⊥ (1), contradicting our assumption. If g(x) = 1, then −g(x) = 0, and therefore, (9.4.2) implies that g(x) · −g(x) = 0 by (9.4.2), a contradiction. 3. Since the open elements of M (B) are exactly the completely additive ones by 2, Mc (B) is a Boolean algebra by Lemma 9.2. The join ∨c in Mc (B) coincides with ∨, and the meet f ∧c g is given by ( f ⊥ ∨ g ⊥ )⊥ . Furthermore the assignment f → f ⊥⊥ is an interior operator by Theorem 2 of [7]. This implies the second claim.  The following observation comes as no surprise: Theorem 9.8 Let X = X, R be a frame; then, R⊥ = −R. Proof Since both R⊥ and −R are completely additive, and thus determined by their action on the singletons, it suffices to show that R⊥ ({y}) = −R({y}) for every y ∈ X . Let y ∈ X . Then, by Lemma 9.3 and a simple computation, R⊥ ({y}) = −R({y}) = −R({y}). This proves the claim.



Let B be a subalgebra of its canonical extension B σ , defined after Theorem 9.2, f ∨ g = f 1 , and suppose that b ∈ B. Even though f +⊥ (b) need not be equal to g + (b) (and not even be in B), it seems reasonable to ask whether f +⊥ (b) ≤ g + (b) in B σ . Lemma 9.4 Let f, g ∈ M (B) such that f + g = f 1 , and suppose that F, G ∈ Ult(B). Then, f [G]  F implies g[G] ⊆ F. Proof Let b ∈ G such that f (b) ∈ / F and assume that there is some c ∈ G such that g(c) ∈ / F. Since f (b) ∈ / F we have f (b · c) ∈ / F, and since g(c) ∈ / F we have g(b · c) ∈ / F. Thus, f + g = f 1 implies f (b · c) ∈ F, a contradiction.  Corollary 9.1 Suppose that B ≤ B σ and f ∨ g = f 1 . Then, f +⊥ (b) ≤ g + (b) for all b ∈ B. Proof Let Y = h(a), a ∈ / 2. We need to show that −R f (Y ) ⊆ Rg (Y ). Consider −R f (Y ) ⊆ Rg (Y ) ⇐⇒ (∀F)[(−R f )(F) ∩ Y = ∅ ⇒ Rg (F) ∩ Y = ∅], ⇐⇒ (∀F)[(∃G)(a ∈ G and f [G]  F) ⇒ Rg (F) ∩ Y = ∅], ⇐⇒ (∀F, G)[a ∈ G and f [G]  F ⇒ (∃G )(a ∈ G and g[G ] ⊆ F)].

The claim now follows from Lemma 9.4 setting G := G.



Example 9.1 The first example (from [10, p. 251]) exhibits a modal operator on the non-complete FC(ω) which has a dual pseudocomplement. Let B := FC(ω)

218

I. Düntsch et al.

and let f : B → B be defined by f (M) = {n + 1 : n ∈ M}. Observe that f n (ω) = ω \ {0, . . . n − 1}, and therefore, B, f  does not have a finite subalgebra. Let g : B → B be defined by ⎧ ⎪ if M = ∅, ⎨∅, g(M) = ω \ {n + 1}, if M = {n}, ⎪ ⎩ ω, otherwise. Clearly, g is a modal operator. Suppose that ∅ = M ⊆ ω. If M is not an atom, then g(M) = ω by definition. If M = {n}, then f (M) ∪ g(M) = {n + 1} ∪ (ω \ {n + 1}) = ω. Thus, f ∨ g = f 1 . Let f ∨ h = f 1 ; our aim is to show that g ≤ h. Since f ({n}) ∪ h({n}) = {n + 1} ∪ h({n}) = ω, it follows that h({n}) = ω \ {n + 1} or h({n}) = ω. Thus, g({n} ⊆ h({n}). Let |M| ≥ 2 and n 0 , n 1 ∈ M. Then, h({n 0 , n 1 }) = h({n 0 }) ∪ h({n 1 }) ⊇ ω \  {n 0 + 1} ∪ ω \ {n 1 + 1} = ω. Hence, g is the dual pseudocomplement of f . Since FC(ω) is not complete, M(FC(ω)) is not dually pseudocomplemented. It is therefore instructive to give a concrete example of a modal operator on FC(ω) without dual pseudocomplement. Example 9.2 Let B = FC(ω), and define f : B → B by f (∅) = ∅, and ⎧ ⎪ if n = 0, ⎨{0}, f ({n}) = ω \ {0}, if n = 0 and n is even, ⎪ ⎩ ω \ {n}, if n is odd,  and extend f over FC(ω) by f (M) = { f ({n}) : n ∈ M}. Since every cofinite M ⊆ ω contains a positive even number n and an odd number m, we note that f (M) ⊇ f ({n}) ∪ f ({m}) = ω \ {0} ∪ ω \ {m} = ω.

(9.4.3)

Furthermore, if g ∈ M (B) and n is odd, then f ({n}) ∪ g({n}) = ω implies n ∈ g({n}).

(9.4.4)

Let F 0 be the set of positive even numbers, and F 1 be the set of odd numbers. For 0  i let n i be the i-th nonzero even number, gi (∅) := ∅, and gi (M) := ω,

if 0 ∈ M,

gi ({n} := {0},

if n ∈ F ,

(9.4.6)

gi ({n} := {n}, gi (M) := ω \ {n i },

if n ∈ F , if M is cofinite and 0 ∈ / M.

(9.4.7) (9.4.8)

(9.4.5) 0 1

We extend the gi additively over finite sets; then gi (M) is finite, if M is finite.

9 On the Semilattice of Modal Operators and Decompositions of the Discriminator

219

Note that the gi only differ in how they handle a cofinite M with 0 ∈ / M. Let L , M ∈ B + . If, say, 0 ∈ M, then gi (M) = ω, and gi (L ∪ M) = ω, since 0 ∈ L ∪ M. Now, gi (L ∪ M) = ω = gi (M) ⊆ gi (L) ∪ gi (M). Thus, suppose that 0 ∈ / L ∪ M. If L is finite, then gi (L) ⊆ {n : n ∈ L ∩ F 1 } ∪ {0} by (9.4.6) and (9.4.7). If both L and M are finite, then gi (L ∪ M) = gi (L) ∪ gi (M) by the definition of gi . If M is cofinite, then L ∪ M is cofinite, and gi (L) ⊆ {n : n ∈ L ∩ F 1 } ∪ {0} ⊆ ω \ {n i } = gi (M). Thus, gi (L ∪ M) = ω \ {n i } = gi (M) = gi (M) ∪ gi (L). If both L and M are cofinite, then so is L ∪ M, and gi (L) = gi (M) = gi (L ∪ M) = ω \ {n i }. Altogether, we have shown that g ∈ M (B). Next, let M ∈ B + . If M is cofinite, then f (M) = ω by (9.4.3), and if 0 ∈ M, then / M; it is enough to show that f ({n}) ∪ gi ({n}) = gi (M) = ω. Let M be finite and 0 ∈ ω for n = 0. By definition,  f ({n}) ∪ gi ({n} =

ω \ {0} ∪ {0} = ω, if n is even, ω \ {n} ∪ {n} = ω, if n is odd.

Hence, f ∨ gi = f 1 . Assume that g is a dual pseudocomplement of f . Then g(ω \ {0}) ≤ gi (ω \ {0}) = ω \ {n i } for all i ∈ ω+ , and thus, g(ω \ {0}) contains no positive even numbers. However, ω \ {0} contains all odd numbers, and thus, F 1 ⊆ g(ω \ {0}) by (9.4.4). It follows that g(ω \ {0}) ∈ / B, a contradiction. Thus, f does not have a dual pseudocomplement. It is instructive to consider the canonical extension B σ of B with f + := R f . Then, f +⊥ exists since B σ is complete, and it is equal to −R f  by Theorem 9.8. Let Fn be the principal ultrafilter of FC(ω) generated by {n}, and U be the non-principal ultrafilter of cofinite sets; furthermore, let h : B → 2Ult(B) be the Stone embedding. Then, for M ∈ B,  {Fn : n ∈ M}, if M is finite, h(M) = {Fn : n ∈ M} ∪ {U }, if M is cofinite. By (9.3.2) and the definition of f ,

220

I. Düntsch et al.

⎧ ⎪ ⎨n = 0 and (m = 0 or m odd) or Fn R f Fm ⇐⇒ n even and m = 0, or ⎪ ⎩ n odd and n = m. Furthermore, U R f Fn ⇐⇒ n = 0, Fn R f U ⇐⇒ n ∈ ω, U R f U. Therefore, keeping in mind that R({Fm }) = R ˘({Fm }) we obtain ⎧ ⎪ ⎨{0} ∪ {Fn : n odd}, if m = 0, R({Fm }) = Ult(B) if m ∈ F 0 , ⎪ ⎩ if m ∈ F 1 . Ult(B) \ {Fm } This gives us ⎧ 0 ⎪ ⎨{Fn : n ∈ F }, if m = 0, −R({Fm }) = ∅ if m ∈ F 0 , ⎪ ⎩ if m ∈ F 1 . {Fm } This shows that −R f  ≤ gi for all i ∈ ω. On the other hand, if we define  p({n}) =

{0}, if n is even, {n}, if n is odd,

then  f, p is a minimal pair, and p  gi for all i ∈ ω+ .



Our final example in this section exhibits a modal operator on the countable free Boolean algebra without a dual pseudocomplement. Example 9.3 Let B be the interval algebra of the rational unit interval [0, 1)Q ; we regard B as a subalgebra of the real unit interval [0, 1)R . It is well known that B is the free Boolean algebra on countably many generators. In particular, B is homogenous, and every nonempty infinite open interval is order isomorphic to every other nonempty infinite open interval. Let 0  p  1 be irrational, and h : (0, 1)Q → ( p, 1)Q be an order isomorphism. Suppose that x ∈ B + and [x00 , x01 ) ∪ [x10 , x11 ) ∪ . . . ∪ [xt0(x) , xt1(x) ) be its canonical representation. Now, set f (0) := 0, and, for x ∈ B + , f (x) := [0, h(xt1(x) )).

9 On the Semilattice of Modal Operators and Decompositions of the Discriminator

221

Let y = [y00 , y01 ) ∪ [y10 , y11 ) ∪ . . . ∪ [yt0(y) , yt1(y) ); then, f (x) + f (y) = [0, h(xt1(x) )) ∪ [0, h(yt1(y) )) = [0, max{h(xt1(x) ), h(yt1(y) ))}, = [0, h(max{xt1(x) , yt1(y) })),

since h is an order isomorphism,

= [0, h((x +

since t (x + y) = max{xt1(x) , yt1(y) },

y)1t (x+y) )),

= f (x + y). Thus, f ∈ M (B). For each x ∈ B + let Mx := {− f (y) : 0  y ≤ x} and Mx∗ := { f (y) : 0  y ≤ x}; then, Mx∗ = {[0, h(yt1(y) )) : 0  y ≤ x}. To abbreviate notation, let C := I nt Alg([0, 1)R ). Considering that h is an order isomorphism, we obtain C

Mx∗ =

C

{[0, h(yt1(y) )) : 0  y ≤ x} = [0, p),

and therefore  Mx = − { f (y) : 0  y ≤ x} = − Mx∗ = [ p, 1). C

C

C

(9.4.9)

(9.4.10)

Next, let f ∨ f = f 1 , and 0  x  1. Since f (x) is an upper bound of Mx by Lemma 9.3, it follows from (9.4.10) that [ p, 1) ⊆ f (x).

(9.4.11)

For each s ∈ (0, p) ∩ Q, let f s (0) := 0, and f s (x) := [s, 1) for all x ∈ B + . Then, f s ∈ M (B), and f (x) ∪ f s (x) = [0, h(xt1(x) )) ∪ [s, 1) = [0, 1), the latter since s  p. If 0  s ≤ s  p, then f s (x) ≤ f s (x), and therefore, C

{ f s (x) : s ∈ (0, p) ∩ Q} =

C

{[s, 1) : s ∈ (0, p) ∩ Q} = [ p, 1). (9.4.12)

Assume that f ⊥ is a dual pseudocomplement of f . Then [ p, 1) ⊆ f ⊥ (x) for each x ∈ B + by(9.4.11); furthermore, f ⊥ (x) ≤ f s (x) for each s ∈ (0, p) ∩ Q, and thus, f ⊥ (x) ⊆ C { f s (x) : s ∈ (0, p) ∩ Q}, and (9.4.12) implies that f ⊥ (x) ⊆ [ p, 1). / B, since p is irraAltogether, we obtain that f ⊥ (x) = [ p, 1)—however, [ p, 1) ∈ tional.  As in Example 9.2, we see concretely that the pseudocomplement does not exist, because certain infinite products (or sums) do not exist in B—this time for an atomless BA. Lemma 9.3 tells us that this is to be expected, since B is not complete. Furthermore, we observe that f is a closure operator on the free countable Boolean algebra.

222

I. Düntsch et al.

9.5 Decomposing Discriminators It is often the case that pairs of operators are considered which, taken together, have desirable structural properties. Examples of such pairs are Galois connections or residuated mappings. In [6] pairs of operators  f, g were considered where f is a modal operator, and g is a sufficiency operator, i.e. g(0) = 1 and g(x + y) = g(x) · g(y) for all x, y ∈ B. A weak mixed algebra (wMIA) is a structure B, f, g such that f is a modal operator, g is a sufficiency operator, and (∀x)[x = 0 implies g(x) ≤ f (x)].

(9.5.1)

The class of wMIAs is denoted by wMIA. These algebras are intimately connected to algebraic models of the logic K∼ , which was introduced in [8]. It turns out that the discriminator decomposition algebras defined below are another way of describing weak MIAs. Suppose that f, g are modal operators on B, and consider the condition (∀x)[x = 0 ⇒ f (x) + g(x)] = 1.

(9.5.2)

Clearly, f ∨ g is the unary discriminator. If a pair  f, g of modal operators satisfies (9.5.2) we call it a decomposing pair, and g a companion of f . If  f, g is a decomposing pair, then so is g, f  owing to the commutativity of +. The set of all decomposing pairs is denoted by Dp(B). For each f ∈ M (B) the pair  f, f 1  is decomposing. If f ⊥ is the dual pseudocomplement of f , then  f, f ⊥  is a decomposing pair, and f ⊥ is the smallest g ∈ M (B) such that  f, g is decomposing. A discriminator decomposition algebra (DDA) is a bi-modal algebra B, f, g such that  f, g is a decomposing pair. If both f and g are proper, i.e. not equal to f 1 , B, f, g is called a proper DDA. The class of DDAs is denoted by DDA; by (9.5.2), DDA is a discriminator class. The relational counterpart of this situation are frames X, R, S, where R ∪ S = X 2 ; these are called generalized models of K∼ in [8]. Theorem 9.9 There is a bijective correspondence between the set of decomposing pairs and the set of pairs  f, g such that B, f, g is a weak MIA. Proof Let B, f, g be a weak MIA, and set f := g ∗ . Then, f is a modal operator, and, for all a = 0, f (a) + f (a) = f (a) + g ∗ (a) = f (a) + −g(a) = 1, the latter since g(a) ≤ f (a). Clearly, the assignment  f, g →  f, g ∗  is an injective mapping, and all that is left to show is that it is surjective. Thus, let f, f be modal operators such that f (a) + f (a) = 1 for all a = 0, and set g := f ∗ . Clearly, g is a

9 On the Semilattice of Modal Operators and Decompositions of the Discriminator

223

sufficiency operator, and f (a) + f (a) = 1 implies g(a) = − f (a) ≤ f (a) for all a = 0.  Thus, the classes wMIA and DDA are equipollent in the sense of [15]. Since in DDA we are dealing with just one kind of operator instead of the two kinds in wMIA, it is less complicated to work in DDA. It is also easier to apply results from the theory of modal algebras. Lemma 9.5 1. DDA is closed under taking subalgebras, homomorphic images and ultraproducts. 2. DDA is not closed under taking direct products, and thus, it is neither a variety nor a quasivariety. Proof 1. DDA is a universal class with a set of positive axioms, thus, it is closed under taking subalgebras, ultraproducts, and homomorphic images, see e.g. [4, Chap. 5]. 2. Since DDA is a discriminator class, each member is simple [16, Theorem 2.2], and thus, DDA is not closed under direct products. For the rest, just note that each quasivariety is closed under direct products [4, Theorem 2.25].  The equational class generated by wMIA was described in [6, Sect. 7], and the results can be translated for DDA in a straightforward way. Given a bimodal algebra B, f, g let u : B → B be defined by u(x) := f ∂ (x) · g ∂ (x).

(9.5.3)

B is called a K∼ -DDA, if u is an S5 possibility operator, i.e. if u has the following properties: x ≤ u(x), u(u(x)) ≤ u(x), u(u ∂ (x)) ≤ x.

(9.5.4) (9.5.5) (9.5.6)

The class of K∼ -DDAs is denoted by KDDA; clearly, KDDA is an equational class. Theorem 9.10 Eq(DDA) = KDDA. Proof Taking into account Theorem 9.9, the proof is a straightforward translation of [6, Theorem 7.3].  The following result relating a decomposing pair to its canonical relations comes as no surprise: Theorem 9.11 Suppose that B, f, g is a bimodal algebra. Then, B, f, g ∈ DDA if and only if R f ∪ Rg = Ult(B)2 .

224

I. Düntsch et al.

Proof “⇒”: Let F, G ∈ Ult(B)2 , and assume F, G ∈ / R f ∪ Rg . Then, f [G]  F and g[G]  F. Thus, there are a, b ∈ G such that f (a) ∈ / F and g(b) ∈ / F. Now, a · b ∈ G, and f (a · b) ∈ F implies f (a) ∈ F, since f is isotone and F is a filter. Thus, f (a · b) ∈ / F, and, similarly, g(a · b) ∈ / F. Since  f, g is a decomposing pair we have f (a · b) + g(a · b) = 1 ∈ F. Since F is an ultrafilter, f (a · b) ∈ F or g(a · b) ∈ F, a contradiction. “⇐”: Let a ∈ B + and assume that f (a) + g(a) = 1. Then, there is an ultrafilter F such that f (a) + g(a) ∈ / F, i.e. f (a) ∈ / F and g(a) ∈ / F. It follows that f [G]  F  and g[G]  F which contradicts R f ∪ Rg = Ult(B)2 . Theorem 9.12 If B, f, g ∈ DDA, then CmCst(B) ∈ DDA. Proof This follows from the syntactic form of (9.5.2), see. e.g. [10, Theorem 4.2.1]. A direct proof is as follows: Since B ∈ DDA, R f ∪ Rg = Ult(B)2 . We show that R f ({U }) ∪ Rg ({U }) = Ult(B) for U ∈ Ult(B); then this can be extended to all non-empty subsets of Ult(B), since the operators are isotone. Assume that R f ({U }) ∪ Rg ({U }) = Ult(B). Then, there is some ultrafilter F of B such that / Rg ({U }); in other words, F, U  ∈ / R f and F, U  ∈ / Rg . F∈ / R f ({U }) and F ∈  This contradicts R f ∪ Rg = Ult(B)2 .

9.6 Proper Companions Let us order Dp(B) by setting  f, g ≤  f , g  if f ≤ f and g ≤ g . The following observation is obvious: Lemma 9.6 If g is a companion of f and g ≤ g , then g is a companion of f . Theorem 9.13 Let  f, g ∈ Dp(B). If f and g are dual pseudocomplements of each other, then  f, g is a minimal pair. If B is complete, then the converse also holds. Proof Since f and g are dual pseudocomplements of each other, we may suppose that f = f ⊥⊥ and g = f ⊥ . Let  f , g  ∈ Dp(B), and  f , g  ≤  f ⊥⊥ , f ⊥ . Then, f ⊥⊥ ∨ g = f 1 , since f ≤ f ⊥⊥ . It follows that f ⊥ ≤ g , since f ⊥ is the dual pseudocomplement of f ⊥⊥ , hence, f ⊥ = g . Similarly we can show that f ⊥⊥ = f . If B is complete, then every f ∈ M (B) has a dual pseudocomplement, and the claim follows from the fact that the set { f ⊥⊥ , f ⊥  : f ∈ M (B)} is dense in Dp(B).  In Example 9.2,  f, p is a minimal pair, and p  gi for all i ∈ ω+ . This shows that the assumption of completeness in the ⇐ direction cannot be removed. A companion g of f is called proper, if g = f 1 . Note that this notion is not symmetric: If f = f 1 , then every g = f 1 is a proper companion of f , but f is not a proper companion of any g. A discriminating pair  f, g is called proper if both f, g = f 1 . It turns out that the property of f having a proper companion is a Σ1 first order property, as the following result shows:

9 On the Semilattice of Modal Operators and Decompositions of the Discriminator

225

Theorem 9.14 f has a proper companion if and only if (∃x)(∃z)[x = 0 and z = 0 and (∀y)(0  y ≤ x ⇒ z ≤ f (y))].

(9.6.1)

Proof “⇒”: Suppose that (9.6.1) is not true, i.e. (∀x)(∀z)[x = 0 and (∀y)(0  y ≤ x ⇒ z ≤ f (y)) ⇒ z = 0].

(9.6.2)

For x = 0, set S(x) := {− f (y) : 0  y ≤ x}. If s is an upper bound of S(x), then − f (y) ≤ s for all 0  y  ≤ x, i.e. −s ≤ f (y). It now follows from (9.6.2) that −s = 0, i.e. s = 1, hence, S(x) = 1  for all x ∈ B + . By Lemma 9.3, f has a dual ⊥ ⊥ S(x) = 1. Therefore, f ⊥ = f 1 which is pseudocomplement f , and f (x) = the smallest companion of f . “⇐”: Let x, z witness (9.6.1); in particular, x, z = 0. We shall consider two cases: 1. z = 1: By the hypothesis, (∀y = 0)[y ≤ x ⇒ f (y) = 1]. Set ⎧ ⎪ ⎨0, if y = 0, g(y) = x, if 0  y ≤ x, ⎪ ⎩ 1, otherwise.

(9.6.3)

Clearly, g ∈ M (B) \ { f 1 }. Let y ∈ B + . If y ≤ x, then z ≤ f (y) = 1, and if y  x, then g(y) = 1. It follows that g is a proper companion of f . 2. z = 1: Define g : B → B by ⎧ ⎪ if y = 0, ⎨0, g(y) = −z, if 0  y ≤ x, ⎪ ⎩ 1, otherwise.

(9.6.4)

Clearly, g ∈ M (B), and g = f 1 since −z = 1. Let y ∈ B + . If y ≤ x, then z ≤ f (y) by the hypothesis, and g(y) = −z, which implies f (y) + g(y) = 1. If y  x, then g(y) = 1. Altogether, g is a proper companion of f . This completes the proof.



Since (9.6.1) holds if and only if { f (y) : 0  y ≤ x} has a nonzero lower bound for some x ∈ B + , we obtain Corollary 9.2 The following statements are equivalent: f has no proper companion. (∀x ∈ B + ) { f (y) : 0  y ≤ x} = 0.  (∀u ∈ B \ {1}) { f ∂ (y) : u ≤ y  1} = 1.

(9.6.5) (9.6.6) (9.6.7)

226

I. Düntsch et al.

Since f (0) = 0, the non-existence of a proper companion is in some sense an expression of continuity of f at 0. One may also interpret this as completeness of f at 0. Corollary 9.3 If a is an atom of B with f (a) = 0, then f has a proper companion. Proof Set x := a, and z := f (a); then, x and z are witnesses for (9.6.1).



Theorem 9.15 If B = B, f  is subdirectly irreducible, and f is a closure operator, then f has a proper companion. Proof Considering −x and −z we see that (9.6.1) is equivalent to (∃u = 1)(∃v = 1)(∀t)(u ≤ t  1 ⇒ f ∂ (t) ≤ v)].

(9.6.8)

Since B is subdirectly irreducible and f is a closure operator, we can use Rautenberg’s criterion in the form (9.3.5) to obtain some a ∈ B such that f ∂ (b) ≤ a for all b = 1. Setting u := 0 and v := a in (9.6.8) gives the desired result.  The following examples shed light on the connection between f having a proper companion and f [B] being dense. Example 9.4 Suppose that B is atomless and I is a dense ideal of B. Define  f (x) :=

1, if x ∈ / I, x, otherwise.

Clearly, f is a closure operator. Let g be a companion of f . If x ∈ I + , then, since B is atomless, there are y, z ∈ I + such that y + z = x and y · z = 0. By (9.5.2), y + g(y) = z + g(z) = 1, and therefore, −y ≤ g(y) and −z ≤ g(z). Since y · z = 0, we have −y + −z = 1, and it follows that 1 = −y + −z ≤ g(y) + g(z) = g(y + z) = g(x). If x ∈ / I , there is some y ∈ I + such that y ≤ x, since I is dense. Since g(y) = 1 and g(y) ≤ g(x), we have g(x) = 1. Hence, g is not proper.  The next example destroys density of f [B] while keeping part of the previous example. Recall that B is called homogeneous if B ∼ =↓ a for every a ∈ B + [12, Definition 9.12.]. In particular, every infinite free algebra is homogeneous. Example 9.5 Suppose that B is homogeneous, 0  a  1, and that π :↓ a →↓ −a is an isomorphism. Define f : B → B by f (x) := x + π(x · a).

9 On the Semilattice of Modal Operators and Decompositions of the Discriminator

227

Then, ↓ a ∩ f [B] = {0}, and thus, f [B] is not dense in B. Since f (0) = π(0) + 0 = 0, we see that f is normal: Let x, y ∈ B + . Then, f (x) + f (y) = x + π(x · a) + y + π(y · a), = x + y + π(x · a) + π(y · a), = x + y + π(x · a + y · a), = x + y + π((x + y) · a), = f (x + y). Clearly, x ≤ f (x). Furthermore, π( f (x) · a) = π((x + π(x · a)) · a)   f (x)

= π(x · a + π(x · a) · a) = π(x · a) + π(π(x · a) · a) = π(x · a) + 0 = π(x · a), and therefore, f ( f (x)) = f ( f (x) + π( f (x) · a), = f ( f (x) + π(x · a)), = f (x + π(x · a) + π(x · a)), = f (x + π(x · a)), = f (x) + f (π(x · a)), = x + π(x · a) + π(x · a), = x + π(x · a),

since f is additive since π(x · a) · a = 0

= f (x). Thus, f is a closure operator.  Let 0  x; by Corollary 9.2 it is sufficient to show that { f (y) : 0  y ≤ x} = 0.  In what follows, we suppose that an infinite product M is taken in the completion   B of B. If M B = 0, then M = 0 in B. B

{ f (y) : 0  y ≤ x} = =



B B

{y + π(y · a) : 0  y ≤ x} {y : 0  y ≤ x} + {π(y · a) : 0  y ≤ x}, B

= 0 + 0, the latter since B is atomless.



228

I. Düntsch et al.

Our final example shows that density of f [B] is in general not sufficient to show that f has no proper companion. Example 9.6 Let B be homogeneous, and 0  a  1. Furthermore, let b, c = 0 and b ⊕ c = −a. Since ↓ a ∼ =↓ b and ↓ −a ∼ =↓ c, there are isomorphisms π :↓ b →↓ a, and ψ :↓ c →↓ −a. Furthermore, a ⊕ b ⊕ c = 1 implies that x = x · a ⊕ x · b ⊕ x · c. Define f : B → B as follows:  f (x) :=

1, if x · a = 0, π(x · b) + ψ(x · c). if x · a = 0.

Then, f is well defined since x = x · a ⊕ x · b ⊕ x · c. Furthermore, f (x) = π(x) if x ≤ b, and f (x) = ψ(x) if x ≤ c. Since f (0) = π(0) = ψ(0) = 0, f is normal. Let x, y ∈ B + . If, say, x · a = 0, then f (x) = 1, and thus, f (x) + f (y) = 1. Since (x + y) · a = 0, we also have f (x + y) = 1. Let x, y ≤ −a. Then, f (x + y) = π((x + y) · b) + ψ((x + y) · c), = π(x · b + y · b) + ψ(x · c + y · c), = π(x · b) + π(y · b) + ψ(x · c) + ψ(y · c), = π(x · b) + ψ(x · c) + π(y · b) + ψ(y · c), = f (x) + f (y). Thus, f ∈ M (B). Let z ∈ B + . If z · a = 0, then f (x) = π(x) = z · a ≤ z for some 0  x ≤ b. If z · −a = 0, then there is some 0  x ≤ c such that f (x) = ψ(x) = z · −a ≤ z. Hence, f [B] is dense in B. Let g : B → B be defined by ⎧ ⎪ ⎨0, if x = 0, g(x) := a, if 0  x ≤ a, ⎪ ⎩ 1, otherwise. Clearly, g ∈ M (B) \ { f 1 }. Let x = 0. If x ≤ a, then f (x) = 1, and if x  a, then g(x) = 1. Hence, g is a proper companion of f .  Thus, we observe that 1. There is some B, f  such that f [B] is dense and f does not have a proper companion (Example 9.4). 2. There is some B, f  such that f is a closure operator, f [B] is not dense and f has no proper companion (Example 9.5).

9 On the Semilattice of Modal Operators and Decompositions of the Discriminator

229

3. There is some B, f  such that f [B] is dense and f has a proper companion (Example 9.6). Therefore, the properties of f [B] dense and f having a proper companion are independent. However, if f has additional properties, the situation is different, as we shall show below. For a modal operator f on B, its n−th iteration is defined as usual: For n ≥ 1, x∈B f 1 (x) := f (x), f n+1 (x) := f ( f n (x)). In analogy to the corresponding property of frame relations, we say that a modal operator f is n–transitive if, for all x ∈ B, 4n . f n+1 (x) ≤ f n (x). Theorem 9.16 Suppose that B is atomless, and f is n-transitive for some n ≥ 1. If f n [B] is dense in B, then f does not have a proper companion. Proof Assume that f has a proper companion. By Theorem 9.14, there are x, z ∈ B + such that 0  y ≤ x implies z ≤ f (y). Let t ∈ B such that 0  f n (t) ≤ x; such t exists, since f n [B] is dense by the hypothesis. By (9.6.1), z ≤ f ( f n (t)), and thus, z ≤ f ( f n (t)) ≤ f n (t) ≤ x, since f is n-transitive. Note that this is not possible in Example 9.6. Again by density there is some s ∈ B + such that 0  f n (s) ≤ z; since B is atomless, we may suppose that f n (s)  z. But then, 0  s ≤ x implies z ≤ f ( f n (s)) ≤  f n (s), a contradiction. Example 9.5 shows that the converse is not true, thus, density of f [B] in B is too strong a property to follow from not having a proper companion, even if f is a closure operator. The modal axiom B : ♦A → A can be n-generalized in two ways, see e.g. [5, Ch. 4.3, pp. 136f]: Bn . ♦n n A → A. B()n . (♦)n A → A.

Corollary 9.4 The possibility operator of the Lindenbaum–Tarski algebra Bω , f  of the logics K 4n B n and K 4n B()n does not have a proper companion for every n ≥ 1. Proof Since Bω is the countable free Boolean algebra, it is atomless. Furthermore, 4n says that f n is n-transitive, and Bn and B()n imply that f n [Bω ] is dense in Bω .  Therefore, the modal operator of the countable Lindenbaum–Tarski algebra of an axiomatic extension of K4B—in particular, that of S5—does not have a proper companion. On the other hand, Corollary 9.3 shows that any nontrivial modal operator

230

I. Düntsch et al.

on an algebra with at least one atom has a proper companion, in particular, that of a finite algebra with at least four elements. Therefore, we conclude that this property is not solely a property of the logic, but depends on the model. Acknowledgements Ivo Düntsch gratefully acknowledges support by the National Natural Science Foundation of China, Grant No. 61976053, and contract DN02/15/19.12.2016 of the Bulgarian NFS. Ewa Orłowska gratefully acknowledges partial support from the National Science Centre project DEC-2011/02/A/HS1/00395. We also want to express our gratitude to the anonymous reviewer for careful reading and helpful suggestions, which greatly helped to increase the readability of the paper.

References 1. Andréka, H., Jónsson, B., & Németi, I. (1991). Free algebras in discriminator varieties. Algebra Universalis, 28(3), 401–447. ISSN 14208911. 2. Blok, W. (1980). The lattice of modal logics: An algebraic investigation. The Journal of Symbolic Logic, 45, 221–238. 3. Blyth, T. S. (2005). Lattices and ordered algebraic structures. Springer. 4. Burris, S., & Sankappanavar, H. P. (1981). A course in universal algebra. New York: Springer. 5. Chellas, B. F. (1980). Modal logic: An introduction. Cambridge University Press. 6. Düntsch, I., Orłowska, E., & Tinchev, T. (2018). Mixed algebras and their logics. Journal of Applied Non-Classical Logics, 27(3–4), 304–320. 7. Frink, O. (1962). Pseudo-complements in semi-lattices. Duke Mathematical Journal, 29, 505– 514. 8. Gargov, G., Passy, S., & Tinchev, T. (1987). Modal environment for Boolean speculations. In D. Skordev (Ed.), Mathematical logic and applications (pp. 253–263). New York: Plenum Press. 9. Goranko, V., & Passy, S. (1992). Using the universal modality: Gains and questions. Journal of Logic and Computation, 2(1), 5–30. 10. Jónsson, B. (1993). A survey of Boolean algebras with operators. In P. Burmeister, I. G. Rosenberg, & G. Sabidussi (Eds.), Algebras and orders (Vol. 389, pp. 239–286). NATO Advanced Science Institutes Series C: Mathematical and Physical Sciences. Dordrecht: Kluwer. 11. Jónsson, B., & Tarski, A. (1951). Boolean algebras with operators I. American Journal of Mathematics, 73, 891–939. 12. Koppelberg, S. (1989). General theory of Boolean algebras. In Handbook on Boolean algebras (Vol. 1). North-Holland. 13. Moschovakis, Y. (2006). Notes on set theory. In Undergraduate texts in mathematics (2nd ed.). Springer. 14. Rautenberg, W. (1980). Splitting lattices of logics. Archive for Mathematical Logic, 20, 155– 159. 15. Tarski, A., & Givant, S. (1987). A formalization of set theory without variables. In Volume 41 of Colloquium Publications. Providence: Amer. Math. Soc. 16. Werner, H. (1978). Discriminator—Algebras. Berlin: Akademie-Verlag. Ivo Düntsch obtained his diploma (1975), doctorate (1981) and habilitation (1989) in mathematics from the Freie Universität Berlin, Germany. He has worked in various academic and administrative positions on five continents. Currently, he is Emeritus Professor at Brock University, St. Catharines, Canada, and Visiting Professor at Fujian Normal University, Fuzhou, China. His research interests are widely spread; they include algebraic and modal logic and their application

9 On the Semilattice of Modal Operators and Decompositions of the Discriminator

231

to data analysis and machine learning, algebraic and statistical foundations of rough set theory, lattice theory and universal algebra, qualitative spatial reasoning and region based topology. Wojciech Dzik is a Polish mathematician and logician. He graduated in Mathematics in 1971 and received his Ph.D. in Mathematics in 1979 at the University of Silesia, Katowice. He visited the Mathematical Institute of the Oxford University (1979/1980) supported by a scholarship. He received his habilitation at Warsaw University in 2008. His main scientific interest include the interplay between algebra and logic in general and structural completeness, duality, admissible rules and unification in logic in particular. He is the author of more than 50 papers and the monograph “Unification Types in Logic,” 2007. Ewa Orłowska graduated in mathematics, received her Ph.D, and habilitation at the Department of Mathematics of the University of Warsaw, Poland. She was affiliated with the University of Warsaw, Polish Academy of Sciences, and at present she is a full profesor at the National Institute of Telecommunications in Warsaw. She held visiting and research positions in universities in Europe, Canada, and South Africa. She served as a president of the Polish Association of Logic and Philosophy of Science and chair of the Council of this Association, and as an assessor of Council of the Division of Logic, Methodology and Philosophy of Science of the International Union of History and Philosophy of Science. She served as an invited speaker and a member of advisory boards, program committees or steering committees of many international conferences. Her scientific interests focus on mathematical logic and are mainly concerned with nonclassical logics and their applications in computer science and relational methods in logic. She is the author or coauthor of over 100 publications, and three books. She served as the editor of several special issues of international journals and books.

Chapter 10

Modal Logics that Bound the Circumference of Transitive Frames Robert Goldblatt

Abstract For each natural number n we study the modal logic determined by the class of transitive Kripke frames in which there are no cycles of length greater than n and no strictly ascending chains. The case n = 0 is the Gödel-Löb provability logic. Each logic is axiomatised by adding a single axiom to K4, and is shown to have the finite model property and be decidable. We then consider a number of extensions of these logics, including restricting to reflexive frames to obtain a corresponding sequence of extensions of S4. When n = 1, this gives the famous logic of Grzegorczyk, known as S4Grz, which is the strongest modal companion to intuitionistic propositional logic. A topological semantic analysis shows that the n-th member of the sequence of extensions of S4 is the logic of hereditarily n + 1irresolvable spaces when the modality ♦ is interpreted as the topological closure operation. We also study the definability of this class of spaces under the interpretation of ♦ as the derived set (of limit points) operation. The variety of modal algebras validating the n-th logic is shown to be generated by the powerset algebras of the finite frames with cycle length bounded by n. Moreover each algebra in the variety is a model of the universal theory of the finite ones, and so is embeddable into an ultraproduct of them. Keywords Transitive frame · Cluster · Circumference · Modal logic · Filtration · Finite model property · Hereditarily irresolvable space · Grzegorczyk · Gödel-Löb logic

10.1 Algebraic Logic and Logical Algebra The field of algebraic logic has been described as having two main aspects (see the introductions to Andréka et al. (2001) and Daigneault (1974). One is the study of R. Goldblatt (B) School of Mathematics and Statistics, Victoria University of Wellington, P.O. Box 600, Wellington 6140, New Zealand e-mail: [email protected] © Springer Nature Switzerland AG 2021 J. Madarász and G. Székely (eds.), Hajnal Andréka and István Németi on Unity of Science, Outstanding Contributions to Logic 19, https://doi.org/10.1007/978-3-030-64187-0_10

233

234

R. Goldblatt

algebras arising from logical ideas. The other is the study of logical questions by algebraic methods. Both aspects are well exemplified in the profound research of Hajnal Andréka and István Németi. Together, and in collaboration with many colleagues, they have created a prodigious body of literature about Boolean algebras, cylindric algebras, polyadic algebras, relation algebras, fork algebras, modal algebras, dynamic algebras, Kleene algebras and others; with applications to questions of definability, axiomatizability, interpolation, omitting types, decidability etc. for a range of logics. Concerning the first aspect, there is no restriction on the methods that may be used to study abstract algebras. Often the work is algebraic, but it may also involve, say, topology or set theory. Or logic itself. The study of algebraic questions by logical methods, a kind of converse to the second aspect, might be called logical algebra. One of the aims of the present paper is to provide an illustration of logical algebra at work. In the final section we show that some varieties of modal algebras, built from certain finite graphs with bounded circumference, have the property that each member of the variety is embeddable into an ultraproduct of finite members. The logical proof of this structural result involves an adaptation of a construction developed to show that certain modal logics have the finite model property under their Kripke semantics, as well as an analysis of the behaviour of the universal sentences satisfied by the algebras involved. The initial impetus for this study came from reflection on a property of the well known modal logic of Grzegorczyk, which is characterised by the class of finite partially ordered Kripke frames. A partial order can be described as a quasi-order (reflexive transitive relation) that has circumference equal to 1, where the circumference is the longest length of any cycle. This suggests a natural question: what modal logics are characterised by frames with circumference at most n for an arbitrary natural number n? Dropping reflexivity and considering transitive frames, the answer is already known for two cases. For n = 0 it is the Gödel-Löb modal logic of provability, and for n = 1 it is a version of Grzegorczyk’s logic without reflexivity. Here we will provide a systematic answer for all n, giving in each case an axiomatisation of the logic concerned and showing it has the finite model property. We then discuss topological semantics for these logics, and finally turn to algebra and take up the matters mentioned in the previous paragraph. The next section provides more background on these ideas, as preparation for the technical work to follow.

10.2 Grzegorczyk and Löb Grzegorczyk (1967) defined a modal logic, which he called G, by adding to S4 the axiom (10.1) (( p   q)   q) ∧ ((¬ p   q)   q))   q,

10 Modal Logics that Bound the Circumference of Transitive Frames

235

where  denotes strict implication, i.e. ϕ  ψ is (ϕ → ψ). He showed that G is a modal companion to intuitionistic propositional logic, meaning that the latter is embedded conservatively into G by the Gödel-McKinsey-Tarski translation. A few years earlier, Soboci´nski (1964) had defined a logic K1.1 by adding to S4 the axiom (( p   p)  p))  p, which he called J1. This was an adaptation of (( p   p)  p)) → (♦  p → p), which was in turn a simplification by Geach of (( p   p)   p)) → (♦  p →  p), which had been discovered in 1958 by Dummett as an example of a formula that is not a theorem of S4.3 but is valid under Prior’s Diodorean temporal interpretation of necessity in discrete linear time (see Prior 1962, p. 139 and Prior 1967, p. 29). Soboci´nski (1970) showed that K1.1 is weaker than G, by deriving J1 from Grzegorczyk’s G-axiom (10.1). He raised the issue of whether K1.1 was strictly weaker, suggesting that this was ‘very probable’. However Segerberg (Segerberg 1971, Sect. II.3) proved that K1.1 and G are the same logic, by showing that K1.1 is determined by the class of finite partially ordered Kripke frames and observing that (10.1) is valid in all such frames, hence derivable in K1.1. Segerberg axiomatised K1.1 as S4 plus (10.2) (( p →  p) → p) → p, which is equivalent to J1 over S4. He gave the name ‘Grz’ to axiom (10.2) in honour of Grzegorczyk, and K1.1/G has been known ever since as S4Grz. The difference between S4 and S4Grz can be understood in terms of the distinction between quasi-ordered and partially ordered frames. A quasi-order (W, R) is a reflexive transitive relation R on a set W . It is a partial order if in addition it is antisymmetric: x Ry Rx implies x = y. The condition ‘x Ry Rx’ defines an equivalence relation on W whose equivalence classes are known as clusters. R is universal on a given cluster and maximally so. It lifts to a partial order of the set of clusters by specifying that C RC  iff x Ry where x is any member of cluster C and y is any member of cluster C  . The original relation R is a partial order iff each cluster contains just a single element. In these terms, S4 has a number of characterisations. It is the logic of all quasiorders, of all partial orders, and of all finite quasi-orders. But it is not the logic of all finite partial orders since, as mentioned above, that logic is S4Grz. The Grz-axiom (10.2) is valid in any finite partial order, but invalid in some infinite partial orders. The precise situation is that Grz is valid in a frame (W, R) iff it is a partial order that has no strictly ascending chains, i.e. no sequences x0 R · · · Rxn Rxn+1 R · · · such that

236

R. Goldblatt

xn+1 Rxn fails for all n. A finite quasi-order has no such chains, so validates Grz iff it is antisymmetric. For n  1, an n-cycle is given by a sequence x0 , . . . , xn−1 of n distinct points that have x0 R · · · Rxn−1 Rx0 . The points of a cycle all belong to the same cluster, and in a finite frame the length of a longest cycle is equal to the size of a largest cluster. This maximum length/size is the circumference of the frame. Our interest in this paper is in relaxing the antisymmetry property of partial orders that constrains clusters to be singletons and cycles to be of length 1. What logic results if we allow cycles of length up to two, or three etc.? Also we wish to broaden the context to consider transitive frames that may have irreflexive elements, and clusters that may consist of a single such element. So instead of S4, we work over the weaker K4, which is the logic of all transitive frames. That allows us to admit a circumference of 0, since in a finite transitive irreflexive frame there are no cycles at all. The logic characterised by such frames has been well studied: it is the smallest normal logic to contain the Löb axiom (10.3) ( p → p) →  p. The proof of this fact is also due to Segerberg (1971, Sect. II.2), who called the axiom W and the logic KW. Later Solovay (1976) showed that it is precisely the modal logic that results when  is interpreted as expressing provability in firstorder Peano arithmetic. It was Löb (1955) who showed that (10.3) is valid under this interpretation. The logic is now often called the Gödel-Löb logic, or GL. For each natural number n we will define an axiom Cn which is valid in precisely those transitive frames that have no strictly ascending chains and no cycles (or clusters) with more than n elements. We prove that the logic of this class of frames is axiomatisable as the system K4Cn . The proof uses the familiar technology of filtration of canonical models and then modification of the filtration to obtain a finite model with the desired properties. Here the modification involves ‘breaking up’ clusters that contain too many elements, hence destroying cycles that are too long. It establishes that K4Cn has the finite model property, being the logic of the class of finite transitive frames that have circumference at most n, and is a decidable logic. From this we conclude that the logics {K4Cn : n  0} form a strictly decreasing sequence of extensions of K4 whose intersection is K4 itself. The analysis is then adapted to some extensions of K4Cn obtained by adding the axioms corresponding to seriality, reflexivity and connectedness of the relation R. To indicate the nature of the axioms Cn , we remark first that C0 is equivalent over all frames to the formula (10.4) ♦ p → ♦( p ∧ ¬♦ p), which is itself equivalent to the Löb axiom (10.3). To explain C1 , observe that Grz is equivalent over S4 to Grz :

(( p →  p) →

p) →  p,

which can be equivalently expressed in terms of ♦ as

(10.5)

10 Modal Logics that Bound the Circumference of Transitive Frames

♦ p → ♦( p ∧ ¬♦(¬ p ∧ ♦ p)).

237

(10.6)

Replacing ¬ p here by a variable which is hypothesised to be incompatible with p, we are led to define C1 to be the two-variable formula

∗ ¬( p ∧ q) → (♦ p → ♦( p ∧ ¬♦(q ∧ ♦ p))) (where in general ∗ ϕ is ϕ ∧  ϕ). This will be shown to be equivalent to (10.6), hence to Grz , over K4. Frames validating the logic K4C1 have only singleton clusters, some of which may contain irreflexive elements. The finite frames of this kind determine the logic K4Grz , as was shown by Amerbauer (1996) using tableaux techniques, and then by Gabelaia (2004) using a filtration method. This logic is equal to K4C1 , while Grzegorczyk’s logic disallows irreflexivity and is equal to S4C1 . The logic K4C1 was studied under the name K4.Grz by Gabelaia (2004) and Esakia (2006), and has been called the weak Grzegorczyk logic (wGrz) by Litak (2007; 2014). Amerbauer (1996) calls it G0 , while Goré and Ramanayake (2014) call it Go. Lifting the pattern of C1 to three variables p0 , p1 , p2 , we define C2 to be

∗

 

 ¬( pi ∧ p j ) → (♦ p0 → ♦( p0 ∧ ¬♦( p1 ∧ ♦( p2 ∧ ♦ p0 )))).

i< j2

Cn extends the pattern to n + 1 variables, and will be formally defined in Sect. 10.4. After completing our model-theoretic analysis we turn to topological semantics and show that for all n ≥ 1, the logic S4Cn is characterised by validity in the class of all (finite) hereditarily n + 1-irresolvable topological spaces when ♦ is interpreted as the operation of topological closure. Hitherto this characterisation was known only for n = 1, i.e. for S4Grz. We then discuss the interpretation of ♦ as the derived set operation, assigning to each set its set of limit points, and show that a space also validates Cn under this interpretation iff it is hereditarily n + 1-irresolvable. In the final section we study the variety Vn of all modal algebras validating the logic K4Cn . This is generated by its finite members, and indeed by the class Cn+ of all powerset algebras of the class Cn of all finite transitive frames of circumference at most n. Thus Vn is the class of all models of the set of equations satisfied by Cn+ . But we show something stronger: Vn is the class of all models of the set of universal sentences satisfied by Cn+ . It follows that every member of Vn can be embedded into an ultraproduct of members of Cn+ .

10.3 Clusters and Cycles A frame F = (W, R) is a directed graph, consisting of a binary relation R on a set W . A point x ∈ W is reflexive if x Rx, and irreflexive otherwise. If every member of W is (ir)reflexive, we say that R and F are (ir)reflexive. F is transitive when R is a transitive relation.

238

R. Goldblatt

For the most part we work with transitive frames and informally give R a temporal interpretation, so that if x Ry we may say that y is an R-successor of x, that y comes R-after x, or is R-later than x, etc. If x Ry but not y Rx, then y is strictly R-later than x. In a transitive frame, a cluster is a subset C of W that is an equivalence class under the equivalence relation {(x, y) : x = y or x Ry Rx}. A singleton {x} with x irreflexive is a degenerate cluster. All other clusters are non-degenerate: if C is nondegenerate then it contains no irreflexive points and the relation R is universal on C and maximally so. A simple cluster is non-degenerate with one element, i.e. a singleton {x} with x Rx. Let C x be the R-cluster containing x. Thus C x = {x} ∪ {y : x Ry Rx}. The relation R lifts to a well-defined relation on the set of clusters by putting C x RC y iff x Ry. This relation is transitive and antisymmetric, for if C x RC y RC x , then x Ry Rx and so C x = C y . A cluster C x is final if it is maximal in this ordering, i.e. there is no cluster C = C x with C x RC. This is equivalent to requiring that x Ry implies y Rx. We distinguish between finite and infinite sequences of R-related points. An Rpath in a frame is a finite sequence x0 , . . . , xn of (not necessarily distinct) points from W with xm Rxm+1 for all m < n. An ascending R-chain in a frame is an infinite sequence {xm : m < ω} of (not necessarily distinct) points from W with xm Rxm+1 for all m < ω. If R is transitive, this implies xm Rxk whenever m < k. The chain is strictly ascending if not xm+1 Rxm for all m, hence for transitive R, not xk Rxm whenever m < k. Observe that if x is a reflexive point then the constant infinite sequence x, . . . , x, . . . is an ascending R-chain that is not strict. In a transitive frame, a strictly ascending chain has all its terms xm being pairwise distinct, so there are infinitely many of them. Lemma 1 The following are equivalent for any transitive frame F = (W, R): (1) There are no strictly ascending chains of points in F . (2) Any ascending chain C0 RC1 R · · · · · · of R-clusters is ultimately constant in the sense that there exists an m such that Cm = Ck for all k > m. Proof Suppose (1) fails, and there is a strictly ascending R-chain {xm : m < ω}. Then C xm R C xm+1 and not C xm+1 R C xm for all m, so the cluster chain {C xm : m < ω} is strictly ascending and hence not ultimately constant, showing that (2) fails. Conversely if (2) fails, there is an ascending cluster chain {Cm : m < ω} that is not ultimately constant. So for all m there exists a k > m such that Cm = Ck , and hence not Ck RCm as Cm RCk and R is antisymmetric on clusters. Using this we can pick out a subsequence {C f m : m < ω} that is strictly ascending. Then choosing xm ∈ C f m for all m gives a chain of points {xm : m < ω} that is strictly ascending, showing that (1) fails.

A cycle of length n  1, or n-cycle, is a sequence x0 , . . . , xn−1 of n distinct points such that x0 , . . . , xn−1 , x0 is an R-path. There are no 0-cycles. A 1-cycle is given by a single point x0 having x0 Rx0 . Adopting terminology from graph theory, we define the circumference of frame F to be the supremum of the set of all lengths of cycles in F . In particular F has

10 Modal Logics that Bound the Circumference of Transitive Frames

239

circumference 0 iff it has no cycles, a property implying that F has no reflexive points. In a finite frame with non-zero circumference, since there are finitely many cycles the circumference is the length of a longest one. In a transitive frame, the points of any cycle are R-related to each other and are reflexive, and all belong to the same non-degenerate cluster. It follows that the circumference is 0 iff the frame is irreflexive. Moreover, any finite non-empty subset of a non-degenerate cluster can be arranged (arbitrarily) into a cycle. Thus for n  1, a frame has a cycle of length n iff it has a non-degenerate cluster of size at least n. So a non-zero circumference of a finite transitive frame is equal to the size of a largest non-degenerate cluster.

10.4 Models and Valid Schemes In the standard language of propositional modal logic, formulas ϕ, ψ, . . . are constructed from some denumerable set Var of propositional variables by the Boolean connectives , ¬, ∧ and the unary modality . The other Boolean connectives ⊥, ∨, →, ↔ are introduced as the usual abbreviations, and the dual modality ♦ is defined to be ¬  ¬. We write ∗ ϕ as an abbreviation of the formula ϕ ∧  ϕ, and ♦∗ ϕ for ϕ ∨ ♦ϕ. We now describe the Kripke semantics, or relational semantics, for this language. A model M = (W, R, V ) on a frame (W, R) is given by a valuation function V assigning to each variable p ∈ Var a subset V ( p) of W , thought of as the set of points of W at which p is true. The truth relation M , x |= ϕ of a formula ϕ being true at x in M is defined by an induction on the formation of ϕ as follows: • • • • •

M,x M,x M,x M,x M,x

|= p iff x ∈ V ( p), for p ∈ Var. |= . |= ¬ϕ iff M , x |= ϕ (i.e. not M , x |= ϕ). |= ϕ ∧ ψ iff M , x |= ϕ and M , x |= ψ. |=  ϕ iff M , y |= ϕ for every y ∈ W such that x Ry.

Consequently, where R ∗ = R ∪ {(x, x) : x ∈ W }, the reflexive closure of R, we have • M , x |= ♦ϕ iff M , y |= ϕ for some y ∈ W such that x Ry. • M , x |= ∗ ϕ iff M , y |= ϕ for every y ∈ W such that x R ∗ y. • M , x |= ♦∗ ϕ iff M , y |= ϕ for some y ∈ W such that x R ∗ y. A model M assigns to each formula ϕ the truth set M ϕ = {x ∈ W : M , x |= ϕ}. (The semantics could have given by defining truth sets inductively, starting with M p = V ( p).) We say that ϕ is true in model M , written M |= ϕ, if it is true at all points in M , i,e. M ϕ = W . We call ϕ valid in frame F , written F |= ϕ, if it is true in all models on F . We may also write M |= , or F |= , to indicate that every member of a set  of formulas is true in M , or valid in F . Given formulas ϕ0 , . . . , ϕn , define the formula Pn (ϕ0 , . . . , ϕn ) to be

240

R. Goldblatt

♦(ϕ1 ∧ ♦(ϕ2 ∧ · · · ∧ ♦(ϕn ∧ ♦ϕ0 )) · · · ) provided that n  1. For the case n = 0, put P0 (ϕ0 ) = ♦ϕ0 . This definition can made more formal by inductively defining a sequence {Pn : n < ω} of operations on formulas, with Pn being n + 1-ary. P0 is as just given, and for n > 0 we inductively put Pn (ϕ0 , ϕ1 , . . . , ϕn ) = ♦(ϕ1 ∧ Pn−1 (ϕ0 , ϕ2 , . . . , ϕn )). Then the next result follows readily from the properties of the truth relation. Lemma 2 In any model M on any frame, M , x0 |= Pn (ϕ0 , . . . , ϕn ) iff there is an R-path x0 R · · · Rxn+1 such that M , xi |= ϕi for 1  i  n and M , xn+1 |= ϕ0 .

 Let Dn (ϕ0 , . . . , ϕn ) be i< jn ¬(ϕi ∧ ϕ j ). For n = 0 this is the empty conjunction, which we take to be the constant tautology . Define Cn to be the scheme

∗ Dn (ϕ0 , . . . , ϕn ) → (♦ϕ0 → ♦(ϕ0 ∧ ¬Pn (ϕ0 , . . . , ϕn )). In other words, Cn is the set of all uniform substitution instances of

∗ Dn ( p0 , . . . , pn ) → (♦ p0 → ♦( p0 ∧ ¬Pn ( p0 , . . . , pn )), where p0 , . . . , pn are variables. Theorem 1 Let F be any transitive frame, and n  0. 1. F |= Cn iff F has circumference at most n and has no strictly ascending chains. 2. If F is finite, then F |= Cn iff F has circumference at most n. Proof Fix a list p0 , . . . , pn of variables and abbreviate Dn ( p0 , . . . , pn ) to Dn and Pn ( p0 , . . . , pn ) to Pn . Then the scheme Cn is valid in F iff its instance

∗ Dn → (♦ p0 → ♦( p0 ∧ ¬Pn ))

(10.7)

is valid, since validity in a frame preserves uniform substitution of formulas for variables. (1) Assume F |= Cn . Then we show that F has circumference at most n and no strictly ascending chains. For, if that were to fail there would be two possible cases, the first being that F has circumference greater than n, so has a cycle with at least n + 1 elements, say x0 , . . . , xn . If n = 0 then Dn = . If n  1, take a model on F having V ( pi ) = {xi } for all i  n. The xi ’s are distinct and each formula ¬( pi ∧ p j ) with i < j  n is true at every point in the model, hence so is Dn . So whatever the value of n, Dn is true everywhere, and therefore so is ∗ Dn . By transitivity all points of the cycle are R-related to each other, hence to x0 , including x0 itself. Therefore ♦ p0 is true at x0 . By Lemma 2, the R-path x0 Rx1 R · · · xn Rx0 ensures that Pn is true at x0 . Hence as p0 is true only at x0 , p0 ∧ ¬Pn is false everywhere, therefore so is

10 Modal Logics that Bound the Circumference of Transitive Frames

241

♦( p0 ∧ ¬Pn ). Altogether these facts imply that the instance (10.7) of Cn is false at

x0 (in fact at every point of the cycle), contradicting the assumption that F |= Cn . The second case is that F has a strictly ascending R-chain {xm : m < ω}. Then take a model on F having V ( pi ) = {xm : m ≡ i mod (n + 1)} for all i  n. Since the points xm of the chain are all distinct, the sets V ( pi ) are all pairwise disjoint, so each formula ¬( pi ∧ p j ) is true everywhere, hence so is ∗ Dn . Since each congruence class mod n + 1 is cofinal in ω, each set V ( pi ) is cofinal in the chain, i.e. for all k < ω there is an m > k with xk Rxm ∈ V ( pi ). In particular this implies that ♦ p0 is true at every point of the chain. Now if p0 is true at point xm , then the R-path xm Rxm+1 R · · · xm+n Rxm+(n+1) has pi true at xm+i for 1  i  n and p0 true at xm+(n+1) , so Pn is true at xm by Lemma 2. Since p0 is true only at points of the chain, it follows that ♦( p0 ∧ ¬Pn ) is false everywhere. Altogether then, (10.7) is false at all points of the chain, again contradicting F |= Cn . The contradictions in both cases ensure that if F |= Cn then F has circumference at most n and no strictly ascending chains. Finally, assume that F has circumference at most n and no strictly ascending chains. Suppose then, for the sake of contradiction, that Cn is not valid in F . Hence (10.7) is not valid and so is false at some x in some model on F . Working in that model, ♦ p0 and ∗ Dn are true at x, so p0 is true at some x0 with x Rx0 , and the formulas ¬( pi ∧ p j ) are all true throughout {y ∈ W : x R ∗ y}; while ♦( p0 ∧ ¬Pn ) is false at x. As x Rx0 , p0 ∧ ¬Pn is false at x0 . Since p0 is true at x0 , this implies that Pn is true at x0 . Hence by Lemma 2, there is an R-path x0 Rx1 R · · · Rxn+1 such that pi is true at xi for 1  i  n and p0 is true at xn+1 . The argument then repeats: by transitivity x Rxn+1 , so since ♦( p0 ∧ ¬Pn ) is false at x, p0 ∧ ¬Pn is false at xn+1 , and hence Pn is true at xn+1 . So by Lemma 2 again, there is an R-path xn+1 Rxn+2 R · · · Rx2(n+1) such that pi is true at xn+1+i for 1  i  n and p0 is true at x2(n+1) . Iterating this construction ad infinitum, we generate an ascending R-chain {xm : m < ω} of points of W such that for each i  n, pi is true at xm iff m ≡ i mod (n + 1). Hence pi is true cofinally along the chain. By assumption there are no strictly ascending chains, so by Lemma 1 the ascending cluster chain {|xm | : m < ω} is ultimately constant. It follows that the point chain cannot continue moving forward into a ‘later’ cluster forever, so some cluster C in the cluster chain must contain some tail {xm : m  k} of the point chain. Then xk , xk+1 ∈ C and xk Rxk+1 , so C is a nondegenerate cluster. The tail {xm : m  k} contains points at which each of p0 , . . . , pn are true, by the cofinality of the truth of these pi ’s. But by the truth of ∗ Dn at x, no two of these variables are ever true at the same point of the point chain. Hence the cluster C contains at least n + 1 distinct points, which form an n + 1-cycle. This contradicts the assumption that F has circumference at most n. The contradiction forces us to conclude that F |= Cn . (2) This follows immediately from (1), as a finite transitive frame cannot have any strictly ascending chains.

The cases n = 0, 1 of this theorem for Cn are essentially known. C0 is

242

R. Goldblatt

∗ → (♦ϕ0 → ♦(ϕ0 ∧ ¬♦ϕ0 )). That is valid in the same frames as (♦ϕ0 → ♦(ϕ0 ∧ ¬♦ϕ0 )), an equivalent form of the Löb axiom ( ϕ0 → ϕ0 ) →  ϕ0 . But it is well known that the Löb axiom is valid in a frame F iff F is transitive and has no ascending chains (see Boolos 1993, p. 75 or Blackburn et al. 2001, Example 3.9). Now a transitive frame has circumference 0 iff it is irreflexive, and in a transitive irreflexive frame every ascending chain is strictly ascending. From these facts it can be seen that a transitive frame has no ascending chains iff it has circumference 0 and no strictly ascending chains. For n = 1, C1 is valid in the same transitive frames as the Grz-variant Grz of (10.5) (see Theorem 2 below). But a transitive frame validates Grz iff it has no ascending chains x0 Rx1 R · · · with xn = xn+1 for all n (Amerbauer 1996, Lemma 1.1). The latter condition prevents there being any clusters with more than one element, ensuring that the circumference is at most 1. Thus it can be seen that a transitive frame validates Grz iff it has circumference at most 1 and no strictly ascending chains.

10.5 Logics and Canonical Models A normal logic is any set L of formulas that includes all tautologies and all instances of the scheme K:

(ϕ → ψ) → ( ϕ →  ψ),

and whose rules include modus ponens and -generalisation (from ϕ infer  ϕ). The set of all formulas valid in some given class of frames is a normal logic. The smallest normal logic, known as K, consists of the formulas that are valid in all frames. We mostly use the standard naming convention for logics that if 1 , . . . , n is a sequence of sets of formulas then K1 · · · n denotes the smallest normal logic that includes 1 ∪ · · · ∪ n . The members of a logic L may be referred to as the L-theorems. A formula ϕ is L-consistent if ¬ϕ is not an L-theorem, and a set of formulas is L-consistent iff the conjunction of any of its finite subsets is L-consistent. A formula is an L-theorem iff it belongs to every maximally L-consistent set of formulas. A normal logic L has the canonical frame F L = (W L , R L ), where W L is the set of maximally L-consistent sets of formulas, and x R L y iff {ϕ :  ϕ ∈ x} ⊆ y iff {♦ϕ : ϕ ∈ y} ⊆ x. By standard canonical frame theory (e.g. Blackburn et al. 2001, Chap. 4 or Goldblatt 1992, Chap. 3), we have that for all formulas ϕ and all x ∈ W L :

ϕ ∈ x

iff

for all y ∈ W L , x R L y implies ϕ ∈ y.

(10.8)

10 Modal Logics that Bound the Circumference of Transitive Frames

243

The canonical model M L on F L has V ( p) = {x ∈ W L : p ∈ x} for all p ∈ Var. With the help of (10.8) it can be shown that it satisfies M L , x |= ϕ iff ϕ ∈ x,

(10.9)

a result known as the Truth Lemma for M L . It implies that the formulas that are true in M L are precisely the L-theorems, since these are precisely the formulas that belong to every member of W L . Thus M L |= ϕ iff ϕ ∈ L. A logic L is transitive if it includes all instances of the scheme 4:

 ϕ →   ϕ.

The set of formulas valid in some class of transitive frames is a transitive normal logic. The smallest transitive normal logic is known as K4. Its theorems are precisely the formulas that are valid in all transitive frames. If a logic L extends K4, equivalently if M L |= 4, then the relation R L of its canonical frame is transitive. Scheme 4 has the equivalent dual form ♦♦ϕ → ♦ϕ. This has the weaker variant w4: ♦♦ϕ → ♦∗ ϕ, i.e. ♦♦ϕ → ϕ ∨ ♦ϕ. It plays a significant role in the topological semantics of Sect. 10.8. The theorems of Kw4, the smallest normal logic to include w4, are precisely the formulas that are valid in all frames that are weakly transitive in the sense that x Ry Rz implies x R ∗ z. If M L |= w4, then R L is weakly transitive. Since ∗ is a theorem of any normal logic, the scheme C0 is deductively equivalent over K to the dual form (10.4) of Löb’s axiom. But scheme 4 is derivable from Löb’s axiom over K (see Boolos 1993, p. 11), so K4C0 =KC0 =GL. For n = 1 we have: Theorem 2 The scheme C1 is deductively equivalent over Kw4 to the scheme

♦ϕ0 → ♦(ϕ0 ∧ ¬♦(¬ϕ0 ∧ ♦ϕ0 ))

(10.10)

that is itself deductively equivalent over K to the Grz-variant Grz of (10.5). Proof For any ϕ0 , the formula C1 (ϕ0 , ¬ϕ0 ) is

∗ ¬(ϕ0 ∧ ¬ϕ0 ) → (♦ϕ0 → ♦(ϕ0 ∧ ¬♦(¬ϕ0 ∧ ♦ϕ0 ))). But ¬(ϕ0 ∧ ¬ϕ0 ) is a tautology, so ∗ ¬(ϕ0 ∧ ¬ϕ0 ) is derivable in K, hence can be detached from C1 (ϕ0 , ¬ϕ0 ) to derive (10.10). In the converse direction, for any ϕ0 and ϕ1 the formula

∗ ¬(ϕ0 ∧ ϕ1 ) →

  (10.10) → (♦ϕ0 → ♦(ϕ0 ∧ ¬♦(ϕ1 ∧ ♦ϕ0 )))

can be shown to be valid in all weakly transitive frames, hence is a theorem of Kw4. Using it and tautological reasoning, from (10.10) we can derive

244

R. Goldblatt

∗ ¬(ϕ0 ∧ ϕ1 ) → (♦ϕ0 → ♦(ϕ0 ∧ ¬♦(ϕ1 ∧ ♦ϕ0 ))), which is C1 (ϕ0 , ϕ1 ).



We now introduce an apparent weakening of Cn , defining C∗n to be the scheme

∗ Dn (ϕ0 , . . . , ϕn ) → (♦ϕ0 → ♦∗ (ϕ0 ∧ ¬Pn (ϕ0 , . . . , ϕn )). Since any formula of the form ♦ϕ → ♦∗ ϕ is a tautology, C∗n is a tautological consequence of Cn , so is included in any logic that includes Cn , and is true in any model in which Cn is true. In the converse direction we have Theorem 3 1. Let M be a model with weakly transitive relation R. Then M |= C∗n implies M |= Cn . 2. Cn is included in any normal logic that includes w4 and C∗n . Proof (1) Suppose that M |= Cn . Then there is an instance

∗ Dn (ϕ0 , . . . , ϕn ) → (♦ϕ0 → ♦(ϕ0 ∧ ¬Pn (ϕ0 , . . . , ϕn ))

(10.11)

of Cn that is not true at some point x in M . Working in M , we have x |= ∗ Dn and x |= ♦ϕ0 , but x |= ♦(ϕ0 ∧ ¬Pn ). If n = 0, then immediately x |= (ϕ0 ∧ ¬Pn ) as Pn = ♦ϕ0 and x |= ♦ϕ0 . If n ≥ 1 we proceed as follows. Since x |= ♦ϕ0 , there is some x0 with x Rx0 and x0 |= ϕ0 . Since x |= ♦(ϕ0 ∧ ¬Pn ), this gives x0 |= ϕ0 ∧ ¬Pn , hence x0 |= Pn . Thus by Lemma 2, there is an R-path x0 R · · · Rxn+1 such that xi |= ϕi for 1  i  n and xn+1 |= ϕ0 . Now suppose x |= ϕ0 . Then x |= ϕ1 as x |= ∗ Dn . But x1 |= ϕ1 , so we have x Rx0 Rx1 and x = x1 , hence x Rx1 by weak transitivity. Thus x can replace x0 to give the R-path x Rx1 R · · · Rxn+1 demonstrating that x |= Pn . This proves that x |= ϕ0 implies x |= Pn , and therefore that x |= (ϕ0 ∧ ¬Pn ). So in any case we have x |= (ϕ0 ∧ ¬Pn ). Since we already have x |= ♦(ϕ0 ∧ ¬Pn ), we conclude that x |= ♦∗ (ϕ0 ∧ ¬Pn ). Together with x |= ∗ Dn and x |= ♦ϕ0 , this shows that M |= C∗n . (2) Let L be any normal logic that includes w4 and C∗n . We apply (1) with M as the canonical model M L . Since w4 ⊆ L, from the Truth Lemma (10.9) we get M L |= w4 and so R L is weakly transitive. Since C∗n ⊆ L we get M L |= C∗n . Hence

by (1), M L |= Cn , so Cn ⊆ L. This theorem implies that Kw4C∗n = Kw4Cn and K4C∗n = K4Cn . But for n = 0 or 1 these are the one logic. Since K4C0 =KC0 =GL (above), we have Kw4C0 =K4C0 . Also Theorem 2 gives Kw4C1 =Kw4Grz , and Gabelaia (Gabelaia 2004, Lemma 4.5) has shown that scheme 4 is derivable from w4 and Grz over K, so Kw4C1 =K4Grz = K4C1 .

10 Modal Logics that Bound the Circumference of Transitive Frames

245

For n ≥ 2, Kw4Cn is strictly weaker than K4Cn . To see this, let F be an irreflexive frame consisting of two points that are R-related to each other but not to themselves. F is weakly transitive but not transitive, and validates Cn for n ≥ 2, hence F |= Kw4Cn but F |= K4Cn . It is straightforward to give a proof-theoretic derivation of Cn+1 in K4Cn , showing that K4Cn+1 ⊆ K4Cn . We will prove in Theorem 4 that K4Cn is characterised by validity in all finite transitive frames of circumference at most n. This is already known for n = 0, 1, as mentioned earlier. K4C0 , the Gödel-Löb logic, was first shown by Segerberg (1971) to be characterised by the class of finite transitive irreflexive frames, which are the finite transitive frames of circumference 0. Also K4C1 , as the logic K4Grz , was shown by Amerbauer (1996) to be characterised by the class of finite transitive antisymmetric frames, i.e. those having only singleton clusters, hence circumference at most 1. We will however include the cases n = 0, 1 in our completeness proof to follow.

10.6 Finite Model Property for K4Cn Let M = (W, R, V ) be any model that has transitive R and M |= Cn , i.e. every instance of Cn is true in M . For example, the canonical model of any normal logic extending K4Cn has these properties. When working within M we may sometimes leave out its name and just write x |= ϕ when M , x |= ϕ. We now set up a filtration of M . Let  be a finite set of formulas that is closed under subformulas. For each x ∈ W let x  be the set {ϕ ∈  : x |= ϕ} of all members of  that are true at x in M . An equivalence relation ∼ on W is given by putting x ∼ y iff x  = y  . We write |x| for the equivalence class {y ∈ W : x ∼ y}, and put W = {|x| : x ∈ W }. The set W is finite, because the map |x| → x  is a well-defined injection of W into the finite powerset of . Thus W has size at most 2size  . Let M = (W , R , V ) be the standard transitive filtration of M through . Thus |x|R |y| iff { ϕ, ϕ :  ϕ ∈ x  } ⊆ y  , and V ( p) = {|x| : x |= p} for p ∈ , while V ( p) = ∅ otherwise. The relation R is transitive and has the important property that (10.12) x Ry implies |x|R |y|, for all x, y ∈ W . The Filtration Lemma gives that for all ϕ ∈  and all x ∈ W , M , |x| |= ϕ iff M , x |= ϕ.

(10.13)

We will use the fact that for any R -cluster C, and any formula ♦ϕ ∈ , if |x|, |y| ∈ C, then M , x |= ♦ϕ iff M , y |= ♦ϕ.

(10.14)

246

R. Goldblatt

This follows from the Filtration Lemma, since if |x| and |y| are in the same R cluster, then exactly the same formulas of the form ♦ϕ are true at both of them in M . We will replace R by a relation R  ⊆ R in such a way that each R -cluster C in M is decomposed into an R  -cluster with at most n elements and (possibly) some singleton (i.e. one-element) R  -clusters. We use letters α, β for members of W . Each such member is a subset of W . For each x ∈ W we write x  α to mean that there is some y ∈ α such that x Ry. This could be read ‘x can see into α’. We write x  α if there is no such y. The next result uses the axiom Cn to establish a property of R that will allow us to refine it into a transitive relation whose clusters have at most n elements. Lemma 3 For any R -cluster C, there is an element x ∗ ∈ W such that |x ∗ | ∈ C and a subset C ∗ ⊆ C such that x ∗  α for all α ∈ C ∗ , and for all y ∈ W , if x ∗ Ry and |y| ∈ C, then |y| ∈ C ∗ and y  α for all α ∈ C ∗ .

(10.15)

Moreover C ∗ has at most n elements. If C is R -degenerate then C ∗ is empty. Proof Take any R -cluster C. For each x ∈ W , define C(x) = {α ∈ C : x  α}. Choose x ∗ ∈ W such that |x ∗ | ∈ C and C(x ∗ ) has least possible size, subject to this. Then define C ∗ = C(x ∗ ), which immediately ensures that x ∗  α for all α ∈ C ∗ . To prove (10.15), suppose x ∗ Ry and |y| ∈ C. Then |y| ∈ C ∗ by definition of C ∗ , since x ∗  |y| ∈ C. Also, C(y) is a subset of C(x ∗ ) by transitivity of R, so by minimality of C(x ∗ ) we get C(y) = C(x ∗ ). Hence if α ∈ C ∗ , then α ∈ C(x ∗ ) = C(y), so y  α as required. Note that if |x ∗ | ∈ C ∗ , then x ∗  |x ∗ |, therefore |x ∗ |R |x ∗ | by (10.12), making / C ∗ , but also C = {|x ∗ |} as C C non-degenerate. Thus if C is degenerate, then |x ∗ | ∈ ∗ ∗ is a singleton containing |x |, hence C is empty as it is a subset of C. It remains to show that C ∗ has at most n elements. This is where the core role of axiom Cn is played. Suppose, for the sake of contradiction, that C ∗ has n + 1 distinct members α0 , . . . , αn . By standard filtration theory, these members are definable as subsets of M , i.e. for each i  n there is a formula ϕi such that for all y ∈ W , M , y |= ϕi iff y ∈ αi (iff |y| = αi ).

(10.16)

If n  1, since the αi ’s are distinct equivalence classes under ∼ they are pairwise disjoint, and hence for all i < j  n, the formula ¬(ϕi ∧ ϕ j ) is true in M at every y ∈ W , therefore so is Dn (ϕ0 , . . . , ϕn ). Thus by the semantics of ∗ , M , x ∗ |= ∗ Dn (ϕ0 , . . . , ϕn ).

(10.17)

But also as ∗ D0 (ϕo ) = ∗ , (10.17) holds as well when n = 0. Now since α0 ∈ C ∗ we have x ∗  α0 , hence using (10.16) we get M , x ∗ |= ♦ϕ0 . Combining this with (10.17) and the fact that every instance of Cn is true in M , we get that

10 Modal Logics that Bound the Circumference of Transitive Frames

247

M , x ∗ |= ♦(ϕ0 ∧ ¬Pn (ϕ0 , . . . , ϕn )). Hence there is some x0 with x ∗ Rx0 and M , x0 |= ϕ0 ∧ ¬Pn (ϕ0 , . . . , ϕn ).

(10.18)

Therefore x0 |= ϕ0 , so by (10.16) |x0 | = α0 ∈ C ∗ . We will now construct an R-path x0 , . . . , xn with xi ∈ αi , hence by (10.16) xi |= ϕi , for all i  n. If n = 0 we already have x0 ∈ α0 and there is nothing further to do. If n > 0, then assume inductively that for some k < n we have defined an R-path x0 , . . . , xk with xi ∈ αi for all i  k. Since x ∗ Rx0 , by transitivity we get x ∗ Rxk . As |xk | = αk ∈ C, we get xk  αk+1 ∈ C ∗ by (10.15), so there is some xk+1 ∈ αk+1 such that xk Rxk+1 . That completes the inductive construction of the R-path x0 , . . . , xn with xi ∈ αi for all i  n. With one more repetition we observe that x ∗ Rxn , so xn  α0 by (10.15), hence xn Rxn+1 for some xn+1 ∈ α0 , thus xn+1 |= ϕ0 . But now applying Lemma 2 to the R-path x0 , . . . , xn , xn+1 we conclude that x0 |= Pn (ϕ0 , . . . , ϕn ). Since x0 |= ¬Pn (ϕ0 , . . . , ϕn ) by (10.18), this is a contradiction, forcing us to conclude that C ∗ cannot have more that n elements, and completing the proof of Lemma 3.

Observe that in this lemma, if R is reflexive then x ∗  |x ∗ | ∈ C and so |x ∗ | ∈ C ∗ . If also n = 1 it follows that C ∗ = {|x ∗ |} and if x ∗ Ry and |y| ∈ C, then |y| = |x ∗ |.

(10.19)

When (10.19) holds, |x ∗ | was called virtually last in C by Segerberg ((Segerberg, 1971), II.3), who showed that if M is a model of S4Grz, then every cluster of M contains a virtually last element. He then transformed M into a semantically equivalent partially ordered model by replacing each cluster by an arbitrary linear ordering of its elements ending with a virtually last element. / C. Segerberg If n = 0, then C ∗ is empty, and hence by (10.15), x ∗ Ry implies |y| ∈ (1971, II.2) showed that if M is a model of GL, then every cluster of M contains an element |x ∗ | with x ∗ having this property . He then transformed M into a semantically equivalent strictly partially ordered (i.e. transitive and irreflexive) model by replacing each cluster by an arbitrary strict linear ordering of its elements ending with |x ∗ |. Lemma 3 thus encompasses Segerberg’s analysis for S4Grz and GL. We now proceed to use the lemma to transform M into an equivalent model of circumference at most n. For each R -cluster C, choose and fix a point x ∗ and associated set C ∗ ⊆ C as given by the lemma. We will call x ∗ the critical point for C. Then we define a subrelation R  of R to refine the structure of each R -cluster C by decomposing it into the subset C ∗ as an R  -cluster together with a degenerate R  -cluster {α} for each α ∈ C − C ∗ . These singleton clusters all have C ∗ as an R  -successor but are R  -incomparable with each other. So the structure replacing C looks like

248

R. Goldblatt



• {α}

·········



C∗ with the bullets being the degenerate R  -clusters determined by the points of C − C ∗ , and the large circle representing C ∗ . All elements of W that R -precede C continue to R  -precede all members of C, while elements of W that come R -after C continue to come R  -after all members of C. Doing this to each cluster of (W , R ) produces a new transitive frame (W , R  ) with R  ⊆ R . R  can be more formally defined on W by specifying, for all α, β ∈ W , that α R  β iff either • α and β belong to different R -clusters and α R β; or • α and β belong to the same R -cluster C and β ∈ C ∗ . Thus every element of C is R  -related to every element of C ∗ , and the restriction of R  to C is equal to the relation C × C ∗ . So we could also define R  as the union of these relations C × C ∗ for all R -clusters C, plus all inter-cluster instances of R . If C is R -degenerate, then C ∗ = ∅ by Lemma 3, and so C × C ∗ = ∅. If C is non-R -degenerate, then the restriction of R to C is C × C, extending C × C ∗ . This implies that R  is a subrelation of R on W . Note that if C ∗ is empty, then C − C ∗ = C = ∅, and all members of C are R  irreflexive. In that case C is replaced in the new frame (W , R  ) by a non-empty set of degenerate R  -clusters. In the case n = 0, by Lemma 3 every R -cluster C has empty C ∗ , and so (W , R  ) consists entirely of R  -irreflexive points and therefore has circumference 0. In the alternative case n  1, any non-degenerate R  -cluster will have the form C ∗ for some R -cluster C, and so have at most n-elements. Since any R  -cycle is included in a non-degenerate R  -cluster, it follows that all R  -cycles have length at most n and (W , R  ) has circumference at most n. So in any case the finite transitive frame (W , R  ) validates Cn by Theorem 1. Now put M  = (W , R  , V ), a model differing from M only in that R  replaces R . We show that replacing M by M  leaves the truth relation unchanged for any formula ϕ ∈ : for all x ∈ W , M  , |x| |= ϕ iff M , |x| |= ϕ.

(10.20)

The proof of this proceeds by induction on the formation of ϕ. If ϕ is a variable, then (10.20) holds because M  and M have the same valuation V . The induction cases of the Boolean connectives are standard. Now make the induction hypothesis on ϕ that (10.20) holds for all x ∈ W , and suppose ♦ϕ ∈ . If M  , |x| |= ♦ϕ, then |x|R  |y| and M  , |y| |= ϕ for some y. Then |x|R |y| as R  ⊆ R , and M , |y| |= ϕ by induction hypothesis. Hence M , |x| |= ♦ϕ.

10 Modal Logics that Bound the Circumference of Transitive Frames

249

Conversely, assume M , |x| |= ♦ϕ. Then M , x |= ♦ϕ by the Filtration Lemma (10.13). Let C be the R -cluster of |x|, and x ∗ be the chosen critical point for C fulfilling Lemma 3. Then |x| and |x ∗ | both belong to C, so M , x ∗ |= ♦ϕ by (10.14). Hence there is some y ∈ W with x ∗ Ry and M , y |= ϕ. Then M , |y| |= ϕ by the Filtration Lemma (10.13), so M  , |y| |= ϕ by induction hypothesis. If |y| ∈ C, then / C, |y| ∈ C ∗ by (10.15), so then |x|R  |y| by definition of R  since |x| ∈ C. But if |y| ∈ then as |x ∗ |R |y| by (10.12), and so |x|R |y|, the R -cluster of |y| is strictly R later than C, and again |x|R  |y| by definition of R  . So in any case we have |x|R  |y| and M  , |y| |= ϕ, which gives M  , |x| |= ♦ϕ. That completes the inductive case for ♦ϕ, and hence proves that (10.20) holds for all ϕ ∈ . Theorem 4 For all n  0 and any formula ϕ the following are equivalent. 1. ϕ is a theorem of K4Cn . 2. ϕ is valid in all transitive frames that have circumference at most n and no strictly ascending chains. 3. ϕ is valid in all finite transitive frames that have circumference at most n. Proof 1 implies 2: Let L be the set of formulas that are valid in all transitive frames that have circumference at most n and no strictly ascending chains. Then L is a transitive normal logic that contains Cn by Theorem 1. Hence L includes K4Cn . 2 implies 3: This follows immediately from the fact that a finite frame has no strictly ascending chains. 3 implies 1: Put L = K4Cn . Suppose 1 fails for ϕ, i.e. ϕ is not a theorem of L. / x, hence M L , x |= ϕ by the Truth Lemma Then there exists an x ∈ W L with ϕ ∈ (10.9). In the above construction of a finite model M  , let M be M L , and  be the set of all subformulas of ϕ. Then by (10.13) and (10.20), M  , |x| |= ϕ. But the frame of M  is finite, transitive and has circumference at most n. This shows that ϕ fails to be valid on such a frame, so 3 does not hold for ϕ.

This theorem yields an alternative proof that K4Cn+1 ⊆ K4Cn , since any formula valid in all finite transitive frames that have circumference at most n + 1 is valid in all finite transitive frames that have circumference at most n. A transitive frame consisting of a single cycle of length n + 1 will validate K4Cn+1 but not Cn , showing that that the logics {K4Cn : n  0} form a strictly decreasing sequence of extensions of K4.  Corollary 1 K4= n0 K4Cn . Proof K4 is a sublogic of K4Cn for all n. For the converse inclusion, if ϕ is not a K4-theorem, then it is invalid in some finite transitive frame F . If n is the size of the largest cycle in F , or 0 if there are no cycles, then F has circumference at most

n, so by Theorem 4, ϕ is not a K4Cn -theorem. The proof of Theorem 4 establishes something more. It gives a computable upper bound on the size of the falsifying model M  , showing that ϕ is a K4Cn -theorem iff it is valid in all finite transitive frames that have circumference at most n and have size

250

R. Goldblatt

at most 2k , where k is the number of subformulas of ϕ. But it is decidable whether a given finite frame is transitive and has circumference at most n, so by well-known arguments (Blackburn et al. 2001, Sect. 6.2), it follows that it is decidable whether or not a given formula is a K4Cn -theorem. A potential strengthening of Cn is to replace ∗ in its antecedent by . But the resulting formula is valid in all finite frames validating K4Cn , so is a theorem of that logic.

10.7 Extensions of K4Cn We will now show how to apply and adapt the construction of M  to obtain finiteframe characterisations of various logics that extend K4Cn .

10.7.1 Seriality The D-axiom ♦ is valid in a frame iff its relation is serial, meaning that every point has an R-successor: ∀w∃y(x Ry). The inclusion of this axiom in a logic ensures that its canonical model is serial, as each point satisfies ♦ . We assume now that the transitive model M having M |= Cn also has serial R. We use this to show that the subrelation R  of M  is also serial. Suppose that a point α ∈ W has an R -cluster C that is not R -final, i.e. there is some cluster C  with C R C  but not C  R C. Then any β ∈ C  has α R  β so is an R  -successor of α. Alternatively, if C is final, let x ∗ be the critical point for C. There is a y with x ∗ Ry, as R is serial. But then |x ∗ |R |y| by (10.12), and so |y| ∈ C as |x ∗ | ∈ C and C is final. But then |y| ∈ C ∗ by (10.15). Since every member of C is R  -related to every member of C ∗ , we get that α R  |y|, completing the proof that R  is serial and hence the frame of M  validates ♦ . Since every cluster in a finite transitive frame has a successor cluster that is final, we see that such a frame is serial iff every final cluster is non-degenerate. From all this we can infer that KD4Cn , the smallest normal extension of K4Cn to contain ♦ , is sound and complete for validity in all finite transitive frames that have circumference at most n and every final cluster non-degenerate.

10.7.2 S4Cn S4 is the logic K4T, where T is the scheme  ϕ → ϕ. A frame validates T iff its relation is reflexive, and the inclusion of T in a logic ensures that the canonical frame is reflexive. Assume that n ≥ 1 and the model M |= Cn as above has reflexive R, hence R is reflexive by (10.12). Thus no R -cluster is degenerate. We modify the

10 Modal Logics that Bound the Circumference of Transitive Frames

251

definition of R  to make it reflexive as well, so that the frame of M  validates T. The change occurs in the case of an R -cluster C having C = C ∗ . Then instead of making the singletons {α} for α ∈ C − C ∗ be degenerate, we make them all into simple R  -clusters by requiring that α R  α. Formally this is done by adding to the definition of α R  β the third possibility that • α and β belong to the same R -cluster C, and α = β ∈ C − C ∗ . Equivalently, the restriction of R  to C is equal to (C × C ∗ ) ∪ {(α, α) : α ∈ C − C ∗ }. Since R is reflexive, this modified definition of R  still has R  ⊆ R , and that is enough to preserve the proof of the truth invariance result (10.20) for the modified model M  . From this it follows that S4Cn , the smallest normal extension of K4Cn to contain the scheme T, is sound and complete for validity in all finite reflexive transitive frames that have circumference at most n. We left out the case n = 0 here because the addition of C0 to S4 results in the inconsistent logic that has all formulas as theorems. The logics {S4Cn : n  1} form a strictly decreasing sequence whose intersection is S4.

10.7.3 Linearity K4.3 is the smallest normal extension of K4 that includes the scheme

(ϕ ∧  ϕ → ψ) ∨ (ψ ∧  ψ → ϕ).

(10.21)

A frame validates this scheme iff it is weakly connected, i.e. satisfies ∀x∀y∀z(x Ry ∧ x Rz → y Rz ∨ y = z ∨ z Ry). The canonical frame of any normal extension of K4.3 is weakly connected. If a transitive weakly connected frame is point-generated, i.e. W = {x} ∪ {y ∈ W : x Ry} for some point x ∈ W , then the frame is connected: it satisfies ∀y∀z(y Rz ∨ y = z ∨ z Ry). Such a connected frame can be viewed as a linearly ordered set of clusters. Now let L be a normal extension of K4.3Cn and ϕ a formula that is not a theorem / x. Put W = {x} ∪ {y ∈ W L : x R L y} and of L. Then there is some x ∈ W L with ϕ ∈ let M = (W, R, V ) be the submodel of M L based on W . Then R is transitive, and is connected since R L is weakly connected and (W, R) is point-generated. Also the fact that W is R L -closed and M L |= Cn ensures that M |= Cn . Take  to be the set of subformulas of ϕ, and M to be the standard transitive filtration of M through . Then M , |x| |= ϕ. Moreover, since R is connected, it follows from (10.12) that R is connected.

252

R. Goldblatt

We now modify each R -cluster C to obtain a suitable model M  with circumference at most n. The way we did this for K4Cn allows some leeway in how we define the relation R  on C − C ∗ . The minimal requirement to make the construction work is that each member of C − C ∗ forms a singleton cluster that R  -precedes C ∗ . Instead of making these members of C − C ∗ incomparable with each other, we could form them arbitrarily into a linear sequence under R  that precedes C ∗ , getting a structure that looks like





······



C∗

The model M  will still satisfy (10.20), so will falsify ϕ at |x|. Since the original R -clusters are linearly ordered by R , this construction will make the R  -clusters be linearly ordered by R  , with the non-degenerate ones being of size at most n. Hence the frame underlying M  will be connected and validate K4.3Cn . This leads to the conclusion that the logic K4.3Cn is sound and complete for validity in all finite transitive and connected frames that have circumference at most n. The analysis of the scheme T can also be applied here to show further that for n  1, the logic S4.3Cn is sound and complete for validity in all finite reflexive transitive and connected frames that have circumference at most n.

10.7.4 Simple Final Clusters The McKinsey axiom is the scheme M:

 ♦ϕ → ♦  ϕ.

(10.22)

The logic S4M is sometimes called S4.1. Since S4M ⊆ S4Grz (Soboci´nski 1964), S4MC1 is just S4C1 . However K4MC1 is stronger than K4C1 (see below). M is equivalent over K to ♦( ϕ ∨  ¬ϕ), which we will make use of, and from which the seriality axiom ♦ is derivable. A necessary and sufficient condition for a transitive frame to validate M is ∀x∃y(x Ry & ∀z(y Rz implies y = z)). For a finite frame, this is equivalent to requiring that every final cluster is simple. Segerberg (1968) gave the first proof of the finite model property for S4M over frames satisfying this requirement. Here we use an adaptation of a proof method of Chagrov and Zakharyaschev (1997, Theorem 5.34) that works also for K4M.

10 Modal Logics that Bound the Circumference of Transitive Frames

253

Let M be a transitive model with M |= M and M |= Cn for some n ≥ 1. Let  be a finite set of formulas that is closed under subformulas (e.g. the set of subformulas of some non-theorem of K4MCn ). Let  be the closure under subformulas of { ϕ,  ¬ϕ : ϕ ∈ }. Then  is still finite and closed under subformulas, so the finite filtration M can be constructed as previously. We show that any R -final cluster C is simple. To show that C has one element, it is enough to show that all members of C satisfy the same formulas from  in M . Since all members of the same cluster satisfy the same formulas of the form  ψ in any model, the problem reduces to the case of a formula ϕ ∈ . Take any x with |x| ∈ C. Then M , x |= ♦( ϕ ∨  ¬ϕ) as M |= M. Hence there exists y with x Ry and M , y |=  ϕ ∨  ¬ϕ. Then |x|R |y|, so |y| ∈ C as C is final. As  ϕ,  ¬ϕ ∈ , and one of them is true at y in M , then one of them is true at |y| in M . Hence either ϕ is true at every member of C, or is false at every member of C. That proves that C is a singleton, so |x| = |y| where y is as above, hence |x|R |x| and C = {|x|}is a simple R -cluster as claimed. Moreover, x fulfils the requirements to be the critical point x ∗ of Lemma 3, so C ∗ = C, and C becomes a simple R  -cluster in the model M  . We see that the final clusters of M  are just the final clusters of M , which are all simple in M  . This leads to the conclusion that for n ≥ 1, K4MCn is sound and complete for validity in all finite transitive frames that have circumference at most n and all final clusters simple, while S4MCn has the finite model property for the subclass of these frames that are reflexive. We also see that K4MC1 = K4DC1 , since the final clusters in a finite K4DC1 -frame are singletons by C1 -validity, and are non-degenerate by seriality, so are simple.

10.7.5 Degenerate Final Clusters Substituting ⊥ for the variable in the Löb axiom produces a formula that is equivalent over K to E :  ⊥ ∨ ♦  ⊥. A dead end in a frame is an element x that has no successor, i.e. {x} is a degenerate final cluster. The condition for a model to satisfy E, or equivalently for its underlying frame to validate E, is that every point is either a dead end or has a successor that is a dead end. In a finite transitive frame, this condition is equivalent to requiring that every final cluster is degenerate. Assume now that a transitive model M having M |= Cn also has M |= E. The simplest way to ensure that our modified filtration M  also satisfies E is to include ♦  ⊥ and its subformulas in the finite set . Then using (10.13) and (10.20) we can infer that M  |= E. This leads to the conclusion that for n ≥ 1, K4ECn is sound and complete for validity in all finite transitive frames that have circumference at most n and all final clusters degenerate.

254

R. Goldblatt

10.8 Models on Irresolvable Spaces There is a topological semantics that interprets modal formulas as subsets of a topological space X , interpreting  by the interior operator Int X of X and ♦ by its closure operator Cl X . This approach is known as C-semantics (Bezhanishvili et al. 2005) after the interpretation of ♦ as Closure. It contrasts with d-semantics, which we will describe later. A topological model M = (X, V ) on X is given by a valuation V assigning a subset V ( p) of X to each variable p. A truth set MC (ϕ) is then defined by induction on the formation of an arbitrary formula ϕ by letting MC ( p) = V ( p), interpreting the Boolean connectives by the corresponding Boolean set operations, and putting MC ( ϕ) = Int X (MC (ϕ)). Then MC (♦ϕ) = Cl X (MC (ϕ)). A truth relation M , x |=C ϕ is defined to mean that x ∈ MC (ϕ). Defining an open neighbourhood of x to be any open subset of X that contains x, we have • M , x |=C  ϕ iff there is an open neighbourhood of x included in MC (ϕ), • M , x |=C ♦ϕ iff every open neighbourhood of x intersects MC (ϕ). ϕ is C-true in M , written M |=C ϕ, if M , x |=C ϕ for all x ∈ X ; and is C-valid in space X , written X |=C ϕ, if it is C-true in all models on X . For any space X , the set {ϕ : X |=C ϕ} of all formulas C-valid in X is a normal logic including S4, called the C-logic of X. A frame (W, R) has the Alexandroff topology on W in which the open sets O ⊆ W are those that are up-sets under R, i.e. if x ∈ O and x Ry then y ∈ O. Call the resulting topological space W R . If R is a quasi-order, the interior operator Int R and closure operator Cl R of W R turn out to be given by Int R Y = {x ∈ W : ∀y(x Ry implies y ∈ Y )}, Cl R Y = R −1 Y = {x ∈ W : ∃y(x Ry ∈ Y )},

(10.23)

which are the same operations that interpret  and ♦ in a model M = (W, R, V ) in the Kripkean sense of Sect. 10.4. Such a quasi-ordered model gives rise to the topological model M R = (W R , V ). These two models are semantically equivalent in the sense that M , x |= ϕ iff M R , x |=C ϕ, for all x ∈ W and all formulas ϕ. It follows, in terms of validity, that (W, R) |= ϕ

iff W R |=C ϕ.

(10.24)

This can be used to show that the set of formulas that are C-valid in all topological spaces is exactly S4, a result due to McKinsey and Tarski (1948), the originators of this kind of topological semantics. For n ≥ 2, a space is called n-resolvable if it has n pairwise disjoint dense subsets (Eckertson 1997). Here Y is dense in X when Cl X Y = X . Since any superset of a

10 Modal Logics that Bound the Circumference of Transitive Frames

255

dense set is dense, n-resolvability is equivalent to X having a partition into n dense subsets. X is n-irresolvable if it is not n-resolvable, and is hereditarily n-irresolvable if every non-empty subspace of X is n-irresolvable. It is known that S4Grz is the C-logic determined by the class of hereditarily 2irresolvable spaces (the prefix n-is usually omitted when n = 2). This follows from results of Esakia (1981) and Bezhanishvili (2003), as explained in van Benthem and Bezhanishvili (2007, pp. 253–4). We will show that in general S4Cn is characterised by C-validity in all hereditarily n + 1-irresolvable spaces. Note that by general topology, the closure operator Cl S of a subspace S of X satisfies Cl S Y = S ∩ Cl X Y . Hence a set Y ⊆ S is dense in S iff S ⊆ Cl X Y . Theorem 5 For n ≥ 1, a topological space X is hereditarily n + 1-irresolvable iff X |=C Cn . Proof Write Int and Cl for the interior and closure operators of X . Suppose first that X is not hereditarily n + 1-irresolvable. Then there is a non-empty subspace S of X that has n + 1 pairwise disjoint dense subsets, say S0 , . . . , Sn . Take variables p0 , . . . , pn and let M be any model on X for which MC ( pi ) = Si for i ≤ n. The disjointness of the Si ’s ensures that the formula Dn ( p0 , . . . , pn ) is C-true in M at every point of X , hence so is ∗ Dn ( p0 , . . . , pn ). The truth set MC (Pn ( p0 , . . . , pn )) is Cl(S1 ∩ Cl(S2 ∩ · · · ∩ Cl(Sn ∩ Cl S0 )) · · · ). But by the given density we have that Si ⊆ S ⊆ Cl S j for all i, j ≤ n. Thus Cl(Sn ∩ Cl S0 ) = Cl Sn , hence Cl(Sn−1 ∩ Cl(Sn ∩ Cl S0 )) = Cl(Sn−1 ∩ Cl Sn ) = Cl Sn−1 , etc. Iterating this calculation we conclude that MC (Pn ( p0 , . . . , pn )) = Cl S1 . Since S0 ⊆ Cl S1 , it follows that the formula p0 ∧ ¬Pn ( p0 , . . . , pn ) is C-false at every point of X , hence so is ♦( p0 ∧ ¬Pn ( p0 , . . . , pn )). But MC (♦ p0 ) = Cl S0 , so

∗ Dn ( p0 , . . . , pn ) → (♦ p0 → ♦( p0 ∧ ¬Pn ( p0 , . . . , pn )) is C-false at every point of Cl S0 . Since ∅ = S ⊆ Cl S0 , we conclude that Cn is not C-valid in X . That proves one direction of the theorem. For the other direction, suppose that some instance

∗ Dn (ϕ0 , . . . , ϕn ) → (♦ϕ0 → ♦(ϕ0 ∧ ¬Pn (ϕ0 , . . . , ϕn ))

(10.25)

of Cn is not C-valid in X , so is false atsome point x in some model M on X . Put Ai = MC (ϕi ) for all i ≤ n and Dn = i< jn −(Ai ∩ A j ). Define Pn = Cl(A1 ∩ Cl(A2 ∩ · · · ∩ Cl(An ∩ Cl A0 )) · · · ).

256

R. Goldblatt

Then Pn = MC (Pn ), and the C-falsity of (10.25) at x implies that x ∈ / Cl(A0 − Pn ) while x ∈ Cl A0 and x ∈ Int Dn . The latter implies that there is an open neighbourhood B of x with B ⊆ Dn . Hence the sets {B ∩ A j : j ≤ n} are pairwise disjoint. Since x ∈ / Cl(A0 − Pn ) there is an open neighbourhood B  of x with B  ∩ (A0 − Pn ) = ∅.

(10.26)

Let O = B ∩ B  , another open set containing x. Put S0 = O ∩ A0 and for 1 ≤ i < n, Si = O ∩ Ai ∩ Cl(Ai+1 ∩ · · · ∩ Cl(An ∩ Cl A0 )) · · · ), while Sn = O ∩ An ∩ Cl A0 . Define S = S0 ∪ · · · ∪ Sn . We will show that Si ⊆ Cl S j for all i, j ≤ n. This implies that for each j ≤ n we have S ⊆ Cl S j , so S j is dense in the subspace S. But S j ⊆ O ∩ A j ⊆ B ∩ A j , so the S j ’s are also pairwise disjoint. Since x ∈ Cl A0 we get O ∩ A0 = ∅, i.e. S0 = ∅. Hence as S0 ⊆ Cl S j , we get S j = ∅ for all j ≤ n. So the S j ’s form n + 1 distinct pairwise disjoint dense subsets of S, showing that S is n + 1-resolvable and therefore X is not hereditarily n + 1irresolvable. It remains to prove that Si ⊆ Cl S j in general. Define a binary relation ρ on subsets of X by putting Yρ Z iff Y ⊆ Cl Z . By closure algebra ρ is transitive, since Yρ Zρ Z  implies Y ⊆ Cl Z ⊆ Cl Cl Z  = Cl Z  . It is enough to prove that the S j ’s form a ρcycle, i.e. S0 ρ S1 ρ · · · ρ Sn ρ S0 , for then the transitivity of ρ ensures that Si ρ S j for all i, j ≤ n, as required. We use the general fact that, since O is open, O ∩ Cl Z ⊆ Cl(O ∩ Z ) for any Z . For 1 ≤ i < n, let Z i = Ai ∩ Cl(Ai+1 ∩ · · · ∩ Cl(An ∩ Cl A0 )) · · · ), and put Z n = An ∩ Cl A0 . Then if i ≤ n we have Si = O ∩ Z i , and if i < n then Z i = Ai ∩ Cl Z i+1 . Also Pn = Cl Z 1 . To show that S0 ρ S1 , note that S0 ⊆ B  ∩ A0 , so from (10.26), S0 ⊆ Pn . Since S0 ⊆ O, so then S0 ⊆ O ∩ Pn = O ∩ Cl Z 1 ⊆ Cl(O ∩ Z 1 ) = Cl S1 . For Si ρ Si+1 when 1 ≤ i < n, Si = O ∩ Ai ∩ Cl Z i+1 ⊆ O ∩ Cl Z i+1 ⊆ Cl(O ∩ Z i+1 ) = Cl Si+1 . Finally, for Sn ρ S0 we have Sn = O ∩ An ∩ Cl A0 ⊆ O ∩ Cl A0 ⊆ Cl(O ∩ A0 ) =

Cl S0 . We can now clarify the relationship between circumference and irresolvability: Theorem 6 Let (W, R) be a quasi-ordered frame, and n ≥ 1.

10 Modal Logics that Bound the Circumference of Transitive Frames

257

1. The topological space W R is hereditarily n + 1-irresolvable iff (W, R) has circumference at most n and has no strictly ascending chains. 2. If W is finite, then W R is hereditarily n + 1-irresolvable iff (W, R) has circumference at most n. Proof This is a restatement of Theorem 1 with ‘W R is hereditarily n + 1-irresolvable’ in place of ‘(W, R) |= Cn ’. But by Theorem 5, W R is hereditarily n + 1-irresolvable

iff W R |=C Cn , which by (10.24) holds iff (W, R) |= Cn . Given our results on relational semantics for S4Cn , and the equivalence with C-semantics given by (10.24), it follows from the second part of Theorem 6 that any non-theorem of S4Cn is falsifiable in a C-model on a finite hereditarily n + 1irresolvable space. Hence S4Cn is complete for C-validity in (finite) hereditarily n + 1-irresolvable spaces. Soundness follows from Theorem 5, which ensures that the C-logic of any hereditarily n + 1-irresolvable space includes S4Cn . Theorem 7 For all n  1 and any formula ϕ the following are equivalent. 1. ϕ is a theorem of S4Cn . 2. ϕ is C valid in all hereditarily n + 1-irresolvable topological spaces. 3. ϕ is C-valid in all finite hereditarily n + 1-irresolvable topological spaces of

size at most 2k , where k is the number of subformulas of ϕ. Next we discuss the topological role of the McKinsey axiom M of (10.22). It is C-valid in precisely those spaces for which every non-empty open subspace is irresolvable (Bezhanishvili et al. 2003, Prop. 2.1). A point x is isolated in a space X if {x} is open in X . Such a point will belong to one cell of any partition and prevent any other cell from being dense, hence preventing resolvability. So a sufficient condition for C-validity of M is that the space is weakly scattered, meaning that the set of isolated points is dense, since that ensures that any non-empty open subspace has an isolated point and therefore is irresolvable. Now in a quasi-ordered frame (W, R), if a final cluster is simple, then since it is open in the Alexandroff space W R , its one element is isolated in W R . If the frame is finite and validates M, then every point is R-succeeded by a point whose cluster is final and therefore simple, implying that W R is weakly scattered. Combining this with the analysis behind Theorem 7 and the relational characterisation of S4MCn at the end of Sect. 10.7, we conclude that for n ≥ 2, S4MCn is sound and complete for C-validity in all (finite) weakly scattered spaces that are hereditarily n + 1-irresolvable. We turn now to d-semantics, which interprets ♦ by the derived set operator De X of a space X . For a topological model M = (X, V ) it generates truth sets Md (ϕ) that have Md (♦ϕ) = De X (Md (ϕ)), the set of limit points of Md (ϕ). The truth relation M , x |=d ϕ now means that x ∈ Md (ϕ). Defining a punctured neighbourhood of x to be any set of the form O − {x} where O is an open neighbourhood of x, we have • M , x |=d ♦ϕ iff every punctured neighbourhood of x intersects Md (ϕ),

258

R. Goldblatt

• M , x |=d

 ϕ iff there is a punctured neighbourhood of x included in Md (ϕ).

A formula ϕ is d-valid in X , written X |=d ϕ, if it is d-true in every model on X . The d-logic {ϕ : X |=d ϕ} of X is a normal logic which always includes the weak transitivity scheme w4, because the operator De X always has De X De X Y ⊆ Y ∪ De X Y for all Y ⊆ X . The condition for X to d-validate the ♦-version ♦♦ϕ → ♦ϕ of scheme 4 is that in general De X De X Y ⊆ De X Y . This is equivalent to requiring that the derived set De X {x} of any singleton is closed. A space with this property is called T D . In terms of strength, the T D -property lies strictly between the separation properties T0 and T1 (Aull and Thron 1962). A T1 space is one in which the derived set De X {x} of any singleton is empty. A T0 space is one in which for any two distinct points x and y there is an open set containing one of them and not the other, i.e. either x ∈ / Cl{y} or y ∈ / Cl{x}. Any hereditarily irresolvable space is T D , as in general Cl X {x} = {x} ∪ De X {x} with x ∈ / De X {x} and {x} dense in Cl X {x}. So if Cl X {x} is irresolvable, then De X {x} cannot be dense in Cl X {x}, hence Cl X De X {x} can only be De X {x}, i.e. De X {x} is closed. In C-semantics there is no distinction in interpretation between  and ∗ , or between ♦ and ♦∗ . Since Int X Y ⊆ Y ⊆ Cl X Y we have MC (∗ ϕ) = MC ( ϕ) and MC (♦∗ ϕ) = MC (♦ϕ). On the other hand in d-semantics, ♦∗ and ∗ serve to define the closure and interior of Md (ϕ). Since in general Cl X Y = Y ∪ De X Y , we get Md (♦∗ ϕ) = Cl X (Md (ϕ)), and thus Md (∗ ϕ) = Int X (Md (ϕ)). Since Cn defines the class of hereditarily n + 1-irresolvable spaces under the Csemantics (Theorem 5), by replacing every occurrence of  and ♦ in Cn by ∗ and ♦∗ we get a formula that defines the class of hereditarily n + 1-irresolvable spaces under the d-semantics. However, we can more simply use Cn itself. Theorem 8 For n ≥ 1, a topological space X has X |=d Cn iff X |=d C∗n iff X |=C Cn iff X is hereditarily n + 1-irresolvable. Proof X |=d Cn implies X |=d C∗n because C∗n is a tautological consequence of Cn . Conversely X |=d C∗n implies X |=d Cn , because if X |=d C∗n then the d-logic of X includes C∗n and w4 (since w4 is d-valid in all spaces), so by Theorem 3(2) this d-logic includes Cn , hence X |=d Cn . We already saw in Theorem 5 that X is hereditarily n + 1-irresolvable iff X |=C Cn . Next we show that X |=C Cn implies X |=d C∗n . For, if X |=d C∗n , then some instance of C∗n of the form

∗ Dn ( p0 , . . . , pn ) → (♦ p0 → ♦∗ ( p0 ∧ ¬Pn ( p0 , . . . , pn ))

(10.27)

is not d-valid in X , so is d-false at some  point x in some model M = (X, V ). Put Ai = V ( pi ) for all i ≤ n and Dn = i< jn −(Ai ∩ A j ). Then Md (Pn ) = De(A1 ∩ De(A2 ∩ · · · ∩ De(An ∩ De A0 )) · · · ), and MC (Pn ) = Cl(A1 ∩ Cl(A2 ∩ · · · ∩ Cl(An ∩ Cl A0 )) · · · ).

10 Modal Logics that Bound the Circumference of Transitive Frames

259

The d-falsity of (10.27) at x implies that x ∈ Int Dn and x ∈ De A0 but x ∈ / Cl(A0 − Md (Pn )). As De Y ⊆ Cl Y in general, Md (Pn ) ⊆ MC (Pn ). Hence as Cl preserves set inclusion, Cl(A0 − MC (Pn )) ⊆ Cl(A0 − Md (Pn )). Therefore x ∈ / Cl(A0 − MC (Pn )). But x ∈ Int Dn and x ∈ De A0 ⊆ Cl A0 , and these facts together ensure that M , x |=C

∗ Dn ( p0 , . . . , pn ) → (♦ p0 → ♦( p0 ∧ ¬Pn ( p0 , . . . , pn )).

Therefore X |=C Cn as required. Finally, to complete the cycle of implications we show that X |=d C∗n implies that X is hereditarily n + 1-irresolvable. For if X is not hereditarily n + 1-irresolvable, then there is a non-empty subspace S of X that has n + 1 pairwise disjoint dense subsets, say S0 , . . . , Sn . Take variables p0 , . . . , pn and let M be any model on X for which Md ( pi ) = Si for i ≤ n. The disjointness of the Si ’s ensures that the formula Dn ( p0 , . . . , pn ) is d-true in M at every point of X , hence so is ∗ Dn ( p0 , . . . , pn ). The truth set Md (Pn ( p0 , . . . , pn )) is De(S1 ∩ De(S2 ∩ · · · ∩ De(Sn ∩ De S0 )) · · · ). By the given density we have that Si ⊆ S ⊆ Cl S j for all i, j ≤ n. But now we observe that as Cl S j = S j ∪ De S j , and the Si ’s are pairwise disjoint, this implies that Si ⊆ De S j for all i = j ≤ n. Thus De(Sn ∩ De S0 ) = De Sn , hence De(Sn−1 ∩ De(Sn ∩ De S0 )) = De(Sn−1 ∩ De Sn ) = De Sn−1 , etc. Iterating this calculation we conclude that Md (Pn ) = De S1 . Since S0 ⊆ De S1 , it follows that the formula p0 ∧ ¬Pn ( p0 , . . . , pn ) is d-false at every point of X , hence so is ♦∗ ( p0 ∧ ¬Pn ( p0 , . . . , pn )). But Md (♦ p0 ) = De S0 , so

∗ Dn ( p0 , . . . , pn ) → (♦ p0 → ♦∗ ( p0 ∧ ¬Pn ( p0 , . . . , pn )) is d-false at every point of De S0 . Since De S0 = ∅ (e.g. from ∅ = S ⊆ Cl S1 we get

S1 = ∅, but S1 ⊆ De S0 ), we conclude that X |=d C∗n . We saw earlier that every hereditarily irresolvable space is T D and hence T0 . The converse is true for finite spaces. The essential reason is that a finite T0 space is scattered, meaning that any non-empty subspace has an isolated point, which ensures that it is irresolvable. The properties of being hereditarily irresolvable, scattered, or T0 are equivalent for finite spaces (Bezhanishvili et al. 2003, Corollary 4.8). Also, a finite space is T0 iff it is T D (Aull and Thron 1962, Corollary 5.1). So if X is finite,

260

R. Goldblatt

in the case n = 1 we can add ‘X is scattered’, ‘X is T D ’ and ‘X is T0 ’ to the list of equivalent conditions in Theorem 8. The logic K4C1 , in the form K4Grz , was shown by Gabelaia (2004) to be complete for d-validity in hereditarily irresolvable spaces (see also Bezhanishvili et al. 2010). This was done by showing that for any finite K4Grz -frame F there exists an hereditarily irresolvable space X having a mapping from X onto F that ensures that X |=d ϕ implies F |= ϕ. The construction makes X infinite, so does not provide the finite model property for K4C1 under d-semantics. In fact none of the logics K4Cn with n ≥ 1 have this finite model property. More generally, if a normal extension of K4 does have the finite model property under d-semantics, then it must be an extension of the Gödel-Löb logic GL (which K4Cn is not when n ≥ 1). This is because a finite space that d-validates K4 is a finite T D space, so is scattered, as explained in the previous paragraph. But Esakia (1981) showed that the Löb axiom is d-valid in precisely the scattered spaces. Thus any finite space d-validating K4 must d-validate GL. For n ≥ 2, the logic K4Cn is characterised by d-validity in all hereditarily n + 1irreducible T D spaces. This will be shown in another article.

10.9 Generating Varieties of Algebras Modal formulas have algebraic models, and the algebraic models of a logic form a variety, i.e. an equationally definable class. Our Theorem 4 can be converted into a demonstration that the variety Vn of algebraic models of K4Cn is generated by its finite members, and indeed generated by certain finite algebras constructed out of finite transitive frames of circumference at most n. This implies that every member of Vn is a homomorphic image of a subalgebra of a direct product of such finite algebras. But something stronger can be shown: every member of Vn is isomorphic to a subalgebra of an ultraproduct of such finite algebras. A proof of this algebraic fact will now be given that makes explicit use of the filtration construction underlying Theorem 4 along with further logical analysis involving the universal sentences that are satisfied by members of Vn . We briefly review the modal algebraic semantics (a convenient reference for more information is Blackburn et al. 2001, Chap. 5). A normal modal algebra has the form A = (B, f A ) with B a Boolean algebra and f A a unary operation on B that preserves all finite meets (including the empty meet as the greatest element 1). A modal formula can be viewed as a term of the language of A, treating its variables as ranging over the individuals of B, with ¬ and ∧ denoting the complement and meet operation of B, denoting 1 and  denoting f A . If ϕ has its variables among p0 , . . . , pk−1 , then it induces a k-ary term function ϕ A on A. We say that ϕ is valid in A when A |= ϕ ≈ , meaning that the equation ϕ ≈ is satisfied in A in the usual sense a ) = 1, for all k-tuples a of from equational logic that A |= (ϕ ≈ )[ a ], i.e. ϕ A ( elements of A. The satisfaction of any equation ϕ ≈ ψ in A is expressible as the

10 Modal Logics that Bound the Circumference of Transitive Frames

261

validity of a modal formula, since A |= ϕ ≈ ψ iff the formula ϕ ↔ ψ is valid in A, i.e. iff A |= (ϕ ↔ ψ) ≈ . Identifying ϕ with the equation ϕ ≈ , we can now legitimately write A |= ϕ[ a] a ) = 1. to mean that ϕ A ( An algebra A will be called transitive if it validates the scheme 4, i.e. the formula ϕ  →   ϕ is valid in A for all ϕ. Now the set L A of all modal formulas that are valid in A is a normal logic that is closed under uniform substitution of formulas for variables, so for A to validate scheme 4 it is enough that it validates  p →   p for a variable p, which amounts to requiring that f A a  f A f A a for every element a of A. Each frame F = (W, R) has an associated algebra F + = (P W, [R]), where P W is the Boolean set algebra of all subsets of W and the unary operation [R] on P W is defined by [R]X = {x : ∀y(x Ry implies y ∈ X )}. F + is called the complex algebra of F . A complex algebra more generally is defined as one that is a subalgebra of some algebra of the form F + . Given such a subalgebra A of F + , consider a model M = (F , V ) on F and a formula ϕ( p0 , . . . , pk−1 ) such that V ( pi ) ∈ A for all i < k. Then it can be shown that M ) = ϕM , ϕ A ( p0M , . . . , pk−1 where ψ M is the truth set {x : M , x |= ψ}. From this it follows that M ] iff M |= ϕ. A |= ϕ[ p0M , . . . , pk−1

(10.28)

Using this in the case that A = F + leads to a proof that a formula ϕ is valid in F , i.e. M |= ϕ for all models M on F , iff it is valid in the normal modal algebra F + in the sense defined here that F + |= ϕ ≈ (see Blackburn et al. 2001, Prop. 5.24 for details of this analysis). The famous representation theorem of Jónsson and Tarski (1951) showed that any normal modal algebra A is isomorphic to a complex algebra, i.e. there is a monomorphism A  F + for some frame F . Moreover they showed that certain equational properties are preserved in passing from A to F + . In particular they proved that if A is a transitive algebra then so is F + , and furthermore that this implies that the binary relation R of F is transitive. We use the standard symbols H, S, P, PU for the class operations of closure under homomorphic images, isomorphic copies of subalgebras, direct products and ultraproducts respectively. A class of algebras is a variety iff it is closed under H, S and P. The smallest variety containing a given class of algebras K is HSPK , which is called the variety generated by K . It is the class of all models of the equational theory of K , which is the set of all equations satisfied by K . Let Vn be the variety of all algebras that validate all theorems of the logic K4Cn . A sufficient condition for membership of an algebra A in Vn is that A is a normal modal algebra that validates the schemes 4 and Cn . This is because if the logic L A comprising all modal formulas that are valid in A includes 4 and Cn , then it includes K4Cn since the latter is the smallest logic to include these schemes. As explained

262

R. Goldblatt

above, for L A to include a scheme it suffices for it to contain a variable instance of it. Thus Vn is defined by finitely many equations. Let Cn be the class of all finite transitive frames of circumference at most n, and Cn+ = {F + : F ∈ Cn } the class of all complex algebras of members of Cn . Each F + ∈ Cn+ validates 4 and Cn , since F does, so Cn+ ⊆ Vn . We then have SPU Cn+ ⊆ HSPCn+ ⊆ Vn .

(10.29)

The first of these inclusions holds because the variety HSPCn+ includes Cn+ and is closed under subalgebras and ultraproducts. The second holds because Vn is closed under H, S, and P. We will show that both inclusions are equalities, so the three classes displayed in (10.29) are identical. To prove this we need some background theory about the universal sentences that are satisfied in Vn . A universal sentence in the language of modal algebras has the form ∀ pσ , where the formula σ is quantifier-free, so is a Boolean combination of equations, and ∀ p is a sequence of universal quantifiers including those for all the variables of σ . The following result is a standard fact in the model theory of universal sentences. Lemma 4 If every universal sentence satisfied by a class K of algebras is satisfied by algebra A, then A is embeddable into an ultraproduct of members of K , i.e. A ∈ SPU K . Proof See Burris and Sankappanavar (1981, Sect. V.2), especially the proof of Theorem 2.20.

This result implies that SPU K is the class of all models of the universal theory of K , which is the set of all universal sentences satisfied by K . Theorem 9 For any n  0, Vn = HSPCn+ = SPU Cn+ . Proof By (10.29) it suffices to show that any member of Vn belongs to SPU Cn+ . So take any A ∈ Vn . To show that A ∈ SPU Cn+ , it is enough by Lemma 4 to show that every universal sentence satisfied by Cn+ is satisfied by A. We prove the contrapositive of this. Let ∀ pσ be a universal sentence that is not satisfied by A. We will show it is not that σ is in conjunctive normal satisfied by (some member of) Cn+ . We can suppose  form, so that the sentence has the shape ∀ p( i 0. Null vectors create the cone structure; timelike vectors fall inside the cone while spacelike vectors fall outside. A time orientable model is one that has a continuous timelike vector field on M. In what follows, we assume that models are time orientable and that an orientation has been chosen. For some connected interval I ⊆ R, a smooth curve γ : I → M is timelike if its tangent vector ξ a at each point in γ [I ] is timelike. Similarly, a curve is null if its tangent vector at each point is null. A curve is causal if its tangent vector at each point is either null or timelike. A causal curve is future-directed if its tangent vector at each point falls in or on the future lobe of the light cone. A causal curve γ : I → M is closed if the tangent vector is nowhere vanishing and there are distinct s, s  ∈ I such that γ (s) = γ (s  ). (M, gab ) satisfies chronology if it does not contain a closed timelike curve; it satisfies causality if it does not contain a closed causal curve. We write p  q (respectively, p < q) if there exists a future-directed timelike (respectively, causal) curve from p to q. For any point p ∈ M, we define the timelike future of p, as the set I + ( p) = {q : p  q}. Similarly, the causal future of p is the set J + ( p) = {q : p < q}. The timelike and causal pasts of p, denoted I − ( p) and J − ( p), are defined analogously. (M, gab ) satisfies distinguishability if there do not exist distinct points p, q ∈ M such that I − ( p) = I − (q) or I + ( p) = I + (q). It satisfies global hyperbolicity if it is causal and for any points p, q ∈ M, the set J + ( p) ∩ J − (q) is compact.3 A curve γ : I → M is maximal if there is no curve γ  : I  → M such that I is a proper subset of I  and γ (s) = γ  (s) for all s ∈ I . The curve γ : I → M is a 2 The

reader is encouraged to consult [20, 26, 37] for details. Less technical surveys of the global structure of spacetime are given in [17, 29]. 3 If one replaces ‘causal’ by ‘strongly causal’ one obtains the standard formulation of global hyperbolicity. One can show that the two formulations are equivalent [3].

412

JB Manchak

geodesic if ξ a ∇a ξ b = 0 where ξ a is its tangent vector and ∇a is the unique derivative operator compatible with gab . A maximal geodesic γ : I → M is incomplete if I = R. A model is geodesically incomplete if it harbors an incomplete geodesic and geodesically complete otherwise. A future-directed causal geodesic γ : I → M is past-incomplete if it is maximal and there is an r ∈ R such that r < s for all s ∈ I . One can define the energy-momentum tensor Tab for the model (M, gab ) via Einstein’s equation: Rab − 21 Rgab = 8π Tab where Rab is the Ricci tensor and R the scalar curvature associated with gab . We say that (M, gab ) is a vacuum solution if Tab = 0. The null energy condition is satisfied if, for any null vector χ a , we have Tab χ a χ b ≥ 0. The weak energy condition is satisfied if, for each timelike vector ξ a , we have Tab ξ a ξ b ≥ 0. The strong energy condition is satisfied if, for any unit timelike vector ξ a , we have (Tab − 21 T gab )ξ a ξ b ≥ 0. Finally, the dominant energy condition is satisfied if, for any future-directed unit timelike ξ a , the vector T ab ξ b is causal and future-directed. Let S be a set. A relation ≤ on S is a partial order if, for all a, b, c ∈ S: (i) a ≤ a, (ii) if a ≤ b and b ≤ c, then a ≤ c, and (iii) if a ≤ b and b ≤ a, then a = b. If ≤ is a partial ordering on a set S, we say a subset T ⊆ S is totally ordered if, for all a, b ∈ T , either a ≤ b or b ≤ a. Let ≤ be a partial ordering on S and let T ⊆ S. An upper bound for T is an element u ∈ S such that for all a ∈ T , a ≤ u. A maximal element of S is an element m ∈ S such that for all c ∈ S, if m ≤ c, then c = m. Zorn’s lemma (equivalent to the axiom of choice) is the following: Let ≤ be a partial order on S. If each totally ordered subset T ⊆ S has an upper bound, there is a maximal element of S.

17.3 Possibility One can think of general relativity as a collection of possible models of spacetime. But what counts as ‘possible’ in this context? As mentioned above, the collection U seems to be ‘too big’ in the sense that some of the geometric possibilities it permits do not appear to be genuine physical possibilities. So one works to pare down the space by examining the “pathological situations with a view toward deciding which situations are to be ruled out by physical considerations” ([15], p. 72). Let us consider two influential suggestions along these lines. As mentioned above, it has been proposed that (†) “any [physically] reasonable space-time should be inextendible” ([6], p. 8). The recommendation is that we remove from the background possibility space U all of the models which are extendible. Indeed, not doing so leads to a situation in which extendible models count as “counter-examples to seemingly plausible [physical] conjectures” ([5], p. 17). An analogous position arises with many other

17 General Relativity as a Collection of Collections of Models

413

properties of interest. For example, consider one version of the (strong) cosmic censorship conjecture [35] which states that (††) all “physically reasonable spacetimes are globally hyperbolic” ([37], p. 304). Now suppose we accept both (†) and (††); suppose we decide to rule out all models which are either extendible or non-globally hyperbolic (or both) from our background possibility space U . It is not clear how this is to be done. Let me explain. On the one hand, it seems natural to begin with the collection (I ) ⊂ U of inextendible models and then pare down this space by removing from it all non-globally hyperbolic models. The resulting space is (I ) ∩ (G H ) where (G H ) ⊂ U is the collection of globally hyperbolic models. On the other hand, one could begin with the collection of globally hyperbolic models (G H ) and work to pare down this space by removing all of the extendible models from it. But here is where some tension arises: we cannot simply identify the resulting sub-collection with (I ) ∩ (G H ) as before and, at the same time, respect the intuitive motivation for the definition of inextendibility. To see the point, consider the ‘bottom half’ of Misner spacetime.4 One can show that the model is globally hyperbolic and extendible, and yet it cannot be extended and remain globally hyperbolic [4]. So we have a model in (G H ) which is ‘as large as it can be’ in the sense that it cannot be be reasonably extended; and yet it does not make it into the collection (I ) ∩ (G H ). The tension at issue here can be stated like so: (i) inextendibility is a modal property which has been defined relative to the background possibility space U and yet (ii) the collection U does not properly capture the notions of physical possibility we are after. It seems clear to me that if we are to rule out ‘physically unreasonable’ models of spacetime, then we ought to rule out ‘physically unreasonable’ definitions of inextendibility as well. Following Geroch [14] let us say that for any collection P ⊆ U , a model is a P-model if it is in the collection P. A P-model is Pextendible if it has an extension in P (such an extension is a P-extension) and P-inextendible otherwise. We are now in a position to see clearly how it is that inextendibility ‘works differently’ in some variant theories of general relativity; the truncated Misner example from above is (G H )-inextendible but U -extendible. Given this sensitivity to theory choice, it seems appropriate to study the property of inextendibility from within each ‘physically reasonable’ variant theory of general relativity. That is, it would seem appropriate to study P-inextendibily for some collection of ‘physically reasonable’ collections P ⊆ U . More on this below. At this stage, we stress that there are a number of other spacetime properties which depend, like inextendibility, on a background ‘physical possibility space’ in the form of some collection of models. Some of these require spacetime to be ‘as large as it can be’ in various senses; the properties of ‘hole-freeness’ and ‘local inextendibility’ are two examples along these lines in addition to inextendibility.5 Properties concerning ‘stability’ are often modal as well; the condition of ‘stable causality’ is one such 4 See

[20] or the proof of Proposition 1 below for a precise definition. 20] for introductions to such concepts and [30, 33] for discussions and updated definitions.

5 See [16,

414

JB Manchak

example.6 For various collections P ⊆ U of ‘physically reasonable’ models, one could make a study of the properties of ‘P-hole-freeness’, ‘P-local inextendibility’, ‘P-stable causality’, and so on. The associated theorems concerning such ‘P properties’ amount to what I will call the ‘modal structure’ of GR(P). Because of (ii) above, we see that a focused study of the modal structure of GR(U ) will be insufficient to properly understand the role of physical possibility within general relativity. And it does not seem appropriate to restrict attention to just one variant theory of general relativity either, given that we have yet to pin down— and arguably cannot pin down—a privileged collection of ‘physically reasonable’ models [28]. The situation suggests a natural pluralistic position: to better understand the role of physical possibility within general relativity, we ought to study a collection of collections of ‘physically reasonable’ models.7

17.4 Inextendibility One response to the discussion above goes something like this: Perhaps for some suitably chosen collection P ⊆ U of ‘physically reasonable’ models, the modal structures of GR(U ) and GR(P) will be ‘close enough’ for all practical purposes. In what follows, I will push back on this line of reasoning. We begin with a question posed by Geroch [14] concerning inextendibility: For which collections P ⊆ U of ‘physically reasonable’ models does P-inextendibility ‘work the same’ as inextendibility? In other words, for which such collections P ⊆ U is the following true? (*) Every P-inextendible P-model is inextendible. We see that if (*) is true for some collection P ⊆ U , then it makes no difference whether one uses P-inextendibility or the standard definition; any P-model will be P-inextendible if and only if it is inextendible. It is of some interest that we have yet to identify a (non-trivial) collection P of ‘physically reasonable’ models which renders (*) true.8 In fact there is a growing collection of collections P ⊆ U for which (*) is known to be false. Here we present a synopsis of the situation so far and 6 Some

care is required to see this fact. The original definition of stable causality given in [19] is clearly modal. But in practice, a non-modal ‘stand in’ definition is often used instead ([37], p. 198). The two definitions—along with a third (non-modal) definition concerning the existence of a ‘global time function’—are all equivalent in GR(U ). But in some variant theories of general relativity, the modal and non-modal properties can come apart. 7 One would also like to identify theorems which hold across all variant theories of general relativity. One can think of the logical study of general relativity carried out in [1] as one research program along these lines. Explicit reference to an arbitrary background collection of models cannot be made in a FOL-sentence, but working with background classes of models is very much in the spirit of the FOL approach. 8 It is trivial that if one takes P to be the collection of inextendible models or any sub-collection of that collection (e.g.. the geodesically complete models), then (*) is true for P .

17 General Relativity as a Collection of Collections of Models

415

draw attention to open questions of significant interest. All proofs for new results can be found in the appendix. Let (Dist), (Caus), (Chr ) ⊂ U be the collections of distinguishing, causal, and chronological models respectively and recall that (G H ) is the collection of globally hyperbolic models. Basic results concerning the ‘causal hierarchy’ of spacetime require that (G H ) ⊂ (Dist) ⊂ (Caus) ⊂ (Chr ) [20]. We know from the truncated Misner example from above that (*) is false when P = (G H ). But even more can be said. Proposition 1 (*) is false for all P such that (G H ) ⊆ P ⊆ (Dist). Proposition 2 (*) is false if P = (Caus). As a corollary to Proposition 1, we see that (*) is false for collections P ⊆ U defined relative to a number of other frequently used causal conditions including ‘strong causality’ and ‘stable causality’ [37]. Another remark may be helpful here; there is no guarantee that, just because (*) is false for (Dist) and (Caus) and (Dist) ⊂ (Caus), that (*) must also be false for all P such that (Dist) ⊂ P ⊂ (Caus). Each such collection P must be either checked independently or some new argument must be introduced for why this is not needed. As far as I know, the question of whether (*) is false if P = (Chr ) is still open [14]. We now turn to the collection (G I ) ⊂ U of geodesically incomplete models. Presumably the singularity theorems of Hawking and Penrose [21] show a sense in which (at least some) ‘physically reasonable’ models can be geodesically incomplete.9 But it is not difficult to show the following. Proposition 3 (*) is false if P = (G I ). Finally, let us consider some of the ‘local’ properties of spacetime. Let (V ) ⊂ U be the collection of vacuum solutions and let (D EC), (S EC), (W EC), (N EC) ⊂ U be the collections of models satisfying the dominant, strong, weak, and null energy conditions respectively. Standard results show that (V ) ⊂ (D EC) ⊂ (W EC) ⊂ (N EC) and (V ) ⊂ (S EC) ⊂ (N EC) [7]. The question of whether any of these collections satisfy (*) was asked by Geroch [14]. Concerning (V ), things are still open as far as I know. If P = (W EC), we know that (*) is false [32]. It turns out that the result can be generalized to include all of the standard energy conditions. Proposition 4 (*) is false for all P such that (D EC) ⊆ P ⊆ (N EC). Proposition 5 (*) is false for all P such that (S EC) ⊆ P ⊆ (N EC). The emerging picture here seems to be that the property of inextendibility ‘works differently’ in a now substantial collection of collections P ⊆ U of ‘physically

course, not all members of (G I ) count as ‘physically reasonable’ models. But in this regard (G I ) is no different than the other sub-collections considered above such as (Chr ). 9 Of

416

JB Manchak

reasonable’ models. So the modal structure of spacetime does not seem to be wellunderstood outside of the standard theory GR(U ).10 The state of affairs suggests that we explore foundational claims other than (*) in the manner outlined above. Along these lines, consider that in the standard theory every extendible model has an inextendible extension [14].11 Upon this theorem rests a hefty philosophical weight. For one thing, it underpins the metaphysical justification for the (†) position in Sect. 3 [8, 32]. For another, an influential definition of a ‘singular’ spacetime seems to depend on it [14]. We will return to the topic of singularities soon. For now, let us ask: for which collections P ⊆ U of ‘physically reasonable’ models is the following statement true? (**) Every P-extendible P-model has a P-inextendible P-extension. It turns out (**) is true for all ‘local’ properties of a certain kind. Following [14, 22] let us say that a property P ⊆ U is local if for each model (M, gab ) ∈ U and any open cover {Vα } of M, we have: (M, gab ) ∈ P ⇔ (Vα , gab ) ∈ P for all Vα .12 We have the following. Proposition 6 (**) is true for all local P ⊆ U . It should be clear that, as a corollary to this proposition, (**) is true if P is any one of the following collections: (V ), (D EC), (S EC), (W EC), or (N EC). A few non-local properties also render (**) true.13 In particular, we have the following (see [32] for a proof of the first of these results). Proposition 7 (**) is true if P = (Chr ). Proposition 8 (**) is true if P = (G I ). One might be tempted to conclude from all of this that (**) will be true for any collection of ‘physically reasonable’ models. But consider the collection (B) ⊂ (G I ) of models with the property that every maximal timelike geodesic is past-incomplete. We have the following [31]. Proposition 9 (**) is false if P = (B).

10 Things

are beginning to change on this front, however. See [11, 36] for work concerning P inextendibility in cases where U ⊆ P rather than P ⊆ U . See also some recent work on ‘time machines’ which involves a study of P -inextendibility [9, 23]. 11 It is of some interest that this result seems to be one of the few in general relativity to depend crucially on the axiom of choice (in the form of Zorn’s lemma) for its proof. See [5] for a nice discussion concerning the relationship between set theory and general relativity. 12 The definition here differs from the one in [27]. One can show that any local property as defined here will be a local property by the lights of [27]. But the two definitions are not equivalent (unpublished work with Andréka, Németi, Madarász, and Székely). 13 See [24] for a discussion concerning some of the obstacles preventing even more results of this kind from being established.

17 General Relativity as a Collection of Collections of Models

417

Presumably, the ‘big bang’ in our own universe permits us to view this property as a ‘physically reasonable’ one.14 In any case, the point here is simply “to demonstrate by some example that a certain assertion is false, or that a certain line of argument cannot work” ([17], p. 221). In the present case, I only wish to stress that it is not yet clear that (**) will be true for all collections P ⊆ U of ‘physically reasonable’ models. One would like to see a further investigation (**) with respect to other subcollections of (G I ). In particular, let (S) ⊂ (G I ) be a collection of models satisfying the assumptions of any one of the standard singularity theorems [21]. One wonders whether (**) is false if P = (S). More on singularities and their connection to both (*) and (**) below.

17.5 Singularities What is a ‘singularity’ in general relativity? This is a subtle question and a great deal of ink has been spilled in the attempt to find a suitable definition.15 But the community seems to have settled on the (non-modal) property of geodesic incompleteness (or some closely related variant) as an imperfect but suitable working ‘stand in’ for this definition [37]. Here I would like to draw attention to one sense in which the plausibility of such a position seems to rest on the standard definition of inextendibility. Here is the argument from Geroch ([14], pp. 261–262). There are two distinct notions to be combined into the definition of a singular spacetime. Let us consider as an example the spacetime M consisting of a small open neighborhood of one of the homogeneous spacelike sections in a Friedmann model. Our M is not ‘singular’ in the usual intuitive sense (e.g. no scalar invariants become infinite). We would, nonetheless, like to rule out M as a model of the universe on the grounds that M represents only ‘part of the universe’. It is because M is extendible that the singularities—which are certainly present in the Friedmann models—do not show up in M. The first step, then, is to recognize the extendible spacetimes. Suppose now that we have a spacetime M  which is inextendible (e.g. the full Friedmann model). We then require a second criterion to recognize that M  is ‘singular’. The important point is that we do not expect at this stage to be able to recognize singularities in an extendible spacetime because singularities always appear ‘at the edge’ of the spacetime manifold, and it is precisely this ‘edge’ which may be missing from an extendible space. That is, we need only formulate a definition of ‘singular’ which is applicable to inextendible spacetimes. Finally, we may try to generalize our definition to extendible spaces. Thus, we envisage three steps in the definition: (1) define ‘extendible’, (2) define ‘singular’ for inextendible spacetimes, and (3) define ‘singular’ for all spacetimes. (Only the second of these steps is difficult.)

Geroch goes on to construct a definition of singular spacetime. For step (1), the standard definition of inextendibility is used. In the ‘difficult’ step (2), we seem to be led to the following imperfect but still suitable definition: an inextendible model 14 On the other hand, one might argue that quantum considerations drastically change our understanding of the ‘big bang’ singularity. See [2] for a discussion. 15 See [10, 12, 14] for exemplary discussions on the topic. See [38] for recent work on the analogous question concerning geometrized Newtonian gravitation.

418

JB Manchak

is singular if it is geodesically incomplete. One reason the definition seems to work is that it allows us to be sure that the geodesic incompleteness present in a singular model is not due to its extendibility—one can show that any extendible spacetime will have incomplete causal geodesics ([6], p. 8). In the final step (3), the definition is generalized: an arbitrary model is singular if (i) it is inextendible and singular or (ii) it is extendible and all of its inextendible extensions are singular . Clearly this definition of a singular spacetime depends on the standard definition of inextendibility. Given that we have reason to doubt the physical significance of the standard definition of inextendibility, where does this leave us? It would seem that, for each ‘physically reasonable’ collection P ⊆ U , one could straightforwardly define a type of ‘P-singular’ property. Here, we draw attention to a pair of obstacles in doing so; it is of some interest that the two obstacles correspond to (*) and (**) from Sect. 17.4. Fix some collection P ⊆ U of ‘physically reasonable’ models. Carrying out the analogous step (1) is not difficult; one can use the definition of P-inextendibility. One might be tempted to carry out an analogous step (2) in the following way: a P-inextendible P-model is P-singular if it is geodesically incomplete. If (*) is true for P, there is no problem. But suppose that (*) is false for P. In this case, we cannot rule out the possibility that a P-singular model is extendible. There seem to be two ways of looking at the situation.16 Consider first an extendible Pinextendible model which is such that no scalar invariants become infinite (as is the case for a number of examples given in the appendix). Such a model seems to be “not ‘singular’ in the usual intuitive sense” ([14], p. 261). On the other hand, we know that any extendible spacetime will have incomplete causal geodesics. And it would seem there is “a serious physical pathology” in any such spacetime since “it is possible for at least one freely falling particle or photon to end its existence within a finite ‘time’...or to have begun its existence a finite time ago” ([37], p. 216). If such a pathology is present, the moniker ‘singular’ does not seem inappropriate—especially in a model which is ‘as large as it can be’ in the sense of P-inextendibility. Suppose the second option is favored and one adopts the definition of P-singular . Complications also arise in carrying out the analogous step (3). If (**) is true for P, we can define an arbitrary P-model as P-singular if (i) it is P-inextendible and P-singular or (ii) it is P-extendible and all of its P-inextendible P-extensions are P-singular (at least one such extension must exist since (**) is true for P). But if (**) is false for P, then there do not exist P-inextendible P-extensions to some P-models and it is not clear how to work around such situations. The upshot seems to be that the question ‘what is a singularity in general relativity?’ is even more subtle than we might have thought.

16 Thanks

to Jim Weatherall for the point.

17 General Relativity as a Collection of Collections of Models

419

17.6 Conclusion Stepping back, we see that the approach to general relativity outlined above can also be implemented with respect to other physical theories. As long as one has a collection of models representing physical possibilities, and as long as there are some models in the collection which are thought to be ‘physically unreasonable’ in some sense, then an exploration of some collection of collections of ‘physically reasonable’ models would seem to be fitting.17 Acknowledgements Thanks to Hajnal Andréka, Judit Madarász, David Malament, István Németi, Gergely Székely, and an anonymous referee for helping to improve a previous draft. My gratitude to Thomas Barrett, Erik Curiel, Sam Fletcher, Bob Geroch, Martin Lesourd, Jan Sbierski, Chris Smeenk, Jim Weatherall, and Chris Wüthrich for helpful discussions on this topic.

Appendix Proposition 1 (*) is false for all P such that (G H ) ⊆ P ⊆ (Dist). Proof Let P be such that (G H ) ⊆ P ⊆ (Dist). Consider the Misner model (N , gab ). Here N = R × S and gab = 2∇(a t∇b) ϕ − t∇a ϕ∇b ϕ where the points (t, ϕ) are identified with the points (t, ϕ + 2π n) for all integers n. Now, let M = {(t, ϕ) ∈ N : t < 0} and consider the extendible model (M, gab ). We know this model is globally hyperbolic [4] and therefore in P. We need only show that all of its extensions  ) be any extension of (M, gab ) and let p be a point fail to be in P. Let (M  , gab  in ∂ M ∩ M . In any neighborhood of p, there will be a point q ∈ ∂ M ∩ M  such  ) is not that q = p. One can verify that I − ( p) = M = I − (q). We see that (M  , gab distinguishing and therefore not in P.  Proposition 2 (*) is false if P = (Caus). Proof Let (M, gab ) be such that M = R × S and gab = ∇(a t∇b) ϕ − sinh t 2 ∇a ϕ∇b ϕ where the points (t, ϕ) are identified with the points (t, ϕ + 2π n) for all integers n ([26], p. 135). One can verify that (M, gab ) is inextendible and that the closed curve at t = 0 is causal since (∂/∂ϕ)a is null there. Moreover, the model was chosen so that the closed causal curve at t = 0 is the only closed causal curve present. Now consider the model (M  , gab ) where M  = M − {(0, 0)}. It is causal—the ‘missing point’ prohibits the causal curve at t = 0 from being closed. But the model has only  one extension: the acausal (M, gab ).18 17 Indeed

[16] explicitly considers such an approach with respect to a version of Newtonian spacetime. See also the related discussion in [25, 34]. 18 This last step relies on the following statement which is also used in the proofs of Propositions 3 and 8 below (Jan Sbierski, private communication): If (M, gab ) is an inextendible model and p ∈ M, then every extension of (M − { p}, gab ) is isometric to (M, gab ).

420

JB Manchak

Proposition 3 (*) is false if P = (G I ). Proof Let (M, gab ) be any geodesically complete model. Now consider the model (M − { p}, gab ) for any point p ∈ M. It is geodesically incomplete since it is extendible. But it has only one extension: the geodesically complete (M, gab ).  Lemma 1 Let (R4 , ηab ) be Minkowski spacetime where ηab = − ∇a t∇b t + ∇a x∇b x + ∇a y∇b y + ∇a z∇b z. Let f : R → R be any smooth function and let : R4 → R be defined by (t, x, y, z) =exp[ f (t)]. The energy-momentum tensor associated with gab = 2 ηab is given by Tab = (ρ + p)∇a t∇b t + pηab where ρ = 3 f  (t)2 /8π and p = −( f  (t)2 + 2 f  (t))/8π . Proof Let (R4 , ηab ) be Minkowski spacetime where ηab = −∇a t∇b t + ∇a x∇b x + ∇a y∇b y + ∇a z∇b z. Let f : R → R be any smooth function and let : R4 → R be defined by (t, x, y, z) =exp( f (t)). Associated with gab = 2 ηab we have the following ([37], p. 446). Rab = −2∇a ∇b f (t) − ηab ηcd ∇c ∇d f (t) + 2(∇a f (t))(∇b f (t)) −2ηab ηcd (∇c f (t))(∇d f (t)) R = −2 [−6ηab ∇a ∇b f (t) − 6ηab (∇a f (t))(∇b f (t))]. Because ∇a f (t) = f  (t)∇a t and ∇a ∇b f (t) = f  (t)(∇a t)(∇b t), we can use the fact that ηab (∇a t)(∇b t) = −1 to simplify. Rab = (2 f  (t)2 − 2 f  (t))(∇a t)(∇b t) + (2 f  (t)2 + f  (t))ηab R = −2 [6 f  (t) + 6 f  (t)2 ]. Using Einstein’s equation Rab − 21 Rgab = 8π Tab we have the following. Tab =

1 [(2 f  (t)2 − 2 f  (t))(∇a t)(∇b t) − ( f  (t)2 + 2 f  (t))ηab ] 8π

Letting ρ = 3 f  (t)2 /8π and p = −( f  (t)2 + 2 f  (t))/8π , we see that Tab = (ρ +  p)∇a t∇b t + pηab as claimed. Proposition 4 (*) is false for all P such that (D EC) ⊆ P ⊆ (N EC). Proof Let P be such that (D EC) ⊆ P ⊆ (N EC). We are done if we can construct an extendible model in the collection (D EC) which has no extension in the collection (N EC). Consider Minkowski spacetime (R4 , ηab ) where ηab = −∇a t∇b t+∇a x∇b x+∇a y∇b y + ∇a z∇b z. Let : R4 →R be defined by (t, x, y, z) =exp[ f (t)] where f (t) = t 2 /2 + t. By the Lemma 1, the energy-momentum tensor associated with gab = 2 ηab is given by Tab = (ρ + p)∇a t∇b t + pηab

17 General Relativity as a Collection of Collections of Models

421

where ρ = 3 f  (t)2 /8π = 3(t + 1)2 /8π and p = −( f  (t)2 + 2 f  (t))/8π = −((t + 1)2 + 2)/8π . So we have the following. ρ − | p| = ρ + p = (2(t + 1)2 − 2)/8π We see that if t > 0, then ρ − | p| > 0. Consider the extendible model (M, gab ) where M = {(t, x, y, z) ∈ R4 : t > 0}. Since ρ − | p| > 0 on M, it follows that the model is in the collection (D EC) ([37], p. 220). We need only show that any extension to (M, gab ) is not in the collection (N EC).  ) be any extension to (M, gab ) and let q ∈ M  be a point in the boundLet (M  , gab ary of M. Let (O, ϕ) be a chart with q ∈ O such that we can extend the coordinates (t, x, y, z) on M to M ∪ O. Let q1 , q2 , q3 ∈ R be such that q = (0, q1 , q2 , q3 ). Consider any smooth functions αt , αx , α y , αz : O → R such that χ a = αt (∂/∂t)a + αx (∂/∂ x)a + α y (∂/∂ y)a + αz (∂/∂z)a is a null vector field on O and we have αt = αx = 1 and α y = αz = 0 on O ∩ M. Now find some δ > 0 such that (−δ, q1 , q2 , q3 ) ∈ O. For t ∈ (−δ, δ), let q(t) = (t, q1 , q2 , q3 ) ∈ M  . Consider the smooth function h : (−δ, δ) → R given by h(t) = b    χ a χ|q(t) where Tab is defined on M  in the natural way using the metric gab . Of Tab course, for all t > 0, we have  b b χ a χ|q(t) = Tab χ a χ|q(t) = (ρ + p)|q(t) h(t) = Tab

= (2(t + 1)2 − 2)/8π = (2t 2 + 4t)/8π Smoothness requires h(0) = 0 and h  (0) = 1/2π . This allows us to find an  ∈ (−δ, 0) such that h(t) < 0 for t ∈ (, 0). So the null energy condition is violated at q(t) for all t ∈ (, 0).  Proposition 5 (*) is false for all P such that (S EC) ⊆ P ⊆ (N EC). Proof The argument follows the one above very closely. Let P be such that (S EC) ⊆ P ⊆ (N EC). We are done if we can construct an extendible model in the collection (S EC) which has no extension in the collection (N EC). Consider Minkowski spacetime (R4 , ηab ) where ηab = −∇a t∇b t + ∇a x∇b x + ∇a y∇b y + ∇a z∇b z. Let : R4 → R be defined by (t, x, y, z) =exp[ f (t)] where f (t) = −t 3 /3. By the Lemma 1, the energy-momentum tensor associated with gab = 2 ηab is given by Tab = (ρ + p)∇a t∇b t + pηab where ρ = 3 f  (t)2 /8π =3t 4 /8π and p= − ( f  (t)2 + 2 f  (t))/8π = (4t − t 4 )/8π . So we have the following.

422

JB Manchak

ρ + p = (2t 4 + 4t)/8π ρ + 3 p = 12t/8π We see that if t > 0, then both ρ + p > 0 and ρ + 3 p > 0. Consider the extendible model (M, gab ) where M = {(t, x, y, z) ∈ R4 : t > 0}. Since ρ + p > 0 and ρ + 3 p > 0 on M, it follows that the model is in the collection (S EC) ([37], p. 220). We need only show that any extension to (M, gab ) is not in the collection (N EC).  ) be any extension to (M, gab ) and let q ∈ M  be a point in the boundLet (M  , gab ary of M. Let (O, ϕ) be a chart with q ∈ O such that we can extend the coordinates (t, x, y, z) on M to M ∪ O. Let q1 , q2 , q3 ∈ R be such that q = (0, q1 , q2 , q3 ). Consider any smooth functions αt , αx , α y , αz : O → R such that χ a = αt (∂/∂t)a + αx (∂/∂ x)a + α y (∂/∂ y)a + αz (∂/∂z)a is a null vector field on O and we have αt = αx = 1 and α y = αz = 0 on O ∩ M. Now find some δ > 0 such that (−δ, q1 , q2 , q3 ) ∈ O. For t ∈ (−δ, δ), let q(t) = (t, q1 , q2 , q3 ) ∈ M  . Consider the smooth function h : (−δ, δ) → R given by h(t) = b    χ a χ|q(t) where Tab is defined on M  in the natural way using the metric gab . Of Tab course, for all t > 0, we have  b b χ a χ|q(t) = Tab χ a χ|q(t) = (ρ + p)|q(t) = (2t 4 + 4t)/8π h(t) = Tab

Smoothness requires h(0) = 0 and h  (0) = 1/2π . This allows us to find an  ∈ (−δ, 0) such that h(t) < 0 for t ∈ (, 0). So the null energy condition is violated at q(t) for all t ∈ (, 0).  Lemma 2 Let F(U ) be the collection of framed models. Let ≤ denote the relation on   , F  ) iff (M  , gab , F  ) is a framed extension F(U ) such that (M, gab , F) ≤ (M  , gab of (M, gab , F). The relation ≤ is a partial ordering on F(U ). Proof See ([13], pp. 188–189).



Lemma 3 Let F(L ) be the collection of framed models satisfying any local property L ⊆ U . F(L ) is partially ordered by ≤ and every sub-collection of F(L ) which is totally ordered by ≤ has an upper bound. Proof Let F(L ) be the collection of framed models satisfying any local property L ⊆ U . Since F(L ) ⊆ F(U ), it follows from Lemma 2 that F(L ) is partially ordered by ≤. Let {(Mi , gi , Fi )} be a sub-collection of F(L ) which is totally ordered by ≤. Following Hawking and Ellis ([20], p. 249) let M be the union of all the Mi where, for (Mi , gi , Fi ) ≤ (M j , g j , F j ), each pi ∈ Mi is identified with ϕi j ( pi ) where ϕi j : Mi → M j is the unique isometric embedding which takes Fi into F j . The manifold M will have an induced metric g = ϕi∗ gi on each ϕi [Mi ] where ϕi : Mi → M is the natural isometric embedding. Finally, take F to be the result of carrying along a chosen Fi using ϕi : Mi → M. Consider the framed model (M, g, F). Clearly, for all i, we have (Mi , gi , Fi ) ≤ (M, g, F). Consider the open cover {Mi } on M. Since

17 General Relativity as a Collection of Collections of Models

423

each (Mi , gi , Fi ) ∈ F(L ), it follows from the fact that L is local that (M, g, F) ∈  F(L ). We see that (M, g, F) is an upper bound for {(Mi , gi , Fi )}. Proposition 6 (**) is true for all local P ⊆ U . Proof Let L ⊆ U be a local property and let (M, gab ) be an L -model which is L -extendible. Let F be frame at some point p ∈ M. So, (M, gab , F) ∈ F(L ). By  , F  ) ∈ F(L ) such Lemma 3 and Zorn’s lemma there is a maximal element (M  , gab     19  that (M, gab , F) ≤ (M , gab , F ). It follows that (M , gab ) is a L -inextendible  L -extension of (M, gab ). Proposition 8 (**) is true if P = (G I ). Proof Let (M, gab ) be a geodesically incomplete model which has a geodesically incomplete extension—call this extension (M  , g  ). If (M  , g  ) is inextendible, we are done. Suppose (M  , g  ) is extendible and let (M  , g  ) be any inextendible exten  ). If (M  , gab ) is geodesically incomplete, we are done. Suppose sion to (M  , gab    ) for p ∈ M  − M  (M , gab ) is geodesically complete. The model (M  − { p}, gab is extendible and therefore geodesically incomplete. By construction, M is a proper  ) is an extension of (M, gab ). But subset of M  − { p} and so (M  − { p}, gab    ).  (M − { p}, gab ) has only one extension: the geodesically complete (M  , gab

References 1. Andréka, H., Németi, I., Madarász, J., & Székely, G. (2010). On logical analysis of relativity theories. Hungarian Philosophical Review, 4, 204–222. 2. Ashtekar, A. (2010). The big bang and the quantum. AIP Conference Proceedings, 1241, 109. 3. Bernal, A., & Sánchez, M. (2007). Globally hyperbolic spacetimes can be defined as ‘causal’ instead of ‘strongly causal’. Classical and Quantum Gravity, 24, 745–749. 4. Chrusciel, P., & Isenberg, J. (1993). Nonisometric vacuum extensions of vacuum maximal globally hyperbolic spacetimes. Physical Review D, 48, 1616. 5. Clarke, C. (1976). Spacetime singularities. Communications in Mathematical Physics, 49, 17– 23. 6. Clarke, C. (1993). The analysis of space-time singularities. Cambridge: Cambridge University Press. 7. Curiel, E. (2017). A primer on energy conditions. In D. Lehmkuhl, G. Schiemann, & E. Scholz (Eds.), Towards a theory of spacetime theories (pp. 43–104). Boston: Birkhäuser. 8. Earman, J. (1995). Bangs, crunches, whimpers, and shrieks: Singularities and acausalities in relativistic spacetimes. Oxford: Oxford University Press. 9. Earman, J., Wüthrich, C., Manchak, J. (2016). Time machines. In E. Zalta, (Ed.), Stanford encyclopedia of philosophy. http://plato.stanford.edu/archives/win2016/entries/time-machine/. is required to implement this step since the collection F(L ) is ‘too big’ to qualify as a set. There are two ways around the problem. On the one hand, one can construct a set S of all spacetimes  ) in S [5]. in U ‘up to isometry’ where each (M, gab ) ∈ U has an isometric representative (M  , gab One can then consider the set F(S ∩ L ) and proceed with the usual version of Zorn’s lemma as applied to sets. On the other hand, one can stay with the collection F(L ) and use an alternate version of Zorn’s lemma as applied to classes [18].

19 Care

424

JB Manchak

10. Ellis, G., & Schmidt, B. (1977). Singular space-times. General Relativity and Gravitation, 8, 915–953. 11. Galloway, G., & Ling, E. (2017). Some remarks on the C 0 -(in)extendibility of spacetimes. Annales Henri Poincaré, 18, 3427–3447. 12. Geroch, R. (1968). What is a singularity in general relativity? Annals of Physics, 48, 526–540. 13. Geroch, R. (1969). Limits of spacetimes. Communications in Mathematical Physics, 13, 180– 193. 14. Geroch, R. (1970). Singularities. In M. Carmeli, S. Fickler, & L. Witten (Eds.), Relativity (pp. 259–291). New York: Plenum Press. 15. Geroch, R. (1971). Spacetime structure from a global viewpoint. In B. Sachs (Ed.), General relativity and cosmology (pp. 71–103). New York: Academic Press. 16. Geroch, R. (1977). Prediction in general relativity. In J. Earman, C. Glymour, & J. Stachel (Eds.), Foundations of space-time theories, minnesota studies in the philosophy of science (Vol. 8, pp. 81–93). Minneapolis: University of Minnesota Press. 17. Geroch, R., & Horowitz, G. (1979). Global structure of spacetimes. In S. Hawking & W. Israel (Eds.), General relativity: An Einstein centenary survey (pp. 212–293). Cambridge: Cambridge University Press. 18. Harper, J., & Rubin, J. (1977). Variations of Zorn’s lemma, principles of cofinality, and Hausdorff’s maximal principle part II: Class forms. The Notre Dame Journal of Formal Logic, 18, 151–163. 19. Hawking, S. (1969). The existence of cosmic time functions. Proceedings of the Royal Society A, 308, 433–435. 20. Hawking, S., & Ellis, G. (1973). The Large Scale Structure of Space-Time. Cambridge: Cambridge University Press. 21. Hawking, S., & Penrose, R. (1970). The singularities of gravitational collapse and cosmology. Proceedings of the Royal Society A, 314, 529–548. 22. Krasnikov, S. (2014). Corrigendum: No time machines in classical general relativity. Classical and Quantum Gravity, 31, 079503. 23. Krasnikov, S. (2018). Back-in-time and faster-than-light travel in general relativity. Cham: Springer. 24. Low, R. (2012). Time machines, maximal extensions and Zorn’s lemma. Classical and Quantum Gravity, 29, 097001. 25. Malament, D. (2008). Norton’s slippery slope. Philosophy of Science, 75, 799–816. 26. Malament, D. (2012). Topics in the foundations of general relativity and newtonian gravitation theory. Chicago: University of Chicago Press. 27. Manchak, J. (2009). Can we know the global structure of spacetime? Studies in History and Philosophy of Science, 40, 53–56. 28. Manchak, J. (2011). What is a physically reasonable spacetime? Philosophy of Science, 78, 410–420. 29. Manchak, J. (2013). Global spacetime structure. In R. Batterman (Ed.), The Oxford handbook of philosophy of physics (pp. 587–606). Oxford: Oxford University Press. 30. Manchak, J. (2014). On space-time singularities, holes, and extensions. Philosophy of Science, 81, 1066–1076. 31. Manchak, J. (2016). Is the universe as large as it can be? Erkenntnis, 81, 1341–1344. 32. Manchak, J. (2017). On the inextendibility of space-time. Philosophy of Science, 84, 1215– 1225. 33. Minguzzi, E. (2012). Causally simple inextendible spacetimes are hole-free. Journal of Mathematical Physics, 53, 062501. 34. Norton, J. (2003). Causation as folk science. Philosophers’ Imprint, 3, 1–22. 35. Penrose, R. (1979). Singularities and time-asymmery. In S. Hawking & W. Israel (Eds.), General relativity: An Einstein centenary survey (pp. 581–638). Cambridge: Cambridge University Press. 36. Sbierski, J. (2018). On the proof of the C 0 -inextendibility of the Schwarzschild spacetime. Journal of Physics: Conference Series, 968, 012012.

17 General Relativity as a Collection of Collections of Models

425

37. Wald, R. (1984). General relativity. Chicago: University of Chicago Press. 38. Weatherall, J. (2014). What is a singularity in geometrized Newtonian gravitation? Philosophy of Science, 81, 1077–1089. JB Manchak is Professor of Logic and Philosophy of Science at the University of California, Irvine. He received his Ph.D. from the same department in 2009 under the direction of David Malament and Robert Geroch. In 2013, he was awarded the Distinguished Teaching Award from the University ofWashington, Seattle. His research concerns the foundations of general relativity with a particular focus on the global structure of spacetime.

Chapter 18

Why Not Categorical Equivalence? James Owen Weatherall

Abstract In recent years, philosophers of science have explored categorical equivalence as a promising criterion for when two (physical) theories are equivalent. On the one hand, philosophers have presented several examples of theories whose relationships seem to be clarified using these categorical methods. On the other hand, philosophers and logicians have studied the relationships, particularly in the first order case, between categorical equivalence and other notions of equivalence of theories, including definitional equivalence and generalized definitional (aka Morita) equivalence. In this article, I will express some skepticism about categorical equivalence as a criterion of physical equivalence, both on technical grounds and conceptual ones. I will argue that “category structure” (alone) likely does not capture the structure of a theory, and discuss some recent work in light of this claim. Keywords Theoretical equivalence · Categorical equivalence

18.1 Introduction In [54], I proposed a criterion of equivalence for (physical) theories.1 This criterion states that two theories are equivalent if:

1 This proposal was inspired and strongly influenced by Halvorson [28], who had previously argued

that category theory would likely be useful for representing theories, particularly for settling questions of (in)equivalence, but who did not make a concrete proposal for how to do this in practice. Since then, there has been a large literature on this subject in philosophy of physics. (Of course, mathematicians have long used category theory to explore related issues!) For a detailed review of the literature on theoretical equivalence in physics, see Weatherall [59, 60]. J. O. Weatherall (B) Department of Logic and Philosophy of Science, University of California, Irvine, CA 92617, USA e-mail: [email protected] © Springer Nature Switzerland AG 2021 J. Madarász and G. Székely (eds.), Hajnal Andréka and István Németi on Unity of Science, Outstanding Contributions to Logic 19, https://doi.org/10.1007/978-3-030-64187-0_18

427

428

J. O. Weatherall

1. their categories of models are equivalent2 ; and 2. the functors realizing that equivalence preserve empirical content. A “category of models”, here, is a category whose objects are models of a physical theory, and whose arrows are maps that, in a suitable sense, preserve the structure of those models.3 What is meant by “empirical content”, meanwhile, is contextual and difficult to pin down in general; in cases of interest one can make precise the sense in which it is preserved. I will have much to say about this criterion below, but in the first instance the idea behind it is that two theories are equivalent if (a) their mathematical structures are equivalent, qua mathematics; (b) they are empirically equivalent; and (c) these two equivalences are compatible. The first of these, (a), is captured by the equivalence of categories; whereas (b) and (c) are captured by the requirement that it be precisely those functors that realize the categorical equivalence that also preserve empirical content. As I describe in Sect. 18.2, this criterion of equivalence, which (at risk of ambiguity) I will call categorical equivalence of physical theories, has been fruitfully employed in a number of cases to clarify the senses in which candidate pairs of theories are equivalent or inequivalent. I think it has led to correct verdicts in all of these cases, and that at least in some cases it has done so in a way that provides new insight into the theories in question. But despite these successes, I think there are reasons to be cautious about the adequacy of categorical equivalence as a criterion of theoretical equivalence.4 In particular, I will argue, a category of models does not (necessarily) capture the (mathematical) “structure” of a theory. Thus, the (categorical) equivalence of two categories of models does not (necessarily) preserve the structure of a theory.5 Categorical equivalence is plausibly a necessary condition for equivalence, but it is not sufficient. 2 I do not review basic ideas from category theory in any detail here; for background, see Mac Lane

[37], van Oosten [53], or Leinster [35]. many physical theories, there are natural candidates for the “models of the theory”; the arrows require more careful attention, though in practice, ambiguities concerning what one should take as arrows of the category of models reflect real interpretational differences. For instance, the models of general relativity are relativistic spacetimes, which are four-dimensional manifolds, satisfying certain topological conditions, endowed with a smooth metric of Lorentz signature. One natural choice of arrows for this category are the isometries, which are diffeomorphisms between spacetime manifolds that preserve the metric. 4 One might be skeptical that there is any single criterion—that is, any necessary or mutually sufficient conditions—for equivalence of theories; instead, one might think that there are many criteria out there that each capture different senses of equivalence, and that the most fruitful approach is to develop a bestiary of such criteria and to ask, in particular cases, in which senses theories are and are not equivalent. From this perspective, the worry is that categorical equivalence, as described in the literature, may not adequately capture the sense of equivalence that it is intended to capture—or, perhaps, any interesting sense at all. 5 Note, here, that the ambiguity alluded to above matters: at issue is whether the categorical equivalence of two categories of models adequately captures the required relationship between the mathematical structures invoked in physical theory; if not, then categorical equivalence, as a criterion of theoretical equivalence that includes an additional condition regarding empirical significance, arguably fails. 3 In

18 Why Not Categorical Equivalence?

429

The worries I will express are related to, and partially inspired by, considerations previously raised by Barrett and Halvorson [9] and Hudetz [29]. But the perspective I offer is different. In particular, I will start with what I take to be the successes of the categorical equivalence approach, and then question whether these successes are achieved for the adverted reasons. The remainder of the paper will proceed as follows. I will begin by introducing categorical equivalence in more detail, and then reviewing some of the ways in which categorical equivalence has been studied and applied by philosophers of science. I will then present some critiques of the approach—including several that I think are important to record, but which I will not pursue (or respond to) here. Next I will focus on one particular worry, regarding whether a category can be said to represent the structure of a theory. I will present what I call the ‘G’ property, which is a property that a category may (or may not) have, and the failure of which in particular examples seems to capture a sense in which those theories do not adequately capture the structure we care about. I will ultimately argue that the ‘G’ property is neither necessary nor sufficient for a category to capture this structure, but I will suggest that it points in a fruitful direction. In the final section, I will describe three attitudes one might adopt given the arguments I present, and argue that they suggest different—though not necessarily mutually exclusive—research programs that one might pursue.

18.2 Categorically Equivalent Theories Categorical equivalence has been explored by philosophers from two directions.6 One, largely theoretical approach—pursued, for instance, by Barrett and Halvorson [9] and Hudetz [29]—has focused on properties of categories of models of theories in first or higher order logic. This approach has established that, in the first order case, categorical equivalence is strictly weaker than other compelling notions of equivalence, such as definitional equivalence and so-called “Morita”, or generalized definitional, equivalence.7 Insofar as these other notions of equivalence, which seem natural within a logical context, are well-motivated, one might take this result alone to show that categorical equivalence is too weak. But this conclusion may be too fast, because it is not clear “how much” weaker categorical equivalence is. In particular, Barrett and Halvorson [9] conjecture that 6 A third direction—pursued, for instance, by Nguyen [41] and by Butterfield [18]—has been more critical. These authors argue that there are further considerations, related to the interpretation of theories that are necessary to establish equivalence. (See also Coffey [19] and Sklar [50].) In a sense, this is uncontroversial: after all, categorical equivalence, as described above, requires theories to be empirically equivalent, in a way that is compatible with their categorical equivalence, and empirical equivalence depends on interpretation. But there is still a matter of controversy over whether there are further senses in which interpretation should matter. I set this cluster of issues aside in what follows, as my goal is to raise a different set of concerns about categorical equivalence. 7 Generalized definitional equivalence of physical theories has been studied extensively by Andréka and Németi [2]; see also Andréka and Németi [1] and Lefever and Székely [34].

430

J. O. Weatherall

categorical equivalence is equivalent to Morita equivalence (in the sense of yielding the same verdicts) for certain classes of first order theories,8 and they suggest that since categorical equivalence is much easier to apply in cases of real physical theories, where one often does not have a first order formulation. For just this reason, we do not have examples of (physical) theories that are categorically equivalent but inequivalent in some stronger, logical sense, of a sort that we might want if we were to evaluate what is missing from categorical equivalence. This remark brings us to the second direction from which categorical equivalence has been studied by philosophers. What we do have is a growing handful of physical theories that are either categorically equivalent, or else which fail to be categorically equivalent in ways that help us to better understand their relationship.9 In particular, categorical equivalence has been used to argue for the equivalence of several pairs of theories: electromagnetism formulated in terms of vector potentials and in terms of electromagnetic fields are equivalent if and only if one takes vector potentials related by a gauge transformation to be isomorphic [54, 57]; likewise, Newtonian gravitation and geometrized Newtonian gravitation (Newton-Cartan theory) are equivalent if and only if one takes gravitational potentials related by a certain class of transformations to be isomorphic [54]; Einstein algebras and general relativity are equivalent [47]; Lagrangian mechanics and Hamiltonian mechanics are equivalent on one way of conceiving of each theory, but one can motivate other ways of thinking of these theories on which they are not equivalent [8]; and likewise there are various formulations of Yang-Mills theory that are equivalent and inequivalent [14, 42, 48, 57]. The details of these examples do not matter for what follows. The point is that the application of categorical equivalence to these cases of antecedent interest has been fruitful. For one, I believe that categorical equivalence has given the right verdict in all of these cases: the theories that are categorically equivalent are equivalent, in the most salient senses; and the theories that are not categorically equivalent are not equivalent—and more, the ways in which they fail to be categorically equivalent have provided insight into the theories in question. In particular, in the cases where one establishes inequivalence, one can still identify functors that relate the relevant theories’ categories of models, and which do so in a way that preserves empirical equivalence. Studying the properties of such functors can reveal information about the theories. For instance, if one does not take vector potentials related by gauge transformations to be isomorphic to one another, then there is a precise sense in which vector potentials have more structure than electromagnetic fields. Of course, the idea that vector potentials have “excess structure”, as compared with electromagnetic fields, is not surprising; indeed, it is best seen as a litmus test for whether the approach is plausible.10 But in other cases, the results have been more 8 Specifically,

they conjecture that categorical equivalence implies Morita equivalence for theories with finite signatures. 9 See Weatherall [58] for a more detailed discussion of the ways in which categorical equivalence has been applied to better understand classical field theories. 10 Though whether the sense of excess structure just alluded to is the right one to consider is disputed—see Nguyen et al. [42], and Bradley and Weatherall [14] for a response.

18 Why Not Categorical Equivalence?

431

novel. For instance, there is a certain theme in the philosophy of physics literature according to which to be a realist about the standard formulation of general relativity is to endorse some form of substantivalism.11 The reason is that general relativity is standardly formulated as a theory of fields on a four-dimensional manifold of spacetime “points”; some authors have concluded that this means general relativity posits spacetime as an independent entity, ontologically prior to matter. In contrast, Earman [22, 23] has suggested that Einstein algebras are an appropriate formal setting for a form of relationism about spacetime in general relativity. Briefly, an Einstein algebra is an algebraic structure whose elements represent possible configurations of fields [27]. Beginning with such a structure, one can proceed to define the structures one uses in general relativity: a metric, a derivative operator, and so on; and one can express Einstein’s equation. But one does so without ever introducing a manifold, and so one might think that this is a theory that does not posit, as a primitive entity, a spacetime manifold. Instead, one works only with possible configurations of matter, relations between those configurations, and structures one can define on the algebra of such configurations. But, drawing on results due to Nestruev [40], Rosenstock et al. [47] show that these two theories are categorically equivalent.12 This result shows more clearly how one can think of a relativistic spacetime as a means of encoding, or representing, nothing more or less than the possible ways in which matter may be (spatio-temporally) configured and the relations between those possible configurations, precisely as Einstein algebras do. This, I think, substantially clarifies the structure represented by a relativistic spacetime. In other cases, exploring functors between physical theories has provided new insight into otherwise murky philosophical disputes. For instance, in a series of recent papers addressing the relationship between Lagrangian mechanics and Hamiltonian mechanics, North [43] has argued that these theories are inequivalent and that Hamiltonian mechanics has “less structure” than Lagrangian mechanics; Curiel [20] has argued that they are inequivalent because Langrangian mechanics has “less structure” than Hamiltonian mechanics; and Barrett [7] has argued that the structures North and Curiel focus on are actually incomparable, in a certain precise sense. More recently, Barrett [8] has substantially clarified the situation, by showing that there is a precise sense in which Lagrangian and Hamiltonian mechanics are (categorically) equivalent, if one takes Lagrangian mechanics to be a theory in which one defines a Lagrangian on the tangent bundle of a configuration space, and Hamilto11 This idea is present throughout much of the literature on the hole argument [24,

45]; seeBrighouse [15] for a particularly clear statement of the view, and Weatherall [61] for a discussion of its origins. 12 There were hints in the philosophical literature that something like this should hold. In the first instance, Earman [22, 23] suggested that Einstein algebras might resolve the apparent indeterminism in general relativity revealed by the hole argument [24]. But as Rynasiewicz [49] argued, in related cases in algebraic topology the sorts of maps that are used to run the hole argument also arise between algebras of functions, suggesting that it is hopeless to expect the hole argument to go away if one moves to Einstein algebras. (See also [6].) This relationship between the maps between Einstein algebras and relativistic spacetimes is at the heart of the equivalence result. For a different view of the hole argument, compatible with the attitudes adopted in the present paper, see Weatherall [56].

432

J. O. Weatherall

nian mechanics to be a theory in which one defines a Hamiltonian on the cotangent bundle of a configuration space. But Barrett also argues that there are other, well-motivated ways of understanding these theories—one of which he identifies with North, and one of which he identifies with Curiel. He then shows that, if one follows North’s proposal, there is a sense, after all, in which the theories are inequivalent. And likewise, if one follows Curiel’s proposal, the theories are also inequivalent. And so one sees that the dispute comes down to what one should mean by “Hamiltonian mechanics” and “Lagrangian mechanics”. Of course, these technical results do not resolve this background issue. But they help to isolate precisely where the disagreements lie. I take these results and the arguments based on them to be good reason to take categorical equivalence seriously. More, since I think that categorical equivalence has yielded the correct verdicts in each of these cases, and done so in a precise way, I think that any critique of categorical equivalence needs to account for why using categorical equivalence has been fruitful in these ways. But I also think that categorical equivalence is probably not the correct criterion of equivalence, for reasons I will elaborate in the next several sections.

18.3 Interlude: Concerns I Will Not Pursue The present section lies largely outside the main thread of the paper and can be freely skipped. But since the goal of the paper is to question the adequacy of categorical equivalence, I will now note a number of concerns about the criterion that I think are largely distinct from the concern I focus on in the following sections. I think these, too, are issues that will need to be addressed in any successful future development of the proposal.13 I will not pursue these in any detail; I do not think they are dispositive, but I will not try to refute them. The first concern has to do with the heterogeneity of the examples noted in the previous section. In particular, these examples involve subtly different conceptions of what counts as a “model” of a physical theory—that is, what one takes to be the objects in a “category of models”. This is so even though in virtually all the cases discussed above, the mathematical structures under consideration are of a similar character: they tend to be smooth manifolds endowed with some further structure, often represented by fields. But the manifolds in the different cases have different representational significance. This would not be a problem were one concerned only with equivalence of theories in a purely logical or mathematical context, where one might take a general view of models as structures that realize some axioms. But in the 13 I do not take the list here to be exhaustive! For instance, I do not (otherwise) raise any concerns about using groupoids—categories where every arrow is invertible—as opposed to categories with richer arrow structure, to represent theories. I also note that Barth [10] has also criticized categorical equivalence—or rather, what he has called the solution-category approach—along lines that are related to some of the points below, though I do not reproduce all of his concerns here and I think some of my concerns are different. Still, my thinking has certainly been influenced by his.

18 Why Not Categorical Equivalence?

433

context of scientific theories, it becomes troubling. This is because it suggests a lack of clarity about what sorts of structures are the relevant ones to count as models for a scientific “theory” in the wild. Given a theory, as described in a textbook or review article, say, how is one to decide how to identify the relevant models to construct the required category? For instance, in the examples of (a) general relativity and Einstein algebras, (b) Newtonian gravitation and geometrized Newtonian gravitation, and (c) vector potential and electromagnetic field formulations of electromagnetic theory, the models of the theories in question are structures representing the complete history of the universe. In all of these cases, except for Einstein algebras, one considers a manifold of (all) events in space and time on which various fields are defined; in the Einstein algebra case there is no manifold, but one has algebras of global, in space and time, field configurations. But in other cases this is not what one does. For instance, in the example of Lagrangian and Hamiltonian mechanics—which should not be dismissed as an outlier, since as I argued above, it is an example in which we have learned something about the relevant theories—the models of the theories are taken to be structures representing all possible instantaneous states of a system, along with a particular dynamics governing the evolution of that system: in the case of Lagrangian mechanics, this is a configuration space along with a Lagrangian function; whereas in Hamiltonian mechanics, it is a phase space along with a Hamiltonian function. A complete history of a system—the models of the first class of examples—would be a trajectory through one of these spaces, rather than the space itself; conversely, in the first class of examples one does not consider different possible dynamics, but only different solutions for a single dynamics. Other examples are different, still. Consider the case of Yang-Mills theory, for instance. On a fiber bundle formulation of the theory, the models are taken to be principal bundles over a manifold of events, with a metric on the base manifold and a principal connection on the principal bundle. And so in a sense there are two manifolds: one representing the global evolution of the universe in space and time, and another carrying information about possible configurations of matter at each point, but without picking out any particular such configuration.14 One might also consider still other candidates that would be natural in theories that have not been studied with these methods. For instance, one might take a candidate “model” of quantum theory to be a Hilbert space along with a ∗ algebra of operators acting on that Hilbert space; in this case, the model would represent a state space along with a privileged set of observables. But this last observation raises another issue, which also speaks to the lack of clarity regarding what is meant by “models” here. Despite the heterogeneity just noted, there is also a striking homogeneity in the examples discussed in the last section—and those studied with these methods. In particular, they are all examples 14 How different really is this situation from the structure one considers in general relativity? It would take us too far afield to evaluate this question in detail, though Weatherall [55] argues that the structures are much more similar in character than they initially appear.

434

J. O. Weatherall

from classical physics, and except for Hamiltonian and Lagrangian mechanics, they are all “classical field theories”. But this is surely a small subset of the physical theories that may be equivalent to others! Where are the examples from quantum mechanics, quantum field theory, statistical physics, and elsewhere? The question is particularly striking given that some of the classic examples of “(in)equivalent theories in the wild” are from the history of quantum theory, such as the famous example of Schrödinger’s wave mechanics and Heisenberg’s matrix mechanics. Can categorical equivalence capture the sense in which these theories are equivalent? Arguably the reason no one has done so yet turns on precisely the ambiguity noted already concerning what should count as a model of a theory. It is not perfectly clear what the category of models of wave mechanics or matrix mechanics ought to be. Yet another cause for concern, largely orthogonal to the issues already raised, is that the sense in which philosophers use category theory to represent physical theories for the purposes of establishing equivalence and inequivalence seems pretty different from the “native” applications of categories in physics, such as in the contexts of locally covariant (quantum) field theory [16] or (higher) gauge theory [5]. In these cases, one tends to use category theory to express the models of the theory, not (in the first instance) to relate those models. So it seems that category theory is entering at a different level of analysis.15 The final two concerns are more general in character, and have less to do with the particular examples that have been studied. First, recall that the criterion of equivalence given above requires not only categorical equivalence, but equivalence given by functors that “preserve empirical content”. The idea is that two theories must be, at least, empirically equivalent if they are to be “theoretically” equivalent. But the notions of “empirical content” and “empirical equivalence” are not clear, and it is hard to see how they could be made precise.16 Indeed, one might worry that it is a prior, largely unanalyzed, notion of which theories are empirically equivalent that is doing the work in the examples discussed above, and that the category theoretic analysis adds little to this.17 (I return to this point in Sect. 18.6.) 15 On the other hand, once one has used categories to build models of a theory, it is often natural to construct categories of such models. But when one does so for theories whose constructions are “native” to category theory one tends to get much richer structures. 16 An anonymous referee notes that attempts to make “empirical equivalence” precise exist in the literature—as, for instance, in van Fraassen [51]. Fair enough. But as van Fraassen himself would surely acknowledge, especially given his more recent work, how our scientific theories come to represent worldly situations involves a great deal of interpretation, intention, practice, and context [52]. One can develop formal tools for expressing these intentions, etc., at which point precise standards of equivalence may be employed. But this hardly yields a formal test of empirical equivalence analogous to the way in which definitional equivalence, say, is a formal test of equivalence of first order theories. The worry, which I articulate presently, is that the vast bulk of the work of establishing equivalence occurs when one tries to take the messy details of how a scientific theory is used to represent the world and distill them into the formal apparatus—and not when one checks the properties of a certain functor. 17 This concern is implicitly raised by Norton [44], who argues that whenever two theories are empirically equivalent, it follows that they must share so much common structure that they are best conceived as notational variants. Though see Bradley [13] for a reply.

18 Why Not Categorical Equivalence?

435

Worse, one might worry that one cannot establish “empirical equivalence” once and for all: one can, at best, establish it for some class of possible interactions or possible measurements. Indeed, in the case of the two formulations of electromagnetism discussed in the previous section, it is natural to take the empirical predictions of each theory to be captured by the measurements one could make with (classical) charged matter. From this perspective, the two theories are empirically equivalent, insofar as they predict the same motions of charged matter. But if one includes interactions with matter represented quantum mechanically, this equivalence breaks down, as shown by the Aharanov-Bohm effect, which is a measurable effect on the behavior of a particle propagating in a region of vanishing electromagnetic field but non-vanishing vector potential.18 This suggests that any equivalence one establishes is, at best, provisional, since future developments elsewhere in physics could lead to experiments that could discriminate between allegedly “equivalent” theories. At the other extreme, one could also imagine forcing equivalence by artificially restricting the possible measurements under consideration. Finally, and most generally of all: one might be skeptical about the idea of having a single, once and for all, characterization of a physical theory—be it as a category, a set of sentences in some language, or even a textbook account.19 The idea is that physical theories are messy affairs including all sorts of arguments, intuitions, biases, interpretations, intentions, numerical methods, and so on. Worse, they are dynamic: methods develop and improve, arguments once found persuasive are rebutted, strongly held intuitions are abandoned, and so on. General relativity as conceived today—insofar as there is an univocal thing that goes by that name—is radically different from general relativity as understood by Einstein and his collaborators. But given this ever-changing richness, in what sense could one ever hope to identify a single category associated with a physical theory, much less establish that, because of some relation that a category stands in to another category, that two theories are “the same”?

18.4 Category Structure and Ideology As I indicated above, I will not pursue the concerns described in the previous section. Instead, I will focus on a worry that I think is more central to the program. Put briefly: a “category (of models)” does not (necessarily) capture the “structure” of a theory. 18 The significance of this example for our understanding of classical electromagnetic theory is discussed by Belot [11]. 19 Note that this concern is a bit different from that raised by the authors noted in footnote 6, because it is not specifically about formal representations of a theory. Rather, the concern is that theories are not to be pinned down at all, whether formally or not. Indeed, one might take the remarks here to be a kind of skepticism about any account of the “structure of theories” or even the “semantics of theories”. For an interesting discussion of these issues, see [25], who argue that physical theories should be associated with a network of different formal axiomatic theories, rather than a single theory. Thank you to Hajnal Andréka and István Németi for drawing my attention to this work.

436

J. O. Weatherall

Or in other words, there is more to the mathematical structure of a theory than just the “category structure” of its models. To say what I mean by this, I first need to say what I mean by “category structure”. Here I adopt a certain ideological posture, which is that the structure of a mathematical “gadget” of a given kind is to be sharply distinguished from the procedure by which you came to construct that object.20 For instance, one may think of the group Z 2 as the quotient group of the integers by even integers, as the symmetry group of a set with two elements, or as the sphere group S 0 , i.e., the group of real numbers of unit length. One might also think of it as some set of ordered sets, the first of which is a domain, and the rest of which characterize multiplication, the identity element, and so on. All of these different ways of constructing Z 2 , however, are the same in one respect: they all have, or instantiate, the same group structure. They are the same, as groups. I take this to mean that, insofar as one wishes to use (just) the group Z 2 for some representational purpose (as opposed to using some other mathematical gadget, perhaps with more or different structure), the construction procedure cannot matter to the success of the representation or the validity of inferences drawn from it. In other words, one is using Z 2 for some representational purposes only insofar as one does not use features of an instantiation of Z 2 that are not shared by all instantiations. But how are we to isolate, from a particular realization of some gadget—say, Z 2 —precisely what structure is intended? The key is to study the maps that preserve that structure. Of course, we do not get such maps for free. But I take it that an essential part of mathematical practice is to define, whenever a new mathematical structure is proposed, a class of mappings that preserve the intended structure—that is, generally, but not always, the isomorphisms of the structure. It is these mappings that capture the sense in which the different realizations of Z 2 are “the same”: they are all related by group isomorphisms. And by looking at what these mappings preserve, one can say what the structures are. For instance, group homomorphisms are mappings between groups that preserve group multiplication; one can infer from this that groups are collections of elements that are distinguished from one another (only) by their multiplicative relations with other group elements. In the case of Z 2 , this means: Z 2 has exactly two distinct elements, one of which is the identity and the other of which, when multiplied by itself, yields the identity. What these elements are, or which element realizes which properties, is not part of the structure, as this is not preserved by group isomorphism. 20 The term “gadget”, here, comes from John Baez, who introduced it (in conversation) because “object” has a technical meaning in category theory and “structure” seems to carry too much baggage; basically, a mathematical “gadget” is any sort of thing that mathematicians define and study. I adopt and discuss this ideology in Weatherall [56]; I think it is particularly clearly expressed in Burgess [17], though he does not go on to emphasize the relationship to “structure-preserving maps” that I introduce presently. Although the basic point is very closely related to famous arguments by Benacerraf [12], it is important to distinguish the ideology about mathematical gadgets that I am adopting from what is sometimes called “mathematical structuralism”. Structuralism, as I understand it, is a view about the ontology of mathematical objects. I do not mean to make any particular claims about ontology here, and I take the claims I make here to be compatible with many different philosophies of mathematics.

18 Why Not Categorical Equivalence?

437

From this perspective, we can now return to category structure. Here the maps that preserve (all) categorical structure are precisely the categorical equivalences.21 Hence, by reflecting on what is preserved by categorical equivalence, we learn what category structure is. In analogy to groups, we find: a category is a collection of objects distinguished (only, and only up to isomorphism) by their arrow-algebraic relations with other objects. In other words, we find that the arrows carry all the information; objects are essentially placeholders.22 In particular, this means that the “internal structure” of objects is not (automatically) preserved under categorical equivalence. Indeed, there is a classic concern along these lines, expressed by Hudetz [29] and others, that categorical equivalence may “trivialize” in some cases. Consider some well-understood, concrete category— say, the category whose objects are groups and whose arrows are homomorphisms.23 Now consider a category whose objects are giraffes, but whose arrows are chosen so that there exists an equivalence between the category of groups and the category of giraffes (with specially chosen arrows). By construction, these are equivalent. But groups and giraffes are not the same! Does this not immediately imply that categorical equivalence is too weak a notion of equivalence? Perhaps. Indeed, from a certain point of view, this claim should not be surprising, for reasons discussed at the beginning of Sect. 18.2: even in the first order case, categorical equivalence is weaker than Morita equivalence, and so if one thought that Morita equivalence was the “right” notion of equivalence for first order theories, one should conclude that categories do not capture the structure of theories. One response to this situation, proposed by Hudetz [29], is to strengthen the notion of “categorical equivalence”, by adding further constraints on the functors that realize the equivalence. In particular, Hudetz suggests that these functors should be definable, in the sense that if the functor takes an object of one category to an object of another, then the object of the codomain category (or an isomorph thereof) should be definable in terms of the object of the domain category.24 I will return to this proposal in Sect. 18.6. But for now, let me simply remark that this proposal is inconsistent with the

21 Why not the categorical isomorphisms—that is, the invertible functors? One certainly could take these to be the relevant standard of equivalence, but this is rarely done in category theory. One reason is that categorical isomorphisms preserve “too much”, in the sense that the intended structure of a category does not include a determinate number of objects (only a determinate number of non-isomorphic objects). Another, related, reason is that it is often natural to think of functors as themselves having structure-preserving maps, known as natural isomorphisms, between them. Categorical equivalences are functors that have “almost” inverses—that is, inverses up to natural isomorphism. In this sense, categorical equivalence might be understood as isomorphism up to isomorphism. 22 These remarks are not meant to be surprising, particularly to anyone who is accustomed to working with categories. In fact, category theorists often emphasize that it is the arrows that do all the work, and there are even single-sort axiomatizations of category theory in which only arrows appear. 23 I am not worrying, here, about size considerations. But for someone who is worried about whether the category of groups is well-defined, we can consider all groups of cardinality less than some inaccessible cardinal, κ. 24 Hudetz makes this condition precise, but for present purposes an informal description suffices.

438

J. O. Weatherall

ideology described above, at least if we understand theories to be represented by categories, precisely because it invokes the internal structure of objects. But there is another possible response to this concern, which is to deny that the trivialization concern is real. The idea is that, at least in some cases, category structure can represent the structure of a theory, precisely because the internal structure of the objects of the category is suitably reflected, or encoded, in the arrows of the category. If this is right, then the claim that the objects of a category are “giraffes” is immaterial—just as claiming that some realization of the group Z 2 happens to have, as elements, giraffes, who have a certain multiplication relation defined on them.25 To see how this goes, consider an example: the category of sets, Set, whose objects are sets and whose arrows are functions. (Of course, by the above, that the objects are sets is immaterial—I am simply giving a construction.26 ) It turns out that in this case, using only arrow constructions, we can reason about sets in detail. For example, there is an object in this category, unique up to isomorphism, that has the property that there exists a unique arrow from any other object to this object. Call this object 1. (It happens to be a set with one element.) Then arrows from 1 may be thought of as elements of their codomains. Similarly, we may think of monomorphisms (i.e., injective functions) as subsets, constructions known as coproducts are disjoint unions, and so on. All of this is true irrespective of what we happen to say the internal structures of the objects are like. This example seems to show that the trivialization concern is chimerical: much more is (or can be) encoded in the arrows of a category than is immediately apparent. But is this always the case? As I argue in the next section the answer appears to be “no”—including for cases of interest for the categorical equivalence program in philosophy of science.

18.5 The ‘G’ Property In the previous section, I argued that in the category Set, the internal structure is “externalized”, i.e., that the internal structure of sets is reflected in the arrows of the category. One way of understanding how this works is to note that objects of the category are uniquely distinguished, up to isomorphism (bijection), by their positions in the graph of arrows. For instance, two sets are isomorphic if and only if they have isomorphic automorphism groups. This feature of Set suggests a proposal. Perhaps one should say that a theory is captured by category structure only if the arrows of that category can distinguish the objects, up to isomorphism. 25 There

is a connection here to classic work by Makkai [38], on a duality between syntax and semantic in first order theories; see also [3, 36]. At least in some cases, one can reconstruct a theory, uniquely up to a suitable notion of equivalence, from its category of models. But the relationship between such results and the categories encountered in the philosophy of physics literature is not clear. 26 In fact, the category of sets can be defined “directly”, as the category satisfying certain axioms, as opposed to by beginning with a prior definition of sets and functions. See Lawvere [33].

18 Why Not Categorical Equivalence?

439

More precisely, we define the following property that a category may have.27 Definition 1 A category C has the ‘G’ property if every full, faithful, and essentially surjective functor F : C → C is naturally isomorphic to 1C . In other words, a category has the ‘G’ property if every “autoequivalence”, that is, every symmetry of the category, in the sense of an equivalence of the category with itself, is naturally isomorphic to the identity.28 Spelling this out, it means that any way of mapping objects of a category to objects of a category that preserves the network of arrows will necessarily take objects to isomorphic objects. In this sense, then, the condition captures the idea that objects are distinguished, up to isomorphism, by their place in the network of arrows. The category Set has the ‘G’ property. So do other “concrete” categories that one often encounters, such as the category Group of groups and group homomorphisms [26, p. 31]. But it is also easy to identify simple categories for which it fails: consider, for instance, a category with two objects and two arrows (the identity on each object). There is an autoequivalence that swaps the objects, but the objects are not isomorphic (and indeed, there are no arrows between them). Still, such examples are contrived, and it is hard to see what internal structure is being captured (or not) in such a case. And so one might ask: do categories that we might naturally associate with, say, physical theories always have the ‘G’ property? The answer is “no”. Consider, for instance, general relativity. We might define a category of models of GR as follows: it is a category whose objects are relativistic spacetimes—that is, smooth, Hausdorff, paracompact four dimensional manifolds with smooth Lorentz-signature metrics—and whose arrows are isometries—that is, diffeomorphisms that preserve the metric. (This is the category associated with GR by, for instance, Rosenstock et al. [47].29 ) This category does not have the ‘G’ property. The reason is that many relativistic spacetimes have no non-trivial symmetries at all—and since the only arrows of the category are isomorphisms, these objects are not distinguished from one another. Choose any two, non-isometric, spacetimes, each with only one automorphism, and consider a functor from GR to itself that takes the first spacetime (and any spacetime isometric to it) to the second; takes the 27 I call this the ‘G’ property because it was proposed by Bob Geroch during a conversation with Hans Halvorson at a meeting in Pittsburgh in April, 2013. Essentially the same condition was also discussed, apparently independently, by Dewar and Eva [21], though their motivation for considering the condition was different: they suggested that violating it would indicate that a theory has “excess structure”. I do not engage further with their proposal here. It is also considered by mathematicians, under a different name: the ‘G’ property is precisely the condition that the automorphism class group of a category be trivial (cf. [26], Problem 1.B). (I am grateful to Hans Halvorson for bringing Freyd’s work to my attention back in 2013.). 28 Here we make use of the fact that every full, faithful, and essentially surjective functor as an almost-inverse, i.e., is an equivalence of categories. For definitions of full, faithful, and essentially surjective, see Leinster [35]. 29 Below I discuss concerns about whether this category has the “right” arrows. But one might also object that it is not clear what objects the category should have, on the grounds that it is not clear if we should limit attention to spacetimes that are connected, maximal, etc. [39]. I set this issue aside here, but note that the two questions may interact in interesting ways.

440

J. O. Weatherall

second (and any spacetime isometric to it) to the first; and acts as the identity on everything else. (We suppose the functor acts trivially on arrows.) The functor so described is an equivalence—but by construction it is not naturally isomorphic to the identity functor.30 So the ‘G’ property does not hold of all categories of interest—and, conversely, the category GR does not have the resources to distinguish non-isometric spacetimes. (Indeed, reflection on the argument just given suggests that GR encodes very little about generic spacetimes.) This seems to me to be a serious problem for the categorical equivalence program. The reason is that the equivalence of categories of models is supposed to capture the sense in which two theories are, mathematically, equivalent. But there is far more to even the mathematical structure of relativistic spacetimes than is encoded in the category GR. Still, the situation is not perfectly clear. There are reasons to think that the ‘G’ property is neither necessary nor sufficient for a category to capture the “structure” of a theory. This property may seem to capture what makes Set seem suitably externalized, but it is not quite what we want. GR may well be deficient, but the ‘G’ property does not capture why. To see that the ‘G’ property is not sufficient, consider the following two theories, with associated categories. First, we have the theory “Directions”. Directions says “the cardinal directions form a two-dimensional vector space (over the reals), with ‘north’ and ‘east’ physically distinguished”. Its category of models, Di, has, as objects, two-dimensional vector spaces with (preferred) ordered basis, and as arrows, linear bijections that preserve that (ordered) basis. Now consider the theory “Baubles”. Baubles says “there are two shiny things, one of which is red and the other of which is blue”. Its category of models, Bau, has, as objects, ordered pairs (of distinct elements), and as arrows, bijections that preserve order. One can easily see that both Bau and Di have the ‘G’ property. This is because, in both cases, all models of the theories are (uniquely) isomorphic to one another, which means that any autoequivalence of the categories of models will necessarily take objects to isomorphic objects. But it is hard to see how either captures the structure of their respective theories. Indeed, Di and Bau are equivalent, despite the models having very different internal structures: the objects of Di are two-dimensional vector spaces, which have infinitely many elements; whereas the objects of Bau have only two elements. But the arrows of the categories do not reflect this.31 So the ‘G’ property does not seem to be sufficient for a category to have suitably captured the internal structure of its objects. 30 One

might worry that this argument depends essentially on GR being a groupoid—i.e., that it has no non-invertible maps. This means that there is no information about which spacetimes might be, for instance, embeddable in one another. Perhaps by adding more arrows, such as isometric embeddings, one could produce a category that has the ‘G’ property. But I doubt it, because one could then consider spacetimes that were, roughly speaking, asymmetric at all scales. This would generate more complicated structures, but still no automorphisms. I suspect that a similar functor could be generated under these circumstances, though I do not claim that this is a proof. 31 This sort of situation arises often when one has highly structured (or highly asymmetric) objects in a category. There are very few maps available that preserve all of the relevant structure.

18 Why Not Categorical Equivalence?

441

The Bau-Di example may seem contrived—and if so, one might think that there is some other property that it fails to have but which, in conjunction with the ‘G’ property, would capture the desired features.32 On this view, one might expect the ‘G’ property to be necessary, but not sufficient. But there are reasons to think this fails as well, as can be seen from the following, non-contrived example. Consider the category Ring whose objects are rings and whose arrows are ring homomorphisms. Rings, recall, are Abelian groups endowed with a second operation—multiplication, as opposed to the group operation, which is called “addition” in this context—that is associative and distributive over addition.33 In general, ring multiplication is not commutative, so that the order of multiplication matters. It turns out, though, that although the order of multiplication matters, there is a certain sense in which the order is nonetheless a matter of convention. To make this observation precise, note that given any ring R, one can always construct an opposite ring, R op , which has precisely the same elements, but whose multiplication operation is such that for any A, B ∈ R, A × R B = B × R op A. Thus, we see that there is a certain intuitive sense in which the “same” multiplication relations may be captured using × R or × R op —namely, by reading left to right versus right to left. One is tempted to say that the ring R and the ring R op have “the same” ring structure. After all, they have the same elements, and, in a sense, the same multiplicative relations, simply expressed in a different way. But in fact, a ring and its opposite are not, in general, isomorphic, and so whatever the intuitive sense in which rings and their opposites “have the same ring structure” may be, it is not captured by isomorphism.34 Instead, this relationship is captured by an autoequivalence of the category Ring. The transformation that takes rings to their opposites, and acts in the obvious way 32 There is another response available to the Bau-Di example, which is to say: in fact, the internal structure of the objects in these categories is not so different after all. This response is motivated by the idea that (adopting the terminology of [62]) the objects of these categories are “co-determinate”, in the sense that any two-dimensional vector space, with ordered basis, is determined “freely” by that basis, i.e., by an ordered pair; and every two-dimensional vector space with ordered basis determines, in particular, an ordered pair (consisting of the basis elements). So, perhaps, once we choose an ordered basis for a vector space, the entire vector space structure should be seen as “determined by” (or, roughly, definable from) that basis. From this perspective, that property ‘G’ holds of these categories is not a problem for property ‘G’. But alas, this response is too fast. The reason is that these categories are too rigid, and one can easily come up with other categories, equivalent to both, for which this “co-determination” relationship does not seem to hold. Consider, for instance, the category whose objects are sets with one elements and whose arrows are functions preserving that element. (Or: the category 1, with a single object and a single arrow.) This category is equivalent to both Bau and Di! And yet it is hard to see how a set with one element could determine a two-dimensional vector space in any interesting sense (since the free vector space on one element is one dimensional). I am grateful to Thomas Barrett for pushing me on this point. 33 Rings are also generally taken to have a multiplicative identity. 34 In fact, although many examples of rings that are not isomorphic to their opposites are known, they are not exactly trivial to state. See Jacobson [31, Sect. 2.8] or Lam [32, Sect. 1]. I believe it was Hans Halvorson who first brought this example to my attention. Observe that groups, too, have opposites, defined in a similar way, but in general groups are isomorphic to their opposites, where the isomorphism takes group elements to their inverses.

442

J. O. Weatherall

on arrows, determines a functor O p : Ring → Ring that takes rings to their opposite rings. This functor is full, faithful, and essentially surjective, and thus it is an autoequivalence (actually, it is an automorphism) of the category. But since rings are not isomorphic to their opposites, it immediately follows that the functor O p is not naturally isomorphic to the identity, because it takes objects to objects that are not isomorphic. Thus, the category Ring does not have the ‘G’ property. One response to this example would be to concede that the category Ring does not adequately capture the structure of rings. But I think this is too fast. As I suggested above, the functor O p seems to take rings to other rings that are, in some sense, “the same”, but where that sense of sameness is not captured by isomorphism. In other words, one might think of this autoequivalence as reflecting a real “symmetry” of the theory of rings.35 Far from failing to capture the structure of rings with the arrows of the category, the O p functor shows that the network of arrows captures a non-trivial sense in which non-isomorphic rings can nonetheless be the same. Of course, this is not a proof of anything, because I have not provided a definitive argument that Ring does capture the structure of the theory of rings. The argument is merely suggestive that the ‘G’ property is not necessary. But whatever else is the case, the failure of the ‘G’ property in this case has a very different character from that in GR, and it is much less obvious, in light of the Ring example, that the ‘G’ property is really capturing what is wrong with the category GR, as a representation of general relativity. Stepping back from this discussion, one might reasonably ask: if the ‘G’ property is neither necessary nor sufficient for a category to have the features that we are interested in, then who cares? I think the considerations just raised show that the ‘G’ property is not quite what we want. But this does not change the fact that some categories, such as Set, Group, and even Ring, seem to have some feature that GR appears to lack, regarding the way in which the network of arrows reflects or expresses the structure of the objects. And this leads to a number of questions: Is there a property that better captures the intuition that motivated the ‘G’ property? Do all “natural” “concrete” categories (such as Man) share these features? Does any physical theory’s category of models have this property?

18.6 Where Do We Go from Here? In the paper thus far, I have proposed a certain informal sense in which some categories might be said to adequately represent the structure of a theory, and argued that the categories one encounters in many discussions of categorical equivalence in the philosophy of science literature do not seem to have the necessary features. I have also proposed a formal condition intended to make the reasoning just sketched 35 Some readers might be tempted, in light of this, to revise the notion of “isomorphism” associated

with rings, so that all rings that are suitably “the same” are isomorphic. But this strikes me as a disastrous proposal. A ring homomorphism that could not distinguish left multiplication from right multiplication would wash out the structure of non-commutative rings!.

18 Why Not Categorical Equivalence?

443

precise, but I have argued that this condition cannot be what we want. This leads to a rather unsatisfactory situation: it seems that there is a sense in which the criterion of categorical equivalence, as discussed in the philosophical literature, is inadequate as a criterion of theoretical equivalence; and yet, it is not clear how the criterion should be modified, nor even how to precisely express how it fails. In the present section, I will sketch three ways of responding to this situation. Some philosophers have already begun exploring each of these three options. But it seems to me that these options reflect importantly different conceptions of what it means to associate a category with a theory. Thus, I think that the merits of each need to be carefully weighed in future work on this subject. The first possible way forward would be to pursue the program suggested by Sect. 18.5: we could find a ‘G’-like property that distinguishes Set from GR in a salient way, but which also respects the sort of symmetry exhibited by Ring. We would then say that categorical equivalence yields theoretical equivalence only for theories whose associated categories satisfy the ‘G’-like property. The main advantage of this approach is that it would capture what makes Set distinctive, and it would help directly diagnose the problem with GR. In a sense, it is the brute-force solution to the problem. On the other hand, this approach has a number of unattractive features. Perhaps the most immediate is simply this: what could the property be? One reason to be skeptical that any such property exists is that the idea the property is meant to express—that a category adequately represents the structure of a theory—is not obviously an intrinsic property of a category at all. Instead, the feature we wish to capture concerns a relation that we wish to see between a category and a theory. We want to know not just about the category itself, but whether it has certain capacities relative to some theory or other. Of course, this is just a rough implausibility argument; it could well turn out that, precisely because categories with certain properties do capture the structure of some theory, their capacity to do so is manifest, as it were, internally. But let us suppose this strategy were successful, in the sense that some property could be found. There are still reasons to doubt that it is the right way forward for the program. First, we already have good reason to expect that this approach would limit the applicability of categorical equivalence, since the example of GR already suggests that categories that we might be interested in using to represent physical theories are unlikely to have the requisite property. Indeed, it is not clear that we should expect any physical theory, encountered in the wild, to naturally be associated with a category satisfying the sort of property envisaged by this program. If this is right, then it would suggest that some or all of the theories for which categorical equivalence has already been used would no longer count as categorically (or theoretically) equivalent. Fair enough, one might say: that is progress in our understanding of the relationship between these theories. But if this is right, then the apparent successes of the categorical equivalence program become a mystery: as I noted above, it seems to me that categorical equivalence has given the correct verdicts in the cases to which it has been applied. If categorical equivalence, as a criterion, is restricted only to theories whose categories have a certain property, and none of the theories considered thus far have that property, then we need to start from scratch in understanding in what sense the theories are equivalent.

444

J. O. Weatherall

So much for the first option, which seems to me to fail even if it succeeds.36 This leads to a second option, which would be to change the criterion of equivalence not by limiting attention to categories with certain properties, but by restricting attention to functors with certain properties. In particular, this is the sort of proposal that Hudetz [29] has defended: recall that on Hudetz’s proposal, one should require that two theories are equivalent just in case they are categorically equivalent, where the functor realizing that equivalence is definable.37 This leads to a criterion of equivalence that Hudetz calls definable categorical equivalence. This approach has some virtues. From this point of view, the problems described in previous sections arise not so much because some categories fail to capture the structure of theories, but rather because we compare those categories using a poorly behaved criterion.38 One might then conjecture that if we limited attention only to definable functors, the “problematic” examples of functors that realized (auto)equivalences in examples such as GR would go away, and we would be left only with examples such as Ring, where the failure of the ‘G’ property did not seem to rule out the possibility that the category captured the internal structure of the objects of the category. But attractive as this proposal may seem, we should be careful about what, exactly, it amounts to. Recall the ideology above: I argued that to understand the “structure” of a mathematical gadget, we must study the maps that we take to preserve that structure. In general, by changing the “structure preserving” maps we consider, we are implicitly changing the structures preserved by those maps. And this is precisely what we are doing when we move from functors to definable functors as maps between categories. Presumably something like this can be done, but we need to be careful and explicit. In particular, no one has clearly articulated what sorts of mathematical gadgets are related by definable functors.39 And this is a concern not only for conceptual reasons, but also for technical ones. For a functor F : C → D to be definable, certain properties must hold concerning languages associated with the objects of C and 36 That said: there remains an interesting question raised by this first approach, which is: if categories C and D are equivalent, with F : C → D realizing that equivalence, then what, if any, structural relationship holds between objects c in C and F(c) in D? I am grateful to Thomas Barrett for emphasizing this point, which I completely endorse. 37 In a recent talk, Thomas Barrett described a similar, but distinct, program, on which it is “wellbehaved” functors that realize equivalences. I will not attempt to reconstruct (or scoop) his ideas here, but note only that it is another proposal that falls into this second category—or, perhaps, somewhere in between the first and second approaches, depending on how it is spelled out. 38 One might be tempted by a possible resonance with the previous proposal, and try to modify the ‘G’ property, using the notion of definable functor, as follows: a category satisfies the ‘H’ property if every full, faithful, and essentially surjective definable functor F : C → C is naturally isomorphic to 1C . But this proposal is unlikely to work, since the functor O p : Ring → Ring apparently counts as definable. 39 I do not mean to criticize Hudetz here. He is explicit about the assumptions he is making when he defines definable functors, and makes clear why in the cases of interest, definable functors are well-defined. But I read his assumptions as sufficient conditions for making sense of definable functors, which is weaker than a theory of the sorts of structures that definable functors relate.

18 Why Not Categorical Equivalence?

445

D. But a generic category does not have a language associated with its objects.40 So in general, how can we evaluate whether a functor is definable? The situation is strongly analogous to noting that not all functions between sets are well-behaved, and then restricting attention to continuous or smooth functions—without first defining a notion of “topological space” or “manifold”. To properly define a notion of definable functor, we first need to introduce a new kind of structure, a Hudetz category, where Hudetz category structure is whatever is preserved by definable functors. These remarks are not meant to dismiss the proposal. To the contrary, I think it is a fruitful one to pursue. But the remarks do suggest that the proposal is incomplete, and they raise questions about how much help the definable functor program will be. In particular, as I have argued above, theories in the wild can, arguably, be associated with categories. But it is much less clear that they can be associated with Hudetz categories, since whatever else is the case, it seems Hudetz categories will require one to specify a (possibly higher order) language associated with a physical theory, and it is not clear that there is a canonical choice of such a language.41 But suppose that this problem can be surmounted. Then, even if we suppose we can associate a Hudetz category with any physical theory in a natural way, it seems the work will be done in identifying and justifying the choice of a language and establishing the necessary definability properties, which raises the concern that category theory plays little role. On the other hand, it is possible that although the proposal is still incomplete, it is unproblematic, for independent reasons. Consider, again, the following analogy: suppose a “definable functor” is a bit like a “continuous function”. Of course, we need a topology to make sense of continuous functions. But some spaces, such as R, come equipped with a canonical topology, or a unique topology compatible with other structure. And some functions of interest on R, such as polynomials, are all automatically continuous in that topology. One might hope or expect that, although we have not yet defined the structure analogous to “topology” on categories of interest, once we do so, we will find that there was a unique or canonical choice, and that the functors that seemed to be the ones of interest will automatically count as “definable” or otherwise well-behaved. Indeed, it might be that we want to generate our definitions so that this turns out to be the case. This possibility strikes me as the most optimistic for the program, though its status remains unclear. Finally, I will now turn to the third option for a path forward. To begin, consider again the examples of “successes” mentioned above—Einstein algebras and general relativity; Lagrangian and Hamiltonian mechanics; and so on. Investigating the proofs of categorical equivalence in each of these cases, we find that the crucial step relies on some background, often deep, mathematical fact. For instance, the relationship between vector potential and electromagnetic field formulations of electromagnetism ultimately comes down to Poincaré’s lemma. The relationship between “standard” 40 Observe that for some categories—toposes—there is an “internal language” associated with the category. But this notion of internal language is not the same as the notion of “language of objects of a category” associated with definable functors. 41 Hudetz has recently made some progress in this direction: see, for instance, [30].

446

J. O. Weatherall

and geometrized Newtonian gravitation ultimately depends on Trautman’s theorems. The relationship between Lagrangian and Hamiltonian mechanics depends on the Legendre transformation, and that between Einstein algebras and general relativity is a special case of function-space duality. And so on. But if it is really these relationships that are in the background, what is added by proving that there is a categorical equivalence (or inequivalence)? The answer is that, in establishing a categorical equivalence, we show that these relationships are functorial, and then determine whether those functors are full, faithful, and essentially surjective. In other words, one attempts to show that the mappings on objects determined by the relationships in question take every model of each theory to an essentially unique model of the other theory; and that it does so in such a way that every structure-preserving map between the models of one theory correspond uniquely to a structure-preserving map between the corresponding models of the other theory, and vice versa. These are natural things to (try to) establish about any mathematical relationship, and establishing whether they hold in a particular case can underwrite a claim that a given relationship really does capture a sense in which two theories are equivalent. Abstracting, then, from this discussion, one might say that what we are really doing when we establish that physical theories are categorically equivalent is abstracting “pure category” structure from a richer characterization of theories, and using that category structure to provide a heuristic for evaluating relationships of prior interest. (I call this idea “Rosenstock’s heuristic”, because I think this perspective is adopted in much of Sarita Rosenstock’s work on categorical equivalence [46].) One way of thinking about this proposal would be to say that categorical equivalence is necessary for equivalence of theories, but that it may not be sufficient. Sufficiency, meanwhile, requires a more subtle and contextual analysis of the proposed relationships between theories—one that, in practice, often involves establishing that the relationships under consideration are already known to preserve “empirical content” in some substantive, but context-dependent, way. In other words, much of the work is done by showing that there are alternative formulations of a theory that are, in some suitable sense, empirically equivalent; establishing that the relationship realizing that empirical equivalence is also a categorical equivalence, then, provides a still stronger sense in which the theories should be said to be equivalent.42 And if

42 Compare this perspective to classic arguments due to Sklar [50], recently amplified by, for instance,

Coffey [19], Nguyen [41], and Butterfield [18], to the effect that a “purely formal” criterion of equivalence could never be adequate. (Recall note 6.) Here it is a semantic relationship—that is, a relationship between the interpreted, applied theories—that is ultimately the starting point, and then the formal methods are a guide to evaluating such relationships. Consider, too, a connection to [44], which argues that empirically equivalent theories are more or less certain to be equivalent in some stronger sense, or else to differ in ways that make more clearly preferable; from the present perspective, categorical (in)equivalence is a way of establishing how much, if anything, is missing from some empirical equivalence.

18 Why Not Categorical Equivalence?

447

the functor is not an equivalence, then one can use the heuristic to better understand what is “lost” as one moves from one formulation to the other.43 From this perspective, the examples of GR and Ring have little bearing on the equivalence relationships under consideration: categorical equivalence is, in a sense, secondary—something we seek to establish only after determining that two theories are empirically equivalent. The fact that there exist apparently pathological autoequivalences of GR is irrelevant because those autoequivalences are pathological precisely because they do not preserve empirical content. They simply do not realize the sort of relationships that we are interested in. And whether there are autoequivalences of Ring is of interest only if it turns out that Ring is associated with some physical theory, and those autoequivalences preserve empirical content. Without that, they, too, are irrelevant. Similarly, understanding categorical equivalence in terms of the Rosenstock heuristic explains why the “successes” noted above were, in fact, successes: they were all cases in which there existed a salient relationship between theories, the status of which was clarified by recasting it in categorical terms. Still, there are disadvantages to adopting this perspective that are important to recognize. Perhaps the most significant disadvantage is that on this view, much (but not all) of the work in establishing that two theories are theoretically equivalent falls back on the murky question of whether those theories are empirically equivalent, which arguably makes the criterion of equivalence vaguer than it at first appears. (On the other hand, insofar as empirical equivalence is necessary for theoretical equivalence, all of the approaches discussed in this section face this worry.) A related concern is that, on the other approaches discussed, categorical equivalence is meant to capture some precise sense in which the mathematical structures used by two theories are equivalent, qua mathematical structures; empirical equivalence, then, establishes merely that in addition to being equivalent qua mathematics, the structures are used in compatible ways for representational purposes. On the present perspective, this relationship is reversed. The claim that categorical equivalence, or some modification thereof, should be expected to capture some robust notion of mathematical equivalence is dispensed with, which makes the significance of categorical equivalence more obscure. This last set of remarks point a significant difference between the Rosenstock heuristic and both the ‘G’ property and Hudetz category approaches. On the first two approaches, a category of models, satisfying some further properties or endowed with some further structure, is intended to capture or represent a theory, full stop. Implicit, I think, in these approaches is the idea that a theory—or at least, a mathematical theory, though perhaps also a physical theory—is the sort of thing that admits of some adequate, precise characterization, once and for all.44 To pursue these approaches 43 On

this point, see the discussions in Baez et al. [4], Weatherall [57], Nguyen et al. [42], and Bradley and Weatherall [14]. 44 Defenders of these approaches might well balk at this point. Do they really need to be committed to the view that categories of models are representations of theories “once and for all”? But if the goal is to determine if two theories are equivalent as theories, then presumably that condition needs to capture everything salient about the pairs of theories. If the goal is to offer a weaker notion of equivalence then much more needs to be said about what features the standard establishes

448

J. O. Weatherall

would be to evaluate whether various candidate representations of a theory succeed. But the third approach I have discussed simply sets this idea aside. One does not need to suppose that a theory is or can be represented by a given category in general; one merely needs to assume that for the purposes of evaluating certain features of a proposed relationship between theories, it is valuable to represent a theory by a category.

18.7 Conclusion My goal in this paper has been to critically re-evaluate categorical equivalence as a criterion of theoretical equivalence for physical theories. My worries turn on a prior question, of whether a category of models can be said to adequately capture the structure of a (mathematical) theory. I have argued that the answer to this question is “no”, at least in the general case, which then leads to a number of further questions. One such question is whether one can express, as a precise condition on a category, a necessary or sufficient condition for that category to encode the internal structure of its objects. Another question is where we should go from here, supposing that one accepts my arguments. On this latter question, I offer three possible paths, each of which, I think, is suggested by work already in the literature, and I discuss some advantages and disadvantages of each. I will conclude with two remarks. One is just to clarify that the possible paths forward that I propose are not mutually exclusive, or, likely, exhaustive. Indeed, I think all three should be pursued, and that the fruits of each will bear on the others.45 That said, as I have argued in the last section, these proposals seem to turn on different conceptions of what the purpose of introducing a category of models of a physical theory is meant to be, and so some care will be needed in future work to keep these different goals clearly in sight. The second remark is that I wish to emphasize the tentative nature of the arguments here. I am expressing worries—not proving theorems or even defending particular views. From one perspective, this may make the paper seem unsatisfying or unclear. I am sympathetic. But I think the real issue, which I have tried to bring forward here, is that a program that has received considerable attention in recent years remains underspecified and inchoate, and it is my hope that the considerations raised here help move the project forward. Acknowledgements I am grateful to Thomas Barrett, Lukas Barth, Neil Dewar, Ben Eva, Ben Feintzeig, Hans Halvorson, Laurenz Hudetz, David Malament, Toby Meadows, and Sarita Rosenequivalence with regards to. This is what empirical equivalence offered: equivalence with regard to the predictions made by two theories, without implying “full” equivalence. One possible line, here, would be to say that categorical equivalence and its various elaborations are attempting to capture a kind of “structural equivalence”, though I think more needs to be said about just what that means. 45 In particular, one might worry that the third approach brushes too many foundational questions aside, and that although it is pragmatically attractive, we should still be interested in the answers to those questions—which the first two approaches may yet yield.

18 Why Not Categorical Equivalence?

449

stock for many helpful conversations in connection with this material, and to Hajnal Andréka, Thomas Barrett, István Németi, and an anonymous referee for detailed comments on a previous draft. A version of the paper was presented at a workshop at the Munich Center for Mathematical Philosophy; I am grateful to the organizers and the audience for their valuable feedback.

References 1. Andréka, H., & Németi, I. (2014). Comparing theories: the dynamics of changing vocabulary. In: Johan van Benthem on logic and information dynamics (pp. 143–172). Springer. 2. Andréka, H., & Németi, I. (2014). Definability theory course notes. Available at https://old. renyi.hu/pub/algebraic-logic/DefThNotes0828.pdf. 3. Awodey, S., & Forssell, H. (2013). First-order logical duality. Annals of Pure and Applied Logic, 164(3), 319–348. 4. Baez, J., Bartel, T., & Dolan, J. (2004). Property, structure, and stuff. Available at: https://math. ucr.edu/home/baez/qg-spring2004/discussion.html. 5. Baez, J., & Schreiber, U. (2007). Higher gauge theory. In A. Davydov (Ed.), Categories in Algebra, Geometry, and Mathematical Physics (pp. 7–30). Providence, RI: American Mathematical Society. 6. Bain, J. (2003). Einstein algebras and the hole argument. Philosophy of Science, 70(5), 1073– 1085. 7. Barrett, T. (2014). On the structure of classical mechanics. The British Journal for the Philosophy of Science, 66(4), 801–828. 8. Barrett, T. W. (2019). Equivalent and inequivalent formulations of classical mechanics. British Journal for Philosophy of Science, 70(4), 1167–1199. 9. Barrett, T. W., & Halvorson, H. (2016). Morita equivalence. The Review of Symbolic Logic, 9(3), 556–582. 10. Barth, L. (2018). Master’s thesis, University of Heidelberg. 11. Belot, G. (1998). Understanding electromagnetism. The British Journal for the Philosophy of Science, 49(4), 531–555. 12. Benacerraf, P. (1965). What numbers could not be. Philosophical Review, 74(1), 47–73. 13. Bradley, C. (2019). The non-equivalence of Einstein and Lorentz. The British Journal for the Philosophy of Science. Forthcoming. https://doi.org/10.1093/bjps/axz014. 14. Bradley, C., & Weatherall, J. O. (2020). On representational redundancy, surplus structure, and the hole argument. Foundations of Physics, 50(4), 270–293. 15. Brighouse, C. (2020). Confessions of a (cheap) sophisticated substantivalist. Foundations of Physics, 50(4), 348–359. 16. Brunetti, R., Fredenhagen, K., & Verch, R. (2003). The generally covariant locality principlea new paradigm for local quantum field theory. Communications in Mathematical Physics, 237(1–2), 31–68. 17. Burgess, J. P. (2015). Rigor and structure. New York: Oxford University Press. 18. Butterfield, J. (2019). On dualities and equivalences between physical theories. In: Huggett, N., & Wüthrich, C. (Eds.), Spacetime after quantum gravity. Forthcoming. 19. Coffey, K. (2014). Theoretical equivalence as interpretive equivalence. British Journal for the Philosophy of Science, 65(4), 821–844. 20. Curiel, E. (2013). Classical mechanics is Lagrangian; it is not Hamiltonian. The British Journal for Philosophy of Science, 65(2), 269–321. 21. Dewar, N., & Eva, B. (2017). A categorical perspective on symmetry and equivalence. 22. Earman, J. (1986). Why space is not a substance (at least not to first degree). Pacific Philosophical Quarterly, 67(4), 225–244. 23. Earman, J. (1989). World enough and space-time. Boston: The MIT Press.

450

J. O. Weatherall

24. Earman, J., & Norton, J. (1987). What price spacetime substantivalism? The hole story. The British Journal for the Philosophy of Science, 38(4), 515–525. 25. Formica, G., & Friend, M. (2020). In the footsteps of Hilbert: The Andréka-Németi group’s logical foundations of theories in physics. In J. X. Madarász & G. Székely (Eds.), Hajnal Andréka and István Németi on unity of science: from computing to relativity theory through algebraic logic. Heidelberg: Springer. 26. Freyd, P. J. (1964). Abelian categories, vol. 1964. New York: Harper & Row. 27. Geroch, R. (1972). Einstein algebras. Communications in Mathematical Physics, 26, 271–275. 28. Halvorson, H. (2012). What scientific theories could not be. Philosophy of Science, 79(2), 183–206. 29. Hudetz, L. (2019a). Definable categorical equivalence. Philosophy of Science, 86(1), 47–75. 30. Hudetz, L. (2019b). The semantic view of theories and higher-order languages. Synthese, 196(3), 1131–1149. 31. Jacobson, N. (1951). Lectures in abstract algebra, vol. 1: basic concepts. The University Series in Higher Mathematics. D. Van Nostrand Co., Inc., Princeton, NJ. 32. Lam, T.-Y. (2013). A first course in noncommutative rings (Vol. 131). Heidelberg: Springer Science & Business Media. 33. Lawvere, F. W. (1964). An elementary theory of the category of sets. Proceedings of the national academy of sciences, 52(6), 1506–1511. 34. Lefever, K., & Székely, G. (2019). On generalization of definitional equivalence to non-disjoint languages. Journal of Philosophical Logic, 48(4), 709–729. 35. Leinster, T. (2014). Basic category theory. Cambridge: Cambridge University Press. 36. Lurie, J. (2018). Ultracategories. https://www.math.harvard.edu/~lurie/papers/Conceptual. pdf. 37. Mac Lane, S. (1998). Categories for the working mathematician (2nd ed.). New York: Springer. 38. Makkai, M. (1993). Duality and definability in first order logic. Providence, RI: American Mathematical Soc. 39. Manchak, J. B. (2020). General relativity as a collection of collections of models. In J. X. Madarász & G. Szekély (Eds.), Hajnal Andréka and István Németi on unity of science: from computing to relativity theory through algebraic logic. Heidelberg: Springer. 40. Nestruev, J. (2003). Smooth manifolds and observables. Berlin: Springer. 41. Nguyen, J. (2017). Scientific representation and theoretical equivalence. Philosophy of Science, 84(5), 982–995. 42. Nguyen, J., Teh, N. J., & Wells, L. (2020). Why surplus structure is not superfluous. British Journal for Philosophy of Science, 71(2), 665–695. 43. North, J. (2009). The ‘structure’ of physics: A case study. Journal of Philosophy, 106(2), 57–88. 44. Norton, J. (2008). Must evidence underdetermine theory. In J. A. Kourany, M. Carrier, & D. Howard (Eds.), The challenge of the social and the pressure of practice: Science and values revisited (pp. 17–44). Pittsburgh, PA: University of Pittsburgh Press Pittsburgh. 45. Norton, J. D. (2011). The hole argument. In: Zalta, E. N. (Ed.), The Stanford Encyclopedia of Philosophy, Fall 2011 Edition. https://plato.stanford.edu/archives/fall2011/entries/spacetimeholearg/. 46. Rosenstock, S. (2019). A categorical consideration of physical formalisms. Ph.D. thesis, University of California, Irvine. 47. Rosenstock, S., Barrett, T. W., & Weatherall, J. O. (2015). On Einstein algebras and relativistic spacetimes. Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics, 52, 309–316. 48. Rosenstock, S., & Weatherall, J. O. (2016). A categorical equivalence between generalized holonomy maps on a connected manifold and principal connections on bundles over that manifold. Journal of Mathematical Physics, 57(10), 102902. 49. Rynasiewicz, R. (1992). Rings, holes and substantivalism: On the program of Leibniz algebras. Philosophy of Science, 59(4), 572–589. 50. Sklar, L. (1982). Saving the noumena. Philosophical Topics, 13(1), 89–110. 51. van Fraassen, B. (1980). The scientific image. Oxford: Oxford University Press.

18 Why Not Categorical Equivalence?

451

52. van Fraassen, B. (2008). Scientific representation. Oxford: Oxford University Press. 53. van Oosten, J. (2002). Basic category theory. BRICS Lecture Series LS-95-01. https://www. staff.science.uu.nl/ooste110/www/syllabi/catsmoeder.pdf. 54. Weatherall, J. O. (2016a). Are Newtonian gravitation and geometrized Newtonian gravitation theoretically equivalent? Erkenntnis, 81(5), 1073–1091. 55. Weatherall, J. O. (2016b). Fiber bundles, Yang-Mills theory, and general relativity. Synthese, 193(8), 2389–2425. 56. Weatherall, J. O. (2016c). Regarding the hole argument. The British Journal for the Philosophy of Science, 69(2), 329–350. 57. Weatherall, J. O. (2016d). Understanding gauge. Philosophy of Science, 83(5), 1039–1049. 58. Weatherall, J. O. (2017). Category theory and the foundations of classical space-time theories. In E. Landry (Ed.), Categories for the Working Philosopher (pp. 329–348). Oxford: Oxford University Press. 59. Weatherall, J. O. (2019a). Theoretical equivalence in physics, part 1. Philosophy Compass, 14(5), e12592. 60. Weatherall, J. O. (2019b). Theoretical equivalence in physics, part 2. Philosophy Compass, 14(5), e12591. 61. Weatherall, J. O. (2020). Some philosophical prehistory of the (Earman-Norton) hole argument. Studies in History and Philosophy of Modern Physics, 70, 79–87. 62. Winnie, J. A. (1986). Invariants and objectivity: A theory with applications to relativity and geometry. In R. Colodny (Ed.), From Quarks to Quasars (pp. 71–180). Pittsburgh: University of Pittsburgh Press. James Owen Weatherall is Professor of Logic and Philosophy of Science at the University of California, Irvine, where he is also a member of the Institute for Mathematical Behavioral Science, the Center for Cosmology, and the Jack W. Peltason Center for the Study of Democracy. His principal research interests concern topics in philosophy of science, philosophy of physics, and mathematical physics.

Chapter 19

Time Travelling in Emergent Spacetime Christian Wüthrich

Abstract Most approaches to quantum gravity suggest that relativistic spacetime is not fundamental, but instead emerges from some non-spatiotemporal structure. This paper investigates the implications of this suggestion for the possibility of time travel in the sense of the existence of closed timelike curves in some relativistic spacetimes. In short, will quantum gravity reverse or strengthen general relativity’s verdict that time travel is possible? Keywords Time travel · Quantum gravity · General relativity · Emergence · Semi-classical quantum gravity · Causal set theory · Loop quantum gravity · String theory

19.1 Introduction General relativity (GR), our currently best theory of gravity and of spacetime, permits time travel into one’s own past in the sense that it contains models of spacetime with closed timelike curves, i.e., worldlines potentially traced out by matter in spacetime, which intersect themselves. If a particle follows such a closed worldline, it returns not only to its earlier position in space—which is common enough—, but in spacetime, i.e., also to its earlier position in time. An early example of such a spacetime is what has become known as Gödel spacetime [11], but in fact there are innumerably many such solutions in GR.1 Should we thus conclude that time travel is, in fact, physically possible, i.e., in accord with the laws of nature? We should not, as there are good reasons to think that, despite its phenomenal empirical success, GR is not the last word on gravity and on the fundamental structure of what plays the role of spacetime: GR assumes that matter has essentially classical 1 For

a recent review, see [30].

C. Wüthrich (B) Department of Philosophy, University of Geneva, Geneva, Switzerland e-mail: [email protected] © Springer Nature Switzerland AG 2021 J. Madarász and G. Székely (eds.), Hajnal Andréka and István Németi on Unity of Science, Outstanding Contributions to Logic 19, https://doi.org/10.1007/978-3-030-64187-0_19

453

454

C. Wüthrich

properties, e.g., by having a determinate spatiotemporal distribution. But of course we have learned from quantum physics that matter degrees of freedom behave rather differently. Thus, at a minimum, GR ought to be replaced by a theory which can accommodate the quantum nature of matter. It is for this simple but conclusive reason that we need a quantum theory of gravity.2 No such theory yet exists in its fully articulated and empirically confirmed form, but candidate theories are string theory, loop quantum gravity, causal set theory, and many more. Thus arises the question of whether these candidates for a more fundamental theory of gravity permit time travel in the same or a similar sense as does GR. In fact, there are two ways in which a quantum theory of gravity might do so. First, it may permit time travel by admitting models which contain the (analogue of) closed timelike curves. In this case, time travel would accord to the laws of nature stipulated by that theory. This would straightforwardly licence time travel’s physical possibility. Second, although that theory itself may prohibit time travel in this same sense, it could allow for the emergent relativistic spacetime—which well approximates the fundamental structure at some scale—to contain closed timelike curves. Although the fundamental theory would then remain inhospitable to time travel itself, it would tolerate the possibility of time travel at some other, less fundamental, scale. It is this possibility in particular that I wish to explore in this article. After settings things up in Sect. 19.2, I will introduce four theories of quantum gravity in Sect. 19.3: semi-classical quantum gravity, causal set theory, loop quantum gravity, and string theory, and discuss the possibility of time travel directly in those theories. In Sect. 19.4, I will turn to the second possibility, viz., that these theories themselves disallow time travel, but fail to prevent it at the emergent level. Conclusions follow in Sect. 19.5.

19.2 Global Hyperbolicity and Energy Conditions Does a theory succeeding to GR include or exclude closed timelike curves and similar causal pathologies inside the bounds of what it deems physically possible? Since despite valiant efforts, no quantum theory of gravity has been fully articulated, let alone empirically confirmed, our discussion must remain preliminary and speculative. Still, from considering candidate theories and their presumed verdict on the question, we hope to glean intimations of an answer and at least start to survey the dialectical landscape of possibilities. There are at least three ways in which quantum gravity may prejudge the case for or against the possibility of time travel. First, it may rule it out by fiat by imposing global hyperbolicity or a kindred mathematical condition. Such an imposition may be metaphysically motivated to rule out causal pathologies, or it may be occasioned by the pragmatic desire to apply a particular mathematical apparatus, which requires 2 And

much less for a whole list of other reasons routinely given in the literature, and critically discussed by [15, 22, 37].

19 Time Travelling in Emergent Spacetime

455

the condition. This a priori restriction to causally benign structures may, of course, eventually be justified a posteriori by the empirical success of the theory. Second, it may be the case that although no such condition is demanded at the outset, it can be derived from the resources of the theory itself. In particular, time travel may be ruled out as a consequence of well-justified assumptions concerning what is physically reasonable or even possible. In a situation like this, it might appear as if time travel is ruled out on physical grounds, and that causality-violating spacetimes ought to be deemed unphysical artefacts of the mathematical formalism of overly permissive GR.3 Although suggestive, we will see in Sect. 19.4 that this may not follow. The third possibility is that we find closed timelike curves (or an analogous feature) prevalent in the more fundamental theory of quantum gravity, or at least in its physical applications. This outcome would suggest (though perhaps again not entail, see Sect. 19.4 below) that the intriguing possibility of violations of causality, first encountered in GR, may remain in quantum gravity. Before turning to approaches to quantum gravity in the next section, the remainder of this section (Sect. 19.2.1) offers a brief discussion at the classical level of two notions central for the possibility of closed timelike curves and thus of time travel: global hyperbolicity and so-called ‘energy conditions’.

19.2.1 At the Classical Level Let me start by fixing some terminology. A relativistic spacetime M, gab  described by a four-dimensional, differentiable manifold M with a metric gab with Lorentz signature is time orientable just in case it permits a globally consistent time direction in the form of an everywhere defined continuous timelike vector field. As a temporal direction is picked as the ‘future’, the time orientation of such a spacetime is thereby determined. A worldline is a continuous timelike curve whose orientation agrees with the time orientation of the spacetime in which it is contained. A closed timelike curve is a closed worldline. The existence of closed timelike curves in a spacetime mark the violation of a so-called ‘causality condition’. It turns out that there is whole hierarchy of stronger and stronger causality conditions ([13], Sects. 6.4–6.6; [33], Sects. 8.2–8.3). The weakest condition requires that there are no closed timelike curves. The strongest condition demands that the spacetime be ‘globally hyperbolic’. Thus, if a spacetime is globally hyperbolic, it does not contain closed timelike curves. Let us unpack the notion of global hyperbolicity. A spacelike hypersurface Σ ⊂ M with no edges is called a global time slice. If such a global time slice Σ is achronal, i.e., if it is not intersected more than once by any future-directed causal curve, it is called a partial Cauchy surface. The future domain of dependence D + (Σ) of a partial Cauchy surface Σ is the set of events p ∈ M such that every past inextendible causal 3 It

certainly appeared so to me when Smeenk and I wrote Smeenk and Wüthrich ([30], Sect. 8).

456

C. Wüthrich

curve through p intersects Σ. The past domain of dependence D − (Σ) is defined analogously. A partial Cauchy surface Σ is a Cauchy surface just in case the total domain of dependence D + (Σ) ∪ D − (Σ) is M. A spacetime M, gab  which admits a Cauchy surface is said to be globally hyperbolic. The future domain of dependence of a global time slice Σ is of interest because it characterises the set of events for which any signal or information which reaches them must have passed through Σ. Thus, assuming that signals and information cannot travel faster than the speed of light, conditions on Σ should determine the complete state at p—assuming deterministic dynamics.4 Similarly, the conditions on Σ should determine the state at any event in the past domain of dependence. In the case of a globally hyperbolic spacetime, therefore, any event at all in the spacetime is similarly determined by the conditions on Σ. Consequently, there cannot be (among other things) closed timelike curves in such a spacetime: regardless of whether or not these closed timelike curves intersect Σ, Σ would not be a Cauchy surface, and so the spacetime would not be globally hyperbolic. Global hyperbolicity or the existence of closed timelike curves are global properties of a spacetime in the sense that, although properties ascribable to spacetimes, they are not possessed by individual events and do not supervene on any such local properties. Before we get to quantum gravity, it should be noted that it would be surprising if moving beyond GR would mean relapsing into imposing global, non-dynamical constraints on spacetime structure such as prohibiting the existence of closed timelike curves by fiat, as it appears as if GR owes its success precisely to abandoning such constraints. We should expect one kind of constraint, however, to restrict the models of classical GR: energy conditions. These are universally valid (but local) constraints on the matter sector of the theory and capture the thought that not just any stressenergy tensor Tab can adequately represent the physical matter content of the universe. Thus, they express general conditions which any matter or non-gravitational field is required to satisfy in order to qualify as ‘physical’. Through the Einstein equation, G ab = 8π Tab ,

(19.1)

where the Einstein tensor G ab = G ab [gab ] is constructed from the spacetime metric gab and its first and second derivatives, and Newton’s constant G and the speed of light c are set to 1, that matter content is related to the geometry of the spacetime. Just like the metric, the Einstein tensor is defined on a four-dimensional pseudo-Riemannian manifold M and describes the curvature of the spacetime M, gab . Importantly, the (classical) energy conditions are defined in tangent spaces and so obtain locally, i.e., point-wise. Since these conditions hold only strictly locally, they do not have the power to rule out causal pathologies such as closed timelike curves, which are global (or at least ‘regional’) properties of a spacetime in that they are topological features of a spacetime that can only be exemplified in at least a region of spacetime.

4 This

expectation is confirmed, e.g., for physical fields in curved spacetimes, which propagate in accordance to hyperbolic wave equations ([33], Chap. 10).

19 Time Travelling in Emergent Spacetime

457

As it turns out, these point-wise energy conditions can only be satisfied for types of ‘classical’ matter; they all fail for quantum fields (due to arbitrarily negative expectation values of energy densities of quantum fields at any point). Hence, the classical conditions have been relaxed to ‘non-local’ energy conditions, which hold in extended regions of spacetime, rather than at single events. Thus, they could at least potentially disqualify spacetimes with closed timelike curves as unphysical. Although the final verdict is out, it seems, however, that this hope will not be borne out.5

19.3 Theories of Quantum Gravity This section will introduce four approaches to quantum gravity and discuss the viability of time travel in each of them: semi-classical quantum gravity (Sect. 19.3.1), causal set theory (Sect. 19.3.2), loop quantum gravity (Sect. 19.3.3), and string theory (Sect. 19.3.4).

19.3.1 Semi-classical Quantum Gravity The research programme of quantum field theory on curved spacetime offers a first stab at a quantum theory of gravity. Although mathematically demanding, the approach is physically simple: take a classical relativistic spacetime and treat it as a fixed background for quantum fields. For a linear field φ defined over globally hyperbolic spacetimes M, gab , there is a mathematically rigorous and physically wellbehaved procedure for writing down an algebra A(M, gab ) of observables ([34], Chap. 4). For more general fields, however, semi-classical quantum gravity may not be well-behaved, and the procedure cannot be applied to non-globally hyperbolic spacetimes.6 If the approach demands global hyperbolicity, it cannot accommodate time travel on closed timelike curves. Since the spacetime is already set in place— and fixed—, there is also no option of emergent time travel under the assumption of global hyperbolicity. This will play out rather differently in loop quantum gravity (see below). However, the most severe limitation of the approach is that the spacetime structure is assumed to be fixed. This stands in obvious tension with the insight in GR that the spacetime geometry not only acts upon the matter content of the world, but is also acted upon by it. Thus, spacetime geometry is dynamical, and one must countenance the ‘backreaction’ of the matter field on the metric. The most basic way to construct a quantum theory of gravity which does this is to combine classical 5 For

a discussion of this point, see Smeenk and Wüthrich ([30], Sect. 7); for a primer on energy conditions, see [5]. 6 For a recent—and optimistic—review, see [31].

458

C. Wüthrich

relativistic spacetime geometry—the left-hand side of (19.1)—with an account of quantum matter which will determine the right-hand side of (19.1). The quantum matter fields, described by an appropriate quantum field theory (QFT), propagate in a classical spacetime. The backreaction of the matter fields on the spacetime geometry is computed through the semi-classical Einstein field equation: G ab = 8π ψ|Tˆab |ψ,

(19.2)

where ψ|Tˆab |ψ is the expectation value of the stress-energy tensor of the quantum fields (which now is of course an operator) in a (physically reasonable) state |ψ. Semi-classical quantum gravity is a quantum theory of gravity as defined above: it combines gravity—in the form of spacetime curvature—with a quantum theory of matter. In general, semi-classical quantum gravity is expected to offer a valid extension of GR for some relatively simple cases when quantum and gravitational effects are not too strong ([40], Sect. 2). Does semi-classical quantum gravity permit time travel? As the only difference between the fully classical Eq. (19.1) and the semi-classical one (19.2) is in the description of the matter on the right-hand side, the relevant issue is whether the quantum nature of matter is less, equally, or more constraining on the spacetime geometry on the left-hand side than is classical matter. As mentioned above, the most direct way in which it is less constraining and so more permissive of time travel than classical matter is by violating the energy conditions believed to hold for classical matter. However, the expectation value ψ|Tˆab |ψ may also act in ways which are more constraining than classical matter. For instance, [12] argued that since ψ|Tˆab |ψ appears to diverge on or near the boundary to the region of spacetime containing closed timelike curves (assuming there were none ‘before’),7 it effectively ‘cuts off’ the region of spacetime with the causal pathologies, rendering it inaccessible from the causally well-behaved domain and thus effectively protecting ‘chronology’. Hawking took the divergence of the expectation value of the energy-momentum tensor as the Cauchy horizon is approached and thus as closed timelike curves are ‘about to form’ to strongly support his ‘chronology protection conjecture’, according to which the “laws of physics do not allow the appearance of closed timelike curves” ([12], 603, emphasis in original). Stated in this way, given the pervasive existence of closed timelike curves in relativistic spacetimes, there thus seems to be little reason to think that the chronology protection conjecture is true in semi-classical quantum gravity, and no reason at all to accept it in the context of GR. If successful, Hawking’s argument might well only establish that the region with the closed timelike curves is beyond the reach of physical denizens of the causally well-behaved region on this side of the Cauchy horizon as they would have to pass through a wall of arbitrarily high energy density in order to be able to travel along closed timelike curves. But these curves might still exist beyond the Cauchy horizon 7 These

boundaries are so-called ‘future Cauchy horizons’, i.e., boundaries of future domains of dependence of global time slides, where these domains are defined as those regions such that every past inextendible causal or timelike through any event in the region intersects the global time slice.

19 Time Travelling in Emergent Spacetime

459

in an inaccessible region of spacetime, and in fact could be taken advantage of by would-be time travellers on the far side of the Cauchy horizon. In this case, the laws of physics would not prevent the existence of closed timelike curves, though perhaps their accessibility. However, it is not clear whether Hawking’s argument succeeds in the first place. A theorem due to [20] establishes that the expectation value of the energy-momentum tensor for a scalar field is not everywhere well-defined on compactly generated Cauchy horizons. The authors suggest that this result may be taken as further support of Hawking’s chronology protection conjecture in that it suggests that the Cauchy horizon cordons off the region with closed timelike curves. However, the result can just as well be taken to indicate that semi-classical quantum gravity is simply no longer valid at the horizon8 and that therefore, Hawking’s argument fails, at least if based solely on semi-classical quantum gravity. Thus, only a more fundamental theory of quantum gravity can deliver a final verdict on the matter. The prospect of probing more deeply to see whether chronology protection obtains motivates not only the present inquiry, but—as should become clear—also promises to shed light on the nature of quantum gravity itself.

19.3.2 Causal Set Theory Just as the research programs introduced in Sects. 19.3.3 and 19.3.4, causal set theory aims at offering a ‘full’ quantum theory of gravity, i.e., a theory in which also the gravitational sector is subjected to a quantum treatment. It is motivated by a result in classical GR, which shows that at least for an important class of relativistic spacetimes, the causal structure determines the metric structure of the spacetime up to a conformal factor.9 This result is interpreted to suggest that the causal structure of a (causally well-behaved) spacetime contains almost the full information concerning its geometry; in fact, all but some information about local ‘size’. In the causal set theory programme, this missing ‘size’ information is naturally supplied by the number of discrete ‘atoms’ of spacetime contained in any region. In slogan form, causal set theory assumes spacetime to be causal structure plus number. Accordingly, the fundamental structures postulated by causal set theory—the ‘causal sets’—are

8 See

for example [32], as well as the discussion in Earman et al. ([6], Sect. 5). is a paraphrase of a theorem due to [21]. More precisely, the theorem states that for any two  , a causal isomorphism ‘distinguishing’ (and temporally oriented) spacetimes M, gab  and M  , gab   φ : M → M is a smooth conformal isometry. A bijection φ : M → M is a causal isomorphism just in case for all p, q ∈ M, p is in the chronological past of q if and only if φ( p) is in the chronological past of φ(q). A spacetime M, gab  is distinguishing just in case for all p, q ∈ M, if the chronological past of p is identical to the chronological past of q, then p = q, and if the chronological future of p is identical to the chronological future of q, then p = q. A causal isomorphism φ is a conformal isometry just in case it is a diffeomorphism and there exists a conformal factor Ω : M  → R such  with Ω = 0. that φ∗ (gab ) = Ω 2 gab

9 This

460

C. Wüthrich

discrete sets of elementary events, which are partially ordered by a relation of causal precedence or of causal connectibility. As it stands today, causal set theory frames a promising research programme but is still a long way from offering a complete quantum theory of gravity. The promise of the research programme remains unfulfilled in three ways. First, merely requiring the fundamental structure of our world to be a discrete, partially ordered set falls way short of constraining the boundless possible combinations of such structures to serious candidates with a promise to reproduce our physical world: there are just too many discrete partial orders, almost all of which do not resemble our universe. How can one identify the ‘physical sector’ of the theory? The most popular strategy to taming the unruly possibilities is by imposing additional constraints; in particular, advocates of causal set theory favour imposing dynamical laws in response to the problem (e.g., the classical sequential growth dynamics proposed in [24]). Second, even the successful resolution of this trouble would at best result in a purely classical theory: neither does the state space have the structure of a vector space, nor is there anything quantum about the dynamics. If the theory is truly to incorporate the quantum nature of matter, then causal set theory as it stands can at best be a stepping stone toward a full quantum theory of gravity. Third, causal set theory suffers from the same affliction as all other approaches to quantum gravity: a full understanding of the relationship between the fundamental physics postulated and the emergent relativistic spacetime with its dynamics between spacetime and matter as encoded in the Einstein field equation remains elusive. Whatever the eventual resolution of these challenges may look like, what does the present state of the theory suggest regarding the possibility of time travel? In specialrelativistic theories, the causal structure of spacetime is expressed by the usual and well-behaved lightcones of Minkowski spacetime. In Minkowski spacetime, causal precedence thus merely partially orders events, as spacelike related events do not stand in this relation. Since causal set theory also permits ‘spacelike separated’ events, the ordering is equally merely partial. For our present purposes what matters, however, is that the ordering is also no weaker than partial. This means, in particular, that it is not a mere pre-order, i.e., a reflexive and transitive order, which is not, in general, antisymmetric.10 The demand that the causal relation be antisymmetric (and so not a mere pre-order) thus precludes the possibility of causal loops of the form of cycles containing numerically distinct events a and b such that a causally precedes b and b causally precedes a, as was possible in spacetimes in GR which contain closed timelike curves. In other words, causal set theory prohibits, in its central axiom, that the fundamental structure accommodates what would be the natural analogue of closed timelike curves in causal set theory, i.e., closed chains of events connected by the relation of causal connectibility. This choice simplifies the technical demands of the approach, as well as its metaphysics ([38], Chap. 3; [18]), but it imperils causal set theory’s capacity to give rise binary relation R on a domain D is antisymmetric just in case for all x, y ∈ D, if Rx y and Ryx, then x = y.

10 A

19 Time Travelling in Emergent Spacetime

461

to relativistic spacetime models which do include closed timelike curves. Although physicists are generally happy to give up non-globally hyperbolic models of GR, a theory’s inability to reproduce that sector of GR may turn out to be a vice rather than a virtue, e.g., in case models with closed timelike curves turn out to be physically significant.11 That causal set theory cannot lend itself to spacetimes with closed timelike curves is not, however, a foregone conclusion: it might be that causal sets, although free of causal loops at the fundamental level, nevertheless can combine in ways such that at higher levels, causal loops emerge. If this turns out to be the case, however, then the emergent structure must necessarily violate the strictures of causal set theory and thus cannot be a model of it. I will return to this possibility in Sect. 19.4 below.

19.3.3 Loop Quantum Gravity Just like causal set theory, loop quantum gravity also starts from GR in its attempt to articulate a quantum theory of gravity. Instead of attempting this via the formulation of a classical discrete structure, it applies a canonical quantization procedure to GR. A canonical quantization of a classical theory attempts to preserve the core structure of the classical theory and convert it, in the most faithful way possible, into a quantum theory. This core structure consists in the canonical variables and their algebraic structure expressed by their Poisson bracket. The classical variables, such as position and momentum, are turned into quantum operators on a Hilbert space and the Poisson bracket becomes the commutation relation between the basic canonical operators. Any canonical approach to quantum gravity assumes that spacetime M, gab  is globally hyperbolic and thus of topology Σ × R, where Σ is again a threedimensional spacelike submanifold of M.12 In this case (of global hyperbolicity), there exists a timelike vector field t a everywhere on M. This vector is tangent to a family of curves which can be parametrized by a ‘time’ parameter t. The resulting three-dimensional surfaces Σt of constant t are totally ordered time slices in those spacetimes. An important technical choice for any canonical approach to quantum gravity is to select a pair of canonical variables as coordinates in the classical phase space of GR. In the traditional canonical approach, the four-dimensional metric of spacetime is rewritten as a function of the spatial three-metric defined on the Σt and of the ‘lapse’ N and the ‘shift vector’ N a resulting from the decomposition of t a = N n a + N a , where n a is a vector field normal to the Σt ’s. The pair of canonical variables in this 11 As is argued in [6] and in [30], this is a possibility that should not be ignored at the present stage of

inquiry, given that it is difficult to know antecedently which parts of a new theory reveal important new physics. 12 An accessible textbook for both approaches to canonical quantum gravity described in this section is [7].

462

C. Wüthrich

approach is then given by the spatial three-metric as the ‘configuration’ variable and what is essentially the extrinsic curvature as its conjugate momentum variable. Capturing the content of the globally hyperbolic sector of GR using this choice of canonical variables leads to a representational surplus resulting from expressing the physical content of the theory using more variables than are needed to capture the true degrees of freedom. As a consequence, ‘constraint equations’ arise. The symmetric three-metric encodes six configuration degrees of freedom. The four constraint equations then leave us with two degrees of freedom for each point in space, as expected from GR. Solving the constraint equations thus gives us the true physical state space. Although its canonical variables permit a natural geometric interpretation, this choice is marred with insurmountable technical difficulties: the constraint equations are non-polynomial and no technique is known for solving them. This problem has essentially halted progress along these lines. An alternative choice of basic variables promised to revive the canonical quantization programme and led to the approach known as ‘loop quantum gravity’. In loop quantum gravity, one proceeds not by using ‘metrical’ variables to capture the geometry of spacetime, but instead variables based on the ‘connection’. Rewriting the metric in terms of ‘triads’, the connection enters their covariant derivative, yielding an expression of the geometry of spacetime equivalent to that based on metrical variables. Loop quantum gravity then selects the so-called ‘Ashtekar variables’, i.e., the (densitized) triads as momentum variables and the connection as canonically conjugate configuration variables. Re-expressing the Einstein-Hilbert action in terms of the components of the connection and the triads, it turns out that there are three sets of constraint equations which must be satisfied in order for the rewritten theory to be equivalent to (the globally hyperbolic sector of) GR. Among these, the Hamiltonian constraint is of particular interest and will be discussed in a moment. Moving to the quantum theory by means of canonical quantization, it seems natural to consider the ‘connection representation’ of the wave function, i.e., expressing the wave function of the system as a function of the connection variable, similarly to Maxwell and Yang-Mills theories. However, technical difficulties suggest replacing the connection representation with the ‘loop representation’, in which the wave function is given as a functional of ‘holonomies’ around closed loops.13 Working in this loop representation renders two of the three families of constraint equations solvable. With just one constraint remaining to be solved, we arrive at what is known as the ‘kinematical Hilbert space’. The so-called ‘spin network states’ can be constructed from the loop states and constitute an orthonormal basis of this Hilbert space ([28], Sect. 6.3). A spin network can be represented by a ‘coloured’ graph such that both its nodes and the links between them carry spin representations. The spin network states can naturally be interpreted as forming a kind of discrete space where the nodes of the network represent the ‘atoms’ of this granular space, and the links the surfaces where adjacent atoms ‘touch’ ([27], Sect. 1.2.2). On this interpretation, physical space is, fundamentally, a quantum superposition of spin

13 See

Gambini and Pullin ([7], Chap. 8).

19 Time Travelling in Emergent Spacetime

463

networks.14 Although the quantum measurement problem prohibits a straightforward interpretation of this structure as chunky space, the geometric properties of the spin networks are at least suggestive of this natural interpretation. Time, on the other hand, seems to have disappeared entirely in canonical quantum gravity. The remaining constraint equation to be solved turns out to demand that the Hamiltonian operator sends the physical states to zero. Unlike in quantum mechanics, where the Schrödinger equation mandates how the Hamiltonian governs the dynamical evolution of the system, the Hamiltonian constraint equation here suggests that there is no change over time for genuinely physical states. In fact, there remains no quantity that could reasonably be interpreted as time in the Hamiltonian constraint equation.15 Furthermore, this equation has so far resisted being solved, stalling the programme of loop quantum gravity. Without progress on this problem, however, we seem to have no prayer of even articulating what time travel could mean in this theory. There are two workarounds. First, some physicists have symmetry-reduced the physical system under study, restricting the classical theory to homogeneous and isotropic spaces before subjecting it to quantization. This ‘cosmological sector’ is much simpler than the full theory such that the corresponding Hamiltonian constraint equation can be solved. Unfortunately, these systems are too simple to permit anything that could reasonably be interpreted as time travel.16 The second workaround is more relevant for our present purposes. The idea here is to forego the canonical description of the dynamical evolution in favour of a covariant formulation of the evolution. Hence, instead of the Hamiltonian operator, we express the dynamics of the theory in terms of transition amplitudes between ‘initial’ and ‘final’ kinematical states. These transition amplitudes are computed as weighted sums over ‘histories’, i.e., ways in which the theory says the ‘final’ state could have been obtained from the ‘initial’ state. The details of how this is accomplished are irrelevant for our purposes (and are given in [29]). What matters is that on a natural, but arguably overly simplistic, interpretation, both the ‘initial’ and the ‘final’ states deserve to be unquoted and correspond to quantum states of spatial hypersurfaces— indeed of global time slices of spacetime. Thus, we seem to be faced with a temporally innocuous structure in which no meaningful sense of time travel is permitted. This interpretation is supported by the fact that any canonical quantization scheme of GR starts out by restricting itself to globally hyperbolic spacetimes. The canonical quantization recipe simply requires the classical spacetime structure of the physical system to be quantized to be globally hyperbolic, and thus causally well behaved. Just as for causal set theory, loop quantum gravity really only considers the globally hyperbolic sector of GR. One would therefore naturally assume that the theory also prohibits an analogue of closed timelike curves at the fundamental level, as did 14 For a further discussion concerning the physical interpretation of these spin networks, see Wüthrich ([39], Sect. 2.1). 15 Huggett et al. ([19], Sect. 2) offers a more detailed explanation of the problem and brief survey of reactions to this ‘problem of time’. 16 Though they are philosophically rich in other ways [17].

464

C. Wüthrich

causal set theory. This conclusion would be premature, though. First, even though macroscopically the ordering of the initial and final states at earlier and later global times precludes closed curves in time, it could be that there exist tiny loops like this at the microscopic level. Second, just like causal sets, the spin networks of loop quantum gravity may combine such that causal loops emerge at a higher level even though there are none at the fundamental level. This second option will be discussed in Sect. 19.4 below, so let me finish with a brief word on the first possibility. Given that the problem of time has so far resisted resolution in the canonical approach to solving the Hamiltonian constraint of the full theory, the possibility of microscopic causality violations remains undecidable on this approach. On the covariant alternative, the possibility can be ruled out: the transition amplitudes are constructed from oriented ‘simplices’ which are constructed from considering, among other things, the action of the Hamiltonian on the nodes of the spin network ([29], Sects. 4.4, 5.3, 7.3). Thus, the distinction between ‘timelike’ and ‘spacelike’ directions is maintained at the fundamental level. Given the construction rules of these simplices, microscopic causal loops are ruled out.

19.3.4 String Theory As a third example of full quantum gravity, let us consider the fate of causal loops in string theory [23, 41]. String theory is the dominant approach to quantum gravity. Unlike causal set theory and loop quantum gravity, it starts out from the standard model of particle physics and tries to extend the framework to incorporate gravity. As string theory is based on the paradigm of particle physics, it does not conceive of gravity as a feature of a dynamical spacetime, but instead as arising from an exchange of force particles, so-called ‘gravitons’. Furthermore, the point particles of earlier theories are replaced by 1-dimensional ‘strings’ (or higher-dimensional ‘branes’) in order to circumvent the problem of ‘non-renormalizability’, which befell earlier attempts to incorporate gravity into the framework of particle physics [35]. String theory exists at two levels. First, there is the perturbative level. At this level, string theorists have developed mathematical tools in order to define the string perturbative expansion over a given background spacetime. Second, the perturbative level is expected to be grounded in the more fundamental non-perturbative theory. This elusive ‘M-theory’ does not yet exist. In fact, its existence is just inferred from the usual assumption that a perturbative expansion only ever gives an approximation to the true physical situation, which must be precisely captured by a more fundamental, and non-perturbative, theory. M-theory is thought to relate five different perturbative string theories by ‘dualities’, i.e., symmetries equating strong coupling limits in one string theory to a weak coupling limit in another string theory. As M-theory does not yet exist, it is impossible to determine its verdict on time travel. However, supersymmetric gravity—widely considered a stepping stone towards full string theory—offers guidance into whether we should expect string theory to permit time travel. Although most of the results I am aware of have been obtained in

19 Time Travelling in Emergent Spacetime

465

five-dimensional supersymmetric gravity rather than in higher-dimensional theories, it turns out that solutions of five-dimensional supergravity can straightforwardly be extended to solutions of ten- or eleven-dimensional supergravity ([9], 4590). Consequently, it appears as if string theory will likely admit time travel in case fivedimensional supersymmetric gravity does. And it turns out that five-dimensional supersymmetric gravity admits many solutions with closed timelike curves. The systematic investigation of closed timelike curves in supersymmetric gravity starts two decades ago in [10]. Since then, at least three important classes of solutions in five-dimensional supersymmetric gravity which contain closed timelike curves have been identified. First, there are supersymmetric solutions of flat space with a periodically identified time coordinate, resulting in a construction analogous to a rolled-up Minkowski spacetime of topology S × R3 in GR. These solutions are topologically not simply-connected. In this case, passing to a covering spacetime avoids the closed timelike curves. In general, however, the supersymmetric solutions with closed timelike curves have a simply-connected topology and so cannot be avoided. The second class of supersymmetric solutions with closed timelike curves consists in an analogue of Gödel spacetime [9]. Just as Gödel spacetime, these solutions model a topologically trivial, rotating, and homogeneous (and so not asymptotically flat) universe containing close timelike curves. Whether these Gödel-type solutions really permit time travel has been contested: holography may effectively act to protect the chronology of Gödel-type solutions in that closed timelike curves are either hidden behind a ‘holographic screen’ and thus made inaccessible for timelike observers, or else broken up into pieces such that no closed timelike curves remain intact [3]. The third class is the so-called ‘BMPV black hole’ solutions, named after the initials of the authors of [4]. BMPV black holes are charged, rotating black holes in simply-connected, asymptotically flat spacetime. Thus, they are the supersymmetric counterparts of the general-relativistic Kerr-Newman black holes. Just as KerrNewman spacetimes can be maximally analytically extended to encompass a region inside the event horizon of the black hole to contain closed timelike curves [36], so can BMPV black holes, as has been shown by [10]. More precisely, Gibbons and Herdeiro show this to be the case for extremal black holes, i.e., black holes whose angular momentum equals their mass (in natural units). It is unclear whether their result generalizes to include physically more realistic cases. Although they firmly establish their result only for a rather finely tuned combination of black hole parameters, [10] show that the presence of closed timelike curves for BMPV black holes is rather robust: this hyper-critical solution represents a simply connected, geodesically complete, asymptotically flat, non-singular, time-orientable, supersymmetric spacetime with finite mass, satisfying the dominant energy condition. Thus, ‘cosmic censorship’, whatever its details, will struggle to eliminate this case.17 None of these three classes of supersymmetric spacetimes with closed timelike curves conclusively establishes the possibility of time travel in supersymmetric gravity, let alone string theory. Having said that, however, it should be noted, with [9], 17 See

Smeenk and Wüthrich ([30], 623) for more details.

466

C. Wüthrich

that closed timelike curves appear generically in physically important classes of five-dimensional supersymmetric spacetimes. The authors of [9] even complain how difficult it is to find five-dimensional solutions of supersymmetric gravity which do not contain either closed timelike curves or singularities. Of course, this finding may be counted as a strike against supersymmetric gravity, rather than as a point in favour of time travel. Nevertheless, the emerging picture is one pointing toward the suggestion that closed timelike curves arise naturally in string theory, or at least in its vassal theories. Clearly, this suggestion remains preliminary in that it is wide open to what extent these results translate into a fundamental, nonperturbative version of string theory, and indeed whether string theory or any of the other approaches presented in this section are viable approaches to quantum gravity for that matter.

19.4 Emergent Time Travel? In the last section, we have discussed the possibility that theories beyond GR directly issue a verdict on the permissibility of time travel. However, as stated in Sect. 19.1, we need to consider a second possibility, according to which an effective theory renders time travel physically possible, even though it is a valid approximation to a more fundamental theory, which in itself rules out time travel. This is the topic of this section. Of the four approaches discussed in Sect. 19.3, two seem to directly permit closed timelike curves and so time travel: while this was conjectured to be the case for string theory based on incomplete results from five-dimensional supersymmetric gravity, the prospects of some form of chronology protection obtaining are rather remote for semi-classical quantum gravity. Leaving aside the case of semi-classical quantum gravity, a note of caution concerning string theory. The results noted in the previous section pertain to the spacetime structure of ‘target space’, which is the spacetime background for strings18 ; it does not correspond to observed, ‘phenomenal’ spacetime, which is an emergent phenomenon in string theory [14]. If this is right, then regardless of whether the target spacetime contains closed timelike curves, what will be of interest is whether the emergent phenomenal spacetime will have a structure such as to permit time travel. As the emergence of spacetime, and particularly of its global properties, is at present only very partially understood in string theory,19 further analysis of this will be left for another day. What is the situation in the two approaches which ruled out closed timelike curves (or their analogues) at the fundamental level, viz., causal set theory and loop quantum gravity? In a way, the situation for both causal set theory and loop quantum gravity 18 Strictly speaking, it is not even target space, or at least not the metric g in it, is fundamental; rather, given a general metric in the action of a theory, one obtains a quantum theory of perturbations around a coherent state, which corresponds to the classical relativistic metric ([16], Sect. 3). 19 See [18].

19 Time Travelling in Emergent Spacetime

467

is similar: fundamentally, they prohibit the equivalent of closed causal curves and so rule out time travel, as we have seen in the previous section. However, depending on what the relationship between the fundamental theory and emergent spacetime may be in each case, we may find that the emergent, macroscopic spacetime structure permits time travel. A consideration of the precise role and ambit of the theory for each case is necessary in order to appreciate this point. One can distinguish between the astrophysical and the cosmological ambit of GR. On the one hand, GR furnishes a theory of gravity applicable to individual stars, or ‘small’ isolated systems consisting of stars and smaller bodies such as our solar system. As such, it can describe the orbits of planets around their central star, the gravitational collapse of a star, a black hole, the merger of two black holes and the gravitational waves emitted on the occasion, and similar astrophysical phenomena involving gravity. On the other hand, since gravity is the dominant interaction at large distances, GR also delivers a cosmological theory, i.e., a theory describing the large-scale structure of the cosmos in its entirety and throughout most of its history. This should not be confused with a ‘theory of everything’, which it clearly needs not be despite the fact that it describes our world at the largest distances and over the longest durations. Qua cosmological theory, GR still supplies the backbone of the current cosmological standard model in the form of the Friedmann-Lemaître-RobertsonWalker spacetimes. These two applications of GR are—though connected—nevertheless distinct. Relativistic spacetimes describing phenomena of the astrophysical kind, just as the cosmological models, are often ‘global’ or ‘large-scale’ in that they encompass large (typically infinite) spatial distances and temporal durations as well. For instance, a Schwarzschild black hole is represented by a spacetime of infinite extent. However, such an astrophysical spacetime is not thought to correctly describe the large-scale structure of the cosmos at all: its description is accurate only near the astrophysical object it is thought to capture. The demand that such astrophysical spacetimes be asymptotically flat—roughly that the curvature vanishes away from the astrophysical object, i.e., in the ‘asymptotic’ region—encodes the idea that the system at stake is, at least to a good approximation, isolated from the influence of other systems or indeed the rest of the cosmos.20 In principle, spacetimes representing individual systems could then be stitched together in order to obtain a more complete description of the physics of ever larger and more encompassing parts of the cosmos. Closed timelike curves arise in both types of relativistic spacetimes. Astrophysical spacetimes such as Kerr-Newman spacetimes may contain causality-violating regions (in this case inside the maximal analytic extension of the interior of the black hole). For these spacetimes, closed timelike curves are typically confined to a region of spacetime. Thus, in general, there are events such that no closed timelike curves pass through them. Cosmological solutions such as Gödel spacetime also accommo20 To articulate precisely what asymptotic flatness amounts to, and, connectedly, what it is for a system to be isolated in a background-independent theory such as GR is far from trivial and requires some unpacking, as it is offered, e.g., in Wald ([33], Sect. 11).

468

C. Wüthrich

date time travel. In those cases, closed timelike curves are sometimes not confined to a region and thus in general every event lies on some closed timelike curves. In those cases, the opportunity to time travel is thus democratically awarded to all events. Returning to quantum gravity, if causal set theory and loop quantum gravity are considered cosmological theories, their laws reign supreme and one would not expect the possibility of time travel to arise. Indeed, since such cosmological models would have been consistent with the causality-enforcing features of these theories, the possibility of time travel would in this case be precluded universally. Let us consider this case for both theories separately.

19.4.1 Causal Set Theory as a Cosmological Theory Turning to causal set theory first, if the fundamental structure thus covers the cosmos and this structure is a causal set, then the condition of asymmetry entails that there cannot be a causal loop anywhere in the entire cosmos. Would it be possible, however, that even though causal loops are globally ruled out at the level of the fundamental structure the relativistic spacetime that emerges from this fundamental causal set contained closed timelike curves? In order to answer this question, we need to consider how relativistic spacetimes are thought to emerge from causal sets. While the emergence of spacetime in causal set theory has so far resisted resolution, the outlines are sufficiently clear for us to be in a position to settle the question.21 As a necessary condition on the relationship between the underlying causal set and the emergent spacetime, there exists an embedding of the causal set into the spacetime. An embedding of a causal set into a spacetime is an injective map from the domain of the elements of the causal set into the manifold of the spacetime that preserves the causal structure in the sense that for any two elements x and y of the causal set, x causally precedes y if and only if the image of x is contained in the causal past of the image of y. This condition and the asymmetry of the causal set together entail that any spacetime events in the image of the causal set cannot be part of a closed timelike curve. Now it is consistent with the condition (and with the asymmetry of the causal set) that the emergent spacetime nevertheless contains closed timelike curves. If so, however, at most one of the events on the closed timelike curve could be in the image of the elements of the causal set. Thus, if there exist closed timelike curves in the emergent spacetime, there could be absolutely no trace of this fact in the fundamental structure. As a causal set is discrete and a relativistic spacetime a continuum structure, it will in general not be the case that the fundamental causal set contains all the ‘information’ present in a relativistic spacetime. That the emergent spacetime does not contain any relevant geometric features not already in some form present in the causal set motivates the additional demand that the embedding be faithful, i.e., 21 For a much more detailed account of the emergence of spacetime in causal set theory, see Huggett and Wüthrich ([18], Chaps. 3, 4).

19 Time Travelling in Emergent Spacetime

469

that the map distributes the images of the elements of the causal set approximately uniformly on the spacetime manifold, which is assumed to be approximately flat below the discreteness scale.22 The idea behind imposing faithfulness is precisely that the geometry of the emergent spacetime be ‘boring’ below the scale captured (and capturable) by the fundamental discrete structure. If the emergent spacetime contained—presumably at Planckian scales—very thin slices disconnecting from the bulk of the spacetime, looping back to reconnect to it at earlier times in a way such that it contained closed timelike curves running along these slices, then it may not violate faithfulness: the spacetime could be flat (locally Minkowskian) everywhere with just no image point on the thin slice looping back. Though thus consistent with the letter of faithfulness, such a situation would arguably violate its spirit: that the emergent spacetime not contain any relevant features not at least implicitly present in the fundamental structure. In sum, if causal set theory is regarded as a cosmological theory, there appears to be quite literally little space for an emergent spacetime to naturally accommodate closed timelike curves.

19.4.2 Loop Quantum Gravity as a Cosmological Theory Many of the conclusions arrived at in the case of causal set theory qua cosmological theory hold in the case of loop quantum gravity in this regime as well. In fact, there is explicit consideration of a sector of loop quantum gravity, known as ‘loop quantum cosmology’, which studies symmetry-reduced models of loop quantum gravity. By imposing isotropy and homogeneity already at the classical level, the constraint equations simplify sufficiently to admit explicit solutions [2]. In those models, a ‘cosmic’ time totally orders all events and there is consequently no possibility for time travel. More generally, the Hamiltonian formalism presupposes the physical system at stake—be it a pendulum, a planet, or spacetime itself—is spatially extended and evolves over time, following the dynamics of Hamilton’s equation. Classically, this assumes, as we saw above, that the spacetime has topology Σ × R such that the spatial time slices Σ are again totally ordered in ‘time’ by the reals. Moving to the quantum theory, as (to repeat) the canonical programme has stalled, the question remains open what the states in the physical Hilbert space are, and so how they ought to be interpreted physically. Alternatively, covariant loop quantum gravity does not easily lend itself to a cosmological interpretation. The ‘initial’ and ‘final time’ slices are intended as such, and the spacetime region they enfold is finite and generally rather small. Independently of the size of the region enveloped by the time slices, a truly cosmological model cannot in general be expected to have a first or last ‘moment’ in time. Although one could in principle identify the initial and final time slices and so create a model with the equivalent of closed timelike curves, such constructions would be an abuse of the 22 For

a more precise formulation, see Huggett and Wüthrich ([18], Chap. 4).

470

C. Wüthrich

theory clearly beyond its intended ambit. Thus, fundamentally, cosmological loop quantum gravity does not permit time travel. Could there be time travel in emergent spacetime, perhaps by means of some more or less artificial construction? Unfortunately, this cannot be conclusively answered, since the emergence of spacetime from states in loop quantum gravity is yet to be fully understood.23 Although the possibility of finely carved emergent spacetimes with closed timelike curves cannot be excluded, it seems as if such spacetimes should not emerge from full-sized cosmological states in loop quantum gravity.

19.4.3 Quantum Gravity as ‘Astrophysics’ There is an alternative to considering a quantum theory of gravity as offering a cosmological theory: it may be deemed, rather, as describing much more local phenomena, such as astrophysical black holes or the very early universe.24 In fact, these are the phenomena where most physicists expect that only a quantum theory of gravity could deliver a satisfactory account, motivating quantum gravity in the first place. Although there are some efforts in this direction (such as the estimation of an entropy bound in [25]), causal set theory lacks well-developed astrophysical applications. This is largely owed to the fact that it is to date a classical theory whose transformation to a quantum theory has been but roughly sketched. Apart from the cosmological applications mentioned in Sect. 19.4.2, loop quantum gravity has also seen some research on black holes (such as the derivation of an expression for the black hole entropy similar to the usual Bekenstein-Hawking formula in [26] or studies of black hole singularity resolution, e.g., in [8]). As far as I can tell, there is no indication of the possibility of time travel in any of these applications. But there remains another option. Perhaps quantum gravity will ever only be concerned with local phenomena, offering a fundamental description of the finest threads of what is spacetime macroscopically, while never amounting to a theory of global structure. If so, a quantum theory of gravity should not be considered cosmological. Instead, the global structure would emerge from patching together smaller pieces of fundamental quantum gravitational structures to cosmological totalities following principles or laws distinct from those asserted in quantum gravity. In fact, in GR itself, we cannot infer from a locally causally well behaved spacetime that it contains no closed timelike curves and so is globally well behaved. There exist pairs of locally isometric spacetimes such that one of them contains closed timelike curves while the other does not. For instance, Minkowski spacetime R4 , η and a slice of 23 See

[39] for a more detailed sketch of the current state of the art.

24 The latter is of course not really a ‘local’ phenomenon as it concerns the early stages of the whole

cosmos; however, since the description is really of a very small universe during the first few ‘Planck times’, the description would be only of what is really a very small part of spacetime. This is indeed the remit of ‘quantum cosmology’, which thus becomes an ‘astrophysical’ theory under the present use of the term.

19 Time Travelling in Emergent Spacetime

471

Fig. 19.1 Two locally isometric spacetimes, only one of which contains closed timelike curves

Minkowksi spacetime rolled up along the timelike direction are locally isometric and so physically indistinguishable, as is illustrated in Fig. 19.1. Whether or not time travel remains possible in those constructs thus depends on the nature of these laws governing the global structure. If they are as permissive as those in GR (or indeed are those of GR), then the resulting global structure will admit (whatever corresponds to) closed timelike curves and time travel in this sense is possible. Of course, these laws may also be more restrictive and preclude the possibility of time travel. For now, the question remains wide open.

19.5 Conclusions One may hope, with [1], that a theory more fundamental than GR would deliver insight into the physical mechanism (such as rotation or ‘antirotation’) behind ‘acausalities’ arising in GR such as the presence of closed timelike curves in some relativistic spacetimes. This hope may be disappointed, even though a more fundamental theory may well admit structures amounting to closed timelike curves and thus permit time travel. As the deliberations in this article show, this clearly remains a live option at the present stage of knowledge. Unfortunately, it is also presently impossible to pronounce any even tentatively conclusive lessons concerning the possibility of time travel to be drawn from quantum gravity. Any more definite insight must await a fuller development of the field. In fact, the preliminary analysis above illustrates just how little we currently know regarding the relationship between these more fundamental theories of quantum gravity and GR. While a fuller analysis of the relationship between quantum gravity

472

C. Wüthrich

and GR is beyond the scope of the present article,25 the issue of what can be said about the causal structure of spacetime as a ‘classical’ limit of the underlying theories of quantum gravity in general, and about the emergence of closed timelike curves in particular, exemplifies that much work remains to be done in quantum gravity.26 Formulated more positively, although we have yet to learn whether time travel is possible or not, our study blazes a trail forward: using the possibility of time travel and its attendant consideration of the causal structure of spacetime as our foil, the above analysis has led us into the heart of the nature of quantum gravity, its ambit, and—centrally—its relation to GR. For this reason alone, the question of time travel beyond GR is worth our while. Even as we await more determinate answers. Acknowledgements I am grateful to the editors for their kind invitation and to Hajnal Andréka, Stefano Furlan, Niels Linnemann, István Németi and an anonymous referee for their comments on earlier versions of this paper and for discussions. I am also grateful to Hajnal Andréka and István Németi for their collaboration on earlier projects. But most of all, I am honoured by their friendship.

References 1. Andréka, H., Németi, I., & Wüthrich, C. (2008). A twist in the geometry of rotating black holes: seeking the cause of acausality. General Relativity and Gravitation, 40, 1809–1823. 2. Bojowald, M. (2011). Quantum cosmology: A fundamental description of the universe. Lecture notes in physics New York: Springer. 3. Boyda, E. K., Ganguli, S., Hoˇrava, P., & Varadarajan, U. (2003). Holographic protection of chronology in universes of the Gödel type. Physical Review D, 67, 106003. 4. Breckenridge, J. C., Myers, R. C., Peet, A. W., & Vafa, C. (1997). D-branes and spinning black holes. Physics Letters B, 391, 93–98. 5. Curiel, E. (2017). A primer on energy conditions. In D. Lehmkuhl, G. Schiemann, & E. Scholz (Eds.), Towards a theory of spacetime theories (pp. 43–104), Einstein Studies. New York: Birkhäuser. 6. Earman, J., Smeenk, C., & Wüthrich, C. (2009). Do the laws of physics forbid the operation of time machines? Synthese, 169, 91–124. 7. Gambini, R., & Pullin, J. (2011). A first course in loop quantum gravity. Oxford: Oxford University Press. 8. Gambini, R., & Pullin, J. (2013). Loop quantization of the Schwarzschild black hole. Physical Review Letters, 110, 211301. 9. Gauntlett, J. P., Gutowski, J. B., Hull, C. M., Pakis, S., & Reall, H. S. (2003). All supersymmetric solutions of minimal supergravity in five dimensions. Classical and Quantum Gravity, 20, 4587–4634. 10. Gibbons, G. W., & Herdeiro, C. A. (1999). Supersymmetric rotating black holes and causality violation. Classical and Quantum Gravity, 16, 3619–3652. 11. Gödel, K. (1949). An example of a new type of cosmological solutions of Einstein’s field equations of gravitation. Reviews of Modern Physics, 21, 447–450. 12. Hawking, S. W. (1992). Chronology protection conjecture. Physical Review D, 46, 603–611. 25 [18]

consider the state of the art regarding the relationship between quantum theories of gravity and GR much more fully. 26 I thank the anonymous referee for pressing this conclusion. I agree that this is an important upshot of my discussion.

19 Time Travelling in Emergent Spacetime

473

13. Hawking, S. W., & Ellis, G. F. R. (1973). The large scale structure of space-time. Cambridge: Cambridge University Press. 14. Huggett, N. (2017). Target space = space. Studies in History and Philosophy of Modern Physics, 59, 81–88. 15. Huggett, N., & Callender, C. (2001). Why quantize gravity (or any other field for that matter)? Philosophy of Science, 68, S382–S394. 16. Huggett, N., & Vistarini, T. (2015). Deriving general relativity from string theory. Philosophy of Science, 82, 1163–1174. 17. Huggett, N., & Wüthrich, C. (2018). The (a)temporal emergence of spacetime. Philosophy of Science, 85, 1190–1203. 18. Huggett, N., & Wüthrich, C. (2021). Out of nowhere: The emergence of spacetime in quantum theories of gravity. Oxford: Oxford University Press, forthcoming. 19. Huggett, N., Vistarini, T., & Wüthrich, C. (2013). Time in quantum gravity. In A. Bardon, & H. Dyke (Eds.), A Companion to the philosophy of time (pp. 242–261). Chichester: WileyBlackwell. 20. Kay, B. S., Radzikowski, M. J., & Wald, R. M. (1997). Quantum field theory on spacetimes with a compactly generated Cauchy horizon. Communications in Mathematical Physics, 183, 533–556. 21. Malament, D. B. (1977). The class of continuous timelike curves determines the topology of spacetime. Journal of Mathematical Physics, 18, 1399–1404. 22. Mattingly, J. (2006). Why Eppley and Hannah’s thought experiment fails. Physical Review D, 73, 064025. 23. Polchinski, J. (1998). String theory. Cambridge: Cambridge University Press. 24. Rideout, D., & Sorkin, R. D. (1999). A classical sequential growth dynamics for causal sets. Physical Review D, 61, 024002. 25. Rideout, D., & Zohren, S. (2006). Evidence for an entropy bound from fundamentally discrete gravity. Classical and Quantum Gravity, 23, 6195–6213. 26. Rovelli, C. (1996). Black hole entropy from loop quantum gravity. Physical Review Letters, 77, 3288–3291. 27. Rovelli, C. (2004). Quantum gravity. Cambridge: Cambridge University Press. 28. Rovelli, C. (2008). Loop quantum gravity. Living Reviews in Relativity, 11, 5. http://www. livingreviews.org/lrr-2008-5. 29. Rovelli, C., & Vidotto, F. (2015). Covariant loop quantum gravity: An elementary introduction to quantum gravity and spinfoam theory. Cambridge: Cambridge University Press. 30. Smeenk, C., & Wüthrich, C. (2011). Time travel and time machines. In C. Callender (Ed.), The oxford handbook of philosophy of time (pp. 577–630). Oxford: Oxford University Press. 31. Verch, R. (2012). Local covariance, renormalization ambiguity, and local thermal equilibrium in cosmology. In F. Finster, O. Müller, M. Nardmann, J. Tolksdorf, & E. Zeidler (Eds.), Quantum field theory and gravity: Conceptual and mathematical advances in the search for a unified framework (pp. 229–256). Basel: Birkhäuser. 32. Visser, M. (2003). The quantum physics of chronology protection. In G. W. Gibbons, E. P. S. Shellard, & S. J. Rankin (Eds.), The future of theoretical physics and cosmology: Celebrating Stephen Hawking’s 60th birthday (pp. 161–176). Cambridge: Cambridge University Press. 33. Wald, R. W. (1984). General relativity. Chicago: University of Chicago Press. 34. Wald, R. W. Quantum field theory in curved spacetime and black hole thermodynamics. Chicago: University of Chicago Press. 35. Witten, E. (1996). Reflections on the fate of spacetime. Physics Today, 24–30. 36. Wüthrich, C. (1999). On time machines in Kerr-Newman spacetimes. Master’s thesis, University of Bern. 37. Wüthrich, C. (2005). To quantize or not to quantize: Fact and folklore in quantum gravity. Philosophy of Science, 72, 777–788. 38. Wüthrich, C. (2012). The structure of causal sets. Journal for General Philosophy of Science, 43, 223–241.

474

C. Wüthrich

39. Wüthrich, C. (2017). Raiders of the lost spacetime. In D. Lehmkuhl, G. Schiemann, & E. Scholz (Eds.), Towards a theory of spacetime theories (pp. 297–335), Einstein Studies. New York: Birkhäuser. 40. Wüthrich, C. Quantum gravity from general relativity. In E. Knox, & A. Wilson (Eds.), Companion to the philosophy of physics. Routledge, forthcoming. 41. Zwiebach, B. (2004). A first course in string theory. Cambridge University Press. Christian Wüthrich is Associate Professor of Philosophy at the University of Geneva. He works in philosophy of physics, philosophy of science, and metaphysics. The primary focus of his research has long been the philosophy of quantum gravity, but also includes the philosophy of space and time, time travel, modality and laws of nature, emergence and reduction, as well as general methodological issues arising in fundamental physics. He is the Founder and Director of the Geneva Symmetry group and the Co-Director of the ‘Beyond Spacetime’ research project, funded by NSF, FQXi, John Templeton Foundation, and ACLS (www.beyondspacetime.net).

Appendix A

From Computing to Relativity Theory Through Algebraic Logic: A Joint Scientific Autobiography Hajnal Andréka and István Németi

This is a two-in-one autobiography. If one omits the word “joint” from the title, omits István Németi from the authors, and changes “we” everywhere to “we, with István”, one gets a separate autobiography for Hajnal. Likewise, if one omits the word “joint” from the title, omits Hajnal Andréka from the authors, and changes “we” everywhere to “we, with Hajnal”, one gets a separate autobiography for István. The sub-titles refer to the subjects and stations we find most important in our carrier so far. These are arranged more or less in chronological order, but there are large overlaps. The goals and style of our research were laid down from 1966 till 1970, when István took part in the daring project of automatizing the Hungarian power system. The research directions, both practical (programming-oriented) and theoretical (general logic, universal algebra), suggested by this work were directly followed till about 1976. After this, till about 1986 we pursued three subjects in parallel: nonstandard-time semantics for dynamic logic of programs (continuation of the programming aspects), categorical injectivity logic and partial algebras (more specific logic and algebra), and algebraic logic that we pursue from 1971 till today. In these periods, we tried to focus and get deep in some specific sub-areas, besides keeping up with the general aspects of our research. From about 1991 the emphasis in our research is also on “widening” besides on “deepening”. From about 1991 we tried to integrate and connect algebraic logic to logic, programming, algebra and in science in general as much as we could. From about 1998 we applied algebraic logic to relativity theory, physics and methodology of science, and we also connected computing with relativity theory in the form of relativistic computing.

H. Andréka (B) · I. Németi Alfréd Rényi Institute of Mathematics, Reáltanoda st. 13–15, 1053 Budapest, Hungary e-mail: [email protected] I. Németi e-mail: [email protected] © Springer Nature Switzerland AG 2021 J. Madarász and G. Székely (eds.), Hajnal Andréka and István Németi on Unity of Science, Outstanding Contributions to Logic 19, https://doi.org/10.1007/978-3-030-64187-0

475

476

Appendix A: From Computing to Relativity Theory Through Algebraic Logic …

In what follows, numbered references such as [4] refer to our joint publication list in this volume. Other publications are also referred to, these are given explicitly in the text.

Large-Scale AI Program for the Hungarian Power System (c. 1966–1970) István got an electric engineer’s diploma from Technical University Budapest in 1966. His first job was at Institute for Power System Study (Hungarian short name ˝ was EROTERV for Er˝om˝u- és Hálózattervez˝o Vállalat). The Hungarian electrical network was planned and maintained here, it operated under the Ministry of Heavy Industries. The power system that consisted of different power-lines, power-houses, transformer-stations, was handled by several copybooks, István’s job was to introduce the regularly measured data in these copybooks. This was not a very creative job for an engineer, and we do not know how István’s scientific career would have turned if Géza Bogdánfy would not have been there. Bogdánfy was the head of the group dealing with this book-keeping process. Géza was watching and observing, and after a while, he sat down and wrote a computer program that modeled the power system, and from that time on the data were introduced to this program, instead of the several copybooks. Once one had this program, it could be used for many other purposes, e.g., to simulate the whole Hungarian power system and see what happened if a specific power-line or power-house was destroyed (by bad weather e.g.,) or if a specific new power-line was built. The team participating in creating this program consisted of Géza, István, Attila Bánhegyi, Károly Füle, Annamária Nitsh and Anna Simay. They constantly improved the program, always aiming at the highest quality ˝ and applicability. It was important for them that people at EROTERV and similar companies use the program in their everyday work. They indeed reached this goal. By the end of 1970 the program was rather big, complex, it was in use also in the late 80s. To give a feeling of the program’s complexity, it was inverting matrices of size 500 times 500, while the largest ones other programs could invert at that time were of size 40 times 40, see [13, p. 302]. This was a big success, and seeing what this ˝ program could do, respectable professor Károly Szendy, head of EROTERV, jumped up and down in joy. The program system used heuristic methods, real random generators,1 and e.g., the program that inverted matrices used a repeated decomposition of matrices. At the end of this endeavor, the program was so complex that sometimes István felt as if it behaved like an intelligent being. This matched a previous meeting of István with a miracle of science. When he was drafted in the army after finishing university, he was stationed near big MIG military airplanes. These airplanes were enormous and heavy like buildings, yet from one moment to the other, they flew up and disappeared 1 By

an appropriate software, the program could ask the state of a specific part of the hardware (printer) at the moment of asking. This worked quite well for a random number.

Appendix A: From Computing to Relativity Theory Through Algebraic Logic …

477

in the sky, they were flying so fast. The creation of the large-scale program system with Géza Bogdánfy and seeing the MIG supersonic airplanes gave István a vision of science that we are still pursuing.

Software Department, Theorem Prover, General Logic, Universal Algebra (c. 1970–1976) The experience in writing the program system sparked interest in its creators towards artificial intelligence, system’s theory and mathematical logic for handling semantic aspects of programming.2 After finishing the power system project, István joined a software department of the Ministry of Heavy Industries.3 He wanted to pursue the semantic aspects of programming. This meant, in short, to focus on the question of “what” a program did, and not on “how” the program did what it did. The need for this ˝ came up at EROTERV, where for handling this need István devised a simple formal language for specifying programs, and each sub-program had to be documented in this language before insertion into the system. Missing unambiguous documentations had cost valuable computer time in the implementation.4 Hajnal joined István in making true his vision about science in 1971. She had obtained a thorough mathematical training as member of the legendary first special mathematical class organized for mathematical talents,5 and then at the mathematics programme of Faculty of Science, Eötvös Loránd University. Already at high school, Hajnal was attracted to computers and she read a book about cybernetics that caught her imagination. In the last year of university, she took courses about Turing machines and recursive functions (from János Urbán), about Mealy and Moore automata (from András Ádám) and about Chomsky grammars (from Tamás Legendy). Hajnal was excited about these topics, and she was glad that Miklós Náray was willing to be her degree thesis advisor in this subject. After obtaining a mathematics diploma in 1971, she was happy to join the Software Department of NIMIGÜSZI, where she met István. 2 They

read books and papers such as [Ashby, W: Design for a brain, Chapman Hall Ltd, 1960], [Simon, H.: The architecture of complexity, Proc. Amer. Phil. Soc. 106,6 (1962)]. An intellectual circle was forming around the team along these lines, Tamás Gergely, Bertalan Hajnal, Ferenc Sebestyén, Ajándok E˝ory, Pál Juhász-Nagy were some members. For example, István gave special seminars for biologists at ELTE about mathematical logic. István also was member of the science fiction club (lead by Péter Kuczka), he met there scientists like astronomer Iván Almár, biologist Tibor Gánti. Later cooperation with physicist Gyula Dávid brought up joint intellectual values that were cradled in this circle. 3 It was the Software Department of NIMIGÜSZI (shorthand for the Hungarian Nehézipari Minisztérium Ipargazdasági és Üzemszervezési Intézete), the head was Miklós Náray. Bogdánfy and Füle also joined this department. 4 Keep in mind that the program system was written at the dawn of the computer era when computer time was sparse and precious. 5 Fazekas Mihály High School Budapest, 1962–1966.

478

Appendix A: From Computing to Relativity Theory Through Algebraic Logic …

Our abilities matched and complemented each other’s amazingly well. Hajnal learned English from István by reading Arthur C. Clarke’s science fiction novel 2001 together. István underlined the unknown words and next time of the reading Hajnal had to recall their Hungarian meanings. We also read together Cohn’s Universal Algebra book. We read the definitions and theorems together, and then we closed the book and began independently to prove the theorem. Often, after a while we both gave up, and showed each other’s half-proofs. This often resulted in two different proofs for the same theorem, because we both could finish the other’s half-proof. Then we matched the proofs with the one given in the book, and sometimes this was a third proof. Perhaps, our delight in seeing several different proofs for the same theorem was born here. Inside the software department, we formed a group called Program Languages, the head was István. Members of this group were Hajnal, Kálmán Balogh, László Gyuris, István Herényi, Katalin Lábodi, Péter Szeredi and Judit Szoldán. We participated both in very concrete software developments, and in more theoretical studies connected with programming technology. We often delivered studies written for concrete contracts with the Hungarian Bureau for Planning (Hungarian short name was OT for Országos Tervhivatal). Life in the Software Department was lively and intellectual. Not only programs were written for concrete softwares, but at the same time software technology was devised, new programming languages found in the literature were adopted, adapted, modified. On the theoretical side, regular seminars were held, for example for ALGOL 68, affix grammars, semigroup-theoretical automata. Following up interest in the semantic aspects of programming, we worked on how one could give clear and unambiguous program specifications. Once one had such a specification, verifying whether the program indeed satisfied this specification was the next natural step. This lead to using theorem-prover programs for this checking. A working theorem prover was indeed written in our group.6 We called this program semi-automatic, because it was not very efficient. In order for it to prove a theorem, we often had to break down the theorem into several lemmas and make the prover prove the lemmas first and then prove the theorem from the lemmas. It was our conviction that a working theorem-prover should be somewhere between an automatic theoremprover and a simple proof-checker. We emphasized that this was in harmony with the fact that natural intelligent beings were not working in ivory towers, alone after having been created, but in constant communication with their environments and other intelligent entities, constantly learning, evolving, adapting. We found it evident that an artificial intelligence had to operate along these lines, too.7 In parallel with the above programming works, we were intensively studying mathematical logic, algebra, and set theory. István ordered (that is, made NIMIGÜSZI buy) the monograph Cylindric Algebras Part I by Henkin, L., Monk, J. D. and Tarski, A. as soon as it appeared in 1971, and we began studying it right away. The zeroth 6 It

was written by Kata Lábodi and delivered to OT in 1973, we believe that this was the first working theorem prover in Hungary, see [29]. 7 It is satisfying for us to see nowadays the general use of theorem provers such as Isabelle/HOL and proof-checkers like the Coq proof-assistant system.

Appendix A: From Computing to Relativity Theory Through Algebraic Logic …

479

chapter of the book is in fact an introduction to universal algebra, á la Tarski, and we immensely enjoyed it. We enjoyed not only its content, but we loved its style that was reminiscent to programming. We considered universal algebra to be the theory of complex systems. We thought that an important advantage of universal algebra was that one could use arbitrary and arbitrarily many functions and relations when modeling a complex system, it did not enforce a stricter format to its user. Universal algebra then deals with how to simplify (make a homomorphic image), focus on a part (form a subalgebra), decompose (write an algebra into a subdirect product of simpler algebras), analyze and then reassemble a system. Our view was in contrast with the prevailing view among mathematicians that universal algebra was the crown of classical algebra that was worth studying only after being familiar with all the classical structures such as groups, modules, vector spaces, etc. We thought that universal algebra was especially important for computer science because there the tendency was that unprecedented complex systems arose at no time (the reason being that to build a program one had to use only “paper and pencil” and not costly heavy materials like bricks, concrete). We supported these ideas with concrete examples from programming technology, the list evolved into a rather popular series of surveys about applications of universal algebra, model theory and categories in computer science.8 In this period, we worked mostly together with and under the guidance of Tamás Gergely. We both won 3-year stipends from the Academy for writing a thesis for the candidate’s degree. Istváns official advisor was Tamás Gergely and Hajnal’s was Bálint Dömölki. We were in leave from the Software Department, our only duty was to pass exams and then write each a thesis. This time was very precious for us: we wanted to learn and understand as much as we could. Besides learning, we developed our own ideas about what general logical systems were and we began to publish and attend conferences.9 Via passing several exams at this period, we got into contact with Hungarian logicians and algebraists László Kalmár, Rózsa Péter,10 András Hajnal,11 Ervin Fried, Tamás Schmidt. 8 This

series of surveys is [36, 53, 59, 72, 77, 92, 108]. In a sense, the ideas are in the background of the 2017 book [4], too. 9 A great experience was to give a talk and take part in Banach Semester’73 in Warsaw. We met here Helena Rasiowa, Wim Blok, Larisa Maksimova, we met János Makowsky as member of an excited team in abstract model theory, and we attended a memorable talk by William Lawvere. 10 Rózsa Péter, alias Rózsi néni, knew Hajnal earlier from the special mathematical high-school class, and from university where Hajnal attended courses given by her. Rózsi néni accepted István immediately as Hajnal’s co-worker. She wrote a letter to her friend, Helena Rasiowa, when it turned out that there was no more place for talks at the 1973 Banach Semester, and indeed we gave our joint talk with Tamás Gergely at Rasiowa’s own seminar. Rózsi néni was mentor of Hajnal’s (small) Ph.D. thesis [8], we gave special seminars about mathematical logic under her umbrella at the university, and the three of us had several friendly programs together. 11 András was Hajnal’s teacher at the special high-school mathematical class already. When we showed him a paper we wrote about a simple cylindric algebraic proof of the completeness theorem of first-order logic, he sent it to Donald Monk for opinion. Monk told that a similar proof was lying in Tarski’s drawer, but since it was not published till that time, we could publish ours mentioning Tarski. That is what we did, see [32].

480

Appendix A: From Computing to Relativity Theory Through Algebraic Logic …

The stipend provided a half-year study in some western center in its subject. There were two main centers in the world for building artificial intelligence, one at the Machine Intelligence Department of Edinburgh University and the other at MIT. We had been following the activities at Edinburgh with great admiration. Each year they issued a thick volume with title Machine Intelligence that contained their latest research papers, and we hardly could wait for them to appear. The last one we got and read was Machine Intelligence 4. It was natural that we asked the trip to be to Edinburgh. István began the trip, together with Tamás Gergely, by visiting cybernetician Gordon Pask’s exciting interdisciplinary group in London. Then they visited the Machine Intelligence Department of Edinburgh University, Tamás for some weeks and István for some months. There they met the Edinburgh Prolog programming language. Prolog is a specific logic programming language, and the latter is a specific area of new-wave programming languages (as we called them). The “old-fashioned” programming languages, e.g., ALGOL 68, were “procedural” ones, they specified in detail some computational steps, and the purpose of these steps were not part of the program. In the new-wave programming languages, basically, you specified “what you wanted” (e.g. by the form of a statement about a specific object), and then “how to get such an object” was the result of a concrete proof from a concrete body of knowledge. In short, a “program” was a body of knowledge (a first-order theory, usually), an “input” for this program was a description of a desired object, and the “output” of the computation was how to get such an object (this could be extracted from a proof of the existence of such an object). The Edinburgh researchers were generous to print out the interpreter for the Prolog language, which Tamás Gergely brought home in a suitcase in December 1974. It was given to Kata Lábodi to make it run. Péter Szeredi helped her, and finally he developed the Hungarian Prolog out of this cooperation. István and Tamás saw Freddy II robot assemble a toy car from its scattered parts, and they met many wonderful people, Rod Burstall and Gordon Plotkin among them. Hajnal joined István in Edinburgh at the end of December (for a month). We got friends with Maarten van Emden, who worked with Robert Kowalski on the minimal model and fixpoint semantics for Horn clauses, they both are great pioneers of logic programming. Kowalski moved to London soon after István arrived, and Emden moved to Canada next year. He, and Jos and Eva, provided us with important professional opportunities, in the form of teaching visits, and a warm family in Canada. By the end of the 3-year stipend, it began to dawn on us that we had to choose between the creation of a big artificial intelligence program (an even richer version ˝ of the one made for the power-system at EROTERV), and pursuing a research career in mathematics. Hungary’s resources suggested the second choice. After the 3-year stipends, we both were admitted to the Mathematical Institute of the Hungarian Academy of Sciences (István in 1976 and Hajnal in 1977). We began an intensive research career, we traveled a lot, participated in and organized conferences, had scientific visits at numerous universities, we have an extended correspondence with research fellows at all parts of the world, we have a lively circle of colleagues and students, and we got into wonderful friendships.

Appendix A: From Computing to Relativity Theory Through Algebraic Logic …

481

In the next period of, roughly, 1976 to 1986, we worked on three subjects mostly in parallel: categorical injectivity logic, nonstandard-time semantics for logics of programs, and algebraic logic (cylindric, relation and dynamic algebras).

Categorical Injectivity Logic, Partial Algebras (c. 1976–1983) Partial algebras appear in all parts of mathematics. In particular, the semantics of a program as input-output relation is best treated as a partial function by letting this function be undefined at an input where the program does not stop (i.e., enters into an infinite cycle along its execution). There was a reluctance for using partial algebras among mathematicians, because the usual algebraic notions can be understood for partial algebras in an uncannily numerous ways. For example, what is the natural meaning of an equality of two terms that involve partial functions? One way of thinking is that the equality is true exactly when both sides are defined and they are equal. Another equally natural way is that the equality is true also when both sides are undefined. Similarly, there are many natural notions for subalgebras, homomorphic images. The tendency for treating partial functions was to replace them with their “total versions”. The total version took some pre-agreed value where the original partial function was not defined, otherwise the two functions agreed. However, this move distorted the nature of partial functions. Well, the wealth of choices in partial algebra notions attracted us! Universal algebra deals with total functions (that is, functions defined everywhere) and we were quite familiar by that time with universal algebra. We felt ready for extending universal algebra to partial algebras. Instead of using an ad-hoc direct process, we decided to use category theory as a bridge. For showing an example, let us concentrate on one basic theorem of universal algebra, Birkhoff’s variety theorem. It says that the smallest equationally defined class containing a class K of algebras is the class of all algebras constructed from members of K by the use of direct products, subalgebras, and homomorphic images.12 This theorem involves the notions of equational logic, subalgebras and homomorphic images, the “right” versions of which we were to find out in partial algebra theory. To use category theory as a bridge, first we had to “elevate” these notions to category theory. At the categorical level, we defined the so-called injectivity logic of a category.13 In this logic, the formulas are the morphisms of the category, the models are the objects of the category, and a morphism is “true” or “valid” in an object if it is injective relative to it (that is, each morphism from the domain of the morphism to the object factors through the morphism). Given two classes H , S of morphisms, we 12 In

short, ModEqK = HSPK.

13 Independently of us, this logic was also defined in [Banaschewski, B., Herrlich, H., Subcategories

defined by implications. Houston J. Math. 2 (1976), 149–171].

482

Appendix A: From Computing to Relativity Theory Through Algebraic Logic …

defined the so-called HS-equations as a subclass of morphisms, in such a way that these were abstract categorical versions of the usual total algebraic equations.14 Then we proved a general Birkhoff-type variety theorem for categories. Namely, we proved that in any category (under some conditions) the HS-equational hull of a class K of objects is HSPK.15 If we choose the category to be that of total algebras, and we choose H , S to be the classes of all surjective and one-to-one homomorphisms, respectively, we obtain the usual Birkhoff variety theorem. To obtain a variety theorem for partial algebras, we choose the category to be the category of all partial algebras and we find out what the theorem says. First of all, we have to choose our favorite notions H and S of homomorphisms and subalgebras (arbitrarily in so far as the conditions of the theorem allow), and then see what (firstorder logic) formulas injectivity of the HS-equations correspond to. These formulas will be the partial algebraic equations that correspond to our choice of H , S. By this, we imported the variety theorem from total algebras to partial algebras in such a way that we did not have to do away with plurality of the partial algebraic notions. We did not stop here. We made the direct product part of the theorem to depend on a parameter, too. This parameter was a class F of filters. Then we generalized the injectivity approach to cones of morphisms (a set of morphisms with a common domain) as formulas, we defined the HSF-cones and voilá! we had a Birkhoff-type theorem for HSF in place of HSP. The HSF-cones, the generalized analogues for equations, were defined so that, roughly, H made restrictions on the domain of the cone, S made restrictions on the co-domains of the members of the cone, and F determined how many members the cone could consist of. For example, if we chose F to be the class of all filters, this number was 1, if we chose F to be the class of all ultrafilters, then this number was “finite”, and if we chose F to be the class of all principal ultrafilters, then there was no restriction on the number. In the category of total algebras, the natural choice for H and S was the class of all surjective homomorphisms and all one-to-one homomorphisms. But, they could be chosen to be the class of all isomorphisms, too. This way, all the well-known axiomatizability theorems for equations, equational implications, universal formulas, Horn-formulas, positive formulas, infinitary quasiequations were special cases of this one HSF-theorem.16 A by-product of importing the universal algebraic notions from total algebras to partial algebras through category theory was a powerful unification of the several axiomatizability theorems of total algebras, all looking similar but treated in model theory completely separately with separate proofs. 14 This

means the following. If we let H and S be the classes of all surjective and one-to-one morphisms in the category of total algebras, then to each HS-equational morphism there corresponds a set of equations, and vice versa, so that in each algebra the HS-equation is injective if and only if the corresponding set of equations is true in the algebra. 15 Here, the HS-equational hull of K is the class of all objects in which all HS-equations injective in K are injective, and HSPK is the class of all objects that can be obtained from members of K by category theoretic direct products, H -morphism-images and S-morphism-subalgebras. 16 In symbols, for example, ModQeqK = SP r K, ModU nivK = SU pK, ModInfqeqK = SPK.

Appendix A: From Computing to Relativity Theory Through Algebraic Logic …

483

István’s candidate’s dissertation, submitted in 1976, was about this kind of generalizing the algebraic notions to partial algebras. Already in 1974 we were advisors of two degree-dissertations in this subject, those of Anna Pásztor (title “The concept of variety in the theory of categories”) and of Ildikó Sain (title “Category theoretical investigations in order to generalize identities and quasi-identities”). They both continued working with us after university, they became dear colleagues. Ildikó Sain was part of almost all our later works, too, starting with 1977. Anna Pásztor moved to Germany and became Ana Pasztor, her Ph.D. advisor was Peter Burmeister in the subject of partial algebras and category theory. Ana defended her Ph.D. in 1979 under his supervision. We kept contact with Peter while the preparation of this Ph.D., this was the first Ph.D. dissertation where we cooperated with a foreign supervisor in directing a dissertation.17 Burmeister was an expert in partial algebra theory, his Ph.D. dissertation in 1971 was about partial algebras. It was natural, that after Ana’s Ph.D. dissertation, we continued working with Peter in partial algebra theory. We visited each other, and wrote several joint papers. He summarized the theory of partial algebras, based on our injectivity approach, in his monograph [Burmeister, P.: A model theoretic oriented approach to partial algebras. Introduction to theory and application of partial algebras. Part I. Akademie Verlag, Berlin 1986]. Our injectivity approach reminded René Guitart and Christian Lair of Ehresmann’s sketches and of their own category theoretic work. They elaborated the connections, wrote papers about this connection.18 A visit to them was financed by the Hungarian Academy of Sciences in 1982. It happened that Jacques Riguet, pioneer of cybernetics and theory of binary relations, was returning home from Budapest to Paris. He invited us to join him, and we took his car instead of a train. We went by car from Budapest through Vienna, Stuttgart, Munich to Paris. We have never been in this part of the world before, it was a great experience. In Paris, we lived in Riguet’s home, his kind personality and warm hospitality is a staying memory. Our stay in Paris was rich in every way. First of all, we had long lively discussions with Guitart and Lair comparing our views, we gave talks, we visited several logic and computer science groups, attended a memorable talk by René Thom, we got invited to dinner into the homes of René Guitart and of Iréne Guessarian, we visited science fiction and cartoon book stores. We are grateful to Riguet for bringing this about. Papers we wrote on the injectivity approach are [9, 43, 49, 48, 78, 76, 75, 80, 87], further papers on partial algebras are [37, 44, 62, 64, 107], and further related papers in category theory are [52, 94].

17 Later we had several joint Ph.D. students with foreign colleagues. Maarten Marx was joint Ph.D.

student of Johan van Benthem and István, Szabolcs Mikulás was joint Ph.D. student of Johan and Hajnal, Tarek Sayed was joint Ph.D. student of Mohammed Amer and István. 18 For example, a whole section is about our work in [Guitart, R., Lair, C., Calcul syntaxique des modeles et calcul des formules internes. Diagrammes, Vol 4, Dec. 1980. 106 p.].

484

Appendix A: From Computing to Relativity Theory Through Algebraic Logic …

Nonstandard-Time Semantics for Dynamic Logic of Programs (c. 1978–1986) We continued interest in program verification, but now on a more theoretical side. Floyd’s method for proving correctness of a program consisted of inserting statements into the nodes of the block-diagram of the program. These statements were about the contents of the computer registers at that point in the execution of the program, and then one had to prove implications between these statements. This is a most natural way of proving a program correct, and the most natural first goal was to prove that all correct programs could be proved to be correct by Floyd’s method. The first result we got was a strong negative one about program verification in general: there is no recursive subset of all correct programs that contains a small “wemust-prove” core of the correct programs [50]. This meant that Floyd’s method could not be complete. As usual, we did not stop here. We figured that the reason for this strong incompleteness was that a halting run of a program had to be finite, and this way the (hard) theory of natural numbers filtrated into the theory of program verification. Our suggested solution was to use any model of Peano arithmetic in place of the natural numbers for the time-scale, but then make restrictions on program runs along time-scale. This restriction was that all first-order formulas survived the “jumps” in a nonstandard time-scale, we called such runs “continuous”. All standard runs (that is, runs along the natural numbers) are continuous. Now we were in the position of proving a completeness theorem: Floyd’s method proved exactly those programs correct that were correct not only with respect to standard runs, but also with respect to continuous nonstandard runs. This completeness theorem gave us a method for proving if Floyd’s method did not prove a correct program to be correct. All we had to do was to exhibit a continuous run of the program with respect to which the program was not correct. Indeed, we exhibited many correct programs that were not Floydprovable. We aimed at finding as simple such programs as we could, and to our great joy, we could use ultraproducts, our favorite tool, to show about a nonstandard run that it was continuous.19 We found it intriguing that the “non-existent”, “theoretical”, infinite runs provided practical information on concrete programs. In Floyd’s program verification method, reference to time along which the program was running appeared in no way. There were more sophisticated methods, in which time appeared in the form of time-modalities such as First, Next, Always (in Burstall’s intermittent assertions method) and Always-in-the-future additionally in Pnueli’s method. If one defines the language properly, and one requires in the definition of a continuous run survival of the additional formulas, too, then one can prove similar completeness theorems for Burstall’s and Pnueli’s program verification methods. We went on, and we made dependence on time explicit by introducing a threesorted language one sort of which was the sort of time, another sort was for program 19 We worked hard to find the simplest examples and their simplest proofs for non-Floyd-provability.

When we proudly presented such in a talk in Dana Scott’s seminar at Pittsburgh in 1984, his remark was that he favored more sophisticated tools in mathematics.

Appendix A: From Computing to Relativity Theory Through Algebraic Logic …

485

runs, and the third sort was the sort for the data of the programs. In Floyd’s method, all formulas are about this third sort. In the three-sorted language we can “talk” about processes (programs, actions) living in time explicitly, we can express partial and total correctness as well as many other properties of programs like concurrency, nondeterminism, fairness. This is an ordinary first-order logic, in which we made explicit all the features relevant to program semantics. Let us call it NDL for “Nonstandardtime dynamic logic”.20 Everything we did about Floyd’s, Burstall’s and Pnueli’s method can be done in this explicit framework. NDL was used for characterizing the “information contents” of distinguished well-known program verification methods, for comparing powers of program verification methods as well as for generating new ones. We began elaborating the idea of NDL together with Tamás Gergely, he later pursued this line mostly with László Úry. They wrote an excellent book on NDL [Gergely, T., Úry, L., First-order programming theories, Springer, 1991]. We continued work on this subject mostly with László Csirmaz, Ana Pasztor, and Ildikó Sain, they contributed many interesting results. Both Csirmaz and Sain wrote their candidate’s theses on this subject, one of Pasztor’s papers on NDL is [Pasztor, A., Recursive programs and denotational semantics in absolute logics of programs, Theoretical Computer Science 70 (1990), 127–150].21 The Madrid group Ana Gil-Luezas, Theresa Hortalá-Gonzáles and Mario Rodriguez-Artalejo took up this subject, we had many enjoyable discussions with them. Other colleagues, too, got interested and wrote papers on nonstandard semantics for DL, for example Martin Abadi, Robert Cartwright, Petr Hájek, David Harel, Daniel Leivant, Albert Meyer, Michael Richter, Manfred Szabo, Marek Suchenek, Andrzej Szalas. Chinese colleagues Zhaowei Xu, Yuefei Sui, Wenhui Zhang rediscovered nonstandard-time semantics for DL in 2015, they continue research in this line. We presented our results on several conferences. The most memorable for us were, perhaps, Poznan 1980 (Logics of Programs and their Applications, organized by Andrzej Salwicki), Yorktown Heights 1981 (Logics of Programs Workshop, organized by Dexter Kozen), Szeged 1981 (Fundamentals of Computation Theory, organized by Ferenc Gécseg), Leeds 1988 (Many-sorted logic and its application in computer science, organized by Karl Meinke and John Tucker). We want to write a few words about our New York trip. At the airport upon arrival Dexter Kozen met us, we were taken to Jim Thatcher’s house where several other participants of the conference were staying, too. The house was one built from wood, in the midst of uninhabited woods surrounding Yorktown Heights, a creek was flowing below it. Jim Thatcher explained that he was keeping the entrance unlocked so that if a thief wanted to take something, he did not have to make damage by forcing his way in. We were given a beautiful bedroom, one wall was almost entirely made of glass through which one could see the woods. It had a separate bathroom, 20 As

the reader may guess already, in our case the qualifier “nonstandard” is a positive one. paper explains connection of nonstandard-time semantics with the set theoretic notion of absoluteness, [231] writes more on this. In short, standard DL is not a KPU-absolute logic, NDL can be viewed to be its KPU-absolute version. 21 This

486

Appendix A: From Computing to Relativity Theory Through Algebraic Logic …

with large soft towels, everything was comfortable and beautiful. But at night we were cold, just one thin blanket on the bed. We hardly could believe that in such a luxurious place there were no extra blankets. But we did not find any. Next day, when we asked for extra blankets, instead of getting them, we were simply explained how to use an electric blanket. We had breakfast with others and then retreated for a while. When we emerged, we found nobody but a letter saying that they left for the workshop in Yorktown Heights (it began in the afternoon), but there was a car around and a map, we should take the car to follow them. We were there alone in the midst of the woods, with a car and a map, but we did not drive! Thank God, we found out that John Tucker who had had quite an adventurous trip to America the day before was also stranded with us in the woods. We do not remember how, but eventually an arrangement was made and Jerzy Tiuryn took us to the workshop.22 We met many people at this workshop, some for the first time, for example Krzystof Apt, Peter van Emde Boas, Erwin Engeler, David Harel, Marek Karpinski, Dexter Kozen, Albert Meyer, Grazina Mirkowska, Rohit Parikh, Vaughan Pratt, Andrzej Salwicki, Jim Thatcher, Jerzy Tiuryn, John Tucker. The atmosphere at the workshop, and that of the whole trip, was fantastic. People were generous, friendly, open, sincere, we felt we were accepted and appreciated. The atmosphere at the workshop was lively, intellectual, exuberant people vibrating with intellectual thoughts. That is an atmosphere in which we feel at home, in which we are happy. We found many of these properties characteristic in our later trips to America, too. It was easy for us to make lasting friendships that make our lives rich. Papers we published on nonstandard-time dynamic logic of programs are [46, 54, 51, 50, 74, 84, 82, 88, 89, 111, 123], related papers are [155, 231].

Algebraic Logic, Tarski’s School, Cylindric and Relation Algebras (1971– ) When István met Boolean algebras as a successful algebraic version of propositional logic, he figured that there should be a similar class of algebras for predicate logic. Indeed, he found cylindric algebras in the literature, and we began to study them as soon as we had the Henkin-Monk-Tarski 1971 monograph. We read the book from beginning till end, every passage, footnote. At a 1973 Banach Semester we already gave talks related to cylindric algebras, and we were thrilled to meet James Donald Monk at the 1978 Banach semester Universal Algebra and its Applications. The first thing we asked him was about the second part of the monograph that was promised in the book. Well, Monk was not very definite about the appearance of Part II, but he was open to our questions in the subject, and we began a cooperation that was perhaps the most important thing in our careers, it gave our scientific careers an 22 That

was an adventurous trip, too! We had our first car accident in the USA on the way to the workshop. The second of the two car-accidents in our lives was in Nashville on the way from the airport to Bjarni Jónsson’s home, in 1988.

Appendix A: From Computing to Relativity Theory Through Algebraic Logic …

487

upward inclination, in every way. Henkin, Monk, and Tarski had a draft of a paper about cylindric algebras that they were going to publish, Monk began to write a more detailed version of that paper, and we were happy to second-read the manuscript. On the way, we answered their open questions, improved theorems, constructed counterexamples. This lead to a parallel paper written by us that became the second part of the Henkin-Monk-Tarski-Andréka-Németi Springer Lecture Notes book [2]. Working on this book together with Monk was exciting, heaven23 for us. Perhaps this Springer book inspired Monk to begin to write Part II of the monograph. We continued the well-working cooperation of our reading the drafts of the chapters as they were in the make. Our contributions are acknowledged throughout Part II. The 1978 Banach Semester on universal algebra and its applications was memorable to us not only for meeting Donald Monk and starting our collaboration with him. Here was that George Grätzer told us that he thought we should prove ourselves by solving other people’s problems and answering their questions, not just proving theorems of our own. We took his advice to heart. We began by solving a problem in his 1968 Universal Algebra book. This asked if Birkhoff’s variety theorem implied the axiom of choice or not (it did not [57, 66]).24 We went on and solved many open problems published by other mathematicians. The most difficult to crack was the one in Ralph McKenzie’s 1966 dissertation (published in 1970): is there an integral relation set algebra that has no permutational representation? (Yes, there is [126].) The oldest one is perhaps the problem in Jónsson’s 1959 paper on modular lattices: is there a weakly representable but not representable relation algebra? (Answer: there are many [132].) We solved several problems from the 1971 and 1985 monographs of Henkin, Monk, and Tarski, solved problems from the Tarski-Givant 1987 book [Tarski, A., Givant, S., A formalization of set theory without variables. AMS Colloquium Publications, 1987], among others (e.g., [101, 157, 163, 237, 240, 245]). In 1982, after the appearance of our joint Springer book, Leon Henkin was touring Yugoslavia, giving talks about cylindric algebras at several universities. Monk told Henkin that this was a good opportunity for meeting us. But Leon was reluctant to come to Budapest, so we met in Szeged for one day, Szeged was near the border of Hungary and Yugoslavia. That was an exciting meeting! Leon explained his method twisting, a method for creating non-representable cylindric algebras from 23 Talking about heaven, in 1980 Hajnal and István were visiting Waterloo University in Canada. Our host, Maarten van Emden was so generous to invite Monk and arrange for all three of us to stay for some days in Walhalla Inn, Waterloo. This way we could work more effectively on the manuscript. That was the period when we were making all kinds of constructions by the use of the ultrafilter-choice functions pair, their version for ultrafilters for infinitary relations. Monk did not mind our slipping the new constructions below his door at nights. 24 We went on and showed that not only the axiom of choice, but the axiom of foundation also has impact on truth of theorems of universal algebra. We flirted with set theory at another occassion, too: we showed that the solution of the so-called finitization problem of algebraic logic depends on set theory. Roughly, while it is not solvable in Zermelo-Fraenkel set theory with the axiom of choice, the finitization problem has positive solutions in appropriate non-well-founded set theories. Papers we wrote about the influence of the set theoretical background we work in are [57, 66, 134, 161, 165, 179].

488

Appendix A: From Computing to Relativity Theory Through Algebraic Logic …

representable ones. He crossed his two fingers for illustration, and that was an unforgettable explanation. Leon was surprised to see us. He had expected one person, Andréka-Németi, not two,25 and one of them a female at that! Later, we met Leon several times in Berkeley where he lived, once we were staying in his home. He took us–Hajnal, István, and Ildikó Sain–to a nearby merry-go-round, after which his famous cylindric algebraic MGR-equation was named. This equation holds in all set cylindric algebras. He had devised the method twisting in order to show that this equation failed in some (abstract) cylindric algebras.26 It was very enjoyable to listen to Leon as he told stories about small and big things, about how and why Twin Drive near his home was created, and about his children. Getting back to the merry-go-round, his student Diane Resek proved in her 1975 Ph.D. dissertation that the cylindric algebraic equations together with infinitely many of Leon’s Merry-Go-Round equations ensured representability as a cylindricrelativized set algebra. Richard Thompson showed that of these infinitely many MGR-equations only four suffice in the theorem. These theorems are very important but their proofs were about 200 pages each. This is why Monk was sorry to leave them out from Part II. We were curious to see the proofs, so we invited Richard Thompson to enlighten us. He visited for a year in 1986 (on an IREX stipendship). He was happy to talk about cylindric algebras on a weekly full-afternoon seminar, and we learnt a lot from him. The end of his visit approached, and he was still making preparations for telling the proof. We began to worry, but then on the way home after a seminar by Richard, all of a sudden, a short elegant proof emerged from all the preparations in Hajnal’s head.27 This is how finally the theorem got published with a 7-page detailed proof [106]. The class of cylindric-relativized set algebras (Crs) turned out to be rather important in the application of cylindric algebra theory to logic. For example, István proved that the equational theory of Crs is decidable [11], this lead to the decidable guarded fragment of first-order logic [169]. After meeting Henkin in 1982, we looked forward to meeting our third co-author Alfred Tarski whose work and views on science we admired. An opportunity came up when we had a one-year teaching position at Waterloo University Canada in 1983– 1984. Our first visit to Berkeley was in May 1984. We could not meet Tarski, since he died at the end of the previous year, but we were received in a very warm way. We were given his room (725 Evans Hall) at the university as our office on the seventh floor. The room had a magnificent view on the bay, but it was cold. We prepared our talk on a rare sunny spot on a terrace some levels below, facing the hills. There were many questions after the talk. A person was asking rather concrete questions about the Hamate semigroup, and we had to admit that we never heard of this semigroup. It turned out that this person was Richard Thompson, and the Hamate semigroup 25 This was dual to our previously thinking that Kirby Baker working in universal algebra meant two persons, Kirby and Baker. 26 Leon did not stop at showing that some cylindric algebras (CAs) were not representable, not even as relativized ones. He conjectured that all cylindric algebras were obtainable from representable ones by his method twisting (together with another of his similar method, dilation). Indeed, our student András Simon proved this for the 3-dimensional case in his 1991 Ph.D. Dissertation. 27 This shows how useful a good foundation can be.

Appendix A: From Computing to Relativity Theory Through Algebraic Logic …

489

referred to the semigroup of substitutions in a cylindric algebra. The name Hamate was his invention to call this semigroup, both to refer to Henkin-Monk-Tarski and to a move in the game go. Julia and Abraham Robinson attended the talk, they said some kind words to us afterwards. Leon and Ginett Henkin gave a party in the evening, and here is where we met Steve Givant for the first time. The three of us talked towards the end of the party, we asked and he talked about quasi-projective relation algebras, the key player of the book28 he was finishing at that time. We offered to read and check the manuscript, and indeed he sent it to us, and we learnt the subject while reading it. This was the beginning of a long collaboration and a lifelong friendship. Another “first” on this visit was that we met William Craig. We got the opportunity of getting to know his sweet personality. We were always very happy to meet him later, in Berkeley, Budapest or Warsaw.29 We also met Diane Resek for the first time during this visit. We met Roger Maddux in person later this year, at the Charleston universal algebra conference. We had exchanged letters, and we looked forward to meeting him. We already met Don Pigozzi30 and Steve Comer in Hungary, and Wim Blok and Bjarni Jónsson in Warsaw. These were all important meetings for us that lead to professional friendships. We have joint papers with almost all of Tarski’s algebraic logic family: scientific children and grandchildren. The 1983-84 trip to America was full of events. It began with a 2-month visit to Montreal: at McGill University we discussed categorical logic with Michael Makkai, and at Concordia University we had discussions on NDL with Fred Szabo. Then, at Waterloo University we taught two courses on the foundation of computer science, and worked with Maarten van Emden and Areski Nait Abdallah on logic programming, we also had discussions with Stanley Burris on his universal algebra book. After courses ended in Canada, we visited universities in the USA, we gave several talks at each: Carnegie-Mellon University Pittsburgh (April 3–10, Dana Scott, Ken Manders), Colorado University at Boulder (April 26–June 20, July 18–August 15, Donald Monk, Jean Larson), University of California at Berkeley (May 3–10), Stanford Research Institute (May 8, Joseph Goguen, José Meseguer, Richard Waldinger, Leslie Lamport), University of California at Los Angeles (May 10–12), The University of Chicago and The University of Illinois at Chicago (June 20–25, Wim Blok, Saunders MacLane, John Baldwin). Then we visited George Grätzer and his group at The University of Manitoba at Winnipeg in Canada (July 3–10), and returned to the USA for the Charleston Universal Algebra Conference organized by Steve Comer (July 10–17). Monk and Henkin finished Part II around this time, it appeared in 1985. We were still thinking a lot about cylindric algebras, an example is the following. The set of valid formulas of first-order logic with equality but no function or relation 28 the

Tarski-Givant 1987 book. one of his visits to Budapest, we wrote a joint paper about partial algebras [107]. He stayed in our Budapest home, and in the evenings he enjoyed dancing to live music in the nearby Déryné Bistro. 30 Don Pigozzi’s collaboration with Tarski on producing Part I of the Henkin-Monk-Tarski monograph on cylindric algebras may have inspired our similar cooperation with Monk. We first met at a universal algebra workshop held in Esztergom Hungary, organized by Ervin Fried, around 1979. 29 In

490

Appendix A: From Computing to Relativity Theory Through Algebraic Logic …

symbols is decidable. This may make one think that the set of equations valid in the minimal cylindric algebras (that is, cylindric algebras generated by the diagonals) is decidable. Indeed, there was a statement to this effect in Part I, the proof was promised to appear in the second volume. In the course of writing it, Don Monk asked if we could reconstruct the proof of this statement. We started doing it, and we were almost there, only one small step was missing. Thinking about this just one small gap, in 1984 on an airport when traveling home from the USA, István got the idea of the proof of the opposite statement: the equational theory of the minimal cylindric algebras is not even recursively enumerable! Later we learnt that Matti Rubin also got this result, we elaborated its logical meaning together with him, as follows. Cylindric algebra equations correspond to formula-schemes of first-order logic. A formula-scheme is something like “ϕ → ∃xϕ where ϕ is a formula”. The logical side of the above algebraic theorem is that, while the set of valid formulas of equality logic is decidable, the set of valid formula-schemes of this logic is not even recursively enumerable! If we allow a binary relation in the language, then the set of valid formulas becomes “harder” (undecidable), but the set of valid formula-schemes becomes “easier” (recursively enumerable). For details, see [101]. We got invited to the 1985 January Boolean Algebra meeting of the Oberwolfach Research Institute in Germany. There was a parallel Set Theory meeting, and we traveled to Oberwolfach together with András Hajnal. This research institute is famous for its excellent meetings where the emphasis is on the exchange of ideas between the participants. Indeed, we had plenty of discussions with the participants of both meetings, and we returned home with a wealth of new ideas, e.g., about Boffa’s non-well-founded set theory. Most important, we met here Matti Rubin. We learnt that, independently of us, he also proved that the equational theory of infinitedimensional minimal cylindric algebras is not decidable, and he also constructed uncountably many subvarieties of infinite-dimensional representable cylindric algebras. Thus, we had an ideal ground for joint research, that we began to pursue happily. He visited us in the summer. At the end of his visit, we traveled together to Szeged to participate on the Colloquium on Ordered Sets. On the train, besides talking about various mathematical ideas, he said he was looking forward to seeing his old friend Ivo Düntsch on the conference. So we, too, looked forward to meeting Ivo. There was an immediate feeling of friendship between us. Next time we met Ivo was in 1988, on the Ames algebraic logic conference where we began to think about McKenzie’s problem. We succeeded in solving it as a joint effort, since then we have had many enjoyable meetings and joint work with Ivo. István submitted his dissertation for the doctoral degree with the Academy in 1986. The two main results were a translation of first-order logic to the equational theory of 3-dimensional cylindric algebras and showing that the equational theory of Crs is decidable.31 Hajnal submitted her dissertation for the same degree in 1991, it

31 The main result of the Tarski-Givant book is translating first-order logic to the equational theory of relation algebras. They ask how much associativity of relation-composition is needed in this result. The answer is in Istvan’s dissertation: it can be weakened till “CA3 ”, but it cannot be weakened till

Appendix A: From Computing to Relativity Theory Through Algebraic Logic …

491

contained results about cylindric and relation algebras, the main subject being how hard the basic operations were definable over each other.32 Our second one-year stay in America was in 1987-88. It began with the conference Algebras, Lattices, and Logic, which was held in the Asilomar Conference Center, California, July 6–12. Together with Ildikó Sain, we arrived at Berkeley on July 4, we stayed with the Henkins, and traveled with Leon and Ginette by car to Asilomar. On the way, Leon was showing us the attractions of the landscape, for example we stopped to see the place of the seals. This Asilomar conference was one of the many conferences to come that was joint event of the universal algebra and the algebraic logic communities. The title of Richard Thompson’s talk was “High deeds in Hungary” (he spent a year before with us in Budapest). Asilomar is on the shore of the Pacific, the ocean is rather cold there. István grew up at lake Balaton,33 water is his element. He could not help taking a swim in the ocean. Alasdair Urquhart joined him, and there were two people in the cold water. Alasdair came out after a while but István stayed, and Richard was anxiously pacing up and down the shore. When István finally came out, all blue, we began to trot home to get him warmed a bit. Richard was running along with a heavy briefcase in his hands. István spent the rest of the evening in the bathtub, in warm water, until finally he thawed. Asilomar itself was quite a cold place with no sunshine at all. We were dying for some warm sunshine, and Peter Ladkin, participant of the conference, got pity of us. He took the two of us on his small plane, and we flew east till we found a sunny stripe of land. There we landed, bathed in the sunshine for a while, and then got back to cool Asilomar. Following the conference, we spent ten days in Berkeley, then we flew from the west coast of America to the east coast: to Miami. During the two weeks we stayed there, we gave several talks in the Computer Science Department of Florida International University, and most of all we spent as much time on the beach as we could. Here we got plenty of sunshine, sizzling hot sand, warm blue Atlantic ocean, and palm trees.34 After visiting the two coasts, we settled in the very middle of America, in Ames Iowa. The purpose of the one-year visit was to work with Roger Maddux, and this was made possible by a two-semester teaching position at Iowa State University. In the first semester, we taught undergraduate and graduate courses in algebra and logic. The university had an excellent library, István spent lots of time in it browsing the books. Here he found, among others, Sullivan’s 1979 book Black Holes. This gave an idea, and in the winter break we posted an announcement of an interdisciplinary course “Infinity and the mind”, offered jointly with Ildikó Sain to graduate students in mathematics, computer science, physics and philosophy. The topics were “Under“W A”. A consequence is that the free 3-dimensional cylindric algebra is not atomic, solution of a problem from Part II of the Henkin-Monk-Tarski monograph. 32 The results also lead to answering several open problems stated earlier in the literature by Craig, Jónsson, and Monk. 33 More precisely, he spent the summers–most important part of the year–at the lake with his grandparents. 34 The ocean reminded István of how he perceived lake Balaton when he was a small child. Now that he grew up, he needed something bigger than a lake to have the same sensation.

492

Appendix A: From Computing to Relativity Theory Through Algebraic Logic …

standing and resolution of logical paradoxes recurring in logic itself, black holes, relativity, time travel, infinities, artificial intelligence, foundation of mathematics. Connections between mathematical and physical infinities”. There was a huge interest and the course was a big success. We began with the elements of computation theory, arriving at the Church Thesis. When István wanted to argue for the plausibility of this thesis, all of a sudden the idea of relativistic computing occurred to him: one can use a black hole to “slow time infinitely”: for an observer approaching the black hole only a finite amount of time may pass while “at home” a computer is computing for an infinity of time. So, we gave as a homework to find a physical device (any gadget the existence of which did not contradict the laws of physics as accepted at that time) which could compute a non-Turing computable task. This was called “the hard-core science fiction homework”, and we returned to it, giving more and more clues, till the end of the course.35 At the end of the semester, the head of the Mathematics Department of ISU invited the three of us to his office and thanked for the successful course. We lived in Don Pigozzi’s house. He was away on a sabbatical, and the house was ideal for the three of us with Ildikó. It was spacious and had a garden. Here was that Hajnal saw the Milky Way first in her life, shining across the night sky with its branches clearly visible.36 Parallel to teaching, we participated on the algebra seminar of the department, and worked with Roger. Among others, we elaborated the method of splitting for relation algebras having the analogous method for cylindric algebras as a pre-image [121]. That turned out to be a rather useful method for constructing non-representable relation algebras. In the winter break, in January 1988, Hajnal and István gave talks at Florida International University, and this time they stayed at Miami Beach. In Ames, it was so cold that Ildikó’s fingers got frozen to the metal mailbox, but we enjoyed hot sun and warm water at the beach. In the spring break, we visited Bjarni Jónsson in Nashville and after the courses ended Ildikó and István visited George Strecker in Manhattan Kansas. We mentioned that there was a series of algebraic logic and universal algebra conferences around this time. The Asilomar conference in 1987 was followed by the Algebraic Logic and Universal Algebra in Computer Science Conference June 1–4 Ames Iowa, Algebraic Logic Colloquium August 8–14 1988 Budapest Hungary, Conference honoring Don Monk May 29–31 1990 Boulder Colorado, Interconnections between model theory and algebraic logic June 9 1990 Berkeley California, Algebraic Logic Meeting June 10–20 1990 Oakland California, The Jónsson Symposium: a Symposium on Algebras, Lattices, and Logic July 2–6 1990 Laugarvatn Iceland, Algebraic Methods in Logic and in Computer Science Sept 15–Dec. 15 1991 Warsaw Poland.

35 When

we came home to Budapest, the idea of this relativistic computing met some hostility among mathematicians. That is why we only returned to this idea when we were working more deeply in relativity theory and collaborated with physicist Gábor Etesi [188]. 36 Ames is a small town built around the university, in the midst of vast corn-fields. There was no light-pollution there.

Appendix A: From Computing to Relativity Theory Through Algebraic Logic …

493

Why did algebraic logic attract us so much? Here are some answers. (1) Its main subject is the structure of concepts of a piece of knowledge (theory). A cylindric algebra is the “content” of a theory, without “form”37 : we can associate a first-order language to this theory once we select a set of notions/concepts to be “basic” (or “observable” or “primary”). This gives a good ground for treating logical interpretations. We have always been more interested in the semantic aspects “what” as opposed to the syntactic one “how”, stemming perhaps from Istváns programming experience. Algebraic logic gives the deepest understanding of the semantic aspects in logic, to our minds. (2) It is a bridge between algebra and logic, and connections attract us. As a bridge, algebraic logic connects the world of “logics” and that of classes of algebras: to any logic it associates a class of algebras (and vice versa). Then it connects logical properties to algebraic ones by stating theorems of the kind: a logic has property L if and only if the corresponding class of algebras has property A. The fact that usually natural logical and natural algebraic properties correspond to each other vindicates, to our minds, the existence of this bridge. For example, interpolation property of a logic corresponds to the amalgamation property of a class of algebras. Logicians investigated interpolation property as an important logical property, without ever thinking of algebra, and algebraists investigated the amalgamation property in algebra without knowing any connection to logic. (3) Algebraic logic has things to say about an important aspect of human thinking: abstracting and creating/devising abstract notions. It can model aspects of the methodology of science: the network of scientific theories. (4) It is complex enough to be intellectually entertaining. A theorem or notion in cylindric algebra always has an algebraic, a geometric and a logical side to it, and it is most satisfying to see all three sides at the same time in parallel.38 It shows many aspects, facets of one thing: algebra, geometry, set theory, algorithmic issues, and combinatorics. Papers we wrote about cylindric algebras are [2, 11, 6, 12, 7, 32, 40, 60, 69, 70, 65, 73, 79, 86, 95, 96, 101, 106, 115, 114, 120, 153, 164, 163, 171, 170, 173, 185, 182, 183, 207, 208, 210, 226, 223, 236, 237, 244], papers on relation algebras are [97, 98, 105, 102, 116, 118, 119, 125, 121, 120, 12, 126, 133, 135, 141, 132, 148, 146, 142, 157, 3, 164, 168, 163, 173, 187, 216, 215, 220, 4, 238, 239, 241, 243, 245, 246]. Dynamic and Kleene algebras are algebraic forms of dynamic logic, papers we wrote on dynamic and Kleene algebras are [60, 71, 81, 215, 224].

37 Using

Johan van Benthem’s words, it is content without wrapping. example, a theorem in algebraic logic most often has a more algebraic presentation, and a more logic-oriented one. To search for extreme two versions satisfies our delight in seeing more than one proofs for the same theorem. 38 For

494

Appendix A: From Computing to Relativity Theory Through Algebraic Logic …

Logic Graduate School, the Amsterdam–Budapest–London Triangle (c. 1991–1998) A colorful, intense and happy period began for us after we came home from the USA in July 1988. We felt we understood algebraic logic, and we wanted to integrate this beautiful part more into science, we wanted more people to know about it. We opened to new people, new directions, and new activities. Right after we came home, there was the Algebraic Logic Colloquium in Budapest that we organized with much help from Miklós Ferenczi and György Serény. The plenary talks were given by Wim Blok, Steve Comer, William Craig, Steve Givant, George McNulty, Don Pigozzi, Boris Plotkin, Boris Schein and Richard Thompson. In the proceedings of the conference, that we edited together with Monk, we tried to involve people outside of Tarski’s school, too. It was at this time when our fruitful and lasting relationship with the Symbolic Logic Department of Faculty of Humanities, Eötvös Loránd University began. This department was founded by Imre Ruzsa, a strong mathematician working in philosophical logic, who wanted to stress connection with philosophy. People from the Symbolic Logic Department we met were Anna Madarászné, András Máté, László Pólos, Tibor Szécsényi. We also began cooperations with the History and Methodology of Science Department of Faculty of Science of the same university, led by George Kampis. We met exciting people here, too, besides George Kampis whom we knew already: Miklós Rédei, László Ropolyi, László E. Szabó, and Péter Szegedy. System theory, cybernetics and holistic views on human culture were in the air in this circle of people. New winds were blowing in Hungary as the soviet empire was disintegrating and the borders were opened toward western countries. After a talk given by us in the Symbolic Logic Department, László Pólos approached us wanting to know if we would be willing to join the work and organization of a new independent university, Corvin University. The idea was attractive since teaching and having students was always part of our lives. Well, organizing a new university, finding money, people, and most of all agreement between people were not easy, and finally the idea died after several years. But in the midst of the organizational efforts, we already run our Logic Department of would-be Corvin University, the Logic Graduate School (LGS).39 Besides Hungarian students, we had foreign ones mostly from Amsterdam, but we had students from Berkeley, Egypt, Israel and Slovakia, too. There were courses given in English by several colleagues, for example in 1995 one course was given to us by Lajos Soukup about the role of models of set theory in algebraic logic. We had an “eastern style of teaching”, the teachers and students of LGS formed a close community, we were like a big family, we spent much time together also outside of classes, we made each other’s lives richer. Some students from this period are Viktor Gyuris, Ben Hansen, Eva Hoogland, Ági Kurucz, Judit Madarász, Maarten Marx, Szabolcs Mikulás, Péter Rebrus, Gábor Sági, Tarek Sayed-Ahmed, András Simon. 39 This

was a mentality we learnt from Géza Bogdánfy: the first step in an enterprise is that you make yourself useful.

Appendix A: From Computing to Relativity Theory Through Algebraic Logic …

495

Besides continuing our good relations with American colleagues and friends, we began to have important European relations, too. Most of all, we met Johan van Benthem and his Amsterdam school, and we met Dov Gabbay and his London school. These two schools were visiting each other regularly to exchange ideas and results. It was our honor that they accepted us, our algebraic logic group, into this cooperation, and so formed the Amsterdam-London-Budapest triangle. The Amsterdam and London groups were strong in modal logic, temporal logic, and traditional model theory of first-order logic, we brought algebraic logic into this triangle. Modal logic and algebraic logic had been developing separately, largely ignoring each others results and ideas that were often the same in two wrappings. Their meeting was beneficial to both of them. For example, Johan van Benthem in his talk at the 1991 algebraic logic Banach Semester posed several open problems from modal logic that were solved within few days on the spot by using methods from algebraic logic. One of them concerned completeness of the Lambek Calculus for a relational semantics (see [131]). Robin Hirsch and Ian Hodkinson from the London group brought new ideas and new techniques from game theory, they enriched algebraic logic with new directions and many deep theorems. A completely new view of cylindric and relation algebras is contained in their book [Hirsch, R., Hodkinson, I., Relation algebras by games. North-Holland, Amdterdam, 2002]. The influence of algebraic logic on modal logic can be seen in the book [Gabbay, D.M., Kurucz, A., Wolter, F., Zakharyaschev, M., Many-dimensional modal logics: theory and applications. North-Holland, Amsterdam, 2003]. Arrow logic invented by Johan van Benthem and Yde Venema, a logic of transitions, is a kind of unification of modal logic and algebraic logic. Perhaps the most spectacular result of this cultural meeting is the guarded fragment GF of first-order logic [145, 169]. This is a large subset of first-order logic, with decidable satisfiability problem. The finite-variable hierarchy works well for this fragment, and it is expressive in the sense that many important “derivatives” of guarded formulas can be expressed with a guarded formula. For example, if a kvariable GF formula is preserved by taking submodels, then it is equivalent to a k-variable guarded formula in which only universal quantifiers occur. The GF can be extended with features that are important in computer science, e.g., fixed-point sentences can be added to GF without destroying decidability. The GF is quite popular among computer scientists.

Relativity Theory, Relativistic Computing, Methodology of Science (c. 1998– ) In the evenings, István read science fiction books for relaxing his mind before sleeping. In one evening, he read that relativity theory stemmed from the fact that light was traveling with the same speed for every observer. He was wondering whether we could construct a world, any, in which this was true. By this time, we were quite

496

Appendix A: From Computing to Relativity Theory Through Algebraic Logic …

skillful in constructing models in logic. We switched off the light, and in the dark we began the play of creating worlds. We just wanted to see whether we can imagine an arbitrary model in which light was traveling with the same speed for anyone, we did not care to get a model that was confirmed by physics. In that night, we had the main ideas for this model, and the main ideas for special relativity theory. This was proof for us that logic was useful in understanding the world around us. István did meet relativity theory in his university studies in the form of Minkowski metric, but this did not leave him with a feeling of understanding. The model we created turned out the same that is used in physics, and we were wondering whether one could construct a different model. Well, after formalizing special relativity in first-order logic, we could derive from the Light axiom together with some auxiliary axioms, that are usually assumed in physics tacitly, the model. This means that this is the only way of creating a world for special relativity. Thus, creating the same model was no coincidence. Besides continuing previous work on algebraic logic, we turned to exploring relativity theory by using first-order logic (FOL). In our previous work, we looked at logic as a model of human cognition.40 In our relativity work, we use first-order logic as a tool in this cognition, a tool that helps our cognition. For example, one can derive, informally, from the Light axiom that the clocks of a traveling observer tick slower than ours. Yet, one cannot derive this formally in FOL if one forgets about the Symmetry axiom. After noticing this, we realize: of course!, when we say “slower” it does matter what time-units the observers use for measuring time. One could measure time in seconds, the other in years. So, for comparing their time, we do need to assume that they use the same units for measuring time, and the Symmetry axiom ensures this. In an axiomatic approach, one is forced to use concrete formulations of statements when proving them. Often, a statement has more than one formulations each of which expresses the same intuitive content, yet these formulations may not be provably equivalent.41 In an informal argument, we may think of one formulation at one part of the argument, and we may think of another, non-equivalent, formulation at another part of the argument. This way, it is easy to arrive at false conclusions. We believe that this necessity of using concrete formulations in a formal axiomatic approach is a huge help for the intellect, maybe the most important benefit. Thus, a first-order logic approach forces one to be very explicit in formulating the statements, using concrete definitions and formulating each axiom, important ones as well as “book-keeping ones” that one often uses tacitly in informal arguments. Once one has a theory put in this form, it can be used for many purposes. One avail of using such formalized theories is in teaching. Another one is in moving against compartmentalization of science. When Vienna Circle wanted to formalize science in 40 We

explain this view in [139]. example, one may express that two stationary observers/coordinate-systems (who agree in the event happened at their origin, i.e., at point (0, 0, 0, 0)) use the same time-units by saying that they “see” the same event at point (1, 0, 0, 0) their time-unit as well. Another formulation is to say that they “see” the same events at each point of their time-axes. The two formulations are not equivalent, and have different consequences. 41 For

Appendix A: From Computing to Relativity Theory Through Algebraic Logic …

497

this way, first-order logic was not available yet. In fact, we believe that mathematical logic developed thanks to the efforts of the Vienna Circle, in great part. When one wants to formalize a larger chunk of science or knowledge, it is important to introduce structure to this formalization. Breaking up a big axiom system to many small ones and indicating their interconnections with interpretations between them is one natural way of structuring a complex theory.42 Using small and well-understood theories is important in foundational thinking, too, see [Friedman, H., On foundational thinking 1. FOM (Foundations of Mathematics) Posting, Archives www.cs.nyu.edu, January 20, 2004]. In short, in an axiomatization, one uses a network of theories in place of just one huge theory. The individual theories in such a network may have different statuses: recording facts about the real world, explaining the “meaning” of a definition, or theorizing about what would-have-been if we found different facts. We believe that Hilbert’s 6th problem about formalizing physics can be solved in this complex manner. Further, our faith is that one can make steps toward the original dream of the Vienna Circle about unity of science via using this kind of axiomatic approach. We began doing this for relativity theory, the first steps are available in [191]. We are now testing usefulness of algebraic logic by applying its methods and theorems in relativity theory. When we were already working deep in algebraic logic, Leon Henkin suggested us to apply our knowledge to some real-life scientific theory, e.g., describe its concept-algebra. At that time, this task seemed to us futuristic. But now, we are doing this with Judit Madarász and Gergely Székely. Donald Monk also suggests this task in [Monk, J.D., An introduction to cylindric set algebras, IGPL, 2000, p. 455]. The concept-algebra of a theory or model is the structure of (first-order logic) definable relations in it. The structure is given by the logical connectives. Thus, a concept algebra is a Boolean algebra together with some additional operators. We began to investigate the concept algebra of special relativity. We completely described the structure of unary and binary definable relations, these are finite. The structure of ternary definable relations is already infinite, but it is atomic. With some differences, the same is true for the concept algebra of classical Newtonian spacetime. However, there are no homomorphisms between the two algebras. For details, see [247]. We are also working on a connection between relativity theory and our original subject: computers, computability. This is elaborating the idea that István had in Ames 1988, that is, using relativistic spacetime to devise a “computer” that can compute non-Turing-computable tasks. The same idea was discovered about the same time independently by other researchers, too, and relativistic computation became one branch of unconventional computation.

42 [Burstall, R., Goguen, J.: Putting theories together to make specifications. In: Proc. IJCAI’77 (Pro-

ceedings of the 5th International Joint Conference on Artificial Intelligence, Vol 2, pp. 1045–1058] is about using this method in computation, while [Konev, B., Lutz, C., Ponomaryov, D., Wolter, F.: Decomposing description logic ontologies. In: Proceedings of 12th Conf. on the Principles of Knowledge Representation and Reasoning, Association for the Advancement of Artificial Intelligence, 2010. pp. 236–246] is about the need of using this method in real-life computer applications.

498

Appendix A: From Computing to Relativity Theory Through Algebraic Logic …

Judit Madarász, Péter Németi and Gergely Székely joined the relativity theoretical research at various points. They are now furthering this project with enthusiasm and energy, they contribute new ideas, directions, and interesting new results. Our papers in relativistic computing are [188, 197, 200, 201, 202, 209, 211, 212, 219, 242]. Our papers about relativity theory in general are [175, 192, 194, 195, 199, 198, 203, 204, 206, 205, 211, 213, 214, 217, 218, 221, 225, 229, 228, 230].

Appendix B

Joint Annotated Bibliography of Hajnal Andréka and István Németi

Számológép was the journal of NIMIGÜSZI, publisher: the director of NIMIGÜSZI, Budapest, editor: Péter Ihrig. It published mostly papers in Hungarian in connection with programming, computers. CL&CL was an international journal (ISBN 963 311 039 4) published in Budapest. Its full name was Computational Linguistics and Computer Languages.

Books [1] Generalization of the concept of variety and quasi-variety to partial algebras through category theory. Dissertationes Mathematicae (Rozprawy Math.) No. 204. PWN - Polish Scientific Publishers, Warsaw, 1983. 51 p. Andréka, H. and Németi, I. [2] Cylindric Set Algebras. Lecture Notes in Mathematics Vol 883, Springer-Verlag, Berlin, 1981. vi+323 p. Henkin, L., Monk, J. D., Tarski, Andréka, H. and Németi, I. [3] Decision problems for equational theories of relation algebras. Memoirs of Amer. Math. Soc. Vol. 126, No. 604, American Mathematical Society, Providence, Rhode Island, 1997. xiv+126 p. Andréka, H., Givant, S. and Németi, I. [4] Simple Relation Algebras. Springer International Publishing AG, 2017. xxiv+622 p. Givant, S. and Andréka, H. Each algebra is a subdirect product of subdirectly irreducible algebras, and the standard method of studying an algebra is by decomposing it to its subdirectly irreducible factors and then reassemble the algebra from its factors. Subdirectly irreducible relation algebras are the same as simple ones (simple means having no nontrivial congruences). In this book we analyze simple relation algebras by cutting them into pieces along an arbitrary equivalence relation, and also by constructing simple relation algebras from any others.

© Springer Nature Switzerland AG 2021 J. Madarász and G. Székely (eds.), Hajnal Andréka and István Németi on Unity of Science, Outstanding Contributions to Logic 19, https://doi.org/10.1007/978-3-030-64187-0

499

500

Appendix B: Joint Annotated Bibliography of Hajnal Andréka and István Németi

[5] Universal Algebraic Logic. (Dedicated to the Unity of Science). Birkhauser, in preparation. Andréka, H., Gyenis, Z., Németi, I. and Sain, I.

Books Edited [6] Algebraic Logic. Colloq. Math. Soc. J. Bolyai Vol. 54, North-Holland, Amsterdam, 1991. vi+746 p. Editors: Andréka, H., Monk, J. D. and Németi, I. [7] Cylindric-like algebras and algebraic logic. Bolyai Society Mathematical Studies Vol. 22, Springer Verlag, Berlin, 2012. 478 p. Editors: Andréka, H., Ferenczi, M. and Németi, I.

Dissertations [8] Algebraic investigation of first order logic . (In Hungarian) Doctoral Dissertation with Eötvös Loránd University, Budapest, 1973. 162 p. Andréka, H. [9] Extending the universal algebraic notions of variety and related ones to partial algebras using abstract model theory and category theory. (In Hungarian) Candidate’s Dissertation with the Hungarian Academy of Sciences, Budapest, 1976. 171 p. Németi, I. A great portion of this is published as [87]. This was the starting point of later investigations in partial algebra theory and the injectivity approach to category theoretic logic.

[10] Universal algebraic investigations in algebraic logic . (In Hungarian) Dissertation for Candidate’s degree with the Hungarian Academy of Sciences, Budapest, 1977. 199 p. Andréka, H. [11] Free algebras and decidability in algebraic logic. (In Hungarian) Doctoral Dissertation with the Hungarian Academy of Sciences, Budapest, 1986. xviii+169 p. Németi, I. Parts of this are published as [125, 153, 160, 223].

[12] Complexity of equations valid in algebras of relations. Doctoral Dissertation with the Hungarian Academy of Sciences, Budapest, 1991. 103 p. Andréka, H. This is published as [163].

Publications (Articles, Book Chapters, Other) 1970 – 1974 [13] Hierarchic partition of large scale systems and its application for power system study. Acta Technica 71 (1971), pp. 285–303. Bogdánfy, G. and Németi, I. [14] Debugging large and at the same time complex systems of programs. (In Hungarian) Számológép 71/3 (1971), 59–69. Németi, I.

Appendix B: Joint Annotated Bibliography of Hajnal Andréka and István Németi

501

[15] Computer program performing hierarchic decomposition. (In Hungarian) Számológép 72/1 (1972), 18pp Németi, I. [16] Pattern recognition program for the computer EMG 830-20. (In Hungarian) Számológép 72/2 (1972), 85–93. Németi, I. and Baksza, L. [17] Software foundations for computer simulation of biological systems. IVth International Biophysics Congress Moscow, 1972. EXXIIa2/9, EXXIIa2/10. Eöry, A. and Németi, I. [18] Logical foundations for the formalization and application of general system theory. (In Hungarian) In: System Theory Research (Rendszerkutatás). Publisher for Economics and Law (Közgazdasági és Jogi Könyvkiadó), Budapest 1973. pp. 307–357. Gergely, T. and Németi, I. [19] Notes on maximal congruence relations, automata and related topics. Acta Cybernetica Tom 2, Fasc 1 (Szeged 1973), pp. 71–88. Andréka, H., Horváth, S. and Németi, I. [20] On the equivalence of sets definable by satisfaction and ultrafilters. Studia Sci. Math. Hungar. 8 (1973), pp. 463–467. Andréka, H. and Németi, I. [21] A theorem on the semantics of first-order predicate logic. (In Hungarian) Számológép 73/1 (1973), 168–175. Andréka, H. and Németi, I. [22] On the minimal language for defining Boolean algebras. (In Hungarian) Számológép 73/1 (1973), 25–40. Andréka, H., Németi, I. and Paizs, K. [23] Application of cylindric algebras to data structures (news). (In Hungarian) Számológép 73/2 (1973), p. 28. Andréka, H. and Németi, I. [24] An algebraic introduction to logic. (In Hungarian) Számológép Kiskönyvtár 73 (1973), 27–64. Andréka, H., Farkas, Zs. and Németi, I. [25] Program writing and program verifying programs. (In Hungarian) Számológép Kiskönyvtár 73 (1973), 86–99. Andréka, H. and Németi, I. [26] Subalgebra systems of algebras with finite and infinite, regular and singular arities. Annales Univ. Budapest. Eötvös Sec. Math. 17 (1974), pp. 103–118. Andréka, H. and Németi, I. [27] Sufficient and necessary condition for the completeness of a calculus. Zeitschr. Math. Logic u. Grundl. Math. Bd 20 (1974), pp. 433–434. Andréka, H., Gergely, T. and Németi, I. [28] On some questions of n–th order logic. (In Russian) Kibernyetyika 74/5, 74/6 (Kijev 1974), pp. 61–67, 77–83. Andréka, H., Gergely, T. and Németi, I. [29] Plans to improve our semi-automatic programverifier system. (In Hungarian) Research Report of the Institute NIM IGÜSZI - OSZI - KSH, Budapest, August 1974. Andréka, H., Balogh, K., Lábodi, K., Németi, I. and Tóth, P.

502

Appendix B: Joint Annotated Bibliography of Hajnal Andréka and István Németi

1975 – 1979 [30] On the role of general system theory in the cognitive process. In: Progress in Cybernetics and System Research, Vol 2. Hemisphere Publishing Corporation, 1975. pp. 137–150. Gergely, T. and Németi, I. [31] Logical foundations for a general theory of systems. Acta Cybernetica Tom 2, Fasc 3 (Szeged 1975), pp. 261–276. Gergely, T. and Németi, I. [32] A simple, purely algebraic proof of the completeness of some first order logics. Algebra Universalis 5 (1975), pp. 8–15. Andréka, H. and Németi, I. [33] Many–sorted languages and their connection with higher order languages. (In Russian) Kibernyetyika 75,4 (Kijev 1975), pp. 86–92. Andréka, H., Gergely, T. and Németi, I. [34] On some questions of higher order logic. (In Hungarian) Matematikai Lapok 24 (1975), pp. 63–94. Andréka, H., Gergely, T. and Németi, I. [35] Remarks on free products in regular varieties and sink–complemented subalgebras. Studia Sci. Math. Hung. 10 (1975), pp. 23–31, Andréka, H. and Németi, I. [36] Application of universal algebra in computer theory. (In Hungarian) Számológép Kiskönyvtár 75(1975), 145–152. Andréka, H. and Németi, I. [37] On a property of the category of partial algebras. CL&CL Vol XI (1976), pp. 5–10. Németi, I. [38] On a proof of Shelah. Bulletin de l’Academie Polonaise des Sciences (Series Math.) 27 (1976), pp. 1–7. Andréka, H., Dahn, B. I. and Németi, I. [39] On the adequateness of predicate logic programming. AISB European Newsletter Issue 23 (1976), pp. 30–32. Andréka, H. and Németi, I. [40] On universal algebraic construction of logics. Studia Logica 36,1–2 (1977), pp. 9–47. Andréka, H., Gergely, T. and Németi, I. [41] On the congruence lattice of pseudosimple algebras. In: Contributions to Universal Algebra (Proc. Coll. Szeged 1975), Colloq. Math. Soc. J. Bolyai Vol 17, North–Holland, Amsterdam, 1977. pp. 15–20. Andréka, H. and Németi, I. [42] The generalised completeness of Horn predicate logic as a programming language. Acta Cybernetica Tom 4, Fasc 1 (Szeged 1978), pp. 3–10. Andréka, H. and Németi, I. [43] Łos lemma holds in every category. Studia Sci. Math. Hungar. 13 (1978), pp. 361–376. Andréka, H. and Németi, I. [44] From hereditary classes to varieties in abstract model theory and partial algebras. Beiträge zur Algebra und Geometrie 7 (1978), pp. 69–78. Németi, I. [45] Neat reducts of varieties. Studia Sci. Math. Hungar. 13 (1978), pp. 47–51. Andréka, H. and Németi, I. [46] Completeness of Floyd logic. Bulletin of the Section of Logic 7/3 (1978), pp. 115–120. Andréka, H. and Németi, I.

Appendix B: Joint Annotated Bibliography of Hajnal Andréka and István Németi

503

The Floyd-Hoare method for proving program correctness is complete with respect to the socalled continuous-traces semantics. It is not complete with respect to the standard-traces semantics of programs. Full proofs are contained in [50].

[47] On universal algebraic logic and cylindric algebras. Bulletin of the Section of Logic 7/4 (1978), pp. 152–158. Andréka, H. and Németi, I. [48] Injectivity in categories to represent all first order formulas. Demonstratio Mathematica 12 (1979), pp. 717–732, Andréka, H. and Németi, I. [49] Formulas and ultraproducts in categories. Beiträge zur Algebra und Geometrie 8 (1979), pp. 133–151. Andréka, H. and Németi, I. [50] Completeness problems in verification of programs and program schemes. In: Mathematical Foundations of Computer Science’79 (Proc. Conf. Olomouc Czechoslovakia 1979). Ed: Becvar, J. Lecture Notes in Computer Science Vol 74, Springer–Verlag, Berlin, 1979. pp. 208–218. Andréka, H., Németi, I. and Sain, I. Thm.1 states a strong incompleteness property for program correctness statements with the standard semantics: there is no recursive set containing the “we must prove these program correctness statements” and contained in the standard -valid program correctness statements. Thm.2. states that the reason for this incompleteness is in the definition of standard semantics for program correctness statements: this standard semantics is not stable in Zermelo-Fraenkel Set Theory. These theorems justify the Henkin style semantics for dynamic logic introduced in the second part of the paper, for which Thm.3 states strong completeness.

[51] Henkin-type semantics for program schemes to turn negative results to positive. In: Fundamentals of Computation Theory’79 (Proc. Conf. Berlin 1979). Ed: L. Budach, Akademie Verlag, Berlin, 1979. Band 2., pp. 18–24. Andréka, H., Németi, I. and Sain, I. [52] Reduced products in categories. In: Contributions to General Algebra (Proc. Conf. Klagenfurt 1978) Verlag Johannes Heyn, 1979. pp. 25–45. Andréka, H., Makai, E., Márki, L. and Németi, I. [53] Applications of universal algebra, model theory, and categories in computer science. (Survey and bibliography) CL&CL Vol XIII (1979), pp. 251–282. Andréka, H. and Németi, I. [54] Program verification within and without logic. Bulletin of the Section of Logic 8/3 (1979), pp. 124–129. Andréka, H., Németi, I. and Sain, I. This paper is an abstract for the first part of [50].

[55] Not all representable cylindric algebras are neat reducts. Bulletin of the Section of Logic 8/3 (1979), pp. 145–147. Andréka, H. and Németi, I. [56] Dimension-restricted free cylindric algebras and finitary logic of infinitary relations. Journal of Symbolic Logic 44 (1979), p. 442. Andréka, H. and Gergely, T. 1980 – 1984 [57] Does SPK ⊇ PSK imply axiom of choice?. Comm. Math. Univ. Carolinae. 21,4 (1980), pp. 699–706. Andréka, H. and Németi, I. “SPK contains PSK for all classes K of algebras” holds in Zermelo–Fraenkel set theory if PK is understood as “the class of algebras isomorphic to direct products of elements of K”, but the same statement implies the axiom of choice if PK is understood without isomorphism. Each of IP

504

Appendix B: Joint Annotated Bibliography of Hajnal Andréka and István Németi

being a closure operator and HP being a closure operator implies the axiom of choice. These give partial answers to Problem 28 in Gratzer’s 1979 Universal Algebra book.

[58] On systems of varieties definable by schemes of equations. Algebra Universalis 11 (1980), pp. 105–116. Andréka, H. and Németi, I. [59] Additions to survey of applications of universal algebra, model theory, and categories in computer science. CL&CL Vol XIV (1980), pp. 7–20. Andréka, H. and Németi, I. [60] Some constructions of cylindric algebra theory applied to dynamic algebras of programs. CL&CL Vol XIV (1980), pp. 43–65. Németi, I. [61] Model theoretical semantics for many–purpose languages and language hierarchies. In: Computational Linguistics (Proc. 8th Int. Conf. Tokyo 1980) Tokyo, 1980. pp. 213–219. Andréka, H., Gergely, T. and Németi, I. [62] Quasi equational logic of partial algebras. Bulletin of the Section of Logic 9/4 (1980), pp. 193–198. Andréka, H., Burmeister, P. and Németi, I. [63] On problems in cylindric algebra theory. Abstracts of Amer. Math. Soc. 1 (1980), p.588. Andréka, H. and Németi, I. [64] Quasivarieties of partial algebras – a unifying approach towards a two–valued model theory for partial algebras. Studia Sci. Math. Hungar. 16 (1981), pp. 325– 372. Andréka, H., Burmeister, P. and Németi, I. [65] Dimension complemented and locally finite dimensional cylindric algebras are elementarily equivalent. Algebra Universalis 13 (1981), pp. 157–163. Andréka, H. and Németi, I. [66] HSPK is an equational class, without the axiom of choice. Algebra Universalis 13 (1981), pp. 164–166. Andréka, H. and Németi, I. Birkhoff’s variety theorem “HSP=ModEq” is proved in Zermelo-Fraenkel set theory without the axiom of choice. This answers Problem 31 in Grätzer’s 1968 Universal Algebra book in the negative.

[67] Similarity types, pseudosimple algebras, and congruence representation of chains. Algebra Universalis 13 (1981), pp. 293–306. Andréka, H. and Németi, I. [68] On cylindric-relativized set algebras. In: Cylindric Set Algebras, Lecture Notes in Mathematics vol 883, Springer-Verlag, Berlin Heidelberg New York, 1981, pp. 131–315. Andréka, H. and Németi, I. [69] Connections between algebraic logic and initial algebra semantics of CF languages. In: Mathematical Logic in Computer Science (Proc. Coll. Salgótarján 1978). Eds: Dömölki, B. and Gergely, T., Colloq. Math. Soc. J. Bolyai Vol 26, North-Holland, Amsterdam, 1981, pp. 25–83. Andréka, H. and Sain, I. [70] Connections between cylindric algebras and initial algebra semantics of CF languages. In: Mathematical Logic in Computer Science (Proc. Coll. Salgótarján 1978). Eds.: Dömölki, B. and Gergely, T., Colloq. Math. Soc. J. Bolyai Vol 26, North–Holland, Amsterdam, 1981. pp. 561–605. Németi, I. [71] Dynamic algebras of programs. In: Fundamentals of Computation Theory’81 (Proc. Conf. Szeged 1981). Ed: Gécseg, F. Lecture Notes in Computer Science Vol 117, Springer–Verlag, Berlin, 1981. pp. 281–290. Németi, I. [72] Some universal algebraic and model theoretic results in computer science. In: Fundamentals of Computation Theory’81 (Proc. Conf. Szeged 1981). Ed: Gécseg,

Appendix B: Joint Annotated Bibliography of Hajnal Andréka and István Németi

505

F. Lecture Notes in Computer Science Vol 117, Springer–Verlag, Berlin, 1981, pp. 16–23. Andréka, H. and Németi, I. [73] Which finite cylindric algebras are generated by a single element?. In: Finite Algebra and Multiple–valued Logic (Proc. Coll. Szeged 1979). Colloq. Math. Soc. J. Bolyai Vol 28, North–Holland, Amsterdam, 1981. pp. 23–39. Andréka, H. and Németi, I. [74] A characterization of Floyd provable programs. In: Mathematical Foundations of Computer Science’81 (Proc. Conf. Strbské Pleso, Czechoslovakia 1981). Eds.: Gruska, J. and Chytil, M. Lecture Notes in Computer Science Vol 118, Springer–Verlag, Berlin, 1981. pp. 162–171. Andréka, H., Németi, I. and Sain, I. An explicit characterization of the information content of the Floyd programverification method is given, as follows. A run (sequence of intensions of the registers) in perhaps nonstandard time of a program is called continuous if it satisfies induction over all possible dynamic logic formulas containing one free time variable. Thm.1: Assume that T is a theory about data containing the Peano’s Axioms. A program p is correct for output formula ψ wrt each continuous runs in each model of T if and only if p can be proved correct wrt ψ by the Floyd inductive assertion method. Detailed proof is given.

[75] A general axiomatizability theorem formulated in terms of cone–injective subcategories. In: Universal Algebra (Proc. Coll. Esztergom 1977). Colloq. Math. Soc. J. Bolyai Vol 29, North–Holland, Amsterdam, 1981. pp. 13–35. Andréka, H. and Németi, I. [76] Cone–implicational subcategories and some Birkhoff–type theorems. In: Universal Algebra (Proc. Coll. Esztergom 1977). Colloq. Math. Soc. J. Bolyai Vol 29, North–Holland, Amsterdam, 1981. pp. 535–578. Németi, I. and Sain, I. [77] Qualitative mathematics (In Hungarian). Magyar Tudomány 1983/2, (1983), pp. 99–103. Andréka, H. and Németi, I. [78] Problems with the category theoretic notions of ultraproducts. Bulletin of the Section of Logic 10,3 (1981), pp. 122–127. Buy, H. H. (Bui Huy Hien), and Németi, I. [79] Foundations for stepwise refinement of program specifications via cylindric algebra theory. Diagrammes 8,1 (1982), pp. N1–N24. Németi, I. [80] On notions of factorization systems and their applications to cone–injective subcategories. Periodica Math. Hungar. 13,3 (1982), pp. 229–235. Németi, I. [81] Every free algebra in the variety generated by the representable dynamic algebras is separable and representable. Theoretical Computer Science 17 (1982), pp. 343–347. Németi, I. [82] A complete logic for reasoning about programs via nonstandard model theory. Theoretical Computer Science 17 (1982), Part I in No 2, pp. 193–212, Part II in No 3, pp. 259–278. Andréka, H., Németi, I. and Sain, I. Part I: Nonstandard Dynamic Logic (NDL) is introduced, by replacing standard semantics of Firstorder Dynamic Logic (FoDL) with an explicit time semantics. NDL is then proved to be strongly complete with respect to a proof concept N (Thm.2, detailed proof, translating NDL into FOL). It is known that FoDL is rather wild, e.g., its validities are not recursively enumerable, hence the move to explicit time semantics was necessary to obtain the above completeness theorem (analogous to Henkin’s Nonstandard Second-order Logic). The proof system N is a new rather strong method for proving properties of programs (see Part II of the paper). Part II: NDL, introduced in Part I, is shown to be useful for reasoning about programs and for characterizing the information contents of known program proving methods. First, natural properties of programs are proved under natural axioms about time. (Thm.3,4) Termination of the “count-down

506

Appendix B: Joint Annotated Bibliography of Hajnal Andréka and István Németi

program” is proved by using the order axioms on time, induction on data, and comprehension on intensions (Thm.7). The Naur-Floyd-Hoare inductive assertions method for proving correctness of programs F is introduced (Sect. 6). Thm.9 is a semantic characterization, in terms of statements in NDL, of the information implicitly contained in the Floyd-method. This information content is time-induction over quantifier-free formulas. If in addition we can reason about ordering in time, our program-proving ability does not increase, we do not go beyond the power of Floyd’s method. But if we can perform addition on time, or if we can quantify over time points in the induction, our reasoning ability is beyond the power of Floyd’s method. Note that quantifying over time is roughly the same as using time-modalities. However, if our theory for the data contains the Peano axioms, full induction over time does not lend more reasoning power than the Floyd method (Thm.11). The proof concept N is strictly stronger than Floyd’s method (Thm.10). slightly updated

[83] Direct limits and filtered colimits are strongly equivalent in all categories. Algebra and its applications, Banach Center Publications Vol 9, PWN – Polish Scientific Publishers, Warszawa 1982, pp. 75–88. Andréka, H. and Németi, I. [84] Nonstandard dynamic logic. (Invited paper.) In: Logics of Programs (Proc. Conf. New York, May 1981). Ed: Kozen, D. Lecture Notes in Computer Science Vol 131, Springer–Verlag, Berlin, 1982. pp. 311–348. Németi, I. NDL, and a lattice of program verifying methods are introduced. In this lattice, comparison of theories in NDL are based on their strengths for proving partial correctness of programs. Figure 2 contains many statements about this lattice. Thm.6: Ia + Ts is strictly weaker than Ia + To. Intuitively: When using full induction Ia over time, it does matter whether we can compare time instances by “later than” relation, or we just have successor on time. (Used together with quantifierfree induction over time in place of full induction, this does not matter.) Thm.6 is proved in detail with figures.

[85] Remark on one–sided A–ideals of semigroups. Math. Slovaca 33,2 (1983), pp. 231–235. Andréka, H., Németi, I. and Sulka, R. [86] The class of neat–reducts of cylindric algebras is not a variety but is closed w.r.t. HP. Notre Dame J. of Formal Logic 24,3 (1983), pp. 399–409. Németi, I. [87] Generalization of variety and quasivariety concept to partial algebras through category theory. Dissertationes Mathematicae (Rozprawy Math.) No. 204. PWN – Polish Scientific Publishers, Warsaw, 1983. 56 p. Andréka, H. and Németi, I. [88] Sharpening the characterization of the power of Floyd method. In: Logics of Programs and their Applications (Proc. Conf. Poznan 1980). Ed: Salwicki, A. Lecture Notes in Computer Science Vol 148, Springer-Verlag, Berlin, 1983, pp. 1–26. Andréka, H. The main result is that induction over formulas containing a single universal quantifier of sort time is already strictly stronger than Floyd’s method, for proving partial correctness of programs (Thm.2). It was known that Floyd’s method is equivalent to induction over formulas not containing quantifiers of sort time ([82], Thm.9). The case for induction over formulas containing a single existential quantifier is left open. This paper is a continuation of [84].

[89] Nonstandard runs of Floyd–provable programs. In: Logics of Programs and their Applications (Proc. Conf. Poznan 1980). Ed: Salwicki, A. Lecture Notes in Computer Science Vol 148, Springer–Verlag, Berlin, 1983, pp. 186–204. Németi, I. The question “exactly which programs are provable by the Floyd-Hoare inductive assertions method” is investigated. Concrete examples of simple nonstandard runs of programs are constructively defined and illustrated. The emphasis is on simplicity, with the aim to make nonstandard runs and nonstandard models less esoteric, less imaginary, easy to draw, easy to touch. It is demonstrated how ultraproducts can be used to test applicability of Floyd’s method in concrete situations.

Appendix B: Joint Annotated Bibliography of Hajnal Andréka and István Németi

507

[90] Surjectiveness of cylindric algebras. Abstracts of Amer. Math. Soc. 4,3 (April 1983), p.293. *83T-08-186 Andréka, H., Comer, S. D. and Németi, I. [91] Combinatorial problem for algebraic logic. Abstracts of Amer. Math. Soc. 4,5 (August 1983), p. 389. (Issue 26) *83T-05-312 Németi, I. [92] Importance of universal algebra for computer science. In: Universal algebra and its links with logic, algebra, combinatorics, and computer science (Proc. of the “25th Arbeitstagung über Allgemeine Algebra”, Darmstadt 1983). Eds.: Burmeister, P., Ganter, B., Herrman, C., Keimel, K., Poguntke, W. and Wille, R. Research and Exposition in Math. Vol 4, Heldermann Verlag, Berlin, 1984. pp. 204–215. Andréka, H. and Németi, I. [93] Homomorphic images of weak cylindric set algebras with infinite bases. Abstracts of Amer. Math. Soc. (1984) Andréka, H., Monk, J. D. and Németi, I. 1985 – 1989 [94] Relative epis need not be surjective. Algebra Universalis 20 (1985), pp. 197– 204. Andréka, H. and Pásztor, A. [95] On the number of generators of cylindric algebras. Journal of Symbolic Logic 50,4 (1985), pp. 865–873. Andréka, H. and Németi, I. [96] Cylindric–relativized set algebras have strong amalgamation. Journal of Symbolic Logic 50 (1985), pp. 689–700. Németi, I. [97] Clones of operations on relations. In: Universal Algebra and Lattice Theory (Proc. Conf. Charleston 1984). Lecture Notes in Mathematics Vol 1149, Springer– Verlag, Berlin, 1985. pp. 7–21. Andréka, H., Comer, S. D. and Németi, I. [98] A non–representable cylindric algebra with pairing function. Algebra Universalis 22 (1986), pp. 117–119. Németi, I. [99] On logic in computer science. In: Information Processing 86, H-J Kugler (ed), North-Holland, IFIP 1986. p.395. Németi, I. [100] Boolean reducts of relation and cylindric algebras and the cube problem. Proc. Amer. Math. Soc. 100,1 (1987), pp. 148–153. Andréka, H. [101] On varieties of cylindric algebras with applications to logic. Annals of Pure and Applied Logic 36 (1987), pp. 235–277. Németi, I. [102] Decidability of relation algebras with weakened associativity. Proc. Amer. Math. Soc. 100,2 (1987), pp. 340–344. Németi, I. [103] A unifying theorem for algebraic semantics and dynamic logics. Information and Computation 72,1 (1987), pp. 31–45. Andréka, H., Guessarian, I. and Németi, I. [104] Rózsa Péter (1905–1977). In: Women of mathematics. A biobibliographic Sourcebook. Eds: Louise S. Grinstein and Paul J. Campbell, Greenwood Press, New York, 1987. pp. 171–174. Andréka, H.

508

Appendix B: Joint Annotated Bibliography of Hajnal Andréka and István Németi

[105] On taking subalgebras of relativized relation algebras. Algebra Universalis 25 (1988), pp. 96–100. Andréka, H. [106] A Stone-type representation theorem for algebras of relations of higher rank. Trans. Amer. Math. Soc. 309,2 (1988), pp. 671–682. Andréka, H. and Thompson, R. J. [107] A system of logic for partial functions under existence–dependent Kleene equality. Journal of Symbolic Logic 53 (1988), pp. 834–839. Andréka, H., Craig, W. and Németi, I. [108] Foundations of computer science: basic research. (In Hungarian) Filozófiai Figyel˝o 1988/4, (1988), pp. 26–55. Andréka, H. and Németi, I. Those parts of mathematics are needed here whose subject is the abstraction (and not only themselves are abstract). Such are mathematical logic, its model theory, algebraic logic, universal algebra.

[109] Nonfinite axiomatizability of the polyadic operations in algebraic logic. Abstracts of Amer. Math. Soc. 9,6 (1988), p.500. *88T-03-264. Andréka, H. and Tuza, Zs. [110] On residuated approximations. In: Categorical Methods in Computer Science (with aspects from Topology). Eds: Ehrig, J., Herrlich, H., Kreowski, H-J. and Preuss, G. Lecture Notes in Computer Science Vol 393, Springer-Verlag, Berlin, 1989, pp. 333–339. Andréka, H., Greechie, R. J. and Strecker, G. E. [111] On the strength of temporal proofs. In: Mathematical Foundations of Computer Science’89 (Proc. Porabka–Kozubnik, Poland, 1989). Eds.: Kreczmar, A. and Mirkowska, G. Lecture Notes in Computer Science Vol 379, Springer–Verlag, Berlin, 1989. pp. 135–144. Andréka, H., Németi, I. and Sain, I. This is a shorter, abstract version of [123].

[112] On the “union-relationcomposition” reducts of relation algebras. Abstracts of Amer. Math. Soc. 10,2 (1989), p.174. *89T-08-21. Andréka, H. [113] There are 140 smallest possible near-boolean algebras. Abstracts of Amer. Math. Soc. (July 1989). Andréka, H. 1990 – 1994 [114] Weak cylindric set algebras and weak subdirect indecomposability. Journal of Symbolic Logic 55,2 (1990), pp. 577–588. Andréka, H., Németi, I. and Thompson, R. J. [115] On cylindric algebraic model theory. In: Algebraic Logic and Universal Algebra in Computer Science (Proc. Conf. Ames 1988). Lecture Notes in Computer Science Vol 425, Springer–Verlag, Berlin, 1990. pp. 37–76. Németi, I. [116] Relatively free relation algebras. (Extended abstract) In: Algebraic Logic and Universal Algebra in Computer Science (Proc. Conf. Ames 1988). Lecture Notes in Computer Science Vol 425, Springer–Verlag, Berlin, 1990. pp. 1–14. Andréka, H., Jónsson, B. and Németi, I. [117] Review of Tarski, A. and Givant, S. A formalization of set theory without variables. Journal of Symbolic Logic 55,1 (1990), pp. 350–352. Németi, I.

Appendix B: Joint Annotated Bibliography of Hajnal Andréka and István Németi

509

[118] One variable is not enough for defining relation algebras but two are. Algebra Universalis 28 (1991), pp. 274–279. Andréka, H. [119] Representations of distributive lattice-ordered semigroups with binary relations. Algebra Universalis 28 (1991), pp. 12–25. Andréka, H. [120] Algebraizations of quantifier logics, an introductory overview. Studia Logica 50, 3–4 (1991), pp. 485–569. Special issue on Algebraic Logic (eds: W. J. Blok and D. Pigozzi) Németi, I. [121] Splitting in relation algebras. Proceedings of Amer. Math. Soc. 111,4 (1991), pp. 1085–1093. Andréka, H., Maddux, R. and Németi, I. [122] Free algebras in discriminator varieties. Algebra Universalis 28 (1991), pp. 401–447. Andréka, H., Jónsson, B. and Németi, I. [123] On the strength of temporal proofs. Theoretical Computer Science 80 (1991), pp. 125–151. Andréka, H., Németi, I. and Sain, I. Shorter version appeared in Lecture Notes in Computer Science Vol 379, 1989.

[124] Open problems. In: Algebraic Logic (Proc. Conf. Algebraic Logic Budapest 1988). Colloq. Math. Soc. J. Bolyai Vol. 54, North-Holland 1991. pp. 727–746. Andréka, H., Monk, J. D. and Németi, I. [125] On Jónsson’s clones of operations on binary relations. In: Algebraic Logic. (Coll. Math. Soc. J. Bolyai Vol. 54), North–Holland, 1991. pp. 431–442. Andréka, H. and Németi, I. [126] A nonpermutational integral relation algebra. Michigan Math. J. 39 (1992), pp. 371–384. Andréka, H., Düntsch, I. and Németi, I. [127] Decidability of weakened versions of first-order logic. In: Logic at Work (Proc. Conf. Amsterdam, December 1992), University of Amsterdam, 1992. Németi, I. [128] Investigations in arrow logics. In: Logic at Work (Proc. Conf. Amsterdam, December 1992), University of Amsterdam, 1992. Marx, M., Mikulás, Sz., Németi, I. and Sain, I. [129] Associativity implies undecidability in arrow logics. In: Logic at Work (Proc. Conf. Amsterdam, December 1992), University of Amsterdam, 1992. Gyuris, V., Kurucz, Á., Németi, I. and Sain, I. [130] Undecidable varieties of semilattice-ordered semigroups, of Boolean algebras with operators, and logics extending Lambek calculus. Bulletin of IGPL 1/1 (1993), pp. 91–98. Kurucz, Á., Németi, I., Sain, I. and Simon, A. [131] Lambek Calculus and its relational semantics: Completeness and incompleteness. Journal of Logic, Language and Information 3 (1994), pp. 1–37. Andréka, H. and Mikulás, Sz. [132] Weakly representable but not representable relation algebras. Algebra Universalis 32 (1994), pp. 31–43. Andréka, H. [133] Representations for small relation algebras. Notre Dame Journal of Formal Logic 35,4 (1994), pp. 550–562. Andréka, H. and Maddux, R. D.

510

Appendix B: Joint Annotated Bibliography of Hajnal Andréka and István Németi

[134] Connections between axioms of set theory and basic theorems of universal algebra. Journal of Symbolic Logic 59,3 (1994), pp. 912–922. Andréka, H., Kurucz, Á. and Németi, I. The axiom of Foundation (AF) is needed for deriving Birkhoff’s variety theorem (VT) even in the presence of the axiom of choice (AC). VT is equivalent to the Collection principle of set theory. The role of AC is investigated in several other theorems of universal algebra.

[135] The lattice of varieties of representable relation algebras. Journal of Symbolic Logic 59,2 (1994), pp. 631–661. Andréka, H., Givant, S. and Németi, I. [136] Exactly which logics touched by the dynamic trend are decidable?. In: Proceedings of 9th Amsterdam Colloquium (Dec.14–17, 1993), ILLC, Department of Philosophy, University of Amsterdam, 1994. Eds: P. Dekker and M. Stokhof. pp. 67–86. Andréka, H., Kurucz, Á., Németi, I., Sain, I. and Simon, A. [137] Craig property of a logic and decomposability of theories. In: Proceedings of 9th Amsterdam Colloquium (Dec.14–17, 1993), ILLC, Department of Philosophy, University of Amsterdam, 1994. Eds: P. Dekker and M. Stokhof. pp. 87–93. Andréka, H., Németi, I. and Sain, I. [138] Applying algebraic logic to logic. In: Algebraic methodology and software technology (AMAST’93, Proc. Twente, The Netherlands, June 1993), Nivat, M., Rattray, C., Rus, T. and Scollo, G. eds., Springer-Verlag, London, 1994. pp. 7–28. Andréka, H., Németi, I. and Sain, I. [139] General algebraic logic: a perspective on “what is logic”. In: What is a logical system. Ed: D. M. Gabbay, Clarendron Press, Oxford, 1994. pp. 393–444. Andréka, H. and Németi, I. [140] Some new landmarks on the roadmap of two dimensional logics. In: Logic and Information Flow. Ed: J. van Eijck and A. Visser, MIT Press, Cambridge, 1994. pp. 163–169. Andréka, H., Németi, I. and Sain, I. [141] Decision problems for equational theories of relation algebras. Bulletin of Section of Logic 23,2 (1994), pp. 47–52. Andréka, H., Givant, S. and Németi, I. 1995 – 1999 [142] The equational theory of union-free algebras of relations. Algebra Universalis 33,4 (1995), pp. 516–532. Andréka, H. and Bredikhin, D. [143] Decidable and undecidable logics with a binary modality. Journal of Logic, Language and Information 4 (1995), pp. 191–206. Kurucz, Á., Németi, I., Sain, I. and Simon, A. [144] Taming Logic. Journal of Logic, Language, and Information 4 (1995), pp. 207–226. Marx, M., Mikulás, Sz. and Németi, I. [145] Back and forth between modal logic and classical logic. Journal of the IGPL 3,5 (1995), pp. 685–720. Andréka, H., van Benthem, J. and Németi, I. [146] Expressibility of properties of relations. Journal of Symbolic Logic 60,3 (1995), pp. 970–991. Andréka, H., Düntsch, I. and Németi, I. [147] Perfect extensions and derived algebras. Journal of Symbolic Logic 60,3 (1995), pp. 775–796. Andréka, H., Givant, S. and Németi, I.

Appendix B: Joint Annotated Bibliography of Hajnal Andréka and István Németi

511

[148] Binary relations and permutation groups. Mathematical Logic Quarterly 41 (1995), pp. 197–216. Andréka, H., Düntsch, I. and Németi, I. [149] Undecidability of the equational theories of some classes of residuated Boolean algebras with operators. Bulletin of the IGPL 3,1 (1995), pp. 93–107. Németi, I., Sain, I. and Simon, A. [150] Operators and laws for combining preferential relations (Extended abstract). In: Information Systems: Correctness and Reusability (Selected papers). Eds: Wieringa, R. J. and Feenstra, R. B., World Scientific Publishing Co, 1995, pp. 191–206. Andréka, H., Ryan, R. and Schobbens, P-Y. [151] General Algebraic Logic including Algebraic Model Theory: An Overview. In: Logic Colloquium’92. Eds: Csirmaz, L. Gabbay, D. and de Rijke, M., Studies in Logic, Language and Computation, CSLI Publications, 1995. pp. 1–60. Andréka, H., Németi, I., Sain, I. and Kurucz, Á. [152] Decidable Logics of the Dynamic Trend, and Relativized Relation Algebras. In: Logic Colloquium’92. Eds: Csirmaz, L., Gabbay, D. and de Rijke, M., Studies in Logic, Language and Computation, CSLI Publications, 1995. pp. 165–175. Mikulás, Sz., Németi, I. and Sain, I. [153] Decidable versions of first order logic and cylindric-relativized set algebras. In: Logic Colloquium’92 (Proc. Veszprém, Hungary 1992). Eds: Csirmaz, L., Gabbay, D. M. and de Rijke, M., Studies in Logic, Language and Computation, CSLI Publications, 1995. pp. 177–241. Németi, I. [154] Submodel preservation theorems in finite variable fragments. In: Modal Logic and Process Algebra. A Bisimulation Perspective. Eds: Ponse, A. de Rijke, M. and Venema, Y. CSLI Lecture Notes No. 53, CSLI Publications, 1995. pp. 1–11. Andréka, H., van Benthem, J. and Németi, I. [155] Effective temporal logics of programs. In: Time and Logic, a computational approach. Eds: Bolc, L. and Szalas, A., UCL Press, London, 1995. pp. 51–129. Andréka, H., Goranko, V., Mikulás, Sz., Németi, I. and Sain, I. [156] Fork Algebras in Usual and in Non–well–founded Set Theories. (Extended abstract) Parts I-II. Bulletin of Section of Logic 24,3 and 24,4 (1995), pp. 158–168, 182–192. Németi, I. and Sain, I. [157] Axiomatization of identity-free equations valid in relation algebras. Algebra Universalis 35 (1996), pp. 256–264. Andréka, H. and Németi, I. [158] Investigations in Arrow Logic. In: Arrow Logic and Multi-Modal Logic, M. Marx, L. Pólos, and M. Masuch eds, CSLI Publications, Stanford, California, 1996. pp. 35–61. Marx, M., Mikulás, Sz., Németi, I. and Sain, I. [159] Causes and remedies for undecidability in arrow logics and in multi-modal logics. Arrow Logic and Multi-Modal Logic, M. Marx, L. Pólos, and M. Masuch eds, CSLI Publications, Stanford, California, 1996. pp. 63–99. Andréka, H., Kurucz, Á., Németi, I., Sain, I. and Simon, A. [160] Fine-structure analysis of first order logic. Arrow Logic and Multi-Modal Logic, M. Marx, L. Pólos, and M. Masuch eds, CSLI Publications, Stanford, California, 1996. pp. 221–247. Németi, I.

512

Appendix B: Joint Annotated Bibliography of Hajnal Andréka and István Németi

[161] Ontology can turn negative results to positive. (An overview of recent results) Bulletin of Section of Logic 25,1 (1996), pp. 29–40. Németi, I. [162] Decision problems for equational theories of relation algebras. Memoirs of Amer. Math. Soc. 126,604 (1997), xiv+126 p. Andréka, H., Givant, S. and Németi, I. [163] Complexity of equations valid in algebras of relations, Parts I-II. Annals of Pure and Applied Logic 89 (1997), 149–229. Andréka, H. [164] Relation algebras from cylindric and polyadic algebras. Logic Journal of the IGPL 5,4 (1997), pp. 575–588. Németi, I. and Simon, A. [165] Strong representability of fork algebras, a set theoretic foundation. Journal of the IGPL 5,1 (1997), pp. 3–28. Németi, I. [166] Persistent properties and an application to algebras of logic. Algebra Universalis 38 (1997), pp. 141–149. Andréka, H., Givant, S., Németi, I. and Simon, A. [167] On neat reducts of algebras of logic. Bulletin of Journal of Symbolic Logic 3,2 (1997), p. 249 Andréka, H., Németi, I. and Sayed-Ahmed, T. [168] On the finitization problem of relation algebras. Bulletin of Section of Logic 26,3 (1997), pp. 139–143. Madarász, J. X., Németi, I. and Sági, G. [169] Modal languages and bounded fragments of predicate logic. Journal of Philosophical Logic 27 (1998), pp. 217–274. Andréka, H., van Benthem, J. and Németi, I. [170] Notions of density that imply representability in algebraic logic. Annals of Pure and Applied Logic 91 (1998), pp. 93–190. Andréka, H., Givant, S., Mikulás, Sz., Németi, I. and Simon, A. [171] Relativised quantification: some canonical varieties of sequence-set algebras. Journal of Symbolic Logic 63,1 (1998), pp. 163–184. Andréka, H., Goldblatt, R. and Németi, I. [172] On the equational theory of representable polyadic equality algebras. (Extended abstract) Logic Journal of the IGPL 6,3 (1998), pp. 3–15. Németi, I. and Sági, G. [173] Finite algebras of relations are representable on finite sets. Journal of Symbolic Logic 64,1 (1999), pp. 243–267. Andréka, H., Hodkinson, I. and Németi, I. [174] Fork Algebras in Usual and in Non–well–founded Set Theories (An overview). In: Logic at Work. Ed: E. Orlowska, Physica-Verlag, 1999. pp. 669–694. Németi, I. and Sain, I. [175] Logical analysis of special relativity theory. In: Essays dedicated to Johan van Benthem on the occasion of his 50th birthday CD-ROM, University of Amsterdam. 1999. Andréka, H., Madarász, J. X. and Németi, I. http://www.illc.uva.nl/ j50/contribs/andreka-nemeti/index.html [176] Omitting types in logics with finitely many variables (abstract). Bulletin of Symbolic Logic 5,1 (1999) p.88 Andréka, H. and Sayed-Ahmed, T. [177] Relation algebras and groups. In: Algebra and model theory, 2 (Erlogol, 1999), Novosibirsk State Tech. Univ., Novosibirsk, 1999. pp. 34–36. Andréka, H. and Givant, S.

Appendix B: Joint Annotated Bibliography of Hajnal Andréka and István Németi

513

2000 – 2004 [178] Editorial to the special issue of Logic Journal of the IGPL on Algebraic Logic. Logic Journal of the IGPL 8,4 (2000), pp. 379–381 Németi, I. and Sain, I. [179] Representability of pairing relation algebras depends on your ontology. Fundamenta Informaticae 44,4 (2000), pp. 397–420. Kurucz, Á. and Németi, I. Representation theorems do depend on the set theory they are proved in: pairing algebras do not have a strong representation theorem in Zermelo-Fraenkel set theory with the axiom of choice, but they do in an appropriate non-well-founded set theory.

[180] On the equational theory of representable polyadic equality algebras. Journal of Symbolic Logic 65,3 (2000), pp. 1143–1167. Németi, I. and Sági, G. [181] Developments after 1991. Chapter in J. D. Monk: Introduction to cylindric algebras. Logic Journal of IGPL (Special issue on Algebrai Logic) 8,4 (2000), pp. 451–506. Andréka, H. [182] A finite axiomatization of locally square cylindric-relativized set algebras. Studia Sci. Math. Hungar. 38 (2001), pp. 1–11. Andréka, H. [183] On neat reducts of algebras of logic. Studia Logica 68,2 (2001), pp. 229–262. Németi, I. and Sayed-Ahmed, T. [184] Free Boolean algebras with closure operators and a conjecture of Henkin, Monk, and Tarski. Studia Sci. Math. Hungar. 38 (2001), pp. 273–278. Madarász, J. X. and Németi, I. [185] Algebraic Logic. In: Handbook of Philosophical Logic, Vol. 2, second edition. Eds. D. M. Gabbay and F. Guenthner, Kluwer Academic Publishers, 2001. pp. 133–247. Andréka, H., Németi, I. and Sain, I. [186] Operators and laws for combining preferential relations. Journal of Logic and Computation 12,1 (2002), pp. 13–53. Andréka, H., Ryan, M. and Schobbens, P-Y. [187] Groups and algebras of relations. Bulletin of Symbolic Logic 8,1 (2002), pp. 38–64. Andréka, H. and Givant, S. R. [188] Non-Turing computations via Malament-Hogarth space-times. International Journal of Theoretical Physics 41,2 (2002), pp. 341–370. Etesi, G. and Németi, I. [189] Relational Algebras. In: The Concise Handbook of Algebra. Eds: Mikhalev, A. V. and Pilz, G. F. Kluwer Akademic Publishers, Dordrecht, Boston, London, 2002. pp. 478–482. Andréka, H., Madarász, J. X. and Németi, I. [190] Algebraic Logic. In: Supplement III of Encyclopaedia of Mathematics. Ed: Hazewinkel, M. Kluwer Academic Publishers, 2002. pp. 31–34. Andréka, H., Madarász, J. X. and Németi, I. [191] On the logical structure of relativity theories. Alfréd Rényi Institute of Mathematics, Budapest, July 5, 2002. Andréka, H., Madarász, J. X. and Németi, I., with contributions from A. Andai, G. Sági, I. Sain and Cs. T˝oke. 1312 p. https://old.renyi. hu/pub/algebraic-logic/Contents.html

514

Appendix B: Joint Annotated Bibliography of Hajnal Andréka and István Németi

[192] A logical investigation of inertial and accelerated observers in flat space-time. In: Kalmár Workshop on Logic and Computer Science, (Gécseg, F. Csirik, J. and Turán, Gy. eds) Department of Informatics, University of Szeged, Szeged, Hungary, 2003. pp. 45–57. Andréka, H., Madarász, J. X., Németi, I. and Székely, G. [193] Algebras of relations of various ranks, some current trends and applications. Journal of Relational Methods in Computer Science 1 (2004), pp. 27–49. Andréka, H., Madarász, J. X. and Németi, I. [194] Logical analysis of relativity theories. In: First-order Logic Revisited (Hendricks et al. eds), Logos Verlag, Berlin, 2004. pp. 7–36. Andréka, H., Madarász, J. X. and Németi, I. [195] On generalizing the logic-approach to space-time towards general relativity: first steps. In: First-order Logic Revisited (Hendricks et al. eds), Logos Verlag, Berlin, 2004. pp. 225–268. Madarász, J. X., Németi, I. and T˝oke, Cs. 2005 – 2009 [196] Mutual definability does not imply definitional equivalence, a simple example. Mathematical Logic Quarterly 51,6 (2005), pp. 591–597. Andréka, H., Madarász, J. X. and Németi, I. [197] Relativistic computers and the Turing barrier. Journal of Applied Mathematics and Computation 178 (2006), pp. 118–142. Németi, I. and Dávid, Gy. [198] Twin Paradox and the logical foundation of relativity theory. Foundations of Physics 36,5 (2006), pp. 681–714. Madarász, J. X., Németi, I. and Székely, G. [199] Logical axiomatizations of space-time. Samples from the literature. In: NonEuclidean Geometries: János Bolyai Memorial Volume (Prékopa, A., Molnár, E. eds), Mathematics and Its Applications Vol. 581, Springer Verlag, 2006. pp. 155–185. Andréka, H., Madarász, J. X. and Németi, I. [200] Can general relativistic computers break the Turing barrier?. In: Logical Approaches to Computational Barriers (Proc. Conf. CiE 2006, Swansea, UK, July 2006). Eds.: Beckmann, A., Berger, U., Löwe, B. and Tucker, J. V., Lecture Notes in Computer Science Vol 3988, Springer–Verlag, Berlin, 2006. pp. 398–412. Németi, I. and Andréka, H. [201] Relativity theory for logicians and new computing paradigms. Abstract of talk. In: Logical Approaches to Computational Barriers (Second Conference on Computability in Europe CiE 2006, Swansea, UK, June/July 2006). Eds: Beckmann, A., Berger, U., Löwe, B. and Tucker, J. V., University of Wales Swansea, Computer Science, Report No CSR 7–2006, pp. 12–14. Andréka, H. [202] New physics and hypercomputation. In: SOFSEM 2006: Theory and Practice of Computer Science (32nd Conf. on Current Trends in Theory and Practice of Computer Science, Merin, Czech Republic, January 2006). Eds: Wiedermann, J., Tel, G., Pokorny, J. Bielikova, M. and Stuller, J., Lecture Notes in Computer Science Vol 3831, Springer Verlag, 2006, Invited talks section, p.63. Németi, I. and Andréka, H. [203] Logic of space-time and relativity theory. In: Handbook of Spatial Logics. Eds.: Aiello, M., Pratt-Hartmann, I., Benthem, J. F. A. K. van, Springer Verlag, 2007. pp. 607–711. Andréka, H., Madarász, J. X. and Németi, I. [204] First-order logic foundation of relativity theories. In: New Logics for the XXIst Century II, Mathematical Problems from Applied Logics, International

Appendix B: Joint Annotated Bibliography of Hajnal Andréka and István Németi

515

Mathematical Series Vol 5. Eds.: Gabbay, D., Goncharov, S. and Zakharyaschev, M., Springer Verlag, 2007. pp. 217–252. Madarász, J. X., Németi, I., and Székely, G. [205] A twist in the geometry of rotating black holes: seeking the cause of acausality. General Relativity and Gravitation 40,9 (2008), pp. 1809–1823. Andréka, H., Németi, I. and Wüthrich, C. [206] Axiomatizing relativistic dynamics without conservation postulates. Studia Logica 89,2 (2008), pp. 163–186. Andréka, H., Madarász, J. X., Németi, I. and Székely, G. [207] Omitting types for finite variable fragments and complete representations of algebras. Journal of Symbolic Logic 73,1 (2008), pp. 65–89. Andréka, H., Németi, I. and Sayed-Ahmed, T. [208] Epimorphisms in cylindric algebras and definability in finite variable logic. Algebra Universalis 61, 3–4 (2009), pp. 261–282. Andréka, H., Comer, S. C., Madarász, J. X., Németi, I. and Sayed-Ahmed, T. [209] General relativistic hypercomputing and foundation of mathematics. Natural Computing 8,3 (2009), pp. 499–516. Andréka, H., Németi, I. and Németi, P. [210] Weakly higher order cylindric algebras and finite axiomatization of the representables. Studia Logica 91,1 (2009), pp. 53–62. Németi, I. and Simon, A. [211] Logical foundation and introduction for relativity theory and for relativistic computing (extended abstract). In: Pre-proceedings of the Unconventional Computation 2009 Hypercomputation Workshop, Ponta Delgada, Azores, Portugal, September 2009. Ed: M. Stannett, pp. 1–2. Andréka, H., Madarasz, J. X., Németi, P., Németi, I. and Székely, G. [212] Relativistic hypercomputing (and physical realisticity issues). Extended abstract. In: Pre-proceedings of the Unconventional Computation 2009 Hypercomputation Workshop, Ponta Delgada, Azores, Portugal, September 2009. Ed: M. Stannett, pp. 2–3. Andréka, H., Madarász, J. X., Németi, P., Németi, I. and Székely, G. 2010 – 2014 [213] On logical analysis of relativity theories. Hungarian Philosophical Review 54,4 (2010), pp. 204–222. Andréka, H., Madarász, J. X., Németi, I. and Székely, G. [214] Visualizing ideas about Gödel-type rotating universes. In: Gödel-type spacetimes: history and new developments. Scherfner, M., Plaue, M. eds, Kurt Godel Society, Collegium Logicum Vol X, 2010. pp. 77–127. Németi, I., Madarász, J. X., Andréka, H. and Andai, A. [215] The equational theory of Kleene lattices. Theoretical Computer Science 412 (2011), pp. 7099–7108. Andréka, H., Mikulás, Sz. and Németi, I. [216] Axiomatizability of positive algebras of binary relations. Algebra Universalis 66,1 (2011), pp. 7–34. Andréka, H. and Mikulás, Sz. [217] Vienna Circle and Logical Analysis of Relativity Theory. In: The Vienna Circle in Hungary (Der Wiener Kreis in Ungarn). Máte, A., Rédei, M., Stadler, F. eds, Veröffentlishungen des Instituts Wiener Kreis, Collegium Logicum Band 16, 2011. pp. 247–268. Andréka, H., Madarász, J. X., Németi, I., Németi, P. and Székely, G. [218] Decidability, undecidability, and Gödel incompleteness in relativity theory. In: Proceedings of the Satellite Workshops of UC2011, (Stannett, M., Makowiec, D.,

516

Appendix B: Joint Annotated Bibliography of Hajnal Andréka and István Németi

Lawniczak, A. T., Di Stefano, B. N. eds) TUCS Lecture Notes 14, Turku Centre for Computer Science, Turku, Finland, 2011. pp. 61–78. Andréka, H., Madarász, J. X. and Németi, I. [219] Closed timelike curves in relativistic computation. In: Proceedings of the Satellite Workshops of UC2011, (Stannett, M., Makowiec, D., Lawniczak, A. T., Di Stefano, B. N. eds) TUCS Lecture Notes 14, Turku Centre for Computer Science, Turku, Finland, 2011. pp. 155–171. Andréka, H., Németi, I. and Székely, G. [220] Functionally dense relation algebras. Algebra Universalis 68, 1–2 (2012), pp. 151–191. Andréka, H. and Givant, S. [221] A logic road from special relativity to general relativity. Synthese 186,3 (2012), pp. 633–649. Andréka, H., Madarász, J. X., Németi, I. and Székely, G. [222] The development of symbolic logic in Hungary. In: Logic in Central and Eastern Europe: History, Science and Discourse. A. Schumann ed, University Press of America, 2012. pp. 201–216. Máté, A., Andréka, H. and Németi, I. [223] Reducing first-order logic to Df3 , free algebras. In: Cylindric-like algebras and algebraic logic. Andréka, H., Ferenczi, M., Németi, I. eds, Springer Verlag, Bolyai Society Mathematical Studies 22, 2012. pp. 15–35. Andréka, H. and Németi, I. [224] Residuated Kleene Algebras. In: Logic and program semantics. Essays dedicated to Dexter Kozen on the occasion of his 60th birthday, R. I. Constable and A. Silva eds, Lecture Notes in Computer Science Vol. 7230, Springer-Verlag, Berlin, 2012. pp. 35–61. Andréka, H., Mikulás, Sz. and Németi, I. [225] A note on ‘Einstein’s special relativity beyond the speed of light by James M. Hill and Barry J. Cox’. Proc. R. Soc. A. 469 (2013), 2154. Andréka, H., Madarász, J. X., Németi, I. and Székely, G. [226] A non representable infinite dimensional quasi-polyadic equality algebra with a representable cylindric reduct. Studia Sci. Math. Hungar. 50,1 (2013), pp. 1–16. Andréka, H., Németi, I. and Sayed Ahmed, T. [227] Finitely axiomatized cylindric Gödel-Bernays set theory. (In Hungarian) In: Essays dedicated to András Máté on the occasion of his 60th birthday. Zvolenszky, Zs., Molnár, A., Mekis, P., Markovich, R., Jellinek, S., Gömöri, M., Bitai, T. eds, L’Harmattan, Budapest, 2013. pp. 184–192. Andréka, H. and Németi, I. [228] Faster than light motion does not imply time travel. Classical and Quantum Gravity 21 (2014), 095005 (11pp). Andréka, H., Madarász, J. X., Németi, I., Stannett, M. and Székely, G. [229] Using Isabelle/HOL to verify first order relativity theory. Journal of Automated Reasoning 52,4 (2014), pp. 361–378. Németi, I. and Stannett, M. [230] Comparing theories: the dynamics of changing vocabulary. A case-study in relativity theory. In: Johan van Benthem on Logical and Informational Dynamics. A. Baltag, S. Smets eds, Springer Series Outstanding contrbutions to logic Vol 5 Springer Verlag, 2014. pp. 143–172. Andréka, H. and Németi, I. [231] Changing a semantics: opportunism or courage?. In: The life and work of Leon Henkin. Essays on his contributions. M. Manzano, I. Sain and E. Alonso eds, Studies in Universal Logic, Springer Verlag, 2014, pp. 307–337. Andréka, H., van Benthem, J. F. A. K., Bezhanishvili, N. and Németi, I.

Appendix B: Joint Annotated Bibliography of Hajnal Andréka and István Németi

517

2015 – 2019 [232] Finite-variable logics do not have weak Beth definability property. In: The road to universal logic Vol II. A. Koslow and A. Buchsbaum eds, Studies in Universal Logic, Birkhouser Basel, 2015, pp. 125–133. Andréka, H. and Németi, I. [233] Ultraproducts of continuous posets. Algebra Universalis 76,2 (2016), pp. 231–235. Andréka, H., Gyenis, Z. and Németi, I. [234] Adventures in the network of theories. Invited presentation at Conference for Young Analytic Philosophy, Salzburg September 7, 2016. Andréka, H. and Németi, I. https://old.renyi.hu/pub/algebraic-logic/TextSalzburgAN16.htm [235] On a new semantics for first-order predicate logic. Journal of Philosophical Logic 46,3 (2017), pp. 259–267. Andréka, H., van Benthem, J. and Németi, I. [236] On Tarski’s axiomatic foundations of the calculus of relations. Journal Symbolic Logic 82,3 (2017), pp. 966–994. Andréka, H., Givant, S. R., Jipsen, P. and Németi, I. [237] How many varieties of cylindric algebras are there. Transactions of the AMS 369,12 (2017), pp. 8903–8937. Andréka, H. and Németi, I. [238] Coset relation algebras. Algebra Universalis 79:28 (2018). 53 p. Andréka, H. and Givant, R. [239] A representation theorem for measurable relation algebras. Annals of Pure and Applied Logic 169,11 (2018), pp. 1117–1189. Givant, R. and Andréka, H. [240] Term algebras of elementarily equivalent atom structures. Algebra Universalis 79:61 (2018). 9 p. Andréka, H. and Németi, I. [241] The variety of coset relation algebras. The Journal of Symbolic Logic 83,4 (2018), pp. 1595–1609. Givant, R. and Andréka, H. [242] Relativistic Computation. In: M. E. Cuffaro, S. C. Fletcher eds., Physical Perspectives on Coputation, Computational Perspectives on Physics. Cambridge University Press, Cambridge, 2018. pp. 195–215. Andréka, H., Madarász, J. X., Németi, I., Németi, P. and Székely, G. [243] A representation theorem for measurable relation algebras with cyclic groups. Transactions of the American Mathematical Society 371,10 (2019), pp. 7175– 7198. Andréka, H. and Givant, S. [244] Atoms in infinite dimensional free sequence-set algebras. Algebra Universalis 80, 41 (2019). Khaled, M. and Németi, I. [245] Varieties generated by completions. Algebra Universalis 80, 30 (2019). Andréka, H. and Németi, I. [246] Nonrepresentable relation algebras from groups. The Review of Symbolic Logic 1–21, (2019). Andréka, H., Givant, S. and Németi, I. [247] Conceptual structure of spacetimes and category of concept algebras. Invited presentation at the workshop Foundations of Categorical Philosophy of Science, Munich April 26–27, 2019. Andréka, H., Madarász, J. X., Németi, I. and Székely, G. https://old.renyi.hu/pub/algebraic-logic/textofslides-Munich19.pdf