314 7 4MB
English Pages XV, 201 [215] Year 2020
LNCS 12094
Daniela Petris¸an Jurriaan Rot (Eds.)
Coalgebraic Methods in Computer Science 15th IFIP WG 1.3 International Workshop, CMCS 2020 Colocated with ETAPS 2020 Dublin, Ireland, April 25–26, 2020, Proceedings
Lecture Notes in Computer Science Founding Editors Gerhard Goos Karlsruhe Institute of Technology, Karlsruhe, Germany Juris Hartmanis Cornell University, Ithaca, NY, USA
Editorial Board Members Elisa Bertino Purdue University, West Lafayette, IN, USA Wen Gao Peking University, Beijing, China Bernhard Steffen TU Dortmund University, Dortmund, Germany Gerhard Woeginger RWTH Aachen, Aachen, Germany Moti Yung Columbia University, New York, NY, USA
12094
More information about this series at http://www.springer.com/series/7407
Daniela Petrişan Jurriaan Rot (Eds.) •
Coalgebraic Methods in Computer Science 15th IFIP WG 1.3 International Workshop, CMCS 2020 Colocated with ETAPS 2020 Dublin, Ireland, April 25–26, 2020 Proceedings
123
Editors Daniela Petrişan Université de Paris, CNRS, IRIF Paris, France
Jurriaan Rot Radboud University Nijmegen, The Netherlands
ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-030-57200-6 ISBN 978-3-030-57201-3 (eBook) https://doi.org/10.1007/978-3-030-57201-3 LNCS Sublibrary: SL1 – Theoretical Computer Science and General Issues © IFIP International Federation for Information Processing 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
The 15th International Workshop on Coalgebraic Methods in Computer Science (CMCS 2020) was originally planned to be held on April 25–26, 2020, in Dublin, Ireland, as a satellite event of the European Joint Conferences on Theory and Practice of Software (ETAPS 2020). Due to the 2020 COVID-19 pandemic, it was canceled, and held in alternative form as a series of online seminars, ranging over several weeks in the fall of 2020. The aim of the workshop is to bring together researchers with a common interest in the theory of coalgebras, their logics, and their applications. Coalgebras allow for a uniform treatment of a large variety of state-based dynamical systems, such as transition systems, automata (including weighted and probabilistic variants), Markov chains, and game-based systems. Over the last two decades, coalgebra has developed into a field of its own interest, presenting a deep mathematical foundation, a growing field of applications, and interactions with various other fields such as reactive and interactive system theory, object-oriented and concurrent programming, formal system specification, modal and description logics, artificial intelligence, dynamical systems, control systems, category theory, algebra, analysis, etc. Previous workshops of the CMCS series were held in Lisbon (1998), Amsterdam (1999), Berlin (2000), Genova (2001), Grenoble (2002), Warsaw (2003), Barcelona (2004), Vienna (2006), Budapest (2008), Paphos (2010), Tallinn (2012), Grenoble (2014), Eindhoven (2016), and Thessaloniki (2018). Since 2004, CMCS has been a biennial workshop, alternating with the International Conference on Algebra and Coalgebra in Computer Science (CALCO). The CMCS 2020 program featured a keynote talk by Yde Venema (ILLC, University of Amsterdam, The Netherlands), an invited talk by Nathanaël Fijalkow (CNRS, LaBRI, University of Bordeaux, France), and an invited talk by Koko Muroya (RIMS, Kyoto University, Japan). In addition, a special session on probabilistic couplings was planned, featuring invited tutorials by Marco Gaboardi (Boston University, USA) and Justin Hsu (University of Wisconsin-Madison, USA). This volume contains revised regular contributions (9 papers were accepted out of 13 submissions; 1 submission was conditionally accepted but unfortunately had to be rejected ultimately as the author did not agree to implement the changes required by the Program Committee) and the abstracts of 3 keynote/invited talks. All regular contributions were refereed by three reviewers. We wish to thank all the authors who submitted to CMCS 2020, and the external reviewers and the Program Committee members for their thorough reviewing and help in improving the papers accepted at CMCS 2020. July 2020
Daniela Petrişan Jurriaan Rot
Organization
Program Committee Chairs Daniela Petrişan Jurriaan Rot
Université de Paris, CNRS, IRIF, France Radboud University, The Netherlands
Steering Committee Filippo Bonchi Marcello Bonsangue Corina Cîrstea Ichiro Hasuo Bart Jacobs Bartek Klin Alexander Kurz Marina Lenisa Stefan Milius (Chair) Larry Moss Dirk Pattinson Lutz Schröder Alexandra Silva
University of Pisa, Italy CWI/Leiden University, The Netherlands University of Southampton, UK National Institute of Informatics, Japan Radboud University, The Netherlands University of Warsaw, Poland Chapman University, USA University of Udine, Italy FAU Erlangen-Nürnberg, Germany Indiana University, USA The Australian National University, Australia FAU Erlangen-Nürnberg, Germany University College London, UK
Program Committee Henning Basold Nick Bezhanishvili Corina Cîrstea Mai Gehrke Helle Hvid Hansen Shin-Ya Katsumata Bartek Klin Ekaterina Komendantskaya Barbara König Dexter Kozen Clemens Kupke Alexander Kurz Daniela Petrişan Andrei Popescu Damien Pous Jurriaan Rot Davide Sangiorgi
Leiden University, The Netherlands University of Amsterdam, The Netherlands University of Southampton, UK CNRS and Université Côte d’Azur, France Delft University of Technology, The Netherlands National Institute of Informatics, Japan Warsaw University, Poland Heriot-Watt University, UK Universität Duisburg-Essen, Germany Cornell University, USA University of Strathclyde, UK Chapman University, USA Université de Paris, France Middlesex University London, UK CNRS, ENS Lyon, France Radboud University, The Netherlands University of Bologna, Italy
viii
Organization
Ana Sokolova David Sprunger Henning Urbat Fabio Zanasi
University of Salzburg, Austria National Institute of Informatics, Japan FAU Erlangen-Nürnberg, Germany University College London, UK
Additional Reviewers Alexandre Goy Gerco van Heerdt Stefan Milius Laureline Pinault Luigi Santocanale
Sponsoring Institutions IFIP WG 1.3 Logical Methods in Computer Science e.V.
Abstracts of Invited Talks
Logic and Automata: A Coalgebraic Perspective
Yde Venema Institute for Logic, Language and Computation, Universiteit van Amsterdam [email protected] https://staff.fnwi.uva.nl/y.venema/ Abstract. In Theoretical Computer Science, the area of Logic and Automata combines a rich mathematical theory with many applications in for instance the specification and verification of software. Of particular relevance in this area is the design of formal languages and derivation systems for describing and reasoning about nonterminating or ongoing behavior, and the study of such formalisms through the theory of automata operating on potentially infinite objects. We will take a foundational approach towards these issues using insights from Universal Coalgebra and Coalgebraic Logic. Specifically, we will review the basic theory of coalgebraic modal fixpoint logic and its link with coalgebra automata – finite automata that operate on coalgebras.We will show that much of the theory linking logic and automata in the setting of streams, trees or transition systems, transfers to the coalgebraic level of generality, and we will argue that this theory is essentially coalgebraic in nature. Topics that will be covered include closure properties of classes of coalgebra automata, expressive completeness results concerning fixpoint logics, and the design of sound and complete derivation systems for modal fixpoint logics. The key instruments in our analysis will be those of a coalgebraic modal signature and its associated one-step logic. As we will see, many results in the theory of logic and automata are ultimately grounded in fairly simple observations on the underlying one-step logic. Keywords: Coalgebra • Modal logic • Fixed points • Automata
The Theory of Universal Graphs for Games: Past and Future Nathanaël Fijalkow1,2 1
CNRS, LaBRI, Université de Bordeaux, Bordeaux, France nathanael.fi[email protected] 2 The Alan Turing Institute of Data Science, London, UK
Abstract. This paper surveys recent works about the notion of universal graphs. They were introduced in the context of parity games for understanding the recent quasipolynomial time algorithms, but they are defined for arbitrary objectives yielding a new approach for constructing efficient algorithms for solving different classes of games. Keywords: Games • Parity games • Universal graphs
Hypernet Semantics and Robust Observational Equivalence
Koko Muroya RIMS, Kyoto University, Kyoto, Japan [email protected] Abstract. In semantics of programming languages, observational equivalence [6] is the fundamental and classical question, asking whether two program fragments have the same behaviour in any program context. Establishing observational equivalence is vital in validating compiler optimisation and refactoring, and there have been proposed proof methodologies for observational equivalence, such as logical relations [9, 10] and applicative bisimulations [1]. A challenge is, however, to establish robustness of observational equivalence, namely to prove that observational equivalence is preserved by a language extension. In this talk I will discuss how hypernet semantics can be used to reason about such robustness. The semantics was originally developed for cost analysis of program execution [8], in particular for analysis of a trade-off between space and time efficiency. It has been used to model exotic programming features [2, 7] inspired by Google’s TensorFlow1, a machine learning library, and also to design a dataflow programming language [3]. In hypernet semantics, program execution is modelled as dynamic rewriting of a hypernet, that is, graph representation of a program. Inspired by Girard’s Geometry of Interaction [4], the rewriting process is notably guided and controlled by a dedicated object (token) of the graph. The use of graphs together with the token enables a direct proof of observational equivalence. The key step of the proof is to establish robustness of observational equivalence relative to language features, using step-wise reasoning enriched with the so-called up-to technique [5].
References 1. Abramsky, S.: The Lazy Lambda-Calculus, pp. 65–117. Addison Wesley (1990) 2. Cheung, S., Darvariu, V., Ghica, D.R., Muroya, K., Rowe, R.N.S.: A functional perspective on machine learning via programmable induction and abduction. In: Gallagher, J., Sulzmann, M. (eds.) FLOPS 2018. LNCS, vol. 10818. Springer, Cham (2018). https://doi. org/10.1007/978-3-319-90686-7_6 3. Cheung, S.W.T., Ghica, D.R., Muroya, K.: Transparent synchronous dataflow. arXiv preprint arXiv:1910.09579 (2019) 4. Girard, J.Y.: Geometry of interaction I: interpretation of system F. In: Logic Colloquium 1988. Studies in Logic and the Foundations of Mathematics, vol. 127, pp. 221–260. Elsevier (1989). https://doi.org/10.1016/S0049-237X(08)70271-4
1
https://www.tensorow.org/.
xiv
K. Muroya
5. Milner, R.: Communicating and Mobile Systems: the Pi-calculus. Cambridge University Press (1999) 6. Morris, Jr., J.H.: Lambda-calculus models of programming languages. Ph.D. thesis, Massachusetts Institute of Technology (1969) 7. Muroya, K., Cheung, S.W.T., Ghica, D.R.: The geometry of computation-graph abstraction. In: LICS 2018, pp. 749–758. ACM (2018). https://doi.org/10.1145/3209108.3209127 8. Muroya, K., Ghica, D.R.: The dynamic geometry of interaction machine: a token-guided graph rewriter. LMCS 15(4) (2019). https://lmcs.episciences.org/5882 9. Plotkin, G.D.: Lambda-Definability and Logical Relations. Memorandum SAI-RM-4 (1973) 10. Statman, R.: Logical relations and the typed lambda-calculus. Inf. Control 65(2/3), 85–97 (1985). https://doi.org/10.1016/S0019-9958(85)80001-2
Contents
The Theory of Universal Graphs for Games: Past and Future . . . . . . . . . . . . Nathanaël Fijalkow
1
Approximate Coalgebra Homomorphisms and Approximate Solutions . . . . . . Jiří Adámek
11
Duality for Instantial Neighbourhood Logic via Coalgebra . . . . . . . . . . . . . . Nick Bezhanishvili, Sebastian Enqvist, and Jim de Groot
32
Free-Algebra Functors from a Coalgebraic Perspective. . . . . . . . . . . . . . . . . H. Peter Gumm
55
Learning Automata with Side-Effects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gerco van Heerdt, Matteo Sammartino, and Alexandra Silva
68
De Finetti’s Construction as a Categorical Limit . . . . . . . . . . . . . . . . . . . . . Bart Jacobs and Sam Staton
90
Injective Objects and Fibered Codensity Liftings. . . . . . . . . . . . . . . . . . . . . Yuichi Komorida
112
Explaining Non-bisimilarity in a Coalgebraic Approach: Games and Distinguishing Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Barbara König, Christina Mika-Michalski, and Lutz Schröder A Categorical Approach to Secure Compilation . . . . . . . . . . . . . . . . . . . . . Stelios Tsampas, Andreas Nuyts, Dominique Devriese, and Frank Piessens
133 155
Semantics for First-Order Affine Inductive Data Types via Slice Categories. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vladimir Zamdzhiev
180
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
201
The Theory of Universal Graphs for Games: Past and Future Nathana¨el Fijalkow1,2(B) 1
CNRS, LaBRI, Universit´e de Bordeaux, Bordeaux, France [email protected] 2 The Alan Turing Institute of Data Science, London, UK
Abstract. This paper surveys recent works about the notion of universal graphs. They were introduced in the context of parity games for understanding the recent quasipolynomial time algorithms, but they are defined for arbitrary objectives yielding a new approach for constructing efficient algorithms for solving different classes of games. Keywords: Games
1
· Parity games · Universal graphs
The Quasipolynomial Era for Parity Games
Fig. 1. A parity game. The vertices controlled by Eve are represented by circles and the ones controlled by Adam as squares. The edges are labelled by priorities inducing the parity objective. The dotted region is the winning region for Eve.
Parity games are a central model in the study of logic and automata over infinite trees and for their tight relationship with model-checking games for the modal c IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 D. Petri¸san and J. Rot (Eds.): CMCS 2020, LNCS 12094, pp. 1–10, 2020. https://doi.org/10.1007/978-3-030-57201-3_1
2
N. Fijalkow
μ-calculus. The central question left open for decades is whether there exists a polynomial time algorithm for solving parity games. This section is a historical account on the progress made on this question since 2017 (Fig. 1). The breakthrough result of Calude, Jain, Khoussainov, Li, and Stephan [2] was to construct a quasipolynomial time algorithm for solving parity games. Following decades of exponential and subexponential algorithms, this very surprising result triggered further research: soon after further quasipolynomial time algorithms were constructed reporting almost the same complexity, which is roughly nO(log d) . Let us classify them in two families. The first family of algorithms includes the original algorithm by Calude, Jain, Khoussainov, Li, and Stephan [2] (see also [6] for a presentation of the algorithm as value iteration), then the succinct progress measure algorithm by Jurdzi´ nski and Lazi´c [9] (see also [7] for a presentation of the algorithm using universal trees explicitly) and the register games algorithm by Lehtinen [11] (see also [14] for a presentation of the algorithm using good-for-small-games automata explicitly). Boja´ nczyk and Czerwi´ nski [1] introduced the separation question, describing a family of algorithms for solving parity games based on separating automata, and showed that the first quasipolynomial time algorithm yields a quasipolynomial solution to the separation question. Later Czerwi´ nski, Daviaud, Fijalkow, Jurdzi´ nski, Lazi´c, and Parys [5] showed that the other two algorithms also yield quasipolynomial solutions to the separation question. The main contribution of [5] is to show that any separating automaton contains a universal tree in its set of states; in other words, the three algorithms in this first family induce three (different) constructions of universal trees. The second family of algorithms is so-called Zielonka-type algorithms, inspired by the exponential time algorithm by Zielonka [15]. The first quasipolynomial time algorithm is due to Parys [13], its complexity was improved by Lehtinen, Schewe, and Wojtczak [12]. Recently Jurdzi´ nski and Morvan [10] constructed a universal attractor decomposition algorithm encompassing all three algorithms: each algorithm is parametrised by the choice of two universal trees (one of each player). All quasipolynomial time algorithms for parity games fall in one of two families, and both are based on the combinatorial notion of universal trees. This notion is by now well understood, with almost matching upper and lower bounds on the size of universal trees [5,7]. The lower bound implies a complexity barrier applying to both families of algorithms, hence to all known quasipolynomial time algorithms.
2
Universal Graphs
Colcombet and Fijalkow [3,4] revisited the relationship between separating automata and universal trees by introducing the notions of good-for-small-games automata and of universal graphs. The notion of good-for-small-games automata extends separating automata and resolves a disturbing situation that we describe now. The first
The Theory of Universal Graphs for Games: Past and Future
3
three quasipolynomial time algorithms discussed above construct separating automata, but for the third algorithm the automaton is non-deterministic. For that reason its correctness does not follow from the generic argument applying to the class of algorithms based on deterministic separating automata. (We note however that the lower bound result of [5] does apply to non-deterministic separating automata.) The notion of good-for-small-games automata resolves this conundrum, since the automaton constructed in Lehtinen’s algorithm is indeed good-for-small-games and the generic argument for deterministic separating automata naturally extends to this class of automata1 . Universal trees arise in the study of the parity objective, the tree structure representing the nested behaviour of priorities. The notion of universal graphs extends universal trees from parity objectives to arbitrary (positionally determined) objectives. The main result of [4] is an equivalence result between good-for-small-games automata and universal graphs. More specifically, a goodfor-small-games automaton induces a universal graph of the same size, and conversely. This equivalence extends the results of [5] relating separating automata and universal trees to any positionally determined objectives, so in particular for parity and mean payoff games. The remainder of this section gives the formal definitions of universal graphs, good-for-small-games automata, and separating automata, their use for solving games, and state the equivalence result. Universal Graphs We deliberately postpone the definitions related to games: indeed the notion of universal graphs is a purely combinatorial notion on graphs and ignores the interactive aspects of games. Graphs. We consider edge labelled directed graphs: a graph is given by a (finite) set V of vertices and a (finite) set E ⊆ V × C × V of edges. A vertex v for which there exists no outgoing edge (v, , v ) ∈ E is called a sink. We let n denote the number of vertices and m the number of edges. The size of a graph is its number of vertices. Homomorphisms. For two graphs G, G , a homomorphism φ : G → G maps the vertices of G to the vertices of G such that (v, , v ) ∈ E =⇒ (φ(v), , φ(v )) ∈ E . As a simple example that will be useful later on, note that if G is a super graph of G, meaning they have the same set of vertices and every edge in G is also in G , then the identity is a homomorphism G → G . Paths. A path π is a (finite or infinite) sequence of consecutive edges, where consecutive means that the third component of a triple in the sequence matches 1
A closely related solution was given in [14] by giving syntactic properties implying that a non-deterministic automaton is good-for-small-games and showing that the automaton constructed in Lehtinen’s algorithm has this property.
4
N. Fijalkow
the first component of the next triple. In the case of a finite path we write last(π) for the last vertex in π. We write π = (v0 , 0 , v1 )(v1 , 1 , v2 ) · · · and let π≤i denote the prefix of π of length i, meaning π≤i = (v0 , 0 , v1 ) · · · (vi−1 , i−1 , vi ). Objectives. An objective is a set Ω ⊆ C ω of infinite sequences of colours. A sequence of colours belonging to Ω is said to satisfy Ω. We say that a path in a graph satisfies Ω, or that it is winning when Ω is clear from context, if the sequence of labels it visits belongs to Ω. A path is said to be maximal if it is either infinite or ends in a sink. We will work with a generic objective Ω. However it is convenient to assume that Ω is prefix independent. Formally, this means that Ω = C ∗ · Ω, or equivalently for all w ∈ C ∗ and ρ ∈ C ω , we have ρ ∈ Ω ⇐⇒ w · ρ ∈ Ω. This assumption can be lifted at the price of working with graphs with a distinguished initial vertex v0 which must be preserved by homomorphisms. Definition 1 (Graphs satisfying an objective). Let Ω be a prefix independent objective. A graph satisfies Ω if all maximal paths are infinite and winning. Note that if a graph contains a sink, it does not satisfy Ω because it contains some finite maximal path. Definition 2 (Universal graphs). Let Ω be a prefix independent objective. A graph U is (n, Ω)-universal if it satisfies the following two properties: 1. it satisfies Ω, 2. for any graph G of size at most n satisfying Ω, there exists a homomorphism φ : G → U. It is not clear that for any objective Ω and n ∈ N there exists an (n, Ω)-universal graph. Indeed, the definition creates a tension between “satisfying Ω”, which restricts the structure, and “homomorphically mapping any graph of size at most n satisfying Ω”, implying that the graph witnesses many different behaviours. A simple construction of an (n, Ω)-universal graph is to take the disjoint union of all (n, Ω)-graphs satisfying Ω. Since up to renaming of vertices there are finitely many such graphs this yields a very large but finite (n, Ω)-universal graph. When considering a universal graph U we typically let VU and EU denote the sets of vertices and edges of U, respectively. Solving Games Using Universal Graphs by Reduction to Safety Games We now introduce infinite duration games on graphs and construct a family of algorithms for solving games using universal graphs by reduction to safety games [1,4]. Arenas. An arena is given by a graph containing no sink together with a partition VEve VAdam of its set V of vertices describing which player controls each vertex. We represent vertices controlled by Eve with circles and those controlled by Adam with squares.
The Theory of Universal Graphs for Games: Past and Future
5
Games. A game is given by an arena and an objective. We often let G denote a game, its size is the size of the underlying graph. It is played as follows. A token is initially placed on some initial vertex v0 , and the player who controls this vertex pushes the token along an edge, reaching a new vertex; the player who controls this new vertex takes over, and this interaction goes on forever describing an infinite path. Strategies. We write Pathv0 for the set of finite paths of G starting from v0 and ending in VEve . A strategy for Eve is a map σ : Pathv0 → E such that for all π ∈ Pathv0 , σ(π) is an edge in E from last(π). Note that we always take the point of view of Eve, so a strategy implicitly means a strategy of Eve, and winning means winning for Eve. We say that an infinite path π = (v0 , 0 , v1 )(v1 , 1 , v2 ) · · · is consistent with the strategy σ if for all i, if vi ∈ VEve , then σ(π≤i ) = (vi , i , vi+1 ). A strategy σ is winning from v0 if all infinite paths starting from v0 and consistent with σ are winning. Solving a game is the following decision problem: INPUT: a game G and an initial vertex v0 . OUTPUT: “yes” if Eve has a winning strategy from v0 , “no” otherwise. We say that v0 is a winning vertex of G when the answer to the above problem is positive. The winning region of Eve of G is the set of winning vertices. Positional Strategies. Positional strategies make decisions only considering the : current vertex: σ : VEve → E. A positional strategy induces a strategy σ Pathv0 → E by σ (π) = σ(last(π)), where by convention the last vertex of the empty path is the initial vertex v0 . Definition 3 (Positionally determined objectives). We say that an objective Ω is positionally determined if for every game with objective Ω and initial vertex v0 , whenever there exists a winning strategy from v0 then there exists a positional winning strategy from v0 . Given a game G, an initial vertex v0 , and a positional strategy σ we let G[σ, v0 ] denote the graph obtained by restricting G to vertices reached by σ from v0 and to the moves prescribed by σ. Formally, the set of vertices and edges is V [σ, v0 ] = {v ∈ V : there exists a path from v0 to v consistent with σ} , E[σ, v0 ] = {(v, , v ) ∈ E : v ∈ VAdam or (v ∈ VEve and σ(v) = (v, , v ))} ∩ V [σ, v0 ] × C × V [σ, v0 ]. Fact 1. Let Ω be a prefix independent objective, G be a game, v0 be an initial vertex, and σ a positional strategy. Then the strategy σ is winning from v0 if and only if the graph G[σ, v0 ] satisfies Ω. Safety Games. The safety objective Safe on two colours C = {ε, Lose} is given by Safe = {εω }. In words, an infinite path is winning if it avoids the letter Lose. The following lemma is folklore. Lemma 1. Given a safety game with n vertices and m edges, there exists an algorithm running in time and space O(n + m) which computes the winning region of Eve.
6
N. Fijalkow
Consider a game G with objective Ω of size n and an (n, Ω)-universal graph U, we construct a safety game G U as follows. The arena for the game G U is given by the following set of vertices and edges VEve = VEve × VU V × VU × C, VAdam = VAdam × VU , = {((v, s), ε, (v , s, )) : (v, , v ) ∈ E} E ∪ {((v , s, ), ε, (v , s )) : (s, , s ) ∈ EU } ∪ {((v, s, ), Lose, (v, s, )) : v ∈ VG , s ∈ VU , ∈ C} . In Words: the game G U simulates G. From (v, s), the following two steps occur. First, the player who controls v picks an edge (v, , v ) ∈ E as he would in G which leads to (v , s, ), and second, Eve chooses which edge (s, , s ) to follow in the universal graph U which leads to (v , s ). If Eve is unable to play in U (because there are no outgoing edges of the form (s, , s )), she is forced to choose ((v, s, ), Lose, (v, s, )) and lose the safety game. Theorem 1. Let Ω be a prefix independent positionally determined objective. Let G be a game of size n with objective Ω and U an (n, Ω)-universal graph. Let v0 be an initial vertex. Then Eve has a winning strategy in G from v0 if and only if there exists a vertex s0 in U such that she has a winning strategy in G U from (v0 , s0 ). This theorem can be used to reduce games with objective Ω to safety games, yielding an algorithm whose complexity is proportional to the number of edges of U. Indeed, recall that the complexity of solving a safety game is proportional to the number of vertices plus edges. The number of vertices of G U is O(n·nU |C|), and the number of edges is O(n · mU ) so the overall complexity is O(n · (nU |C| + mU )). Proof. Let us assume that Eve has a winning strategy σ in G from v0 , which can be chosen to be positional, Ω being positionally determined. We consider the graph G[σ, v0 ]: it has size at most n and since σ is winning from v0 it satisfies Ω. Because U is an (n, Ω)-universal graph there exists a homomorphism φ from G[σ, v0 ] to U. We construct a winning strategy in G U from (v0 , φ(v0 )) by playing as in σ in the first component and following the homomorphism φ on the second component. Conversely, let σ be a winning strategy from (v0 , s0 ) in G U. Consider the strategy σ in G which mimics σ by keeping an additional component in U which is initially set to s0 and updated following σ. The strategy σ is winning since all infinite paths in U satisfy Ω. Good-for-Small-Games Automata The automata we consider are non-deterministic safety automata over infinite words on the alphabet C, where safety means that all states are accepting: a word is rejected if there exist no run for it. Formally, an automaton is given by a
The Theory of Universal Graphs for Games: Past and Future
7
set of states Q, an initial state q0 ∈ Q, and a transition relation Δ ⊆ Q × C × Q. A run is a finite or infinite sequence of consecutive transitions starting with the initial state q0 , where consecutive means that the third component of a triple in the sequence matches the first component of the next triple. We say that it is a run over a word if the projection of the run on the colours matches the word. An infinite word is accepted if it admits an infinite run. We let L(A) denote the set of infinite words accepted by an automaton A. We introduce the notion of a guiding strategy for resolving non-determinism. A guiding strategy is a function σ : C + → Δ: given a finite word ρ ∈ C + it picks a transition. More specifically for an infinite word ρ = c0 c1 · · · ∈ C ω the guiding strategy chooses the following infinite sequence of transitions: σ(c0 )σ(c0 c1 )σ(c0 c1 c2 ) . . . . It may fail to be a run over ρ even if there exists one: the choice of transitions may require knowing the whole word ρ in advance, and the guiding strategy has to pick the transition only based on the current prefix. We say that the automaton guided by σ accepts a word ρ if the guiding strategy indeed induces an infinite run. In the following we say that a path in a graph is accepted or rejected by an automaton; this is an abuse of language since what the automaton reads is only the sequence of colours of the corresponding path. Definition 4. An automaton is (n, Ω)-good-for-small-games if the two following properties hold: 1. there exists a guiding strategy σ such that for all graphs G of size n satisfying Ω, the automaton guided by σ accepts all paths in the graph G, 2. the automaton rejects all paths not satisfying Ω. In a very similar way as universal graphs, good-for-small-games automata can be used to construct reductions to safety games. Consider a game G with objective Ω of size n and an (n, Ω)-good-for-smallgames A, we construct a safety game G A as follows. We let Q denote the set of states of A and Δ the transition relation. The arena for the game G A is given by the following set of vertices and edges VEve = VEve × Q VEve × Q × C, VAdam = VAdam × Q, = {((v, q), ε, (v , q, ) : (v, , v ) ∈ E} E ∪ {((v , q, ), ε, (v , q )) : (q, , q ) ∈ Δ} ∪ {((v, q, ), Lose, (v, q, )) : v ∈ V, q ∈ Q, ∈ C} . In words: the game G A simulates G. From (v, q), the following two steps occur. First, the player who controls v picks an edge (v, , v ) ∈ E as he would in G which leads to (v , q, ), and second, Eve chooses which transition (q, , q ) to follow in the automaton A which leads to (v , q ). If Eve is unable to choose a transition in A (because there are no transitions of the form (q, , q )), she is forced to choose ((v, q, ), Lose, (v, q, )) and lose the safety game.
8
N. Fijalkow
Theorem 2. Let Ω be a prefix independent positionally determined objective. Let G be a game of size n with objective Ω and A an (n, Ω)-good-for-smallgames automaton. Let v0 be an initial vertex. Then Eve has a winning strategy in G from v0 if and only if she has a winning strategy in G A from (v0 , q0 ). As for universal graphs, this theorem can be used to reduce games with objective Ω to safety games, and the complexity is dominated by the number of transitions of A. Separating Automata Definition 5. An (n, Ω)-separating automaton is a deterministic (n, Ω)-goodfor-small-games automaton. In the deterministic case the first property reads: for all graphs G of size n satisfying Ω, the automaton accepts all paths in the graph G. The advantage of using deterministic automata is that the resulting safety game is smaller, and indeed the complexity of the algorithm above is proportional to the number of states of A, more specifically it is O(n · |Q|) where Q is the set of states of A. The Equivalence Result Theorem 3 ([4]). Let Ω be a prefix independent positionally determined objective and n a natural number. – For any (n, Ω)-separating automaton one can construct an (n, Ω)-good-forsmall-games automaton of the same size. – For any (n, Ω)-good-for-small-games automaton one can construct an (n, Ω)universal graph of the same size. – For any (n, Ω)-universal graph one can construct an (n, Ω)-separating automaton of the same size. The first item is trivial because by definition a separating automaton is a goodfor-small-games automaton. The other two constructions are non-trivial, and in particular it is not true that a good-for-small-games automaton directly yields a universal graph as is, and the same holds conversely. Both constructions are based on a saturation method which constructs a linear order either on the set of states or on the set of vertices. As a corollary of this equivalence result we can refine the reduction to safety games and present it in a value iteration style which has the same time complexity but also achieves a very small space complexity. We refer to [8] for more details.
The Theory of Universal Graphs for Games: Past and Future
3
9
Perspectives
The notion of universal graphs is a new tool for constructing algorithms for several classes of games by reducing them to safety games. For each objective, the question becomes whether there exist small universal graphs. A benefit of this approach is that this becomes a purely combinatorial question about graphs which does not involve reasoning about games. Let us briefly discuss some of the results obtained so far, and gives some perspectives for future research. The first example is parity objectives. In that case universal graphs have been combinatorially well understood: they are equivalently described as universal trees and their size is quasipolynomial with almost matching upper and lower bounds [7]. A more systematic study of ω-regular objectives reveals that similar ideas lead to small universal graphs for objectives such as Rabin or disjunctions of parity objectives. The second example is mean payoff objectives, which extend parity objectives. Upper and lower bounds have been obtained recently for this class of objectives, yielding new algorithms with (slightly) improved complexity [8]. In all examples mentioned above, the algorithms obtained using the universal graph technology match the best known complexity for solving these games. Universal Graphs for Families of Graphs One can easily extend the theory of universal graphs to consider games over a given family of graphs. This motivates the question whether there exist small universal graphs for different families of graphs such as graphs of bounded tree or clique width. Combining Objectives: Compositional Constructions of Universal Graphs An appealing question is whether universal graphs can be constructed compositionally. Indeed, it is often useful to consider objectives which are described as boolean combinations of objectives. Can we give generic constructions using universal graphs for basic objectives to build universal graphs for more complicated objectives? As examples, let us mention unions of mean payoff objectives (with both infimum limit or supremum limit semantics) and disjunctions of parity and mean payoff objectives.
References 1. Boja´ nczyk, M., Czerwi´ nski, W.: An automata toolbox, February 2018. https:// www.mimuw.edu.pl/∼bojan/papers/toolbox-reduced-feb6.pdf 2. Calude, C.S., Jain, S., Khoussainov, B., Li, W., Stephan, F.: Deciding parity games in quasipolynomial time. In: STOC, pp. 252–263 (2017). https://doi.org/10.1145/ 3055399.3055409 3. Colcombet, T., Fijalkow, N.: Parity games and universal graphs. CoRR abs/1810.05106 (2018)
10
N. Fijalkow
4. Colcombet, T., Fijalkow, N.: Universal graphs and good for games automata: new tools for infinite duration games. In: FoSSaCS, pp. 1–26 (2019). https://doi.org/ 10.1007/978-3-030-17127-8 1 5. Czerwi´ nski, W., Daviaud, L., Fijalkow, N., Jurdzi´ nski, M., Lazi´c, R., Parys, P.: Universal trees grow inside separating automata: quasi-polynomial lower bounds for parity games. CoRR abs/1807.10546 (2018) 6. Fearnley, J., Jain, S., Schewe, S., Stephan, F., Wojtczak, D.: An ordered approach to solving parity games in quasi polynomial time and quasi linear space. In: SPIN, pp. 112–121 (2017) 7. Fijalkow, N.: An optimal value iteration algorithm for parity games. CoRR abs/1801.09618 (2018) 8. Fijalkow, N., Gawrychowski, P., Ohlmann, P.: The complexity of mean payoff games using universal graphs. CoRR abs/1812.07072 (2018) 9. Jurdzi´ nski, M., Lazi´c, R.: Succinct progress measures for solving parity games. In: LICS, pp. 1–9 (2017) 10. Jurdzi´ nski, M., Morvan, R.: A universal attractor decomposition algorithm for parity games. CoRR abs/2001.04333 (2020) 11. Lehtinen, K.: A modal-µ perspective on solving parity games in quasi-polynomial time. In: LICS, pp. 639–648 (2018) 12. Lehtinen, K., Schewe, S., Wojtczak, D.: Improving the complexity of Parys’ recursive algorithm. CoRR abs/1904.11810 (2019) 13. Parys, P.: Parity games: Zielonka’s algorithm in quasi-polynomial time. In: MFCS, pp. 10:1–10:13 (2019). https://doi.org/10.4230/LIPIcs.MFCS.2019.10 14. Parys, P.: Parity games: another view on Lehtinen’s algorithm. In: CSL, pp. 32:1– 32:15 (2020). https://doi.org/10.4230/LIPIcs.CSL.2020.32 15. Zielonka, W.: Infinite games on finitely coloured graphs with applications to automata on infinite trees. Theor. Comput. Sci. 200(1–2), 135–183 (1998). https:// doi.org/10.1016/S0304-3975(98)00009-7
Approximate Coalgebra Homomorphisms and Approximate Solutions Jiˇr´ı Ad´ amek1,2(B) 1
2
Faculty of Electrical Engineering, Technical University Prague, Prague, Czech Republic Institute of Theoretical Computer Science, Technical University Braunschweig, Braunschweig, Germany [email protected] Abstract. Terminal coalgebras νF of finitary endofunctors F on categories called strongly lfp are proved to carry a canonical ultrametric on their underlying sets. The subspace formed by the initial algebra μF has the property that for every coalgebra A we obtain its unique homomorphism into νF as a limit of a Cauchy sequence of morphisms into μF called approximate homomorphisms. The concept of a strongly lfp category includes categories of sets, posets, vector spaces, boolean algebras, and many others. For the free completely iterative algebra Ψ B on a pointed object B we analogously present a canonical ultrametric on its underlying set. The subspace formed by the free algebra ΦB on B has the property that for every recursive equation in Ψ B we obtain the unique solution as a limit of a Cauchy sequence of morphisms into ΦB called approximate solutions. A completely analogous result holds for the free iterative algebra RB on B.
1
Introduction
A number of important types of state-based systems can be expressed as coalgebras for finitary endofunctors F of Set. That is, endofunctors such that every element of F X lies in the image of F m for some finite subset m : M → X. Examples include deterministic automata, dynamic systems and finitely branching LTS. The terminal coalgebra νF carries a canonical ultrametric, see [2] or [4]. Recall from [12] that νF represents the set of all possible state behaviors, and given a system as a coalgebra A for F , the unique homomorphism h : A → νF assigns to every state its behavior. We are going to prove that whenever F is grounded, i.e., F ∅ = ∅, then the initial algebra μF forms a dense subspace of the ultrametric space νF . And the coalgebra structure of νF is determined as the unique continuous extension of the inverted algebra structure of μF . Moreover, for every coalgebra A we present a canonical Cauchy sequence of morphisms J. Ad´ amek—Supported by the Grant Agency of the Czech Republic under the grant 19-00902S. c IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 D. Petri¸san and J. Rot (Eds.): CMCS 2020, LNCS 12094, pp. 11–31, 2020. https://doi.org/10.1007/978-3-030-57201-3_2
12
J. Ad´ amek
hk : A → μF (k < ω) converging to h in the power space (νF )A . We thus see that given state-based systems represented by a finitary set functor, (infinite) behaviour of states has canonical finite approximations, see Proposition 18. All this generalizes from Set to every strongly locally finitely presentable category. These categories, introduced in [3] and recalled in Sect. 2 below, include vector spaces, posets and all locally finite varieties (e.g., boolean algebras or SMod, semimodules over finite semirings S). Every such category has a natural forgetful functor U into Set. And the underlying set U (νF ) of the terminal coalgebra νF carries a canonical ultrametric for every finitary endofunctor preserving monomorphisms. For example, consider the endofunctor of S-Mod representing weighted automata, F X = S × X Σ (where Σ is a finite set of inputs). It has ∗ the terminal coalgebra νF = SΣ , the semimodule of weighted languages. The ultrametric assigns to distinct languages L1 , L2 the distance 2−k for the least k such that for some word w ∈ Σ ∗ of length k the two weights are different, see Example 13(4). If F is grounded, we again define, for every coalgebra A, a canonical sequence of morphisms hk : A → μF converging to the unique coalgebra homomorphism h : A → νF in the power of the above ultrametric space. A concrete example: for every object Σ consider the endofunctor F X = X ×Σ +1 where 1 is the terminal object. The underlying set of νF is the set of all finite and infinite words over U Σ. And in case U preserves finite coproducts, the underlying set of μF is the set of all finite words. The distance of two distinct words is 2−k for the least k such that their prefixes of lengths at most k are distinct. Recursive equations in the terminal coalgebra (considered as an algebra) have a unique solution. That is, the algebra νF is completely iterative, see [11]. We define, analogously, a canonical Cauchy sequence of approximate solutions into the initial algebra. And we prove that this sequence converges to the unique solution in νF . We recall from Milius [11] the concept of an algebra A for F being completely iterative in Sect. 4. Shortly, this means that for every object X (of recursion variables) and every equation morphism e : X → F X + A there exists a unique solution, which is a morphism e† : X → A satisfying a simple recursive condition. Milius proved that the free completely iterative algebra Ψ B on an object B is precisely the terminal coalgebra for the endofunctor F (−) + B: Ψ B = νX.F X + B. The functor F (−) + B is grounded whenever B has a global element. Thus we have the canonical ultrametric on the underlying set of Ψ B. Moreover, the free algebra ΦB for F on B is well-known to be the initial algebra for the functor F (−) + B: ΦB = μX.F X + B. We are going to present, for every recursive equation e in Ψ B, a canonical Cauchy sequence of ‘approximate solutions’ e‡k : X → ΦB in the free algebra which converges to the solution e† in the power space of νF , see Theorem 31. Finally, the same result is proved for the free iterative algebra RB: solutions are limits of sequences of approximate solutions. Recall that ‘iterative’ is the
Approximate Coalgebra Homomorphisms
13
weakening of ‘completely iterative’ by restricting the objects X (of recursion variables) to finitely presentable ones. Related Work. For finitary set functors preserving ω op -limits Barr proved that the canonical ultrametric of νF is complete, and in case F is grounded this is the Cauchy completion of the subspace μF , see [10]. This was generalized to strongly lfp categories in [3]. The fact that preservation of ω op -limits is not needed is new, and so is the fact that the algebra structure of μF determines the coalgebra structure of νF . The idea of approximate homomorphisms and approximate solutions has been used for set functors in the recent paper [5] where instead of ultrametrics complete partial orders were applied. It turns out that the present approach is much more natural and needs less technical preparations.
2
Strongly lfp Categories
Throughout the paper we work with a finitary endofunctor F of a locally finitely presentable (shortly lfp) category. In this preliminary section we recall basic properties of lfp categories and introduce strongly lfp categories. Recall that an object A of a category K is called finitely presentable if its hom-functor K(A, −) preserves filtered colimits. And a category K is called lfp, or locally finitely presentable , if it has colimits and a set of finitely presentable objects whose closure under filtered colimits is all of K. Every lfp category is complete ([8], Remark 1.56). Recall further that an endofunctor is called finitary if it preserves filtered colimits. Example 1 (See [6]). (1) Set is lfp. An endofunctor F is finitary iff every element of F X lies in the image of F m for some finite subset m : M → X. (2) K-Vec, the category of vector space over a field K, is lfp. An endofunctor F is finitary iff every element of F X lies in the image of F m for some finitedimensional subspace m : M → X. Remark 1. (1) In an lfp category, given a filtered colimit at : At → A (t ∈ T ) of a diagram D with all connecting morphisms monic, it follows that (a) each at is monic, and (b) for every cocone ft : At → X of monomorphisms for D the unique factorization morphism f : A → X is monic. See [8], Proposition 1.62. (2) In any category the corresponding statement is true for limits at : A → At , t < λ, of λ-chains of monomorphisms for all limit ordinals λ: (a) each at is monic, and (b) for every cone ft : X → At of monomorphisms the factorization morphism f : X → A is monic. Indeed, (a) follows from the fact that the cone of all as with t ≤ s is collectively monic. And (b) is trivial: since a0 · f is monic, so is f .
14
J. Ad´ amek
(3) Every lfp category has (strong epi, mono)-factorizations of morphisms, see [8], Proposition 1.61. An epimorphism e : A → B is called strong if for every monomorphism m : U → V the following diagonal fill-in property holds: given a commutative square v · e = m · u (for some u : A → U and v : B → V ) there exists a unique ‘diagonal’ morphism d : B → U with u = d · e and v = m · d. (4) An object A of an lfp category K is called finitely generated if it is a strong quotient of a finitely presentable object. Equivalently: K(A, −) preserves filtered colimits of monomorphisms, see [8], Proposition 1.69. (5) The initial object of K is denoted by 0. We call it simple if it has no proper strong quotients or, equivalently, if the unique morphism from it to any given object is monic. This holds e.g. in Set and K-Vec. In the lfp category Alg1,0 of unary algebras with a constant the intial object is the algebra of natural numbers with the successor operation, and it is not simple. Notation 2. (1) We denote by F n 0 (n < ω) the initial-algebra ω-chain F 2!
F!
!
0 −−→ F 0 −−−→ F 2 0 −−−→ · · · with connecting maps wn,k : F n 0 → F k 0, where w0,1 =! is unique and wn+1,k+1 = F wn,k . The colimit cocone wn,ω : F n 0 → F ω 0 (n < ω) of this chain yields, since F is finitary, a colimit cocone F wn,ω : F n+1 0 → F (F ω 0). Consequently, we obtain a unique morphism ϕ : F (F ω 0) → F ω 0
with
ϕ · F wn,ω = wn+1,ω
(n < ω).
Then F ω 0 is the initial algebra with the algebra structure ϕ, see the second proposition of [1]. (2) We denote by F i 1 (i ∈ Ord) the terminal-coalgebra chain indexed by all ordinals, dually ordered: !
F!
F 2!
F i1 ←
1 ←−− F 1 ←−−− F F 1 ←−−− · · ·
···
It is given by transfinite recursion: F 0 1 = 1, F i+1 1 = F (F i 1) and, for limit ordinals i, F i 1 = limj ω the connecting maps vjω are monic by transfinite induction. (1) vω+1,ω : F (F ω 1) → F ω 1 is monic. To prove this, use the fact that K is lfp and choose a filtered diagram D of finitely presentable objects At (t ∈ T ) with a colimit cocone at : At → F ω 1. Due to Remark 1(3) we have a factorization at = bt ·et for a strong epimorphism et : At → Bt and a monomorphism bt : Bt → F ω 1. ¯ whose connecting morphisms The objects Bt (t ∈ T ) form a filtered diagram D are derived via diagonal fill-in: given a connecting morphism at,t : At → At
Approximate Coalgebra Homomorphisms
17
¯ by means of the in D, we obtain a connecting morphism bt,t : Bt → Bt of D following diagonal: et
/ / Bt at,t b bt At t,t et / F ω1 Bt / At
bt
¯ a filtered diagram (since D is) and the morphisms It is easy to verify that D ω ¯ Moreover, each connecting bt : Bt → F 1 (t ∈ T ) form the colimit cocone of D. morphism bt,t is monic (since its composite with bt is). Hence, our assumption that F is finitary and preserves monomorphisms implies that we have a filtered ¯ of all F Bt (t ∈ T ) whose connecting morphisms are monic and diagram F D whose colimit is (F bt )t∈T . Using Remark 1(1) it is sufficient, for proving that vω+1,ω is monic, to verify that (for every t ∈ T ) vω+1,ω · F bt : F Bt F ω 1 is monic. Since our category is strongly lfp and Bt is finitely generated (being a strong quotient of the finitely presentable object At ), it follows that for the monomorphism bt there exists k < ω with vω,k ·bt monic. Thus, the morphism F vω,k ·F bt = vω,k+1 · vω+1,ω · F bt = vω+1,ω · F bt is monic, as desired. (2) Each vi,ω for i > ω is monic. This is easily seen by transfinite induction on i. The case i = ω+1 has just been proved. If vi,ω is monic, so is F vi,ω = vi+1,ω+1 , therefore, the composite vi+1,ω = vω+1,ω · vi+1,ω+1 is also monic. And given a limit ordinal i > ω, then each connecting map vk,j , where ω ≤ j ≤ k < i is monic, since vj,ω · vk,j = vk,ω is monic, see Remark 1(2). Now F i 1 is a limit of the chain F j 1 for all ω ≤ j < i whose connecting morphisms are monic. This implies that the limit cone consists of monomorphisms. In particular, vi,ω is monic, as required. Lemma 9. For i in Theorem 8 the following squares commute: μF O
m
wk,ω
F k0
/ νF = F i 1
F k!
(k < ω)
vi,k
/ F k1
Proof. We proceed by induction, the case k = 0 is clear. Recall that m is an algebra homomorphism: m · ϕ = τ −1 · F m = vi+1,i · F m.
(2.1)
18
J. Ad´ amek
By definition of ϕ we have ϕ · F wk,ω = wk+1,ω (Notation 2(1)), thus, m · wk+1,ω = vi+1,i · F m · F wk,ω .
(2.2)
Assuming the above square commutes for k, we prove it for k + 1: F k+1 ! = F (vi,k · m · wk,ω ) = vi+1,k+1 · F m · F wk,ω = vi,k+1 · vi+1,i · F m · F wk,ω
induction hypothesis vi+1,k+1 = F vi,k
= vi,k+1 · m · wk+1,ω
by (2.2)
Since vi,k is the composite vω,k · vi,ω , we obtain the following corollary of Theorem 8: Corollary 10. If the initial object is simple, then the following monomorphism vi,ω
m
u ¯ ≡ μF −−→ F i 1 −−−−→ F ω 1 makes the squares below commutative μF O
u ¯
wk,ω
F k0
3
/ F k1
F k!
(k < ω)
vi,k
/ F k1
Approximate Homomorphisms
Assumption 11. Throughout the rest of the paper K denotes a strongly lfp category with a simple initial object, and F a finitary endofunctor preserving monomorphisms. Recall that the initial algebra μF is a canonical subobject of the terminal coalgebra νF (Proposition 4). For the forgetful functor of Notation 7 we present a canonical ultrametric on the underlying set U (νF ) of the terminal coalgebra. Then U (μF ) is a subspace of that ultrametric space, since U preserves monomorphisms. We prove that those two ultrametric spaces have the same Cauchy completion. Moreover, given an arbitrary coalgebra α : A → F A, the unique homomorphism into νF is determined by a Cauchy sequence of (approximate) morphisms from A to μF . Remark 4. (1) Recall that a metric d is called an ultrametric if the triangle inequality can be strengthened to d(x, z) ≤ max d(x, y), d(y, z) . We denote by UMet the category of ultrametric space with distances at most 1 and nonexpanding functions. That is, functions f : (X, d) → (X , d ) with d(x, y) ≥ d f (x), f (y) for all x, y ∈ X.
Approximate Coalgebra Homomorphisms
19
(2) The category UMet has products w.r.t. the supremum metric. In particular, for every set M the power (X, d)M is the ultrametric space of all functions f : M → X with distances given by d(f, f ) = sup d(f (m), f (m) for f, f ∈ X M . m∈M
(3) Completeness of an ultrametric space means that every Cauchy sequence has a limit. Every ω op -sequence of morphisms ak+1,k : Ak+1 → Ak (k < ω) in Set carries a canonical complete ultrametric on its limit A = lim Ak in Set k 2, we shall show that it is already (n − 1)−permutable. Let p1 , ..., pn−1 be the terms from the Mal’cev condition for n-permutability, with the Eqs. 1. According to Lemma 9, we find some term s(x, y, z, u) such that s(x, y, z, z) ≈ p1 (x, y, z) and s(x, x, y, z) ≈ p2 (x, y, z). We now define a new term m(x, y, z) := s(x, y, y, z), and calculate: m(x, y, y) ≈ s(x, y, y, y) ≈ p1 (x, y, y) ≈ x as well as m(x, x, y) ≈ s(x, x, x, y) ≈ p2 (x, x, y) ≈ p3 (x, y, y). From this we obtain a shorter chain of equations, discarding p1 and p2 and replacing them by m: x ≈ m(x, y, y) m(x, x, y) ≈ p3 (x, y, y) pi (x, x, y) ≈ pi+1 (x, y, y) for all 2 < i < n − 1 pn−1 (x, x, y) ≈ y.
7
Conclusion and Further Work
We have shown that variety functors FΣ preserve preimages if and only if Σ Σ where Σ is the derivative of the set of equations Σ. For each Mal’cev variety, FΣ weakly preserves kernel pairs. For the other direction, if FΣ weakly preserves kernel pairs, then every equation of the shape p(x, x, y) ≈ q(x, y, y) ensures the existence of a term s(x, y, z, u), such that p(x, y, z) ≈ s(x, y, z, z) and q(x, y, y) ≈ s(x, x, y, z). This intriguing algebraic condition appears to be new and certainly deserves further study from a universal algebraic perspective. Adding it to Freese’s characterization of n-permutable varieties by means of his “order-derivative”, allows to distinguish between the cases n = 2 (Mal’cev varieties) and n > 2, which cannot be achieved using his order derivative, alone. It would be desirable to see if this consequence of weak kernel preservation can be turned into a concise if-and-only-if statement.
Free-Algebra Functors from a Coalgebraic Perspective
67
References 1. Barr, M.: Terminal coalgebras in well-founded set theory. Theoret. Comput. Sci. 114(2), 299–315 (1993). https://doi.org/10.1016/0304-3975(93)90076-6 2. Day, A.: A characterization of modularity for congruence lattices of algebras. Can. Math. Bull. 12(2), 167–173 (1969). https://doi.org/10.4153/CMB-1969-016-6 ´ An easy test for congruence modular3. Dent, T., Kearnes, K.A., Szendrei, A.: ity. Algebra Univers. 67(4), 375–392 (2012). https://doi.org/10.1007/s00012-0120186-z 4. Freese, R.: Equations implying congruence n-permutability and semidistributivity. Algebra Univers. 70(4), 347–357 (2013). https://doi.org/10.1007/s00012-0130256-x 5. Gumm, H.P.: Congruence modularity is permutability composed with distributivity. Arch. Math 36, 569–576 (1981). https://doi.org/10.1007/BF01223741 6. Gumm, H.P.: Elements of the general theory of coalgebras. In: LUATCS 99. Rand Afrikaans University, Johannesburg, South Africa (1999). https://www. mathematik.uni-marburg.de/∼gumm/Papers/Luatcs.pdf 7. Fiadeiro, J.L., Harman, N., Roggenbach, M., Rutten, J. (eds.): CALCO 2005. LNCS, vol. 3629. Springer, Heidelberg (2005). https://doi.org/10.1007/11548133 8. Gumm, H.P.: Connected monads weakly preserve products. Algebra Univers. 81(2), 18 (2020). https://doi.org/10.1007/s00012-020-00654-w 9. Gumm, H.P., Schr¨ oder, T.: Types and coalgebraic structure. Algebra Univers. 53, 229–252 (2005). https://doi.org/10.1007/s00012-005-1888-2 10. Hagemann, J., Mitschke, A.: On n-permutable congruences. Algebra Univers. 3(1), 8–12 (1973). https://doi.org/10.1007/BF02945100 11. Henkel, C.: Klassifikation Coalgebraischer Typfunktoren. Diplomarbeit, Universit¨ at Marburg (2010) 12. J´ onsson, B.: Algebras whose congruence lattices are distributive. Math. Scand. 21, 110–121 (1967). https://doi.org/10.7146/math.scand.a-10850 13. Mal’cev, A.I.: On the general theory of algebraic systems. Matematicheskii Sbornik (N.S.) 35(77), 3–20 (1954). (in Russian) 14. Mal’tsev, A.I.: On the general theory of algebraic systems. Am. Math. Soc. 27, 125–142 (1963). Transl., Ser. 2 15. Rutten, J.: Universal coalgebra: a theory of systems. Theoret. Comput. Sci. 249, 3–80 (2000). https://doi.org/10.1016/S0304-3975(00)00056-6 16. Rutten, J.: Universal coalgebra: a theory of systems. Techical report, CWI, Amsterdam (1996). CS-R9652 17. Selinger, P.: Functionality, polymorphism, and concurrency: a mathematical investigation of programming paradigms. Technical report, IRCS 97–17, University of Pennsylvania (1997). https://repository.upenn.edu/ircs reports/88/ 18. Trnkov´ a, V.: On descriptive classification of set-functors. I. Comm. Math. Univ. Carolinae 12, 143–174 (1971). https://eudml.org/doc/16419
Learning Automata with Side-Effects Gerco van Heerdt1(B) , Matteo Sammartino1,2 , and Alexandra Silva1
2
1 University College London, London, UK {gerco.heerdt,alexandra.silva}@ucl.ac.uk Royal Holloway, University of London, London, UK [email protected]
Abstract. Automata learning has been successfully applied in the verification of hardware and software. The size of the automaton model learned is a bottleneck for scalability, and hence optimizations that enable learning of compact representations are important. This paper exploits monads, both as a mathematical structure and a programming construct, to design and prove correct a wide class of such optimizations. Monads enable the development of a new learning algorithm and correctness proofs, building upon a general framework for automata learning based on category theory. The new algorithm is parametric on a monad, which provides a rich algebraic structure to capture non-determinism and other side-effects. We show that this allows us to uniformly capture existing algorithms, develop new ones, and add optimizations. Keywords: Automata
1
· Learning · Side-effects · Monads · Algebras
Introduction
The increasing complexity of software and hardware systems calls for new scalable methods to design, verify, and continuously improve systems. Black-box inference methods aim at building models of running systems by observing their response to certain queries. This reverse engineering process is very amenable for automation and allows for fine-tuning the precision of the model depending on the properties of interest, which is important for scalability. One of the most successful instances of black-box inference is automata learning, which has been used in various verification tasks, ranging from finding bugs in implementations of network protocols [15] to rejuvenating legacy software [29]. Vaandrager [30] has written a comprehensive overview of the widespread use of automata learning in verification. A limitation in automata learning is that the models of real systems can become too large to be handled by tools. This demands compositional methods and techniques that enable compact representation of behaviors. This work was partially supported by the ERC Starting Grant ProFoundNet (grant code 679127), a Leverhulme Prize (PLP–2016–129), and the EPSRC Standard Grant CLeVer (EP/S028641/1). c IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 D. Petri¸san and J. Rot (Eds.): CMCS 2020, LNCS 12094, pp. 68–89, 2020. https://doi.org/10.1007/978-3-030-57201-3_5
Learning Automata with Side-Effects
69
In this paper, we show how monads can be used to add optimizations to learning algorithms in order to obtain compact representations. We will use as playground for our approach the well known L algorithm [2], which learns a minimal deterministic finite automaton (DFA) accepting a regular language by interacting with a teacher, i.e., an oracle that can reply to specific queries about the target language. Monads allow us to take an abstract approach, in which category theory is used to devise an optimized learning algorithm and a generic correctness proof for a broad class of compact models. The inspiration for this work is quite concrete: it is a well-known fact that non-deterministic finite automata (NFAs) can be much smaller than deterministic ones for a regular language. The subtle point is that given a regular language, there is a canonical deterministic automaton accepting it—the minimal one—but there might be many “minimal” non-deterministic automata accepting the same language. This raises a challenge for learning algorithms: which non-deterministic automaton should the algorithm learn? To overcome this, Bollig et al. [11] developed a version of Angluin’s L algorithm, which they called NL , in which they use a particular class of NFAs, Residual Finite State Automata (RFSAs), which do admit minimal canonical representatives. Though NL indeed is a first step in incorporating a more compact representation of regular languages, there are several questions that remain to be addressed. We tackle them in this paper. DFAs and NFAs are formally connected by the subset construction. Underpinning this construction is the rich algebraic structure of languages and of the state space of the DFA obtained by determinizing an NFA. The state space of a determinized DFA—consisting of subsets of the state space of the original NFA—has a join-semilattice structure. Moreover, this structure is preserved in language acceptance: if there are subsets U and V , then the language of U ∪ V is the union of the languages of the first two. Formally, the function that assigns to each state its language is a join-semilattice map, since languages themselves are just sets of words and have a lattice structure. And languages are even richer: they have the structure of complete atomic Boolean algebras. This leads to several questions: Can we exploit this structure and have even more compact representations? What if we slightly change the setting and look at weighted languages over a semiring, which have the structure of a semimodule (or vector space, if the semiring is a field)? The latter question is strongly motivated by the widespread use of weighted languages and corresponding weighted finite automata (WFAs) in verification, from the formal verification of quantitative properties [13,17,25], to probabilistic model-checking [5], to the verification of on-line algorithms [1]. Our key insight is that the algebraic structures mentioned above are in fact algebras for a monad T . In the case of join-semilattices this is the powerset monad, and in the case of vector spaces it is the free vector space monad. These monads can be used to define a notion of T -automaton, with states having the structure of an algebra for the monad T , which generalizes non-determinism as a side-effect. From T -automata we can derive a compact, equivalent version by
70
G. van Heerdt et al.
taking as states a set of generators and transferring the algebraic structure of the original state space to the transition structure of the automaton. This general perspective enables us to generalize L to a new algorithm LT , which learns compact automata featuring non-determinism and other side-effects captured by a monad. Moreover, LT incorporates further optimizations arising from the monadic representation, which lead to more scalable algorithms. We start by giving an overview of our approach, which also states our main contributions in greater detail and ends with a road map of the rest of the paper.
2
Overview and Contributions
In this section, we explain the original L algorithm and discuss the challenges in adapting the algorithm to learn automata with side-effects, illustrating them through a concrete example—NFAs. We then highlight our main contributions. L Algorithm. This algorithm learns the minimal DFA accepting a language L ⊆ A∗ over a finite alphabet A. It assumes the existence of a minimally adequate teacher, which is an oracle that can answer two types of queries: 1. Membership queries: given a word w ∈ A∗ , does w belong to L? and 2. Equivalence queries: given a hypothesis DFA H, does H accept L? If not, the teacher returns a counterexample, i.e., a word incorrectly classified by H. The algorithm incrementally builds an observation table made of two parts: a top part, with rows ranging over a finite set S ⊆ A∗ ; and a bottom part, with rows ranging over S · A (· is pointwise concatenation). Columns range over a finite E ⊆ A∗ . For each u ∈ S ∪ S · A and v ∈ E, the corresponding cell in the table contains 1 if and only if uv ∈ L. Intuitively, each row u contains enough information to fully identify the Myhill–Nerode equivalence class of u with respect to an approximation of the target language—rows with the same content are considered members of the same equivalence class. Cells are filled in using membership queries. As an example, and to set notation, consider the table below over A = {a, b}. It shows that L contains the word aa and does not contain the words ε (the empty word), a, b, ba, aaa, and baa. E ε a aa S ε 0 0 1 a 0 1 0 S·A b 0 0 0
rowt : S → 2E E A
rowb : S → (2 )
rowt (u)(v) = 1 ⇐⇒ uv ∈ L rowb (u)(a)(v) = 1 ⇐⇒ uav ∈ L
We use functions rowt and rowb to describe the top and bottom parts of the table, respectively. Notice that S and S · A may intersect. For conciseness, when tables are depicted, elements in the intersection are only shown in the top part. A key idea of the algorithm is to construct a hypothesis DFA from the different rows in the table. The construction is the same as that of the minimal DFA from the Myhill–Nerode equivalence, and exploits the correspondence between table rows and Myhill–Nerode equivalence classes. The state space of the hypothesis DFA is given by the set H = {rowt (s) | s ∈ S}. Note that there may
Learning Automata with Side-Effects 1 2 3 4 5 6 7 8 9 10 11 12 13 14
71
S, E ← {ε} repeat while the table is not closed or not consistent if the table is not closed find t ∈ S, a ∈ A such that rowb (t)(a) = rowt (s) for all s ∈ S S ← S ∪ {ta} if the table is not consistent find s1 , s2 ∈ S, a ∈ A, and e ∈ E such that rowt (s1 ) = rowt (s2 ) and rowb (s1 )(a)(e) = rowb (s2 )(a)(e) E ← E ∪ {ae} Construct the hypothesis H and submit it to the teacher if the teacher replies no, with a counterexample z S ← S ∪ prefixes(z) until the teacher replies yes return H
Fig. 1. L algorithm.
be multiple rows with the same content, but they result in a single state, as they all belong to the same Myhill–Nerode equivalence class. The initial state is rowt (ε), and we use the ε column to determine whether a state is accepting: rowt (s) is accepting whenever rowt (s)(ε) = 1. The transition function is defined a as rowt (s) − → rowb (s)(a). (Notice that the continuation is drawn from the bottom part of the table). For the hypothesis automaton to be well-defined, ε must be in S and E, and the table must satisfy two properties: – Closedness states that each transition actually leads to a state of the hypothesis. That is, the table is closed if for all t ∈ S and a ∈ A there is s ∈ S such that rowt (s) = rowb (t)(a). – Consistency states that there is no ambiguity in determining the transitions. That is, the table is consistent if for all s1 , s2 ∈ S such that rowt (s1 ) = rowt (s2 ) we have rowb (s1 ) = rowb (s2 ). The algorithm updates the sets S and E to satisfy these properties, constructs a hypothesis, submits it in an equivalence query, and, when given a counterexample, refines the hypothesis. This process continues until the hypothesis is correct. The algorithm is shown in Fig. 1. Example Run. We now run the algorithm with the target language L = {w ∈ {a}∗ | |w| = 1}. The minimal DFA accepting L is M=
a
a
a
(1)
Initially, S = E = {ε}. We build the observation table given in Fig. 2a. This table is not closed, because the row with label a, having 0 in the only column, does not appear in the top part of the table: the only row ε has 1. To fix this,
72
G. van Heerdt et al.
ε ε1 a0
ε ε1 a0 aa 1
(a)
(b)
ε a aa aaa aaaa
a a
(d)
(c)
ε 1 0 1 1 1
ε a aa aaa aaaa
ε 1 0 1 1 1
a 0 1 1 1 1
(e)
Fig. 2. Example run of L on L = {w ∈ {a}∗ | |w| = 1}.
we add the word a to the set S. Now the table (Fig. 2b) is closed and consistent, so we construct the hypothesis that is shown in Fig. 2c and pose an equivalence query. The teacher replies no and informs us that the word aaa should have been accepted. L handles a counterexample by adding all its prefixes to the set S. We only have to add aa and aaa in this case. The next table (Fig. 2d) is closed, but not consistent: the rows ε and aa both have value 1, but their extensions a and aaa differ. To fix this, we prepend the continuation a to the column ε on which they differ and add a · ε = a to E. This distinguishes rowt (ε) from rowt (aa), as seen in the next table in Fig. 2e. The table is now closed and consistent, and the new hypothesis automaton is precisely the correct one M. As mentioned, the hypothesis construction approximates the theoretical construction of the minimal DFA, which is unique up to isomorphism. That is, for S = E = A∗ the relation that identifies words of S having the same value in rowt is precisely the Myhill–Nerode’s right congruence. Learning Non-deterministic Automata. As is well known, NFAs can be smaller than the minimal DFA for a given language. For example, the language L above is accepted by the NFA a N=
a
(2)
a
which is smaller than the minimal DFA M. Though in this example, which we chose for simplicity, the state reduction is not massive, it is known that in general NFAs can be exponentially smaller than the minimal DFA [24]. This reduction of the state space is enabled by a side-effect—non-determinism, in this case. Learning NFAs can lead to a substantial gain in space complexity, but it is challenging. The main difficulty is that NFAs do not have a canonical minimal representative: there may be several non-isomorphic state-minimal NFAs accepting the same language, which poses problems for the development of the learning algorithm. To overcome this, Bollig et al. [11] proposed to use a particular class of NFAs, namely RFSAs, which do admit minimal canonical representatives. However, their ad-hoc solution for NFAs does not extend to other automata, such as weighted or alternating. In this paper we present a solution that works for any side-effect, specified as a monad.
Learning Automata with Side-Effects
73
The crucial observation underlying our approach is that the language semantics of an NFA is defined in terms of its determinization, i.e., the DFA obtained by taking sets of states of the NFA as state space. In other words, this DFA is defined over an algebraic structure induced by the powerset, namely a (complete) join semilattice (JSL) whose join operation is set union. This automaton model does admit minimal representatives, which leads to the key idea for our algorithm: learning NFAs as automata over JSLs. In order to do so, we use an extended table where rows have a JSL structure, defined as follows. The join of two rows is given by an element-wise or, and the bottom element is the row containing only zeroes. More precisely, the new table consists of the two functions rowt : P(S) → 2E
rowb : P(S) → (2E )A
given by rowt (U ) = {rowt (s) | s ∈ U } and rowb (U )(a) = {rowb (s)(a) | s ∈ U }. Formally, these functions are JSL homomorphisms, and they induce the following general definitions: – The table is closed if for all U ⊆ S, a ∈ A there is U ⊆ S such that rowt (U ) = rowb (U )(a). – The table is consistent if for all U1 , U2 ⊆ S s.t. rowt (U1 ) = rowt (U2 ) we have rowb (U1 ) = rowb (U2 ). We remark that our algorithm does not actually store the whole extended table, which can be quite large. It only needs to store the original table over S because all other rows in P(S) are freely generated and can be computed as needed, with no additional membership queries. The only lines in Fig. 1 that need to be adjusted are lines 5 and 8, where closedness and consistency are replaced with the new notions given above. Moreover, H is now built from the extended table. Optimizations. In this paper we also present two optimizations to our algorithm. For the first one, note that the state space of the hypothesis constructed by the algorithm can be very large since it encodes the entire algebraic structure. We show that we can extract a minimal set of generators from the table and compute a succinct hypothesis in the form of an automaton with side-effects, without any algebraic structure. For JSLs, this consists in only taking rows that are not the join of other rows, i.e., the join-irreducibles. By applying this optimization to this specific case, we essentially recover the learning algorithm of Bollig et al. [11]. The second optimization is a generalization of the optimized counterexample handling method of Rivest and Schapire [28], originally intended for L and DFAs. It consists in processing counterexamples by adding a single suffix of the counterexample to E, instead of adding all prefixes of the counterexample to S. This can avoid the algorithm posing a large number of membership queries. Example Revisited. We now run the new algorithm on the language L = {w ∈ {a}∗ | |w| = 1} considered earlier. Starting from S = E = {ε}, the observation table (Fig. 3a) is immediately closed and consistent. (It is closed because we
74
G. van Heerdt et al.
ε ε1 a0
a
(a)
(b)
ε a aa aaa
a (c)
(d)
ε 1 0 1 1
ε a aa aaa
ε 1 0 1 1
a 0 1 1 1
(e)
Fig. 3. Example run of the L adaptation for NFAs on L = {w ∈ {a}∗ | |w| = 1}.
have rowt ({a}) = rowt (∅).) This gives the JSL hypothesis shown in Fig. 3b, which leads to an NFA hypothesis having a single state that is initial, accepting, and has no transitions (Fig. 3c). The hypothesis is incorrect, and the teacher may supply us with counterexample aa. Adding prefixes a and aa to S leads to the table in Fig. 3d. The table is again closed, but not consistent: rowt ({a}) = rowt (∅), but rowb ({a})(a) = rowt ({aa}) = rowt (∅) = rowb (∅)(a). Thus, we add a to E. The resulting table (Fig. 3e) is closed and consistent. We note that row aa is the union of other rows: rowt ({aa}) = rowt ({ε, a}) (i.e., it is not a join-irreducible), and therefore can be ignored when building the succinct hypothesis. This hypothesis has two states, ε and a, and indeed it is the correct one N. Contributions and Road Map of the Paper. After some preliminary notions in Sect. 3, we present the main contributions: – In Sect. 4, we develop a general algorithm LT , which generalizes the NFA one presented in Sect. 2 to an arbitrary monad T capturing side-effects, and we provide a general correctness proof for our algorithm. – In Sect. 5, we describe the first optimization and prove its correctness. – In Sect. 6 we describe the second optimization. We also show how it can be combined with the one of Sect. 5, and how it can lead to a further small optimization, where the consistency check on the table is dropped. – Finally, in Sect. 7 we show how LT can be applied to several automata models, highlighting further case-specific optimizations when available.
3
Preliminaries
In this section we define a notion of T -automaton, a generalization of nondeterministic finite automata parametric in a monad T . We assume familiarity with basic notions of category theory: functors (in the category Set of sets and functions) and natural transformations. Side-effects can be conveniently captured as a monad. A monad T = (T, η, μ) is a triple consisting of an endofunctor T on Set and two natural transformations: a unit η : Id ⇒ T and a multiplication μ : T 2 ⇒ T , which satisfy the compatibility laws μ ◦ ηT = idT = μ ◦ T η and μ ◦ μT = μ ◦ T μ. Example 1 (Monads). An example of a monad is the triple (P, {−}, ), where P denotes the powerset functor associating a collection of subsets to a set, {−}
Learning Automata with Side-Effects
75
is the singleton operation, and is just union of sets. Another example is the triple (V (−), e, m), where V (X) is the free semimodule (over a semiring S) over X, namely {ϕ | ϕ : X → S having finite support}. The support of a function ϕ : X → S is the set of x ∈ X such that ϕ(x) = 0. Then e : X → V (X) is the characteristic function for each x ∈ X, and m : V (V (X)) → V (X) is defined for ϕ ∈ V (V (X)) and x ∈ X as m(ϕ)(x) = ψ∈V (X) ϕ(ψ) × ψ(x). Given a monad T , a T -algebra is a pair (X, h) consisting of a carrier set X and a function h : T X → X such that h ◦ μX = h ◦ T h and h ◦ ηX = idX . A T homomorphism between two T -algebras (X, h) and (Y, k) is a function f : X → Y such that f ◦h = k◦T f . The abstract notion of T -algebra instantiates to expected notions, as illustrated in the following example. Example 2 (Algebras for a monad). The P-algebras are the (complete) joinsemilattices, and their homomorphisms are join-preserving functions. If S is a field, V -algebras are vector spaces, and their homomorphisms are linear maps. We will often refer to a T -algebra (X, h) as X if h is understood or if its specific definition is irrelevant. Given a set X, (T X, μX ) is a T -algebra called the free T -algebra on X. One can build algebras pointwise for some operations. For instance, if Y is a set and (X, x) a T -algebra, then we have a T -algebra (X Y , f ), where f : T (X Y ) → X Y is given by f (W )(y) = (x ◦ T (evy ))(W ) and evy : X Y → X by evy (g) = g(y). If U and V are T -algebras and f : U → V is a T -algebra homomorphism, then the image img(f ) of f is a T -algebra, with the T -algebra structure inherited from V . The following proposition connects algebra homomorphisms from the free T -algebra on a set U to an algebra V with functions U → V . We will make use of this later in the section. Proposition 3. Given a set U and a T -algebra (V, v), there is a bijective correspondence between T -algebra homomorphisms T U → V and functions U → V : for a T -algebra homomorphism f : T U → V , define f † = f ◦ η : U → V ; for a function g : U → V , define g = v ◦ T g : T U → V . Then g is a T -algebra homomorphism called the free T -extension of g, and we have f † = f and g † = g. We now have all the ingredients to define our notion of automaton with sideeffects and their language semantics. We fix a monad (T, η, μ) with T preserving finite sets, as well as a T -algebra O that models outputs of automata. Definition 4 (T -automaton). A T -automaton is a quadruple (Q, δ : Q → QA , out : Q → O, init ∈ Q), where Q is a T -algebra, the transition map δ and output map out are T -algebra homomorphisms, and init is the initial state. Example 5. DFAs are Id-automata when O = 2 = {0, 1} is used to distinguish accepting from rejecting states. For the more general case of O being any set, DFAs generalize into Moore automata. Example 6. Recall that P-algebras are JSLs, and their homomorphisms are joinpreserving functions. In a P-automaton, Q is equipped with a join operation, and
76
G. van Heerdt et al.
QA is a join-semilattice with pointwise join: (f ∨ g)(a) = f (a) ∨ g(a) for a ∈ A. Since the automaton maps preserve joins, we have, in particular, δ(q1 ∨ q2 )(a) = δ(q1 )(a) ∨ δ(q2 )(a). One can represent an NFA over a set of states S as a Pautomaton by taking Q = (P(S), ) and O = 2, the Boolean join-semilattice with the or operation as its join. Let init ⊆ S be the set of initial states and out : P(Q) → 2 and δ : P(S) → P(S)A the respective extensions (Proposition 3) of the NFA’s output and transition functions. The resulting P-automaton is precisely the determinized version of the NFA. More generally, an automaton with side-effects given by a monad T always represents a T -automaton with a free state space. Proposition 7. A T -automaton of the form ((T X, μX ), δ, out, init), for any set X, is completely defined by the set X with the element init ∈ T X and functions δ † : X → (T X)A
out† : X → O.
We call such a T -automaton a succinct automaton, which we sometimes identify with the representation (X, δ † , out† , init). These automata are closely related to the ones studied in [18]. A (generalized) language is a function L : A∗ → O. For every T -automaton we have an observability and a reachability map, telling respectively which state is reached by reading a given word and which language each state recognizes. Definition 8 (Reachability/Observability Maps). The reachability map of a T -automaton A is a function rA : A∗ → Q inductively defined as: rA (ε) = init and rA (ua) = δ(rA (u))(a). The observability map of A is a function oA : Q → ∗ OA given by: oA (q)(ε) = out(q) and oA (q)(av) = oA (δ(q)(a))(v). The language accepted by A is the map LA = oA (init) = outA ◦ rA : A∗ → O. Example 9. For an NFA A represented as a P-automaton, as seen in Example 6, oA (q) is the language of q in the traditional sense. Note that q, in general, is a set of states: oA (q) takes the union of languages of singleton states. The set LA is the language accepted by the initial states, i.e., the language of the NFA. The reachability map rA (u) returns the set of states reached via all paths reading u. Given a language L : A∗ → O, there exists a (unique) minimal T -automaton ML accepting L, which is minimal in the number of states. Its existence follows from general facts. See for example [19]. ∗
Definition 10 (Minimal T -Automaton for L). Let tL : A∗ → OA be the function giving the residual languages of L, namely tL (u) = λv.L(uv). The minimal T -automaton ML accepting L has state space M = img(tL ), initial state init = tL (ε), and T -algebra homomorphisms out : M → O and δ : M → M A given by out(tL (U )) = L(U ) and δ(tL (U ))(a)(v) = tL (U )(av). In the following, we will also make use of the minimal Moore automaton accepting L. Although this always exists—by instantiating Definition 10 with T = Id—it need not be finite. The following property says that finiteness of Moore automata and of T -automata accepting the same language are related.
Learning Automata with Side-Effects
77
Proposition 11. The minimal Moore automaton accepting L is finite if and only if the minimal T -automaton accepting L is finite.
4
A General Algorithm
In this section we introduce our extension of L to learn automata with sideeffects. The algorithm is parametric in the notion of side-effect, represented as the monad T , and is therefore called LT . We fix a language L : A∗ → O that is to be learned, and we assume that there is a finite T -automaton accepting L. This assumption generalizes the requirement of L that L is regular (i.e., accepted by a specific class of T -automata, see Example 5). An observation table consists of a pair of functions rowt : S → OE
rowb : S → (OE )A
given by rowt (s)(e) = L(se) and rowb (s)(a)(e) = L(sae), where S, E ⊆ A∗ are finite sets with ε ∈ S ∩ E. For O = 2, we recover exactly the L observation table. The key idea for LT is defining closedness and consistency over the free T -extensions of those functions. Definition 12 (Closedness and Consistency). The table is closed if for all U ∈ T (S) and a ∈ A there exists a U ∈ T (S) such that rowt (U ) = rowb (U )(a). The table is consistent if for all U1 , U2 ∈ T (S) such that rowt (U1 ) = rowt (U2 ) we have rowb (U1 ) = rowb (U2 ). For closedness, we do not need to check all elements of T (S) × A against elements of T (S), but only those of S × A, thanks to the following result. Lemma 13. If for all s ∈ S and a ∈ A there is U ∈ T (S) such that rowt (U ) = rowb (s)(a), then the table is closed. Example 14. For NFAs represented as P-automata, the properties are as presented in Sect. 2. Recall that for T = P and O = 2, the Boolean joinsemilattice, rowt and rowb describe a table where rows are labeled by subsets of S. Then we have, for instance, rowt ({s1 , s2 })(e) = rowt (s1 )(e) ∨ rowt (s2 )(e), i.e., rowt ({s1 , s2 })(e) = 1 if and only if L(s1 e) = 1 or L(s2 e) = 1. Closedness amounts to check whether each row in the bottom part of the table is the join of a set of rows in the top part. Consistency amounts to check whether, for all sets of rows U1 , U2 ⊆ S in the top part of the table whose joins are equal, the joins of rows U1 · {a} and U2 · {a} in the bottom part are also equal, for all a ∈ A. If closedness and consistency hold, we can define a hypothesis T -automaton H, with state space H = img(rowt ), init = rowt (ε), and output and transitions out : H → O δ : H → HA
out(rowt (U )) = rowt (U )(ε) δ(rowt (U )) = rowb (U ).
78 1 2 3 4 5 6 7 8 9 10 11 12 13 14
G. van Heerdt et al. S, E ← {ε} repeat while the table is not closed or not consistent if the table is not closed find s ∈ S, a ∈ A such that rowb (s)(a) = rowt (U ) for all U ∈ T (S) S ← S ∪ {sa} if the table is not consistent find U1 , U2 ∈ T (S), a ∈ A, and e ∈ E such that rowt (U1 ) = rowt (U2 ) and rowb (U1 )(a)(e) = rowb (U2 )(a)(e) E ← E ∪ {ae} Construct the hypothesis H and submit it to the teacher if the teacher replies no, with a counterexample z S ← S ∪ prefixes(z) until the teacher replies yes return H
Fig. 4. Adaptation of L for T -automata.
The correctness of this definition follows from the abstract treatment of [21], instantiated to the category of T -algebras and their homomorphisms. We can now give algorithm LT . Similarly to the example in Sect. 2, we only have to adjust lines 5 and 8 in Fig. 1. The resulting algorithm is shown in Fig. 4. Correctness. Correctness for LT amounts to proving that, for any target language L, the algorithm terminates returning the minimal T -automaton ML accepting L. As in the original L algorithm, we only need to prove that the algorithm terminates, that is, that only finitely many hypotheses are produced. Correctness follows from termination, since line 13 causes the algorithm to terminate only if the hypothesis automaton coincides with ML . In order to show termination, we argue that the state space H of the hypothesis increases while the algorithm loops, and that H cannot be larger than M , the state space of ML . In fact, when a closedness defect is resolved (line 6), a row that was not previously found in the image of rowt : T (S) → OE is added, so the set H grows larger. When a consistency defect is resolved (line 9), two previously equal rows become distinguished, which also increases the size of H. As for counterexamples, adding their prefixes to S (line 11) creates a consistency defect, which will be fixed during the next iteration, causing H to increase. This is due to the following result, which says that the counterexample z has a prefix that violates consistency. Note that the hypothesis H in the statement below is the hypothesis obtained before adding the prefixes of z to S. Proposition 15. If z ∈ A∗ is such that LH (z) = L(z) and prefixes(z) ⊆ S, then there are a prefix ua of z, with u ∈ A∗ and a ∈ A, and U ∈ T (S) such that rowt (u) = rowt (U ) and rowb (u)(a) = rowb (U )(a). Now, note that by increasing S or E, the hypothesis state space H never decreases in size. Moreover, for S = A∗ and E = A∗ , rowt = tL . Therefore,
Learning Automata with Side-Effects
79
since H and M are defined as the images of rowt and tL , respectively, the size of H is bounded by that of M . Since H increases while the algorithm loops, the algorithm must terminate and is thus correct. Note that the learning algorithm of Bollig et al. does not terminate using this counterexample processing method [10, Appendix F]. This is due to their notion of consistency being weaker than ours: we have shown that progress is guaranteed because a consistency defect, in our sense, is created using this method. Query Complexity. The complexity of automata learning algorithms is usually measured in terms of the number of both membership and equivalence queries asked, as it is common to assume that computations within the algorithm are insignificant compared to evaluating the system under analysis in applications. The cost of answering queries themselves is not considered, as it depends on the implementation of the teacher, which the algorithm abstracts from. The table is a T -algebra homomorphism, so membership queries for rows labeled in S are enough to determine all other rows. We measure the query complexities in terms of the number of states n of the minimal Moore automaton, the number of states t of the minimal T -automaton, the size k of the alphabet, and the length m of the longest counterexample. Note that t cannot be smaller than n, but it can be much bigger. For example, when T = P, t may be in O(2n ).1 The maximum number of closedness defects fixed by the algorithm is n, as a closedness defect for the setting with algebraic structure is also a closedness defect for the setting without that structure. The maximum number of consistency defects fixed by the algorithm is t, as fixing a consistency defect distinguishes two rows that were previously identified. Since counterexamples lead to consistency defects, this also means that the algorithm will not pose more than t equivalence queries. A word is added to S when fixing a closedness defect, and O(m) words are added to S when processing a counterexample. The number of rows that we need to fill using queries is therefore in O(tmk). The number of columns added to the table is given by the number of times a consistency defect is fixed and thus in O(t). Altogether, the number of membership queries is in O(t2 mk).
5
Succinct Hypotheses
We now describe the first of two optimizations, which is enabled by the use of monads. Our algorithm produces hypotheses that can be quite large, as their state space is the image of rowt , which has the whole set T (S) as its domain. For instance, when T = P, T (S) is exponentially larger than S. We show how we can compute succinct hypotheses, whose state space is given by a subset of S. We start by defining sets of generators for the table. 1
Take the language {ap }, for some p ∈ N and a singleton alphabet {a}. Its residual languages are ∅ and {ai } for all 0 ≤ i ≤ p, thus the minimal DFA accepting the language has p + 2 states. However, the residual languages w.r.t. sets of words are all the subsets of {ε, a, aa, . . . , ap }—hence, the minimal T -automaton has 2p+1 states.
80
G. van Heerdt et al.
Definition 16. A set S ⊆ S is a set of generators for the table whenever for all s ∈ S there is U ∈ T (S ) such that rowt (s) = rowt (U ).2 Intuitively, U is the decomposition of s into a “combination” of generators. When T = P, S generates the table whenever each row can be obtained as the join of a set of rows labeled by S . Explicitly: for all s ∈ S there is {s1 , . . . , sn } ⊆ S such that rowt (s) = rowt ({s1 , . . . , sn }) = rowt (s1 ) ∨ · · · ∨ rowt (sn ). Recall that H, with state space H, is the hypothesis automaton for the table. The existence of generators S allows us to compute a T -automaton with state space T (S ) equivalent to H. We call this the succinct hypothesis, although T (S ) may be larger than H. Proposition 7 tells us that the succinct hypothesis can be represented as an automaton with side-effects in T that has S as its state space. This results in a lower space complexity when storing the hypothesis. We now show how the succinct hypothesis is computed. Observe that, if generators S exist, rowt factors through the restriction of itself to T (S ). Denote t t . Since we have T (S ) ⊆ T (S), the image of row this latter function row t coincides with img(rowt ) = H, and therefore the surjection restricting row to its image has the form e : T (S ) → H. Any right inverse i : H → T (S ) of the function e (that is, e ◦ i = idH , but whereas e is a T -algebra homomorphism, i need not be one) yields a succinct hypothesis as follows. Definition 17 (Succinct Hypothesis). The succinct hypothesis is the T automaton S = (T (S ), δ, out, init) given by init = i(rowt (ε)) and out† : S → O δ † : S → T (S )A
out† (s) = rowt (s)(ε) δ † (s)(a) = i(rowb (s)(a)).
This definition is inspired by that of a scoop, due to Arbib and Manes [4]. Proposition 18. Any succinct hypothesis of H accepts the language of H. We now give a simple procedure to compute a minimal set of generators, that is, a set S such that no proper subset is a set of generators. This generalizes a procedure defined by Angluin et al. [3] for non-deterministic, universal, and alternating automata. Proposition 19. The following algorithm returns a minimal set of generators for the table: S ← S while there are s ∈ S and U ∈ T (S \ {s}) s.t. rowt (U ) = rowt (s) S ← S \ {s} return S 2
Here and hereafter we assume that T (S ) ⊆ T (S), and more generally that T preserves inclusion maps. To eliminate this assumption, one could take the inclusion map f : S → S and write rowt (T (f )(U )) instead of rowt (U ).
Learning Automata with Side-Effects
81
To determine whether U as in the above algorithm exists, one can always naively enumerate all possibilities, using that T preserves finite sets. This is what we call the basic algorithm. For specific algebraic structures, one may find more efficient methods, as we show in the following example. Example 20. Consider the powerset monad T = P. We now exemplify two ways of computing succinct hypotheses, which are inspired by canonical RFSAs [16]. The basic idea is to start from a deterministic automaton and to remove states that are equivalent to a set of other states. The algorithm given in Proposition 19 computes a minimal S that only contains labels of rows that are not the join of other rows. (In case two rows are equal, only one of their labels is kept.) In other words, as mentioned in Sect. 2, S contains labels of join-irreducible rows. To concretize the algorithm efficiently, we use a method introduced by Bollig et al. [11], which essentially exploits the natural order on the JSL of table rows. In contrast to the basic exponential algorithm, this results in a polynomial one.3 Bollig et al. determine whether a row is a join of other rows by comparing the row just to the join of rows below it. Like them, we make use of this also to compute right inverses of e, for which we will formalize the order. The function e : P(S ) → H tells us which sets of rows are equivalent to a single state in H. We show two right inverses H → P(S ) for it. The first one, i1 (h) = {s ∈ S | rowt (s) ≤ h}, stems from the construction of the canonical RFSA of a language [16]. Here we use the order a ≤ b ⇐⇒ a ∨ b = b induced by the JSL structure. The resulting construction of a succinct hypothesis was first used by Bollig et al. [11]. This succinct hypothesis has a “maximal” transition function, meaning that no more transitions can be added without changing the language of the automaton. The second inverse is i2 (h) = {s ∈ S | rowt (s) ≤ h and for all s ∈ S s.t. rowt (s) ≤ rowt (s ) ≤ h we have rowt (s) = rowt (s )}, resulting in a more economical transition function, where some redundancies are removed. This corresponds to the simplified canonical RFSA [16]. Example 21. Consider T = P, and recall the table in Fig. 3e. When S = S, the right inverse given by i1 yields the succinct hypothesis shown below. a
a
a
a a
a
a
Note that i1 (rowt (aa)) = {ε, a, aa}. Taking i2 instead, the succinct hypothesis is just the DFA (1) because i2 (rowt (aa)) = {aa}. Rather than constructing a 3
When we refer to computational complexities, as opposed to query complexities, they are in terms of the sizes of S, E, and A.
82
G. van Heerdt et al.
succinct hypothesis directly, our algorithm first reduces the set S . In this case, we have rowt (aa) = rowt ({ε, a}), so we remove aa from S . Now i1 and i2 coincide and produce the NFA (2). Minimizing the set S in this setting essentially comes down to determining what Bollig et al. [11] call the prime rows of the table. Remark 22. The algorithm in Proposition 19 implicitly assumes an order in which elements of S are checked. Although the algorithm is correct for any such order, different orders may give results that differ in size.
6
Optimized Counterexample Handling
The second optimization we give generalizes the counterexample processing method due to Rivest and Schapire [28], which improves the worst case complexity of the number of membership queries needed in L . Maler and Pnueli [26] proposed to add all suffixes of the counterexample to the set E instead of adding all prefixes to the set S. This eliminates the need for consistency checks in the deterministic setting. The method by Rivest and Schapire finds a single suffix of the counterexample and adds it to E. This suffix is chosen in such a way that it either distinguishes two existing rows or creates a closedness defect, both of which imply that the hypothesis automaton will grow. The main idea is finding the distinguishing suffix via the hypothesis automaton H. Given u ∈ A∗ , let qu be the state in H reached by reading u, i.e., qu = rH (u). For each q ∈ H, we pick any Uq ∈ T (S) that yields q according to the table, i.e., such that rowt (Uq ) = q. Then for a counterexample z we have that the residual language w.r.t. Uqz does not “agree” with the residual language w.r.t. z. ∗ The above intuition can be formalized as follows. Let R : A∗ → OA be given by R(u) = tL (Uqu ) for all u ∈ A∗ , the residual language computation. We have the following technical lemma, saying that a counterexample z distinguishes the residual languages tL (z) and R(z). Lemma 23. If z ∈ A∗ is such that LH (z) = L(z), then tL (z)(ε) = R(z)(ε). We assume that Uqε = η(ε). For a counterexample z, we then have R(ε)(z) = tL (ε)(z) = R(z)(ε). While reading z, the hypothesis automaton passes a sequence of states qu0 , qu1 , qu2 , . . . , qun , where u0 = , un = z, and ui+1 = ui a for some a ∈ A is a prefix of z. If z were correctly classified by H, all residuals R(ui ) would classify the remaining suffix v of z, i.e., such that z = ui v, in the same way. However, the previous lemma tells us that, for a counterexample z, this is not case, meaning that for some suffix v we have R(ua)(v) = R(u)(av). In short, this inequality is discovered along a transition in the path to z. Corollary 24. If z ∈ A∗ is such that LH (z) = L(z), then there are u, v ∈ A∗ and a ∈ A such that uav = z and R(ua)(v) = R(u)(av).
Learning Automata with Side-Effects
83
To find such a decomposition efficiently, Rivest and Schapire use a binary search algorithm. We conclude with the following result that turns the above property into the elimination of a closedness witness. That is, given a counterexample z and the resulting decomposition uav from the above corollary, we show that, while currently rowt (Uqua ) = rowb (Uqu )(a), after adding v to E we have rowt (Uqua )(v) = rowb (Uqu )(a)(v). (To see that the latter follows from the proposition below, note that for all U ∈ T (S) and e ∈ E, rowt (U )(e) = tL (U )(e) and for each a ∈ A, rowb (U )(a )(e) = tL (U )(a e).) The inequality means that either we have a closedness defect, or there still exists some U ∈ T (S) such that rowt (U ) = rowb (Uqu )(a). In this case, the rows rowt (U ) and rowt (Uqua ) have become distinguished by adding v, which means that the size of H has increased. A closedness defect also increases the size of H, so in any case we make progress. Proposition 25. If z ∈ A∗ is such that LH (z) = L(z), then there are u, v ∈ A∗ and a ∈ A such that rowt (Uqua ) = rowb (Uqu )(a) and tL (Uqua )(v) = tL (Uqu )(av). We now show how to combine this optimized counterexample processing method with the succinct hypothesis optimization from Sect. 5. Recall that the succinct hypothesis S is based on a right inverse i : H → T (S ) of e : T (S ) → H. Choosing such an i is equivalent to choosing Uq for each q ∈ H. We then redefine R using the reachability map of the succinct hypothesis. Specifically, R(u) = tL (rS (u)) for all u ∈ A∗ . Unfortunately, there is one complication. We assumed earlier that Uqε = η(ε), or more specifically R(ε)(z) = L(z). This now may be impossible because we do not even necessarily have ε ∈ S . We show next that if this equality does not hold, then there are two rows that we can distinguish by adding z to E. Thus, after testing whether R(ε)(z) = L(z), we either add z to E (if the test fails) or proceed with the original method. Proposition 26. If z ∈ A∗ is such that R(ε)(z) = L(z), then rowt (initS ) = rowt (ε) and tL (initS )(z) = tL (ε)(z). To see that the original method still works, we prove the analogue of Proposition 25 for the new definition of R. Proposition 27. If z ∈ A∗ is such that LS (z) = L(z) and R(ε)(z) = L(z), then there are u, v ∈ A∗ and a ∈ A such that rowt (rS† (ua)) = rowb (rS† (u))(a) and tL (rS† (ua))(v) = tL (rS† (u))(av). Example 28. Recall the succinct hypothesis S from Fig. 3c for the table in Fig. 2a. Note that S = S cannot be further reduced. The hypothesis is based on the right inverse i : H → P(S) of e : P(S) → H given by i(rowt (ε)) = {ε} and i(rowt (∅)) = ∅. This is the only possible right inverse because e is bijective. For the prefixes of the counterexample aa we have rS (ε) = {ε} and rS (a) = rS (aa) = ∅. Note that tL ({ε})(aa) = 1 while tL (∅)(a) = tL (∅)(ε) = 0. Thus, R(ε)(aa) = R(a)(a). Adding a to E would indeed create a closedness defect.
84
G. van Heerdt et al.
Query Complexity. Again, we measure the membership and equivalence query complexities in terms of the number of states n of the minimal Moore automaton, the number of states t of the minimal T -automaton, the size k of the alphabet, and the length m of the longest counterexample. A counterexample now gives an additional column instead of a set of rows, and we have seen that this leads to either a closedness defect or to two rows being distinguished. Thus, the number of equivalence queries is still at most t, and the number of columns is still in O(t). However, the number of rows that we need to fill using membership queries is now in O(nk). This means that a total of O(tnk) membership queries is needed to fill the table. Apart from filling the table, we also need queries to analyze counterexamples. The binary search algorithm mentioned after Corollary 24 requires for each counterexample O(log m) computations of R(x)(y) for varying words x and y. Let r be the maximum number of queries required for a single such computation. Note that for u, v ∈ A∗ , and letting α : T O → O be the algebra structure on O, we have R(u)(v) = α(T (evv ◦ tL )(Uqu )) for the original definition of R and R(u)(v) = α(T (evv ◦ tL )(rS† (u))) in the succinct hypothesis case. Since the restricted map T (evv ◦ tL ) : T S → T O is completely determined by evv ◦ tL : S → O, r is at most |S|, which is bounded by n in this optimized algorithm. For some examples (see for instance the writer automata in Sect. 7), we even have r = 1. The overall membership query complexity is O(tnk + tr log m). Dropping Consistency. We described the counterexample processing method based around Proposition 25 in terms of the succinct hypothesis S rather than the actual hypothesis H by showing that R can be defined using S. Since the definition of the succinct hypothesis does not rely on the property of consistency to be well-defined, this means we could drop the consistency check from the algorithm altogether. We can still measure progress in terms of the size of the set H, but it will not be the state space of an actual hypothesis during intermediate stages. This observation also explains why Bollig et al. [11] are able to use a weaker notion of consistency in their algorithm. Interestingly, they exploit the canonicity of their choice of succinct hypotheses to arrive at a polynomial membership query complexity that does not involve the factor t.
7
Examples
In this section we list several examples that can be seen as T -automata and hence learned via an instance of LT . We remark that, since our algorithm operates on finite structures (recall that T preserves finite sets), for each automaton type one can obtain a basic, correct-by-construction instance of LT for free, by plugging the concrete definition of the monad into the abstract algorithm. However, we note that this is not how LT is intended to be used in a real-world context; it should be seen as an abstract specification of the operations each concrete
Learning Automata with Side-Effects
85
implementation needs to perform, or, in other words, as a template for real implementations. For each instance below, we discuss whether certain operations admit a more efficient implementation than the basic one, based on the specific algebraic structure induced by the monad. Due to our general treatment, the optimizations of Sects. 5 and 6 apply to all of these instances. Non-deterministic Automata. As discussed before, non-deterministic automata are P-automata with a free state space, provided that O = 2 is equipped with the “or” operation as its P-algebra structure. We also mentioned that, as Bollig et al. [11] showed, there is a polynomial time algorithm to check whether a given row is the join of other rows. This gives an efficient method for handling closedness straight away. Moreover, as shown in Example 20, it allows for an efficient construction of the succinct hypothesis. Unfortunately, checking for consistency defects seems to require a number of computations exponential in the number of rows. However, as explained at the end of Sect. 6, we can in fact drop consistency altogether. Universal Automata. Just like non-deterministic automata, universal automata can be seen as P-automata with a free state space. The difference is that the P-algebra structure on O = 2 is dual: it is given by the “and” rather than the “or” operation. Universal automata accept a word when all paths reading that word are accepting. One can dualize the optimized specific algorithms for the case of non-deterministic automata. This is precisely what Angluin et al. [3] have done. Partial Automata. Consider the maybe monad Maybe(X) = 1 + X, with natural transformations having components ηX : X → 1 + X and μX : 1 + 1 + X → 1 + X defined in the standard way. Partial automata with states X can be represented as Maybe-automata with state space Maybe(X) = 1 + X, where there is an additional sink state, and output algebra O = Maybe(1) = 1 + 1. Here the left value is for rejecting states, including the sink one. The transition map δ : 1 + X → (1 + X)A represents an undefined transition as one going to the sink state. The algorithm LMaybe is mostly like L , except that implicitly the table has an additional row with zeroes in every column. Since the monad only adds a single element to each set, there is no need to optimize the basic algorithm for this specific case. Weighted Automata. Recall from Sect. 3 the free semimodule monad V , sending a set X to the free semimodule over a finite semiring S. Weighted automata over a set of states X can be represented as V -automata whose state space is the semimodule V (X), the output function out : V (X) → S assigns a weight to each state, and the transition map δ : V (X) → V (X)A sends each state and each input symbol to a linear combination of states. The obvious semimodule structure on S extends to a pointwise structure on the potential rows of the table. The basic algorithm loops over all linear combinations of rows to check closedness and over
86
G. van Heerdt et al.
all pairs of combinations of rows to check consistency, making them extremely expensive operations. If S is a field, a row can be decomposed into a linear combination of other rows in polynomial time using standard techniques from linear algebra. As a result, there are efficient procedures for checking closedness and constructing succinct hypotheses. It was shown by Van Heerdt et al. [21] that consistency in this setting is equivalent to closedness of the transpose of the table. This trick is due to Bergadano and Varricchio [7], who first studied learning of weighted automata. Alternating Automata. We use the characterization of alternating automata due to Bertrand and Rot [9]. Recall that, given a partially ordered set (P, ≤), an upset is a subset U of P such that, if x ∈ U and x ≤ y, then y ∈ U . Given Q ⊆ P , we write ↑Q for the upward closure of Q, that is the smallest upset of P containing Q. We consider the monad A that maps a set X to the set of all upsets of P(X). Its unit is given by ηX (x) =↑{{x}} and its multiplication by μX (U ) = {V ⊆ X | ∃W ∈U ∀Y ∈W ∃Z∈Y Z ⊆ V }. Algebras for the monad A are completely distributive lattices [27]. The sets of sets in A(X) can be seen as DNF formulae over elements of X, where the outer powerset is disjunctive and the inner one is conjunctive. Accordingly, we define an algebra structure β : A(2) → 2 on the output set 2 by letting β(U ) = 1 if {1} ∈ U , 0 otherwise. Alternating automata with states X can be represented as A-automata with state space A(X), output map out : A(X) → 2, and transition map δ : A(X) → A(X)A , sending each state to a DNF formula over X. The only difference with the usual definition of alternating automata is that A(X) is not the full set PP(X), which is not a monad [23]. However, for each formula in PP(X) there is an equivalent one in A(X). An adaptation of L for alternating automata was introduced by Angluin et al. [3] and further investigated by Berndt et al. [8]. The former found that given a row r ∈ 2E and a set of rows X ⊆ 2E , r is equal to a DNF combination of rows from X (where logical operators are applied component-wise) if and only if it is equal to the combination defined by Y = {{x ∈ X | x(e) = 1} | e ∈ E∧r(e) = 1}. We can reuse this idea to efficiently find closedness defects and to construct the hypothesis. Even though the monad A formally requires the use of DNF formulae representing upsets, in the actual implementation we can use smaller formulae, e.g., Y above instead of its upward closure. In fact, it is easy to check that DNF combinations of rows are invariant under upward closure. Similar as before, we do not know of an efficient way to ensure consistency, but we could drop it. Writer Automata. The examples considered so far involve existing classes of automata. To further demonstrate the generality of our approach, we introduce a new (as far as we know) type of automaton, which we call writer automaton. The writer monad Writer(X) = M × X for a finite monoid M has a unit ηX : X → M × X given by adding the unit e of the monoid, ηX (x) = (e, x), and a multiplication μX : M × M × X → M × X given by performing the monoid
Learning Automata with Side-Effects
87
multiplication, μX (m1 , m2 , x) = (m1 m2 , x). In Haskell, the writer monad is used for such tasks as collecting successive log messages, where the monoid is given by the set of sets or lists of possible messages and the multiplication adds a message. The algebras for this monad are sets Q equipped with an M-action. One may take the output object to be the set M with the monoid multiplication as its action. Writer-automata with a free state space can be represented as deterministic automata that have an element of M associated with each transition. The semantics is as expected: M-elements multiply along paths and finally multiply with the output of the last state to produce the actual output. The basic learning algorithm has polynomial time complexity. To determine whether a given row is a combination of rows in the table, i.e., whether it is given by a monoid value applied to one of the rows in the table, one simply tries all of these values. This allows us to check for closedness, to minimize the generators, and to construct the succinct hypothesis, in polynominal time. Consistency involves comparing all ways of applying monoid values to rows and, for each comparison, at most |A| further comparisons between one-letter extensions. The total number of comparisons is clearly polynomial in |M|, |S|, and |A|.
8
Conclusion
We have presented LT , a general adaptation of L that uses monads to learn an automaton with algebraic structure, as well as a method for finding a succinct equivalent based on its generators. Furthermore, we adapted the optimized counterexample handling method of Rivest and Schapire [28] to this setting and discussed instantiations to non-deterministic, universal, partial, weighted, alternating, and writer automata. Related Work. This paper builds on and extends the theoretical toolkit of Van Heerdt et al. [19,21], who are developing a categorical automata learning framework (CALF) in which learning algorithms can be understood and developed in a structured way. An adaptation of L that produces NFAs was first developed by Bollig et al. [11]. Their algorithm learns a special subclass of NFAs consisting of RFSAs, which were introduced by Denis et al. [16]. Angluin et al. [3] unified algorithms for NFAs, universal automata, and alternating automata, the latter of which was further improved by Berndt et al. [8]. We are able to provide a more general framework, which encompasses and goes beyond those classes of automata. Moreover, we study optimized counterexample handling, which [3,8,11] do not consider. The algorithm for weighted automata over an arbitrary field was studied in a category theoretical context by Jacobs and Silva [22] and elaborated on by Van Heerdt et al. [21]. The algorithm itself was introduced by Bergadano and Varricchio [7]. The theory of succinct automata used for our hypotheses is based on the work of Arbib and Manes [4], revamped to more recent category theory.
88
G. van Heerdt et al.
Future Work. Whereas our general algorithm effortlessly instantiates to monads that preserve finite sets, a major challenge lies in investigating monads that do not enjoy this property. The algorithm for weighted automata generalizes to an infinite field [7,21,22] and even a principal ideal domain [20]. However, for an infinite semiring in general we cannot guarantee termination, which is because a finitely generated semimodule may have an infinite chain of strict submodules [20]. Intuitively, this means that while fixing closedness defects increases the size of the hypothesis state space semimodule, an infinite number of steps may be needed to resolve all closedness defects. In future work we would like to characterize more precisely for which semirings we can learn, and ideally formulate this characterization on the monad level. As a result of the correspondence between learning and conformance testing [6,21], it should be possible to include in our framework the W-method [14], which is often used in case studies deploying L (e.g. [12,15]). We defer a thorough investigation of conformance testing to future work.
References 1. Aminof, B., Kupferman, O., Lampert, R.: Reasoning about online algorithms with weighted automata. ACM Trans. Algorithms 6(2), 28:1–28:36 (2010) 2. Angluin, D.: Learning regular sets from queries and counterexamples. Inf. Comput. 75, 87–106 (1987) 3. Angluin, D., Eisenstat, S., Fisman, D.: Learning regular languages via alternating automata. In: IJCAI, pp. 3308–3314 (2015) 4. Arbib, M.A., Manes, E.G.: Fuzzy machines in a category. Bull. AMS 13, 169–210 (1975) 5. Baier, C., Gr¨ oßer, M., Ciesinski, F.: Model checking linear-time properties of probabilistic systems. In: Droste, M., Kuich, W., Vogler, H. (eds.) Handbook of Weighted Automata. EATCS, pp. 519–570. Springer, Heidelberg (2009) 6. Berg, T., Grinchtein, O., Jonsson, B., Leucker, M., Raffelt, H., Steffen, B.: On the correspondence between conformance testing and regular Inference. In: Cerioli, M. (ed.) FASE 2005. LNCS, vol. 3442, pp. 175–189. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31984-9 14 7. Bergadano, F., Varricchio, S.: Learning behaviors of automata from multiplicity and equivalence queries. SIAM J. Comput. 25, 1268–1280 (1996) 8. Berndt, S., Li´skiewicz, M., Lutter, M., Reischuk, R.: Learning residual alternating automata. In: AAAI, pp. 1749–1755 (2017) 9. Bertrand, M., Rot, J.: Coalgebraic determinization of alternating automata. arXiv preprint arXiv:1804.02546 (2018) 10. Bollig, B., Habermehl, P., Kern, C., Leucker, M.: Angluin-style learning of NFA (research report LSV-08-28). Technical report, ENS Cachan (2008) 11. Bollig, B., Habermehl, P., Kern, C., Leucker, M.: Angluin-style learning of NFA. In: IJCAI, vol. 9, pp. 1004–1009 (2009) 12. Chalupar, G., Peherstorfer, S., Poll, E., de Ruiter, J.: Automated reverse engineerR In: WOOT (2014) ing using Lego. 13. Chatterjee, K., Doyen, L., Henzinger, T.A.: Quantitative languages. In: CSL, pp. 385–400 (2008)
Learning Automata with Side-Effects
89
14. Chow, T.S.: Testing software design modeled by finite-state machines. IEEE Trans. Softw. Eng. 4, 178–187 (1978) 15. de Ruiter, J., Poll, E.: Protocol state fuzzing of TLS implementations. In: USENIX Security, pp. 193–206 (2015) 16. Denis, F., Lemay, A., Terlutte, A.: Residual finite state automata. Fundamenta Informaticae 51, 339–368 (2002) 17. Droste, M., Gastin, P.: Weighted automata and weighted logics. In: Caires, L., Italiano, G.F., Monteiro, L., Palamidessi, C., Yung, M. (eds.) ICALP 2005. LNCS, vol. 3580, pp. 513–525. Springer, Heidelberg (2005). https://doi.org/10.1007/ 11523468 42 18. Goncharov, S., Milius, S., Silva, A.: Towards a coalgebraic Chomsky hierarchy. In: Diaz, J., Lanese, I., Sangiorgi, D. (eds.) TCS 2014. LNCS, vol. 8705, pp. 265–280. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44602-7 21 19. van Heerdt, G.: An abstract automata learning framework. Master’s thesis, Radboud University Nijmegen (2016) 20. van Heerdt, G., Kupke, C., Rot, J., Silva, A.: Learning weighted automata over principal ideal domains. arXiv preprint arXiv:1911.04404 (2019) 21. van Heerdt, G., Sammartino, M., Silva, A.: CALF: categorical automata learning framework. In: CSL, pp. 29:1–29:24 (2017) 22. Jacobs, B., Silva, A.: Automata learning: a categorical perspective. In: van Breugel, F., Kashefi, E., Palamidessi, C., Rutten, J. (eds.) Horizons of the Mind. A Tribute to Prakash Panangaden. LNCS, vol. 8464, pp. 384–406. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06880-0 20 23. Klin, B., Salamanca, J.: Iterated covariant powerset is not a monad. Electron. Notes Theor. Comput. Sci. 341, 261–276 (2018) 24. Kozen, D.C.: Automata and Computability. Springer, New York (2012) 25. Kuperberg, D.: Linear temporal logic for regular cost functions. Log. Methods Comput. Sci. 10(1) (2014) 26. Maler, O., Pnueli, A.: On the learnability of infinitary regular sets. Inf. Comput. 118, 316–326 (1995) 27. Markowsky, G.: Free completely distributive lattices. Proc. Am. Math. Soc. 74(2), 227–228 (1979) 28. Rivest, R.L., Schapire, R.E.: Inference of finite automata using homing sequences. Inf. Comput. 103, 299–347 (1993) 29. Schuts, M., Hooman, J., Vaandrager, F.: Refactoring of legacy software using model ´ learning and equivalence checking: an industrial experience report. In: Abrah´ am, E., Huisman, M. (eds.) IFM 2016. LNCS, vol. 9681, pp. 311–325. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-33693-0 20 30. Vaandrager, F.W.: Model learning. Commun. ACM 60(2), 86–95 (2017)
De Finetti’s Construction as a Categorical Limit Bart Jacobs1(B) and Sam Staton2 1
Institute for Computing and Information Sciences, Radboud University, Nijmegen, NL, The Netherlands [email protected] Department of Computer Science, University of Oxford, Oxford, UK [email protected]
2
Abstract. This paper reformulates a classical result in probability theory from the 1930s in modern categorical terms: de Finetti’s representation theorem is redescribed as limit statement for a chain of finite spaces in the Kleisli category of the Giry monad. This new limit is used to identify among exchangeable coalgebras the final one.
1
Introduction
An indifferent belief about the bias of a coin can be expressed in various ways. One way is to use elementary sentences like “When we toss the coin 7 times, the probability of 5 heads is 18 .” 1 “When we toss the coin k times, the probability of n heads is k+1 .”
(1)
Another way is to give a probability measure on the space [0, 1] of biases: “The bias of the coin is uniformly distributed across the interval [0, 1].”
(2)
This sentence (2) is more conceptually challenging than (1), but we may deduce (1) from (2) using Lebesgue integration: 0
1
7 5
r5 (1 − r)2 dr = 18 ,
0
1
k n
rn (1 − r)k−n dr =
1 k+1
k! where nk = n!(k−n)! is the binomial coefficient. Here we have considered indifferent beliefs as a first illustration, but more generally de Finetti’s theorem gives a passage from sentences about beliefs like (1) to sentences like (2). Theorem 1 (De Finetti, Informal). If we assign a probability to all finite experiment outcomes involving a coin (such as (1)), and these probabilities are all suitably consistent, then by doing so we have given a probability measure on [0, 1] (such as the uniform distribution, (2)). In this paper we give two categorical/coalgebraic formulations of this theorem. c IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 D. Petri¸san and J. Rot (Eds.): CMCS 2020, LNCS 12094, pp. 90–111, 2020. https://doi.org/10.1007/978-3-030-57201-3_6
De Finetti’s Construction as a Categorical Limit
91
– In Theorem 8, we state the result as: the unit interval [0, 1] is an inverse limit over a chain of finite spaces in a category of probability channels. In particular, a cone over this diagram is an assignment of finite probabilities such as (1) that is suitably consistent. – In Theorem 12, we state the result as: [0, 1] carries a final object in a category of (exchangeable) coalgebras for the functor 2 × (−) in the category of probability channels. A coalgebra can be thought of as a process that coinductively generates probabilities of the form (1). A coalgebra is a little more data than strictly necessary, but this extra data corresponds to the concept of sufficient statistics which is useful in Bayesian inference. Multisets and Probability Distributions. A multiset is a ‘set’ in which elements may occur multiple times. We can write such a multiset over a set {R, G, B}, representing red, green and blue balls, as: 3|R + 5|G + 2|B . This multiset can be seen as an urn containing three red, five green and two blue balls. A probability distribution is a convex combination of elements, where the frequencies (or probabilities) with which elements occur are numbers in the unit interval [0, 1] that add up to one, as in: 3 10 |R
+ 12 |G + 15 |B .
Obviously, distributions, whether discrete (as above) or continuous, play a central role in probability theory. Multisets also play a crucial role, for instance, an urn with coloured balls is very naturally represented as a multiset. Similarly, multiple data items in learning scenarios are naturally captured by multisets (see e.g. [12]). The two concepts are related, since any multiset gives rise to a probability distribution, informally by thinking of it as an urn and then drawing from it (sampling with replacement). This paper develops another example of how the interplay between multisets and distributions involves interesting structure, in an emerging area of categorical probability theory. Beliefs as Cones. In brief, following an experiment involving K tosses of a coin, we can set up an urn that approximately simulates the coin according to the frequencies that we have seen. So if we see 5 heads out of 7 tosses, we put 5 black balls into an urn together with 2 white balls, so that the chance of black (=heads) is 57 . We can describe a belief about a coin by describing the probabilities of ending up with certain urns (multisets) after certain experiments. Clearly, if we draw and discard a ball from an urn, then this is the same as stopping the experiment one toss sooner. So a consistent belief about our coin determines a cone over the chain M[1](2) o
◦
M[2](2) o
◦
M[3](2) o
◦
...
(3)
92
B. Jacobs and S. Staton
in the category of finite sets and channels (aka probability kernels). Here M[K](2) is the set of multisets over 2 of size K, i.e. urns containing K balls coloured black and white, and the leftward arrows describe randomly drawing and discarding a ball. We use the notation → for channels, a.k.a. Kleisli map, see below for details. Our formulation of de Finetti’s theorem says that the limit of this chain is [0, 1]. So to give a belief, i.e. a cone over (3), is to give a probability measure on [0, 1]. We set up the theory of multisets and cones in Sect. 3, proving the theorem in Sect. 6. Exchangeable Coalgebras. In many situations, it is helpful to describe a belief about a coin in terms of a stateful generative process, i.e. a channel (aka probability kernel) h : X → 2 × X, for some space X, which gives coinductively a probability for a sequence of outcomes. This is a coalgebra for the functor 2 × (−). An important example is P´ olya’s urn, which is an urn-based process, but one where the urn changes over time. Another important example puts the carrier set X = [0, 1] as the set of possible biases, and the process merely tosses a coin with the given bias. We show that [0, 1] carries the final object among the ‘exchangeable’ coalgebras. Thus there is a canonical channel X → [0, 1], taking the belief described by a generative process such as P´ olya’s urn to a probability measure on [0, 1] such as the uniform distribution. We set up the theory of exchangeable coalgebras in Sect. 4, proving the final coalgebra theorem in Sect. 7 by using the limit reformulation of de Finetti’s theorem.
2
Background on Mulitsets and Distributions
Multisets. A multiset (or bag) is a ‘set’ in which elements may occur multiple times. We shall write M(X) for the set of (finite) multisets with elements from a set X. Elements ϕ ∈ M(X) may be described as functions ϕ : X → N with finite support, that is, with supp(ϕ) := {x ∈ X| ϕ(x) = 0} is finite. Alternatively, multisets can be described as formal sums i ni |xi , where ni ∈ N describes the multiplicity of xi ∈ X, that is, the number of times that the element xi occurs in the multiset. The mapping X → M(X) is a monad on the category Sets of sets and functions. In fact, M(X) with pointwise addition and empty multiset 0, is the free commutative monoid on X. Frequently we need to have a grip on the total number of elements occurring in a multiset. Therefore we write, for K ∈ N, M[K](X) := {ϕ ∈ M(X) | ϕhas precisely K elements} = {ϕ ∈ M(X) | x∈X ϕ(x) = K}. Clearly, M[0](X) is a singleton, containing only the empty multiset 0. Each M[K](X) is a quotient of the set of sequences X K via an accumulator map: XK
accX
/ M[K](X)
with
acc X (x1 , . . . , xK ) :=
x∈X
i
1xi (x) x
(4)
De Finetti’s Construction as a Categorical Limit
93
where 1xi is the indicator function sending x ∈ X to1 if x = xi and to 0 otherwise. In particular, for X = 2, acc 2 (b1 , . . . , bK ) = ( i bi )|1 + (K − i bi )|0. The map acc X is permutation-invariant, in the sense that: acc X (x1 , . . . , xK ) = acc X (xσ(1) , . . . , xσ(K) )
for each permutation σ ∈ SK .
Such permutation-invariance, or exchangeability, plays an important role in de Finetti’s work [5], see the end of Sect. 7. Discrete Probability. A discrete probability distribution on a set X is a finite r |x of elements xi ∈ X with multiplicity ri ∈ [0, 1] such that formal sum i i i r = 1. It can also be described as a function X → [0, 1] with finite support. i i We write D(X) for the set of distributions on X. This gives a monad D on Sets; it satisfies D(1) ∼ = 1 and D(2) ∼ = [0, 1], where 1 = {0} and 2 = {0, 1}. Maps in the associated Kleisli category K(D) will be called channels and are written with a special arrow →. Thus f : X → Y denotes a function f : X → D(Y ). For a distribution ω ∈ D(X) and a channel g : Y → Z we define a new distribution f = ω ∈ D(Y ) and a new channel g ◦· f : X → Z by Kleisli composition:
g ◦· f (x) := g = f (x). f = ω := x ω(x) · f (x)(y) y y∈Y
Discrete Probability over Multisets. For each number K ∈ N and probability r ∈ [0, 1] there is the familiar binomial distribution binom[K](r) ∈ D({0, 1, . . . , K}). It captures probabilities for iterated flips of a known coin, and is given by the convex sum: K k K−k k . binom[K](r) := k · r · (1 − r) 0≤k≤K
The multiplicity probability before |k in this expression is the chance of getting k heads of out K coin flips, where each flip has fixed bias r ∈ [0, 1]. In this way we obtain a channel binom[K] : [0, 1] → {0, 1, . . . , K}. More generally, one can define a multinomial channel: D(X)
mulnom[K] ◦
/ M[K](X).
It is defined as: mulnom[K](ω) :=
ϕ∈M[K](X)
K! · ω(x)ϕ(x) ϕ . x x ϕ(x)!
The binomial channel is a special case, since [0, 1] ∼ = D(2) and {0, 1, . . . , K} ∼ = M[K](2), via k → k|0 + (K − k)|1. The binary case will play an important role in the sequel.
94
B. Jacobs and S. Staton
3
Drawing from an Urn
Recall our motivation. We have an unknown coin, and we are willing to give probabilities to the outcomes of different finite experiments with it. From this we ultimately want to say something about the coin. The only experiments we consider comprise tossing a coin a fixed number K times and recording the number of heads. The coin has no memory, and so the order of heads and tails doesn’t matter. So we can imagine an urn containing black balls and white balls, and in an experiment we start with an empty urn and put a black ball in the urn for each head and a white ball for each tail. The urn forgets the order in which the heads and tails appear. We can then describe our belief about the coin in terms of our beliefs about the probability of certain urns occurring in the different experiments. These probabilities should be consistent in the following informal sense, that connects the different experiments. – If we run the experiment with K + 1 tosses, and then discard a random ball from the urn, it should be the same as running the experiment with just K tosses. In more detail: because the coin tosses are exchangeable, that is, the coin is memoryless, to run the experiment with K + 1 tosses, is to run the experiment with K + 1 tosses and then report some chosen permutation of the results. This justifies representing the outcome of the experiment with an unordered urn. It follows that the statistics of the experiment are the same if we choose a permutation of the results at random. To stop the experiment after K tosses, i.e. to discard the ball corresponding to the last toss, is thus to discard a ball unifomly at random. As already mentioned: an urn filled withcertain objects of the kind x1 , . . . , xn is very naturally described as a multiset i ni |xi , where ni is the number of objects xi in the urn. Thus an urn containing objects from a set X is a multiset in M(X). For instance, an urn with three black and two white balls can be described as a multiset 3|B + 2|W ∈ M(2), where B stands for black and W for white. We can thus formalise a belief about the K-th experiment, where we toss our coin K times, as a distribution ωK ∈ D(M[K](2)), on urns with K black and white balls. For the consistency property, we need to talk about drawing and discarding from an urn. We define a channel that draws a single item from an urn:
D / D X × M[K](X) (5) M[K + 1](X) It is defined as: D(ϕ) :=
x∈supp(ϕ)
ϕ(x) x, ϕ − 1|x . ϕ(y) y
Concretely, in the above example: D 3|B + 2|W = 35 B, 2|B + 2|W + 25 W, 3|B + 1|W .
De Finetti’s Construction as a Categorical Limit
95
Notice that on the right-hand-side we use brackets | − inside brackets | − . The outer | − are for D and the inner ones for M. We can now describe the ‘draw-and-delete’ channel, by post-composing D with the second marginal: M[K + 1](X)
DD := D(π2 )◦D
/ D M[K](X)
ϕ − 1|x . Explicitly, DD(ϕ) = x ϕ(x) ϕ(y) y Now the consistency property requires that the beliefs about the coin comprise a sequence of distributions ωK ∈ D(M[K](2)) such that (DD = ωK+1 ) = ωK . Thus they should form a cone over the sequence M[0](2) o
DD ◦
M[1](2) o
DD ◦
M[2](2) o
DD ◦
... o
DD ◦
M[K](2) o
DD ◦
... (6)
This means that for all K, ωK (ϕ) =
ϕ(B)+1 K+1
· ωK+1 (ϕ + |B ) +
ϕ(W )+1 K+1
· ωK+1 (ϕ + |W ).
(7)
Known Bias is a Consistent Belief. If we know for sure that the bias of a coin is r ∈ [0, 1], then we would put ωK = binom[K](r). This is a consistent belief, which is generalised in the following result to the multinomial case. This cone forms the basis for this paper. Proposition 2. The multinomial channels form a cone for the chain of drawand-delete channels, as in:
Our reformulation of the de Finetti theorem explains the sense in which above cone is a limit when X = 2, see Theorem 8. Proof. For ω ∈ D(X) and ϕ ∈ M[K](X) we have: DD ◦· mulnom[K + 1] (ω)(ϕ) = mulnom[K + 1](ω)(ψ) · DD[K](ψ)(ϕ) ψ = mulnom[K + 1](ω)(ϕ + 1|x ) · ϕ(x)+1 K+1 x (K+1)! (ϕ+1| x )(y) = · ω(y) · ϕ(x)+1 K+1 y (ϕ+1| x )(y)! x y K! = ω(y)ϕ(y) · ω(x) ϕ(y)! · x
y
y
K! · ω(y)ϕ(y) · = x ω(x) y y ϕ(y)! = mulnom[K](ω)(ϕ).
96
B. Jacobs and S. Staton
Unknown Bias (Uniform) is a Consistent Belief. If we do not know the bias of our coin, it may be reasonable to say that all the (K + 1) outcomes in M[K](2) are equiprobable. This determines the discrete uniform distribution υK ∈ D(M[K](2)), υK :=
K k=0
1 K+1 k|B
+ (K − k)|W .
Proposition 3. The discrete uniform distributions υK form a cone for the chain of draw-and-delete channels, as in:
Proof. For ϕ ∈ M[K](2) we have: DD = υK+1 (ϕ) = υK+1 (ψ) · DD[K](ψ)(ϕ) ψ
= υK+1 (ϕ + |W ) · DD[K](ϕ + |W )(ϕ) + υK+1 (ϕ + |B ) · DD[K](ϕ + |B )(ϕ) = = =
4
ϕ(W )+1 ϕ(B)+1 1 1 K+2 · K+1 + K+2 · K+1 ϕ(W )+ϕ(B)+2 (K+2)(K+1) 1 since ϕ(W ) + ϕ(B) = K+1
K.
Coalgebras and P´ olya’s Urn
In the previous section we have seen two examples of cones over the chain (6): one for a coin with known bias (Proposition 2), and one where the bias of the coin is uniformly distributed (Proposition 3). In practice, it is helpful to describe our belief about the coin in terms of a data generating process. This is formalised as a channel h : X → 2 × X, i.e. as a coalgebra for the functor 2 × (−) on the category of channels. The idea is that our belief about the coin is captured by some parameters, which are the states of the coalgebra x ∈ X. The distribution h(x) describes our belief about the outcome of the next coin toss, but also provides a new belief state according to the outcome. For example, if we have uniform belief about the bias of the coin, we predict the outcome to be heads or tails equally, but afterward the toss our belief is refined: if we saw heads, we believe the coin to be more biased towards heads. P´ olya’s urn is an important example of such a coalgebra. We describe it as a multiset over 2 = {0, 1} with a certain extraction and re-fill operation.
De Finetti’s Construction as a Categorical Limit
97
The number 0 corresponds to black balls, and 1 to white ones, so that a multiset ϕ ∈ M(2) captures an urn with ϕ(0) black and ϕ(1) white ones. When a ball of color x ∈ 2 is extracted, it is put back together with an extra ball of the same color. We choose to describe it as a coalgebra on the set M∗ (2) of non-empty multisets over 2. Explicitly, pol : M∗ (2) → 2 × M∗ (2) is given by: pol(ϕ) :=
ϕ(0) ϕ(1) 0, ϕ + 1|0 + 1, ϕ + 1|1 . ϕ(0) + ϕ(1) ϕ(0) + ϕ(1)
So the urn keeps track of the outcomes of the coin tosses so far, and our belief about the bias of the coin is updated according to successive experiments. 4.1
Iterating Draws from Coalgebras
By iterating a coalgebra h : X → 2 × X, we can simulate the experiment with K coin tosses hK : X → M[K](2), by repeatedly applying h. Formally, we build a sequence of channels hK : X → M[K](2) × X by h0 (x) := (0, x) and: hK+1 := X
hK ◦
/ M[K](2) × X
id×h ◦ /
M[K](2) × 2 × X
/ M[K + 1](2) × X
add×id
where the rightmost map add : M[K](2) × 2 → M[K + 1](2) is add(ϕ, b) = ϕ + 1|b . (Another way to see this is to regard a coalgebra h : X → 2 × X in particular as coalgebra X → M(2) × X. Now M(2) × − is the writer monad for the commutative monoid M(2), i.e. the monad for M(2)-actions. Iterated Kleisli composition for this monad gives a sequence of maps h : X → M(2) × X.) For example, starting with P´olya’s urn pol : M∗ (2) → 2 × M∗ (2), we look at a few iterations: 0 pol (b|0 + w|1) = 1 0, b|0 + w|1 1 pol (b|0 + w|1) = pol(b|0 + w|1) b w 1|0, ϕ + |0 + b+w 1|1, ϕ + |1 = b+w 2 b b+1 pol (b|0 + w|1) = b+w 2|0, · b+w+1 (b + 2)|0 + w|1 b w + b+w · b+w+1 1|0 + 1|1, (b + 1)|0 + (w + 1)|1 w b 1|0 + 1|1, (b + 1)|0 + (w + 1)|1 + b+w · b+w+1 w w+1 + b+w · b+w+1 2|1, b|0 + (w + 2)|1 b(b+a) 2|0, (b + 2)|0 + w|1 = (b+w)(b+w+1) 2bw 1|0 + 1|1, (b + 1)|0 + (w + 1)|1 + (b+w)(b+w+1) w(w+1) 2|1, b|0 + (w + 2)|1 . + (b+w)(b+w+1) (10) It is not hard to see that we get as general formula, for K ∈ N: K i0 and measurable M ⊆ [0, 1] as: beta(α, β)(M ) :=
M
xα−1 · (1 − x)β−1 dx. B(α, β)
(20)
We prove some basic properties of the beta-binomial channel. These properties are well-known, but are formulated here in slightly non-standard form, using channels. Lemma 7. 1. A betabinomial channel can itself be decomposed as:
2. The beta-binomial channels R>0 × R>0
betabin[K] ◦
/ M[K](2)
for K ∈ N, form a cone for the infinite chain of draw-and-delete channels (6).
De Finetti’s Construction as a Categorical Limit
103
Proof. 1. Composition in the Kleisli category K(G) of the Giry monad G yields: binom[K] ◦· beta (α, β) k|0+(K − k)|1 1 α−1 K k · (1 − x)β−1 K−k x dx · x = · (1 − x) · k B(α, β) 0 1 α+k−1 0 x · (1 − x)β+(K−k)−1 dx = K k · B(α, β) (18) K B(α + k, β + K − k) = k · B(α,β) (17) = betabin[K](α, β) k|0+(K − k)|1 . 2. The easiest way to see this is by observing that the channels factorise via the binomial channels binom[K] : [0, 1] → M[K](2). In Proposition 2 we have seen that these binomial channels commute with draw-and-delete channels DD.
6
De Finetti as a Limit Theorem
We first describe our ‘limit’ reformulation of the de Finetti theorem and later on argue why this is a reformulation of the original result. Theorem 8. For X = 2 the cone (8) is a limit in the Kleisli category K(G) of the Giry monad; this limit is the unit interval [0, 1] ∼ = D(2). This means that if we have channels cK : Y → M[K](2) ∼ = {0, 1, . . . , K} with DD ◦· cK+1 = cK , then there is a unique mediating channel c : Y → [0, 1] with binom[K] ◦· c = cK , for each K ∈ N. In the previous section we have seen an example with Y = R>0 × R>0 and cK = betabin[K] : Y → M[K](2) and mediating channel c = beta : Y → [0, 1]. Below we give the proof, in the current setting. We first illustrate the central role of the draw-and-delete maps DD : M[K + 1](X) → M[K](X) and how they give rise to hypergeometric and binomial distributions. Via the basic isomorphism M[K](2) ∼ = {0, 1, . . . , K} the draw-and-delete map (7) becomes a map DD : {0, . . . , K + 1} → {0, . . . , K}, given by: DD() =
K+1 |
− 1 +
K+1− K+1 |.
(21)
Notice that the border cases = 0 and = K + 1 are handled implicitly, since the corresponding probabilities are then zero. We can iterate DD, say N times, giving DD N : {0, . . . , K +N } → {0, . . . , K}. If we draw-and-delete N times from an urn with K + N balls, the result is the same as sampling without replacement K times from the same urn. Sampling without replacement is modelled by the hypergeometric distribution. So we can calculate DD N explicitly, using the hypergeometric distribution formula: − k · K+N K−k N k . DD () = 0≤k≤K
K+N K
104
B. Jacobs and S. Staton
Implicitly the formal sum is over k ≤ , since otherwise the corresponding probabilities are zero. As is well known, for large N , the hypergeometric distribution can be approximated by the binomial distribution. To be precise, for fixed K, if ( /K+N ) → p as N → ∞, then the hypergeometric distribution DD N () approaches the binomial distribution binom[K](p) as N goes to infinity. This works as follows. Using = (n − i) = n(n − 1)(n − 2) · · · (n − m + 1) we can write the notation (n) m i