Program = Proof 9798615591839

This course provides a first introduction to the Curry-Howard correspondence between programs and proofs, from a theoret

268 77 2MB

English Pages 539 [489] Year 2020

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Introduction
Proving instead of testing
Typing as proving
Checking programs
Checking proofs
Searching for proofs
Foundations
In this course
Other references on programs and proofs
About this document
Typed functional programming
Introduction
Hello world
Execution
A statically typed language
A functional language
Other features
Basic constructions
Declarations
Functions
Booleans
Products
Lists
Strings
Unit
Recursive types
Trees
Usual recursive types
Abstract description
Option types and exceptions
The typing system
Usefulness of typing
Properties of typing
Safety
Typing as proving
Arrow as implication
Other connectives
Limitations of the correspondence
Propositional logic
Introduction
From provability to proofs
Intuitionism
Formalizing proofs
Properties of the logical system
Natural deduction
Formulas
Sequents
Inference rules
Intuitionistic natural deduction
Proofs
Fragments
Admissible rules
Definable connectives
Equivalence
Structural rules
Substitution
Cut elimination
Cuts
Proof substitution
Cut elimination
Consistency
Intuitionism
Commutative cuts
Proof search
Reversible rules
Proof search
Classical logic
Axioms for classical logic
The intuition behind classical logic
A variant of natural deduction
Cut-elimination in classical logic
De Morgan laws
Boolean models
DPLL
Resolution
Double-negation translation
Intermediate logics
Sequent calculus
Sequents
Rules
Intuitionistic rules
Cut elimination
Proof search
Hilbert calculus
Proofs
Other connectives
Relationship with natural deduction
Kripke semantics
Kripke structures
Completeness
Pure lambda-calculus
lambda-terms
Definition
Bound and free variables
Renaming and alpha-equivalence
Substitution
beta-reduction
Definition
An example
Reduction and redexes
Confluence
beta-reduction paths
Normalization
beta-equivalence
eta-equivalence
Computing in the lambda-calculus
Identity
Booleans
Pairs
Natural numbers
Fixpoints
Turing completeness
Self-interpreting
Adding constructors
Confluence of the lambda-calculus
Confluence
The parallel beta-reduction
Properties of the parallel beta-reduction
Confluence and the Church-Rosser theorem
Implementing reduction
Reduction strategies
Normalization by evaluation
Nameless syntaxes
The Barendregt convention
De Bruijn indices
Combinatory logic
Simply typed lambda-calculus
Typing
Types
Contexts
lambda-terms
Typing
Basic properties of the typing system
Type checking, type inference and typability
The Curry-Howard correspondence
Subject reduction
eta-expansion
Confluence
Strong normalization
A normalization strategy
Strong normalization
First consequences
Deciding convertibility
Weak normalization
Other connectives
Products
Unit
Coproducts
Empty type
Commuting conversions
Natural numbers
Strong normalization
Curry style typing
A typing system
Principal types
Computing the principal type
Hindley-Milner type inference
Bidirectional type checking
Hilbert calculus and combinators
Classical logic
Felleisen's C
The lambdamu-calculus
Classical logic as a typing system
A more symmetric calculus
First-order logic
Definition
Signature
Terms
Substitutions
Formulas
Bound and free variables
Natural deduction rules
Classical first order logic
Sequent calculus rules
Cut elimination
Eigenvariables
Curry-Howard
Theories
Equality
Properties of theories
Models
Presburger arithmetic
Peano and Heyting arithmetic
Set theory
Naive set theory
Zermelo-Fraenkel set theory
Intuitionistic set theory
Unification
Equation systems
Most general unifier
The unification algorithm
Implementation
Efficient implementation
Resolution
Agda
What is Agda?
Features of proof assistants
Installation
Getting started with Agda
Getting help
Shortcuts
The standard library
Hello world
Our first proof
Our first proof, step by step
Our first proof, again
Basic agda
The type of types
Arrow types
Functions
Postulates
Records
Modules
Inductive types: data
Natural numbers
Pattern matching
The induction principle
Booleans
Lists
Options
Vectors
Finite sets
Inductive types: logic
Implication
Product
Unit type
Empty type
Negation
Coproduct
Pi-types
Sigma-types
Predicates
Equality
Equality and pattern matching
Main properties
Half of even numbers
Reasoning
Definitional equality
More properties with equality
The J rule
Decidable equality
Heterogeneous equality
Proving programs in practice
Extrinsic vs intrinsic proofs
Insertion sort
The importance of the specification
Termination
Termination and consistency
Structural recursion
A bit of computability
The number of bits
The fuel technique
Well-founded induction
Division and modulo
Formalization of important results
Safety of a simple language
Natural deduction
Pure lambda-calculus
Naive approach
De Bruijn indices
Keeping track of free variables
Normalization by evaluation
Confluence
Combinatory logic
Simply typed lambda-calculus
Definition
Strong normalization
Normalization by evaluation
Dependent type theory
Core dependent type theory
Expressions
Free variables and substitution
Contexts
Definitional equality
Sequents
Rules for contexts
Rules for equality
Axiom rule
Terms and rules for type constructors
Rules for Pi-types
Admissible rules
Universes
The type of Type
Russell's paradox in type theory
Girard's paradox
The hierarchy of universes
More type constructors
Empty type
Unit type
Products
Dependent sums
Coproducts
Booleans
Natural numbers
Other type constructors
Inductive types
W-types
Rules for W-types
More inductive types
The positivity condition
Disjointedness and injectivity of constructors
Implementing type theory
Expressions
Evaluation
Convertibility
Typechecking
Testing
Homotopy type theory
Identity types
Definitional and propositional equality
Propositional equality in Agda
The rules
Leibniz equality
Extensionality of equality
Uniqueness of identity proofs
Types as spaces
Intuition about the model
The structure of paths
n-types
Propositions
Sets
n-types
Propositional truncation
Univalence
Operations with paths
Equivalences
Univalence
Applications of univalence
Describing identity types
Describing propositions
Incompatibility with set theoretic interpretation
Equivalences
Function extensionality
Propositional extensionality
Higher inductive types
Rules for higher types
Paths over
Circle as a higher inductive type
Useful higher inductive types
Appendix
Relations
Definition
Closure
Quotient
Congruence
Monoids
Definition
Free monoids
Well-founded orders
Partial orders
Well-founded orders
Lexicographic order
Trees
Multisets
Cantor's diagonal argument
A general Cantor argument
Agda formalization
Recommend Papers

Program = Proof
 9798615591839

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Samuel MIMRAM

Contents

0 Introduction 0.1 Proving instead of testing . . . . . . . . 0.2 Typing as proving . . . . . . . . . . . . 0.3 Checking programs . . . . . . . . . . . . 0.4 Checking proofs . . . . . . . . . . . . . . 0.5 Searching for proofs . . . . . . . . . . . 0.6 Foundations . . . . . . . . . . . . . . . . 0.7 In this course . . . . . . . . . . . . . . . 0.8 Other references on programs and proofs 0.9 About this document . . . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

10 10 11 13 13 15 15 16 16 16

1 Typed functional programming 1.1 Introduction . . . . . . . . . . . . . . . . . 1.1.1 Hello world . . . . . . . . . . . . . 1.1.2 Execution . . . . . . . . . . . . . . 1.1.3 A statically typed language . . . . 1.1.4 A functional language . . . . . . . 1.1.5 Other features . . . . . . . . . . . 1.2 Basic constructions . . . . . . . . . . . . . 1.2.1 Declarations . . . . . . . . . . . . . 1.2.2 Functions . . . . . . . . . . . . . . 1.2.3 Booleans . . . . . . . . . . . . . . 1.2.4 Products . . . . . . . . . . . . . . 1.2.5 Lists . . . . . . . . . . . . . . . . . 1.2.6 Strings . . . . . . . . . . . . . . . . 1.2.7 Unit . . . . . . . . . . . . . . . . . 1.3 Recursive types . . . . . . . . . . . . . . . 1.3.1 Trees . . . . . . . . . . . . . . . . . 1.3.2 Usual recursive types . . . . . . . . 1.3.3 Abstract description . . . . . . . . 1.3.4 Option types and exceptions . . . 1.4 The typing system . . . . . . . . . . . . . 1.4.1 Usefulness of typing . . . . . . . . 1.4.2 Properties of typing . . . . . . . . 1.4.3 Safety . . . . . . . . . . . . . . . . 1.5 Typing as proving . . . . . . . . . . . . . 1.5.1 Arrow as implication . . . . . . . . 1.5.2 Other connectives . . . . . . . . . 1.5.3 Limitations of the correspondence

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

18 18 18 18 19 19 20 20 20 21 21 22 22 23 23 23 23 24 26 28 29 29 30 31 38 38 39 40

CONTENTS 2 Propositional logic 2.1 Introduction . . . . . . . . . . . . . . . . . . 2.1.1 From provability to proofs . . . . . . 2.1.2 Intuitionism . . . . . . . . . . . . . . 2.1.3 Formalizing proofs . . . . . . . . . . 2.1.4 Properties of the logical system . . . 2.2 Natural deduction . . . . . . . . . . . . . . 2.2.1 Formulas . . . . . . . . . . . . . . . 2.2.2 Sequents . . . . . . . . . . . . . . . . 2.2.3 Inference rules . . . . . . . . . . . . 2.2.4 Intuitionistic natural deduction . . . 2.2.5 Proofs . . . . . . . . . . . . . . . . . 2.2.6 Fragments . . . . . . . . . . . . . . . 2.2.7 Admissible rules . . . . . . . . . . . 2.2.8 Definable connectives . . . . . . . . 2.2.9 Equivalence . . . . . . . . . . . . . . 2.2.10 Structural rules . . . . . . . . . . . . 2.2.11 Substitution . . . . . . . . . . . . . . 2.3 Cut elimination . . . . . . . . . . . . . . . . 2.3.1 Cuts . . . . . . . . . . . . . . . . . . 2.3.2 Proof substitution . . . . . . . . . . 2.3.3 Cut elimination . . . . . . . . . . . . 2.3.4 Consistency . . . . . . . . . . . . . . 2.3.5 Intuitionism . . . . . . . . . . . . . . 2.3.6 Commutative cuts . . . . . . . . . . 2.4 Proof search . . . . . . . . . . . . . . . . . . 2.4.1 Reversible rules . . . . . . . . . . . . 2.4.2 Proof search . . . . . . . . . . . . . . 2.5 Classical logic . . . . . . . . . . . . . . . . . 2.5.1 Axioms for classical logic . . . . . . 2.5.2 The intuition behind classical logic . 2.5.3 A variant of natural deduction . . . 2.5.4 Cut-elimination in classical logic . . 2.5.5 De Morgan laws . . . . . . . . . . . 2.5.6 Boolean models . . . . . . . . . . . . 2.5.7 DPLL . . . . . . . . . . . . . . . . . 2.5.8 Resolution . . . . . . . . . . . . . . . 2.5.9 Double-negation translation . . . . . 2.5.10 Intermediate logics . . . . . . . . . . 2.6 Sequent calculus . . . . . . . . . . . . . . . 2.6.1 Sequents . . . . . . . . . . . . . . . . 2.6.2 Rules . . . . . . . . . . . . . . . . . 2.6.3 Intuitionistic rules . . . . . . . . . . 2.6.4 Cut elimination . . . . . . . . . . . . 2.6.5 Proof search . . . . . . . . . . . . . . 2.7 Hilbert calculus . . . . . . . . . . . . . . . . 2.7.1 Proofs . . . . . . . . . . . . . . . . . 2.7.2 Other connectives . . . . . . . . . . 2.7.3 Relationship with natural deduction 2.8 Kripke semantics . . . . . . . . . . . . . . .

3

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

41 41 41 42 43 44 44 45 45 46 46 47 49 49 52 53 54 55 56 57 57 58 61 62 64 65 65 66 67 68 70 72 74 75 79 81 85 89 91 92 93 93 96 97 97 102 102 103 103 105

CONTENTS

4

2.8.1 2.8.2

Kripke structures . . . . . . . . . . . . . . . . . . . . . . . 105 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . 107

3 Pure λ-calculus 3.1 λ-terms . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Definition . . . . . . . . . . . . . . . . . . . 3.1.2 Bound and free variables . . . . . . . . . . . 3.1.3 Renaming and α-equivalence . . . . . . . . 3.1.4 Substitution . . . . . . . . . . . . . . . . . . 3.2 β-reduction . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Definition . . . . . . . . . . . . . . . . . . . 3.2.2 An example . . . . . . . . . . . . . . . . . . 3.2.3 Reduction and redexes . . . . . . . . . . . . 3.2.4 Confluence . . . . . . . . . . . . . . . . . . 3.2.5 β-reduction paths . . . . . . . . . . . . . . 3.2.6 Normalization . . . . . . . . . . . . . . . . . 3.2.7 β-equivalence . . . . . . . . . . . . . . . . . 3.2.8 η-equivalence . . . . . . . . . . . . . . . . . 3.3 Computing in the λ-calculus . . . . . . . . . . . . . 3.3.1 Identity . . . . . . . . . . . . . . . . . . . . 3.3.2 Booleans . . . . . . . . . . . . . . . . . . . 3.3.3 Pairs . . . . . . . . . . . . . . . . . . . . . . 3.3.4 Natural numbers . . . . . . . . . . . . . . . 3.3.5 Fixpoints . . . . . . . . . . . . . . . . . . . 3.3.6 Turing completeness . . . . . . . . . . . . . 3.3.7 Self-interpreting . . . . . . . . . . . . . . . 3.3.8 Adding constructors . . . . . . . . . . . . . 3.4 Confluence of the λ-calculus . . . . . . . . . . . . . 3.4.1 Confluence . . . . . . . . . . . . . . . . . . 3.4.2 The parallel β-reduction . . . . . . . . . . . 3.4.3 Properties of the parallel β-reduction . . . . 3.4.4 Confluence and the Church-Rosser theorem 3.5 Implementing reduction . . . . . . . . . . . . . . . 3.5.1 Reduction strategies . . . . . . . . . . . . . 3.5.2 Normalization by evaluation . . . . . . . . . 3.6 Nameless syntaxes . . . . . . . . . . . . . . . . . . 3.6.1 The Barendregt convention . . . . . . . . . 3.6.2 De Bruijn indices . . . . . . . . . . . . . . . 3.6.3 Combinatory logic . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

110 111 111 112 112 113 114 114 115 115 115 116 116 117 117 118 118 118 119 119 122 126 128 128 128 129 130 130 135 136 136 142 146 147 147 152

4 Simply typed λ-calculus 4.1 Typing . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Types . . . . . . . . . . . . . . . . . . . . . . 4.1.2 Contexts . . . . . . . . . . . . . . . . . . . . 4.1.3 λ-terms . . . . . . . . . . . . . . . . . . . . . 4.1.4 Typing . . . . . . . . . . . . . . . . . . . . . 4.1.5 Basic properties of the typing system . . . . . 4.1.6 Type checking, type inference and typability 4.1.7 The Curry-Howard correspondence . . . . . . 4.1.8 Subject reduction . . . . . . . . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

159 159 159 159 160 160 161 162 164 167

CONTENTS

4.2

4.3

4.4

4.5 4.6

5

4.1.9 η-expansion . . . . . . . . . . . . 4.1.10 Confluence . . . . . . . . . . . . Strong normalization . . . . . . . . . . . 4.2.1 A normalization strategy . . . . 4.2.2 Strong normalization . . . . . . . 4.2.3 First consequences . . . . . . . . 4.2.4 Deciding convertibility . . . . . . 4.2.5 Weak normalization . . . . . . . Other connectives . . . . . . . . . . . . . 4.3.1 Products . . . . . . . . . . . . . 4.3.2 Unit . . . . . . . . . . . . . . . . 4.3.3 Coproducts . . . . . . . . . . . . 4.3.4 Empty type . . . . . . . . . . . . 4.3.5 Commuting conversions . . . . . 4.3.6 Natural numbers . . . . . . . . . 4.3.7 Strong normalization . . . . . . . Curry style typing . . . . . . . . . . . . 4.4.1 A typing system . . . . . . . . . 4.4.2 Principal types . . . . . . . . . . 4.4.3 Computing the principal type . . 4.4.4 Hindley-Milner type inference . . 4.4.5 Bidirectional type checking . . . Hilbert calculus and combinators . . . . Classical logic . . . . . . . . . . . . . . . 4.6.1 Felleisen’s C . . . . . . . . . . . . 4.6.2 The λµ-calculus . . . . . . . . . 4.6.3 Classical logic as a typing system 4.6.4 A more symmetric calculus . . .

5 First-order logic 5.1 Definition . . . . . . . . . . . . . . . . 5.1.1 Signature . . . . . . . . . . . . 5.1.2 Terms . . . . . . . . . . . . . . 5.1.3 Substitutions . . . . . . . . . . 5.1.4 Formulas . . . . . . . . . . . . 5.1.5 Bound and free variables . . . . 5.1.6 Natural deduction rules . . . . 5.1.7 Classical first order logic . . . . 5.1.8 Sequent calculus rules . . . . . 5.1.9 Cut elimination . . . . . . . . . 5.1.10 Eigenvariables . . . . . . . . . 5.1.11 Curry-Howard . . . . . . . . . 5.2 Theories . . . . . . . . . . . . . . . . . 5.2.1 Equality . . . . . . . . . . . . . 5.2.2 Properties of theories . . . . . 5.2.3 Models . . . . . . . . . . . . . 5.2.4 Presburger arithmetic . . . . . 5.2.5 Peano and Heyting arithmetic . 5.3 Set theory . . . . . . . . . . . . . . . . 5.3.1 Naive set theory . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

170 171 171 171 172 175 176 176 178 181 184 184 186 187 188 189 190 190 191 192 198 207 211 212 212 215 217 219

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

221 221 221 221 222 222 223 224 225 227 228 229 230 233 233 233 234 237 238 240 240

CONTENTS . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

242 246 251 252 252 253 256 257 259

6 Agda 6.1 What is Agda? . . . . . . . . . . . . . 6.1.1 Features of proof assistants . . 6.1.2 Installation . . . . . . . . . . . 6.2 Getting started with Agda . . . . . . . 6.2.1 Getting help . . . . . . . . . . 6.2.2 Shortcuts . . . . . . . . . . . . 6.2.3 The standard library . . . . . . 6.2.4 Hello world . . . . . . . . . . . 6.2.5 Our first proof . . . . . . . . . 6.2.6 Our first proof, step by step . . 6.2.7 Our first proof, again . . . . . 6.3 Basic agda . . . . . . . . . . . . . . . . 6.3.1 The type of types . . . . . . . . 6.3.2 Arrow types . . . . . . . . . . . 6.3.3 Functions . . . . . . . . . . . . 6.3.4 Postulates . . . . . . . . . . . . 6.3.5 Records . . . . . . . . . . . . . 6.3.6 Modules . . . . . . . . . . . . . 6.4 Inductive types: data . . . . . . . . . . 6.4.1 Natural numbers . . . . . . . . 6.4.2 Pattern matching . . . . . . . . 6.4.3 The induction principle . . . . 6.4.4 Booleans . . . . . . . . . . . . 6.4.5 Lists . . . . . . . . . . . . . . . 6.4.6 Options . . . . . . . . . . . . . 6.4.7 Vectors . . . . . . . . . . . . . 6.4.8 Finite sets . . . . . . . . . . . . 6.5 Inductive types: logic . . . . . . . . . 6.5.1 Implication . . . . . . . . . . . 6.5.2 Product . . . . . . . . . . . . . 6.5.3 Unit type . . . . . . . . . . . . 6.5.4 Empty type . . . . . . . . . . . 6.5.5 Negation . . . . . . . . . . . . 6.5.6 Coproduct . . . . . . . . . . . . 6.5.7 Π-types . . . . . . . . . . . . . 6.5.8 Σ-types . . . . . . . . . . . . . 6.5.9 Predicates . . . . . . . . . . . . 6.6 Equality . . . . . . . . . . . . . . . . . 6.6.1 Equality and pattern matching

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

262 262 262 266 267 267 267 268 269 269 271 272 274 274 274 276 277 277 277 278 278 278 281 282 283 284 284 286 287 287 287 289 289 289 290 291 292 293 295 295

5.4

5.3.2 Zermelo-Fraenkel set theory 5.3.3 Intuitionistic set theory . . Unification . . . . . . . . . . . . . 5.4.1 Equation systems . . . . . . 5.4.2 Most general unifier . . . . 5.4.3 The unification algorithm . 5.4.4 Implementation . . . . . . . 5.4.5 Efficient implementation . . 5.4.6 Resolution . . . . . . . . . .

6 . . . . . . . . .

CONTENTS

7 . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

296 296 297 298 299 301 301 302 304 304 306 308 310 310 310 312 313 314 315 321

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

324 324 327 329 329 330 333 333 334 337 339 339 342 346

8 Dependent type theory 8.1 Core dependent type theory . . . . . . . . . . 8.1.1 Expressions . . . . . . . . . . . . . . . 8.1.2 Free variables and substitution . . . . 8.1.3 Contexts . . . . . . . . . . . . . . . . 8.1.4 Definitional equality . . . . . . . . . . 8.1.5 Sequents . . . . . . . . . . . . . . . . . 8.1.6 Rules for contexts . . . . . . . . . . . 8.1.7 Rules for equality . . . . . . . . . . . . 8.1.8 Axiom rule . . . . . . . . . . . . . . . 8.1.9 Terms and rules for type constructors 8.1.10 Rules for Π-types . . . . . . . . . . . . 8.1.11 Admissible rules . . . . . . . . . . . . 8.2 Universes . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

351 351 351 352 352 353 353 354 354 355 355 356 358 359

6.7

6.8

6.6.2 Main properties . . . . . . . . . . . . 6.6.3 Half of even numbers . . . . . . . . . 6.6.4 Reasoning . . . . . . . . . . . . . . . 6.6.5 Definitional equality . . . . . . . . . 6.6.6 More properties with equality . . . . 6.6.7 The J rule . . . . . . . . . . . . . . . 6.6.8 Decidable equality . . . . . . . . . . 6.6.9 Heterogeneous equality . . . . . . . Proving programs in practice . . . . . . . . 6.7.1 Extrinsic vs intrinsic proofs . . . . . 6.7.2 Insertion sort . . . . . . . . . . . . . 6.7.3 The importance of the specification . Termination . . . . . . . . . . . . . . . . . . 6.8.1 Termination and consistency . . . . 6.8.2 Structural recursion . . . . . . . . . 6.8.3 A bit of computability . . . . . . . . 6.8.4 The number of bits . . . . . . . . . . 6.8.5 The fuel technique . . . . . . . . . . 6.8.6 Well-founded induction . . . . . . . 6.8.7 Division and modulo . . . . . . . . .

7 Formalization of important results 7.1 Safety of a simple language . . . . . . 7.2 Natural deduction . . . . . . . . . . . 7.3 Pure λ-calculus . . . . . . . . . . . . . 7.3.1 Naive approach . . . . . . . . . 7.3.2 De Bruijn indices . . . . . . . . 7.3.3 Keeping track of free variables 7.3.4 Normalization by evaluation . . 7.3.5 Confluence . . . . . . . . . . . 7.4 Combinatory logic . . . . . . . . . . . 7.5 Simply typed λ-calculus . . . . . . . . 7.5.1 Definition . . . . . . . . . . . . 7.5.2 Strong normalization . . . . . . 7.5.3 Normalization by evaluation . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

359 359 363 367 370 370 371 372 373 374 375 376 378 378 379 382 382 385 389 390 391 392 393 396 398

9 Homotopy type theory 9.1 Identity types . . . . . . . . . . . . . . . . . . . . . . . . 9.1.1 Definitional and propositional equality . . . . . . 9.1.2 Propositional equality in Agda . . . . . . . . . . 9.1.3 The rules . . . . . . . . . . . . . . . . . . . . . . 9.1.4 Leibniz equality . . . . . . . . . . . . . . . . . . . 9.1.5 Extensionality of equality . . . . . . . . . . . . . 9.1.6 Uniqueness of identity proofs . . . . . . . . . . . 9.2 Types as spaces . . . . . . . . . . . . . . . . . . . . . . . 9.2.1 Intuition about the model . . . . . . . . . . . . . 9.2.2 The structure of paths . . . . . . . . . . . . . . . 9.3 n-types . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.1 Propositions . . . . . . . . . . . . . . . . . . . . 9.3.2 Sets . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.3 n-types . . . . . . . . . . . . . . . . . . . . . . . 9.3.4 Propositional truncation . . . . . . . . . . . . . . 9.4 Univalence . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4.1 Operations with paths . . . . . . . . . . . . . . . 9.4.2 Equivalences . . . . . . . . . . . . . . . . . . . . 9.4.3 Univalence . . . . . . . . . . . . . . . . . . . . . 9.4.4 Applications of univalence . . . . . . . . . . . . . 9.4.5 Describing identity types . . . . . . . . . . . . . 9.4.6 Describing propositions . . . . . . . . . . . . . . 9.4.7 Incompatibility with set theoretic interpretation

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

399 400 400 400 401 402 404 406 407 407 411 413 414 419 423 426 437 437 439 442 443 444 445 447

8.3

8.4

8.5

8.2.1 The type of Type . . . . . . . . . . . . . . . . 8.2.2 Russell’s paradox in type theory . . . . . . . 8.2.3 Girard’s paradox . . . . . . . . . . . . . . . . 8.2.4 The hierarchy of universes . . . . . . . . . . . More type constructors . . . . . . . . . . . . . . . . . 8.3.1 Empty type . . . . . . . . . . . . . . . . . . . 8.3.2 Unit type . . . . . . . . . . . . . . . . . . . . 8.3.3 Products . . . . . . . . . . . . . . . . . . . . 8.3.4 Dependent sums . . . . . . . . . . . . . . . . 8.3.5 Coproducts . . . . . . . . . . . . . . . . . . . 8.3.6 Booleans . . . . . . . . . . . . . . . . . . . . 8.3.7 Natural numbers . . . . . . . . . . . . . . . . 8.3.8 Other type constructors . . . . . . . . . . . . Inductive types . . . . . . . . . . . . . . . . . . . . . 8.4.1 W-types . . . . . . . . . . . . . . . . . . . . . 8.4.2 Rules for W-types . . . . . . . . . . . . . . . 8.4.3 More inductive types . . . . . . . . . . . . . . 8.4.4 The positivity condition . . . . . . . . . . . . 8.4.5 Disjointedness and injectivity of constructors Implementing type theory . . . . . . . . . . . . . . . 8.5.1 Expressions . . . . . . . . . . . . . . . . . . . 8.5.2 Evaluation . . . . . . . . . . . . . . . . . . . 8.5.3 Convertibility . . . . . . . . . . . . . . . . . . 8.5.4 Typechecking . . . . . . . . . . . . . . . . . . 8.5.5 Testing . . . . . . . . . . . . . . . . . . . . .

8 . . . . . . . . . . . . . . . . . . . . . . . . .

CONTENTS

9.5

9

9.4.8 Equivalences . . . . . . . . . . . 9.4.9 Function extensionality . . . . . 9.4.10 Propositional extensionality . . . Higher inductive types . . . . . . . . . . 9.5.1 Rules for higher types . . . . . . 9.5.2 Paths over . . . . . . . . . . . . . 9.5.3 Circle as a higher inductive type 9.5.4 Useful higher inductive types . .

A Appendix A.1 Relations . . . . . . . . . . . . . . A.1.1 Definition . . . . . . . . . . A.1.2 Closure . . . . . . . . . . . A.1.3 Quotient . . . . . . . . . . . A.1.4 Congruence . . . . . . . . . A.2 Monoids . . . . . . . . . . . . . . . A.2.1 Definition . . . . . . . . . . A.2.2 Free monoids . . . . . . . . A.3 Well-founded orders . . . . . . . . A.3.1 Partial orders . . . . . . . . A.3.2 Well-founded orders . . . . A.3.3 Lexicographic order . . . . A.3.4 Trees . . . . . . . . . . . . . A.3.5 Multisets . . . . . . . . . . A.4 Cantor’s diagonal argument . . . . A.4.1 A general Cantor argument A.4.2 Agda formalization . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

449 449 456 456 456 459 460 461

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

463 463 463 463 464 464 464 464 464 465 465 465 466 467 468 469 469 471

Chapter 0

Introduction These are the extended notes for the INF551 course which I taught at École Polytechnique starting from 2019. The goal is to give a first introduction to the Curry-Howard correspondence between programs and proofs, from a theoretical programmer’s perspective: we want to understand the theory behind logic and programming languages, but also to write concrete programs (in OCaml) and proofs (in Agda). Although most of the material is self-contained, the reader is supposed to be already acquainted with logic and programming.

0.1 Proving instead of testing Most of the current software development is validated by performing tests: we run the programs with various values for the parameters, chosen in order to cover most branches of the program, and, if no bug has occurred during those executions, we consider that the program is good enough for production. The reason for this is that we consider that if the program uses “small” constants and “regular enough” functions then a large number of tests should be able to cover all the general behaviors of the program. Seriously following such a discipline greatly reduces the number of bugs, especially the simple ones, but we all know that it does not completely eliminates those: in some very particular and unlucky situations, problems still do happen. In mathematics, the usual approach is quite different. For instance, when proving a property P (n) over natural numbers, a typical mathematician, will not test that P (0) holds, P (1) holds, P (2) holds, and so on, up to a big number, and if the property is experimentally always verified, he will claim: “I am almost certain that the property P is always true”. He will maybe perform some tests in order to verify whether the conjecture is plausible or not, but in the end he will write down a proof, which ensures that the property P (n) is always verified, for eternity, even if someone makes a particularly unlucky or perverse choice for n. Proving instead of testing does require some extra work, but the confidence it brings to the results is incomparable. Let us present an extreme example of why this is the right way to proceed. On day one, our mathematician finds out using a formal computation software that Z ∞ π sin(t) dt = t 2 0 On day two, he tries to play around a bit with such formulas and finds out that Z ∞ sin(t) sin(t/101) π dt = t t/101 2 0 On day three, he thinks maybe a pattern could emerge and discovers that Z ∞ π sin(t) sin(t/101) sin(t/201) dt = t t/101 t/201 2 0

CHAPTER 0. INTRODUCTION

11

On day four, he gets quite confident and conjectures that, for every n ∈ N, ! Z ∞ Y n sin(t/(100n + 1)) π dt = t/(100n + 1) 2 0 i=0 He then spends the rest of the year heating his computer and the planet, successfully proving the conjecture for increasing values of n. This approach seems to be justified since the most complicated function involved in here is the sinus, which is quite regular (it is periodic), and all the constants are small (we get factors such as 100), so that if something bad ought to happen it will happen for a not-so-big value of n and testing should discover it. In fact, the conjecture breaks starting at n = 15 341 178 777 673 149 429 167 740 440 969 249 338 310 889 and none of the usual tests would have found this out. There is a nice explanation for this which we will not give here, see [BB01, Bae18], but the moral is: if you want to be sure of something, don’t test it, prove it. On the computer science side, analogous examples abound where errors have been found in programs, although heavily tested. They have recently increased with the advent of parallel computing (for instance, in order to exploit all the cores that you have on your laptop or even your smartphone), where bugs might be triggered by some particular and very rare scheduling of processes. Already in the 70s, Dijkstra was claiming that program testing can be used to show the presence of bugs, but never to show their absence! [Dij70], and the idea of formally verifying programs can even be traced back 20 years before that by, as usual, Turing [Tur49]. If we want to have software we can really trust (and not trust most of the time), we should move from testing to proving in computer science too. In this course, you will learn how to perform such proofs, as well as the theory behind it. Actually, the most complicated program we will prove correct here is a sorting algorithm and I can already hear you thinking “come on, we have been writing sorting algorithms for decades, we should know how to write one by now”. While I understand your point, I have two arguments to provide for this. Firstly, proving a more realistic program is only a matter of time (and experience): the course covers most of the concepts required to perform proofs, and attacking full-fledged code will not require new techniques, only patience. Secondly, in 2015, some researchers found out, using formal methods, that the default sorting algorithm (the one in the standard library, not some obscure library found on the GitHub) in both Python and Java (not some obscure programming language) was flawed, and the bug had been standing there for more than a decade [dGdBB+ 19]...

0.2 Typing as proving But how can we achieve this goal of applying techniques of proofs to programs? It turns out that we do not even need to come up with some new ideas thanks to the so-called proof-as-program correspondence discovered in the 1960s by Curry and Howard: a program is secretly the same as a proof! More precisely, in a typed functional programming language, the type of a program can be read as

CHAPTER 0. INTRODUCTION

12

a formula, and the program itself contains exactly the information required to prove this formula. This is the one thing to remember from this course: PROGRAM = PROOF This deep relationship allows the use techniques from mathematics in order to study programs, but also can be used to extract computational contents from proofs in mathematics. The goal of this course is to give a precise meaning to this vague description, but let us give an example in order to understand it better. In a functional language such as OCaml, we can write a function such as let comp f g x = g (f x) and the compiler will automatically infer a type for it. Here, it will be ('a -> 'b) -> ('b -> 'c) -> ('a -> 'c) meaning that for any types ’a, ’b and ’c, – if f is a function which takes a value of type ’a as argument and returns a value of type ’b, – if g is a function which takes a value of type ’b as argument and returns a value of type ’c, and – if x is a value of type ’a, then the result is of type ’c. For instance, with the function succ of type int -> int (it adds one to an integer), and the function string_of_int of type int -> string (it converts an integer to a string), the expression comp succ string_of_int 2 will be of type string (it will evaluate to "3"). Now, if we read -> as a logical implication ⇒, the type can be written as (A ⇒ B) ⇒ (B ⇒ C) ⇒ (A ⇒ C) which is a valid formula. This is not by chance: in some sense, the program comp can be considered as a way of proving that this formula is valid. Of course, if we want to prove richer properties of programs (or use programs to prove more interesting formulas), we should use a logic which is more expressive than propositional logic. In this course, we will present dependent types which achieve this, while keeping the proof-as-program correspondence. For instance the euclidean division, which computes the quotient and remainder of two integers is usually given the type int -> int -> int * int stating that it takes two integers as argument and returns a pair of integers. This typing is very weak, in the sense that there are many different functions which also have this type. With dependent types, we will be able to give it the type (m : int) → (n : int’) → Σ(q : int).Σ(r : int).((m = nq + r) × (0 6 r < n))

CHAPTER 0. INTRODUCTION

13

which can be read as the formula ∀m ∈ int.∀n ∈ int’.∃q ∈ int.∃r ∈ int.((m = nq + r) ∧ (0 6 r < n)) and entirely specifies its behavior (here, int’ stands for the type of non-zero integers, division by zero being undefined).

0.3 Checking programs In order to help developing proofs with the aid of a computer people have developed proof assistants such as Coq or Agda: those are programs which help the user to gradually develop proofs (which is necessary due to their typical size) and automatically check that they are correct. While those have progressed much over the years, in practice, proving that a program is correct still takes much more time than testing it (but, again, the result is infinitely superior). For this reason, they have been used in areas where there is a strong incentive to do it: applications where human lives (or large amounts of money) are involved. This technology is part of a bigger family of tools and techniques called formal methods whose aim is to guarantee the functioning of programs, with various automation levels and expressiveness. As usual, the more precise invariants you are going to prove about your programs, the less automated you can be: abstract interpretation (AbsInt, Astr´ee, ...)

proof assistants (Agda, Coq, ...) expressiveness

automation Hoare logic (Why3, ...)

There is quite a number of industrial successes of uses of formal methods. For instance, the line 14 in Paris and the CDGVAL at Roissy airport have been proved using the B-method, Airbus is heavily using various formal tools (AbsInt, Astrée, CompCert, Frama-C), etc. We should also mention here the CompCert project, which provides a fully certified compiler for a (subset of) C: even though your program is proved to be bug-free, the compiler (which is not an easy piece of software) might itself be the cause of problems in your program... The upside of automated methods is that, of course, they allow for reaching one’s goal much more quickly. Their downside is that when they do not apply to a particular problem, one is left without a way out. On the other side, virtually any property can be shown in a proof assistant, provided that one is smart enough and has enough time to spend.

0.4 Checking proofs Verifying that a proof is correct is a task which can be automatically and efficiently performed (it amounts to checking that a program is correctly typed); in contrast, finding a proof is an art. The situation is somewhat similar to analysis in mathematics where differentiating a function is a completely mechanical task,

CHAPTER 0. INTRODUCTION

14

while integrating requires the use of many methods and tricks [Mun19]. This means that computer science can also be of some help to mathematicians: we can formalize mathematical proofs in proof assistants and ensure that no one has made a mistake. And it happens that mathematicians, even famous ones, make subtle mistakes. For instance, in the 1990s, the Fields medalist Voevodsky wrote a paper solving an important conjecture by Grothendieck [KV91], which roughly states that spaces are the same as strict ∞-categories in which all morphisms are weakly invertible (don’t worry if you do not precisely understand all the terms in this sentence). A few years later, this was shown to be wrong because someone provided a counter-example [Sim98], but no one could exactly point out what was the mistake in the original proof. Because of this, Voevodsky thought for more than 20 years that his proof was still correct. Understanding that there was indeed a mistake lead him to use proof assistants for all his proofs and, in fact, propose a new foundation for mathematics using logics, which is nowadays called homotopy type theory [Uni13]. Quoting him [Voe14]: I now do my mathematics with a proof assistant and do not have to worry all the time about mistakes in my arguments or about how to convince others that my arguments are correct. But I think that the sense of urgency that pushed me to hurry with the program remains. Sooner or later computer proof assistants will become the norm, but the longer this process takes the more misery associated with mistakes and with unnecessary self-verification the practitioners of the field will have to endure. As a much simpler example, suppose that we want to prove that all horses have the same color (sic). We show by induction on n the property P (n) = “every set of n horses is monochromatic”. For n = 0 and n = 1, the property is obvious. Now suppose that P (n) holds and consider a set H of n + 1 horses. We can figure H as a big set, in which we can pick two distinct elements (horses) h1 and h2 and consider the sets H1 = H \ {h2 } and H2 = H \ {h1 }: h1 H1

h2 H2

By induction hypothesis all the horses in H1 have the same color and all the horses in H2 have the same color. Therefore, by transitivity, all the horses in H have the same color. Of course this proof is not valid, because we all know that there are horses of various colors (can you spot the mistake?). Formalizing the proof in a proof assistant will force you to fill in all the details, thus removing the possibility for potential errors in vague arguments, and will ensure that the arguments given are actually valid, so that flaws such as in the above proof will be uncovered. This is not limited to small reasoning: large and important

CHAPTER 0. INTRODUCTION

15

proofs have been fully checked in Coq for instance, such as the four color theorem [Gon08] in graph theory or the Feit-Thompson theorem [GAA+ 13] in group theory.

0.5 Searching for proofs Closely related to proof checking is proof search, or automated theorem proving, i.e. have the computer try by itself to find a proof for a given formula. For simple enough fragments of logic (e.g. propositional logic) this can be done: proof theory allows to carefully design efficient new proof search procedures. For richer logics, it quickly becomes undecidable. However, modern proof assistants (e.g. Coq) have so called tactics which can fill in some specific proofs, even though the logic is rich. Typically, they are able to take care of showing boring identities such as (x + y) − x = y in abelian groups. Understanding proof theory allows us to formulate problems in a logical fashion and solve them. It thus applies to various fields, even outside theoretical computer science. For instance, McCarthy, a founder of Artificial Intelligence (the name is due to him!), was a strong advocate of using mathematical logic to represent knowledge and manipulate it [McC60]. Neural networks are admittedly more fashionable these days, but one never knows what the future will be made of. Although we will see some proof search techniques in this course, this will not be a central subject. The reason for this is that the main message is that we should take proofs seriously: since a proof is the same as a program, we are not interested in provability, but rather in proofs themselves, and proof search techniques give us very little control over the proofs they produce.

0.6 Foundations At the beginning of the 20th century, some annoying paradoxes surfaced in mathematics, such as Russell’s paradox, motivating Hilbert’s program to provide an axiomatization on which all mathematics could be founded and show that this axiomatization is consistent: this is sometimes called the foundational crisis. Although Gödel’s incompleteness theorems established that there is no definite answer to this question, various formalisms have been proposed in which one can develop most of usual mathematics. One of the most widely used is set theory, as axiomatized by Zermelo and Fraenkel, but other formalisms have been proposed such as Russell’s theory of types [WR12], where the current type theory originates from: in fact, type theory can be taken as a foundation of mathematics. People usually see set theory as being more fundamental, since we see a type as representing a set (e.g. A ⇒ B is the set of functions from the set of A to the one of B), but we can also view type theory as being more fundamental since we can formalize set theory in type theory. The one you take for foundations is a matter of taste: are you more into chickens or into eggs? Type theory also provides a solid framework in which one can study basic philosophical questions such as: What is reasoning? What is a proof? If I know that something exists do I know this thing? What does it mean for two things to be equal? and so on. We could spend pages discussing those matters (and

CHAPTER 0. INTRODUCTION

16

others have done so), but we rather like to formalize things, and we will see that very satisfactory answers to those questions can be given with a few inference rules. The meaning of life remains an open question, though. By taking an increasingly important part in our lives and influencing the way we see the (mathematical) world, it has even evolved for some of us into some sort of religion based on the computational trinitarism, which stems from the observation that computation manifests itself in three forms [Har11]: categories

logic

programming

The aim of the present text is to explain the bottom line of the above diagram and leave categories for other books [Mac71, LS88, Jac99]. Another closely related religion is constructivism, a doctrine according to which something can be accepted only if it can actually be constructed. It will play a central role in here, because programs precisely constitute a mean to describe the construction of things.

0.7 In this course As a first introduction to functional and typed languages, we first present OCaml in chapter 1, in which most example programs given here are written. We present propositional logic in chapter 2 (the proofs), λ-calculus in chapter 3 (the programs), and the simply-typed variant λ-calculus in chapter 4 (the programs are the proofs). We then generalize the correspondence between proofs and programs to richer logics: we present first-order logic in chapter 5, and the proof assistant Agda in chapter 6 which is used in chapter 7 to formalize most important results in this book, and is based on the dependent types detailed in chapter 8. We finally give an introduction to the recent developments in homotopy type theory in chapter 9.

0.8 Other references on programs and proofs Although, we claim some originality in the treatment and the covered topics, this book is certainly not the first one about the subject. Excellent references include Girard’s Proofs and Types [Gir89], Girard’s Blind Spot [Gir11], Leroy’s College de France course Programmer = démontrer ? La correspondance de Curry-Howard aujourd’hui, Pierce’s Types and Programming Languages [Pie02], Sørensen and Urzyczyn’s Lectures on the Curry-Howard isomorphism [SU06], the “HoTT book” Homotopy Type Theory: Univalent Foundations of Mathematics [Uni13], Programming Language Foundations in Agda [WK19] and Software foundations [PdAC+ 10].

0.9 About this document This book was first published in 2020, and the version you are currently reading was compiled on February 15, 2021. Regular revisions can be expected if mis-

CHAPTER 0. INTRODUCTION

17

takes are found. Should you find one, please send me a mail at [email protected]. Reading on the beach. A printed copy of this course be ordered from Amazon: https://www.amazon.fr/dp/B08C97TD9G/. Color of the cover. In case you wonder, the color of the cover was chosen because it seemed obvious to me that program = proof = purple Code snippets. Most of the code shown in this book is excerpted from larger files which are regularly compiled in order to ensure their correctness. The process of extracting snippets for inclusion into LATEX is automated with a tool whose code is freely available at https://github.com/smimram/snippetor. Thanks. Many people have (knowingly or not) contributed to the elaboration of this book. I would like to particularly express my thanks to David Baelde, Olivier Bournez, Eric Finster, Emmanuel Haucourt, Stéphane Lengrand, Assia Mahboubi, Paul-André Melliès, Gabriel Scherer, Pierre-Yves Strub, Benjamin Werner. I would also like to express my thanks to the readers of the book who have suggested corrections and improvements: Eduardo Jorge Barbosa, Brian Berns, Florian Chudigiewitsch, Chhi’mèd Künzang, Yuxi Liu, Uma Zalakain.

Chapter 1

Typed functional programming 1.1 Introduction As an illustration of typed functional programming, we present here the OCaml programming language which was developed by Leroy and collaborators, following ideas from Milner. We recall here some of the basics of the language both because it will be used in order to provide illustrative implementations, but also because we will detail the theory behind it and generalized it in later chapters. This is not meant to be a complete introduction to programming in OCaml: advanced courses and documentation can be found on the website http://ocaml.org/, we well as in books [CMP00, MMH13]. After a brief tour of the language, we present the most important constructions in section 1.2 and detail recursive types which are the main way of constructing types throughout the book section 1.3. In section 1.4, we present the ideas behind the typing system and the guarantees it brings. Finally, we illustrate how types can be thought of as formulas in section 1.5. 1.1.1 Hello world. The mandatory “Hello world!” program, which prints Hello world!, can be written as follows: (* Our first program. *) print_endline "Hello, world!" This illustrates the concise syntax of the language (compared to Java for instance). Comments are written using (* ... *). Application of a function to arguments does not require parenthesis. The indentation is not relevant in programs (contrarily to e.g. Python), but you are of course strongly encouraged to correctly indent your programs. 1.1.2 Execution. The programs written in OCaml can be compiled to efficient native code by using ocamlopt, but there is also a “toplevel” which allows to interactively evaluate commands. It can be launched by typing ocaml, or utop if you want a fancier version. For instance: # 2 + 2;; - : int = 4 We have typed 2 + 2, followed by ;; to indicate the end of our program. The toplevel then indicates that this is an integer (it is of type int) and that the result is 4. We call value an expression which cannot be reduced further: 2+2 is not a value whereas 4 is: the execution of a program consists in reducing expressions into values in an appropriate way (e.g. 2+2 reduces to 4).

CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING

19

1.1.3 A statically typed language. The OCaml language is typed meaning that every term has a type indicating the kind of data it is. A type can be thought of as a particular set of values, e.g. int represents the set of integers, string represents the set of strings, and so on. In this way, the expressions 2+2 and 4 have the type int (they are integers), and the function string_of_int which gives the string representation of an integer has type int -> string, meaning that it is a function which takes an integer as argument and returns a string. Moreover, typing is statically checked: when compiling a program, the compiler ensures that all the types match, and we use values of the expected type. For instance, if we try to compile the program let s = string_of_int 3.2 the compiler will complain with Error: This expression has type float but an expression was expected of type int because the string_of_int function expects an integer whereas we have provided a float number as argument. This discipline is very strict in OCaml (we sometimes say that the typing is strong): this ensures that the program will not raise an error during execution because an unexpected value was provided to a function (this is a theorem!). Otherwise said, quoting Milner [Mil78]: Well-typed programs cannot go wrong. Moreover, the types are inferred meaning that the user never has to specify the types, they are guessed automatically. For instance, in the definition let f x = x + 1 we know that the addition takes two integers as argument and returns an integer: therefore x must be of type int and the compiler infers that f must be a function of type int -> int. However, if for some reason we want to specify the types, it is still possible: let f (x : int) : int = x + 1 1.1.4 A functional language. The language is functional meaning that is has good support for defining functions, and manipulate them just as any other expression. For instance, suppose that we have a list l of integers, and we want to double all its elements. We can use the List.map function from the standard library, which is of type ('a -> 'b) -> 'a list -> 'b list meaning that it takes as arguments a function f of type 'a -> 'b (here 'a and 'b are intended to mean “any type”), and a list whose elements are of type 'a, and returns the list whose elements are of type 'b, obtained by applying f to all the elements of the list. We can then define the “doubled list” by let l2 = List.map (fun x -> 2 * x) l where we apply the function x 7→ 2 × x to every element: note that the function List.map takes a function as argument and that we did not even have to give a name to the function above by using the construction fun (such a function is sometimes called anonymous).

CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING

20

1.1.5 Other features. There are some other important features of the OCaml language, that we mention only briefly here, because we will not use them much. References. As in most programming languages, OCaml has support for values that we can modify, called here references: they can be thought of as memory cells, from which we can retrieve the value and also change it. We can create a reference r containing x with let r = ref x, then we can obtain its contents with !r and change it to y with r := y. For instance, incrementing 10 times a counter is done by let () = let r = ref 0 in for i = 0 to 9 do r := !r + 1 done Garbage collection. Contrarily to languages such as C, OCaml has a Garbage Collector which takes care of allocating and freeing memory. This means that freeing memory for a value which is not used anymore is taken care of for us. This prevents many common bugs such as writing in a part of memory which was freed or freeing twice a memory region by mistake. Other traits. In addition to the functional programming style, OCaml has support for many other style of programming including imperative (e.g. references described above), objects, etc. OCaml also has support for records, arrays, modules, generalized algebraic data types, etc.

1.2 Basic constructions 1.2.1 Declarations. Declarations of values are performed with the let keyword. For instance, let x = 3 declares that the variable x refers to the value 3. Note that, unless we explicitly use the reference mechanism, these values cannot be modified. An expression can also contain some local definitions which are only valid in the rest of the expression. For instance, in let x = let y = 2 in y * y the definition of the variable y is only valid in the following expression y * y. A program consists of a sequence of such definitions. In the case where a function “does not return anything”, such as printing, by convention it returns a value of type unit, and we generally use the construction let () = ... in order to ensure that this is the case. For instance: let x = "hello" let () = print_string ("The value of x is " ^ x)

CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING

21

1.2.2 Functions. Functions are also defined using let definition, specifying the arguments after the name of the variable. For instance, let add x y = x + y which would be of type int -> int -> int Note that arrows are implicitly bracketed on the right: this type means int -> (int -> int) Application of a function to arguments is obtained by juxtaposing the function and the arguments, e.g. let x = add 3 4 (no need for parenthesis). There is a support for partial application, meaning that we do not have to give all the arguments to a function (functions are sometimes called curryfied). For instance, the incrementation of an integer can be defined by let incr = add 1 The value incr thus defined is the function which takes an argument y and returns add 1 y, so that the above definition is equivalent to let incr y = add 1 y This is in accordance of the bracketing of the type above: add is a function which, when given an integer argument, returns a function of type int -> int. As mentioned above, anonymous functions can be defined by the construction fun x -> .... The add function could thus have equivalently been defined by let add = fun x y -> x + y or even let add x = fun y -> x + y Functions can be recursive, meaning that they can call themselves. In this case, the rec keyword has to be used. For instance the factorial function is defined by let rec fact n = if n = 0 then 1 else n * fact (n - 1) 1.2.3 Booleans. The type corresponding to booleans is bool, its two values being true and false. The usual operators are present: conjunction &&, disjunction ||, and negation not. In order to compare whether two values x and y are equal or different, one should use x = y and x y. They can be used in conditional branchings if ... then ... else ...

CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING

22

or loops while ... do ... done as illustrated above. Beware that the operators == and != also exist, but they compare values physically, i.e. check whether they have the same memory location, not if they have the same contents. For instance, using the toplevel, we have: # let x = ref 0;; val x : int ref = {contents = 0} # let y = ref 0;; val y : int ref = {contents = 0} # x = x;; - : bool = true # x = y;; - : bool = true # x == x;; - : bool = true # x == y;; - : bool = false 1.2.4 Products. The pair of x and y is noted x,y. For instance, we can consider the pair 3,"hello" which has the product type int * string (it is a pair consisting of an integer and a string). Note that addition could have been defined as let add' (x,y) = x + y resulting in a slightly different function than above: it now has the type (int * int) -> int meaning that it takes one argument, which is a pair of integers, and returns a pair of integers. This means that partial application is not directly available as before, although we could still write let incr' = fun y -> add (1,y) 1.2.5 Lists. We quite often use lists in OCaml. The empty list is denoted [], and x::l is the list obtained by putting the value x before a list l. Most expected functions on lists are available in the module List. For instance, – @ concatenates two lists, – List.length computes the length of a list, – List.map maps a function on all the elements of a list, – List.iter executes a function on all the elements of a list, – List.mem tests whether a value belongs to a list.

CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING

23

1.2.6 Strings. Strings are written as "this is a string" and the related functions can be found in the module String. For instance, the function String.length computes the length of a string and String.sub computes a substring (at given indices) of a string. Concatenation is obtained by ^. 1.2.7 Unit. In OCaml, the type unit contains only one element, written (). As explained above, this is the value returned by functions which only have an effect and return no meaningful value (e.g. printing a string). They are also quite useful to define functions which have an effect. For instance, if we define let f = print_string "hello" the program will write “hello” at the beginning of the execution, because the expression defining f is evaluated. However, if we define let f () = print_string "hello" nothing will be printed because we define a function taking a unit as argument. In the course of the program, we can then use f () in order to print “hello”.

1.3 Recursive types A very useful way of defining new data types in OCaml is by recursive types, whose elements are constructed from other types according to specific constructions, called constructors. 1.3.1 Trees. As a first example, consider trees (more specifically, planar binary trees with integer labels) such as 3 4 1

1 3

5

2

Here, a tree consists of finitely many nodes which are labeled by an integer and can either have two sons, which are themselves trees, or none (in which case they are called leaves). This description translates immediately to the following type definition in OCaml: type tree = | Node of int * tree * tree | Leaf of int This says that a tree is recursively characterized as being Node applied to a triple consisting of an integer and two trees or Leaf applied to an integer. For instance, the above tree is represented as let t = Node (3, Node (4, Leaf 1, Node (3, Leaf 5, Leaf 2)), Leaf 1)

CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING

24

Here, Node or Leaf are not functions (Leaf 1 does not reduce to anything), they are called constructors. By convention, constructors have to begin with a capital letter, in order to distinguish them from functions (which have to begin with a lowercase letter). Pattern matching. Any element of the type tree is obtained as a constructor applied to some arguments. OCaml provides the match construction which allows to distinguish between the various possible cases of constructors and return a result accordingly: this is called pattern matching. For instance, the function computing the height of a tree can be programmed as let rec height t = match t with | Node (n, t1, t2) -> 1 + max (height t1) (height t2) | Leaf n -> 0 Here, Node (n, t1, t2) is called a pattern and n, t1 and t2 are variables which will be defined as the values corresponding to the tree we are currently matching with, and could be given any other name. As another example, the sum of the labels in a tree can be computed as let rec sum t = match t with | Node (n, t1, t2) -> n + sum t1 + sum t2 | Leaf n -> n It is sometimes useful to add conditions to matching cases, which can be done using the when construction. For instance, if we wanted to match only nodes with strictly positive labels, we could have, in our pattern matching, a case of the form | Node (n, t1, t2) when n > 0 -> ... In case where multiple patterns apply, the first one is chosen. OCaml tests that all the possible values are handled in a pattern matching (and issues a warning otherwise). Finally, one can write let f = function ... instead of let f x = match x with ... (this shortcut was introduced because it is very common to directly match on the argument of a function). 1.3.2 Usual recursive types. It is interesting to note that many (most) usual types can be encoded as recursive types. Booleans. The type of booleans can be encoded as type bool = True | False

CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING

25

although OCaml chose not to do that for performance reasons. A case construction if b then e1 else e2 could then be encoded as match b with | True -> e1 | False -> e2 Lists. Lists are also a recursive type: type 'a list = | Nil | Cons of 'a * 'a list In OCaml, [] is a notation for Nil and x::l a notation for Cons (x, l). The length of a list can be computed as let rec length l = match l with | x::l -> 1 + length l | [] -> 0 Note that the type of list is parametrized over a type ’a. We are thus able to define, at once, the type of lists containing elements of type ’a, for any type ’a. Coproducts. We have seen that the elements of a product type 'a * 'b are pairs x , y consisting of an element x of type ’a and an element y of type ’b. We can define coproducts consisting of an element of type ’a or an element of type ’b by type ('a, 'b) coprod = | Left of 'a | Right of 'b An element of this type is namely of the form Left x with x of type ’a or Right y with y of type ’b. For instance, we can define a function which provides the string representation of a value which is either an integer or a float by let to_string = function | Left n -> string_of_int n | Right x -> string_of_float x which is of type (int, float) coprod -> string. Unit. The type unit has () as only value. It could have been defined as type unit = | T having () being a notation for T.

CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING

26

Empty type. The “empty type” can be defined as type empty = | i.e. a recursive type with no constructor (we still need to write | for syntactical reasons). There is thus no value of that type. Natural numbers. Natural numbers, in unary notation, can be defined as type nat = | Zero | Suc of nat (any natural number is either 0 or the successor of a natural number) and addition programmed as let rec add m n = match m with | Zero -> n | Suc m -> Suc (add m n) Of course, it would be a bad idea to use this type for heavy computations, and int provides access to machine binary integers (and thus natural numbers), which are much more efficient. 1.3.3 Abstract description. As indicated before, a type can be thought of as a set of values. We would now like to briefly sketch a mathematical definition of the set of values corresponding to inductive types. Suppose fixed a set U, which we can think of as the set of all possible values an OCaml program can manipulate. We write P(U) for the powerset of U, i.e. the set of all subsets of U, which is ordered by inclusion. Any recursive definition induces a function F : P(U) → P(U) sending a set X to the set obtained by applying the constructors to the elements of X. For instance, with the definition tree of section 1.3.1, the induced function is F (X) = {Node(n,t1 ,t2 ) | n ∈ N and t1 , t2 ∈ X} ∪ {Leaf(n) | n ∈ N} The set associated to tree is intuitively the smallest set X ⊆ U which is closed under adding nodes and leaves, i.e. such that F (X) = X, provided that such a set exists. Such a set X satisfying F (X) = X is called a fixpoint of F . In order to be able to interpret the type of trees as the smallest fixpoint of F , we should first show that such a fixpoint indeed exists. A crucial observation in order to do so is the fact that the function F : P(U) → P(U) is monotone, in the sense that, for X, Y ∈ U, X ⊆ Y implies F (X) ⊆ F (Y ). Theorem 1.3.3.1 (Knaster-Tarski [Kna28, Tar55]). The set \ fix(F ) = {X ∈ P(U) | F (X) ⊆ X} is the least fixpoint of F : we have F (fix(F )) = fix(F )

CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING

27

and, for every fixpoint X of F , fix(F ) ⊆ X Proof. We write C = {X ∈ P(U) | F (X) ⊆ X} for the set of prefixpoints of F . Given X ∈ C, we have \ fix(F ) = C⊆X (1.1) and therefore, since F is increasing, F (fix(F )) ⊆ F (X) ⊆ X Since this holds for any X ∈ C, we have \ F (fix(F )) ⊆ C = fix(F )

(1.2)

(1.3)

Moreover, by monotonicity again, we have F (F (fix(F ))) ⊆ F (fix(F )) therefore, F (fix(F )) ∈ C, and thus by (1.1) fix(F ) ⊆ F (fix(F ))

(1.4)

From (1.3) and (1.4), we deduce that fix(F ) is a fixpoint of F . An arbitrary fixpoint X of F necessarily belongs to C and, by (1.2), we have fix(F ) = F (fix(F )) ⊆ X fix(F ) is thus the smallest fixpoint of F . Remark 1.3.3.2. The attentive reader will have noticed that all we really used in the course of the proof was the fact that P(U) equipped with ∩ is a semilattice with ∅ as smallest element. Under the more subtle hypothesis of the Kleene fixpoint theorem (P(U) is a directed complete partial order and F is Scottcontinuous), one can even show that [ fix(F ) = F n (∅) n∈N

i.e. the fixpoint can be obtained by iterating F from the empty set. In the case of trees, F 0 (∅) = ∅ F 1 (∅) = {Leaf(n) | n ∈ N} F 2 (∅) = {Leaf(n) | n ∈ N} ∪ {Nodes(n,t1 ,t2 ) | n ∈ N and t1 , t2 ∈ F 1 (∅)} and more generally, F n (∅) is the set of trees of height strictly below n. The theorem states that any tree is a tree of some (finite) height. Remark 1.3.3.3. In general, there are multiple fixpoints. For instance, for the function F corresponding to trees, the set of all “trees” where we allow to have an infinite number of nodes is also a fixpoint of F .

CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING

28

As a direct corollary of theorem 1.3.3.1, we obtain the following induction principle: Corollary 1.3.3.4. Given a set X such that F (X) ⊆ X, we have fix(F ) ⊆ X. Example 1.3.3.5. With the type nat of natural numbers, we have F (X) = {Zero} ∪ {Suc(n) | n ∈ X} We have fix(F ) = {Sucn (Zero) | n ∈ N} = {Zero, Suc(Zero), Suc(Suc(Zero)), . . .} In the following, we write 0 (resp. S n) instead of Zero (resp. Succ(n)), and fix(F ) = N. The induction principle states that if X contains 0 and is closed under successor then it contains all natural numbers. Given a property P (n) on natural numbers, consider the set X = {n ∈ N | P (n)} The requirement F (X) ⊆ X translates as P (0) holds and P (n) implies P (S n). The induction principle, is thus the classical recurrence principle: P (0) ⇒ (∀n ∈ N.P (n) ⇒ P (S n)) ⇒ (∀n ∈ N.P (n)) Example 1.3.3.6. Consider the type empty. We have F (X) = ∅ and thus fix(F ) = ∅. The induction principle states that any property is necessarily valid on all the elements of the empty set: ∀x ∈ ∅.P (x) Exercise 1.3.3.7. Define the function F associated to the type of lists. Show that it also has a greatest fixpoint, distinct from the smallest fixpoint, and provide concrete description of it. 1.3.4 Option types and exceptions. Another quite useful recursive type defined in the standard library is the option type type 'a option = | Some of 'a | None A value of this type is either of the form Some x for some x of type ’a or None. It can be thought of as the type ’a extended with the default value None and can be used for functions which normally return a value excepting in some cases (in other languages such as C or Java, one would return a NULL pointer in this case). For instance, the function returning the head of a list is almost always defined, except when the argument is the empty list. It thus makes sense to implement it as the function of type 'a list -> 'a option defined by let hd l = match l with | x::l -> Some x | [] -> None

CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING

29

It is however quite cumbersome to use because each time we want to use the result of this function, we have to match it in order to decide whether the result is None or not. For instance, in order to double the head of a list l of integers known to be non-empty, we still have to write something like match head l with | Some n -> 2*n | None -> 0 (* This case cannot happen *) See figure 1.2 for a more representative example. Exceptions. In order to address this, OCaml provides the mechanism of exceptions, which are kinds of errors that can be raised and catched. For instance, in the standard library, the exception Not_found is defined by exception Not_found and the head function by let hd l = match l with | x::l -> x | [] -> raise Not_found It now has type 'a list -> 'a, meaning that we can write 2 * (hd l) to double the head of a list l. In the case where we take the head of the empty list, the exception Not_found is raised. We can catch it with the following construction if we need: try ... with | Not_found -> ...

1.4 The typing system We have already explained in section 1.1.3 that OCaml is a strongly typed language. We detail here some of the advantages and properties of such a typing system. 1.4.1 Usefulness of typing. One of the main advantages of typed languages is that typing ensures their safety: if the program passes the typechecking phase, we are guaranteed that we will not have errors at execution because unexpected data is provided to a function, see section 1.4.3. But there are other advantages too.

CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING

30

Documentation. Knowing the type of a function is very useful for documenting it: from the type we can generally deduce the order of the arguments of the functions, what it is returning and so on. For instance, the function Queue.add of the module implementing queues in OCaml has type 'a -> 'a queue -> unit This allows us to conclude that the function takes two arguments: the first argument must be the element we want to add and the second one the queue in which we want to add it. Finally, the function does not return anything (to be more precise it returns the only value of the type unit): this must mean that the structure of queue is modified in place (otherwise, we would have had the modified queue as return value). Abstraction. Having a typing system is also good for abstraction: we can use a data structure without knowing the details of implementation or even having access to them. Taking the Queue.add function as example again, we only know that the second argument is of type 'a queue, without any more details on this type. This means that we cannot mess up with the internals of the data structure, and that the implementation of queues can be radically modified without us having to change our code. Efficiency. Static typing can also be used to improve efficiency of compiled programs. Namely, since we know in advance the type of the values we are going to handle, our code can be specific to the corresponding data structure, and avoid performing some security checks. For instance, in OCaml, the concatenation function on strings can simply put the two strings together; in contrast, in a dynamically typed programming language such as Python, the concatenation function on strings has first to ensure that the arguments are strings, if they are not we will try to convert them as strings, and then we can put them together. 1.4.2 Properties of typing. There are various flavors of typing systems. Dynamic vs static. The types of programs can either be checked during the execution (the typing is dynamic) or during the compilation (the typing is static), OCaml is using the latter. Static typing has many advantages: potential errors are found very early, without having to perform tests, it can help to optimize programs, and provides very strong guarantees on the execution of the program. The dynamic approach also has some advantages though: the code is more flexible, the runtime can automatically perform conversions between types if necessary, etc. Weak vs strong. The typing system of OCaml is strong which means that it ensures that the values in a type are actually of that type: there is no implicit or dynamic type conversion, no NULL pointers, no explicit manipulation of pointers, and so on. By opposition, when those requirements are not met, the typing system is said to be weak.

CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING

31

Decidability of typing. A basic requirement of a typing system is that we should be able to decide whether a given term has a given type, i.e. we should have a type checking algorithm. For OCaml (and all decent programming languages) this is the case, and type checking is performed during each compilation of a program. Type inference. It is often cumbersome to have to specify to write the type of all the terms (or even to give many type annotations). In OCaml, the compiler performs type inference, which means that it automatically generates a type for the program, when there is one. Polymorphism. The types in OCaml are polymorphic which means that they can contain universally quantified variables. For instance, the identity function let id x = x has the type 'a -> 'a, which can also be read as the universally quantified type ∀A.A → A. This means that we can substitute ’a for any type and still get a valid type for the identity. Principal types. A program can admit multiple types. For instance, the identity function admits the following types 'a -> 'a

or

int -> int

or

('a -> 'b) -> ('a -> 'b)

and infinitely many others. The first one 'a -> 'a is however “more general” than the others because all the other types can be obtained by substituting 'a by some type. Such a most general type is called a principal type. The type inference of OCaml has the property that it always generates a principal type. 1.4.3 Safety. The programs which are well-typed in OCaml are safe in the sense that types are preserved during execution and programs do not get stuck. In order to formalize these properties, we first need to introduce a notion of reduction, which formalizes the way programs are executed. We will first do this on a very small (but representative) subset of the language. Most of the concepts used here, such as reduction or typing derivation, will be further detailed in subsequent chapters. A small language. We first introduce a small programming language, which can be considered as a very restricted subset of OCaml, that we will implement in OCaml itself. In this language, the values are either integers or booleans, and can be encoded as type value = | Int of int | Bool of bool We define a program as being either a value, an addition (p + p’), a comparison of integers (p < p’), or a conditional branching (if p then p’ else p”):

CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING

32

n1 + n2 −→ n1 + n2 p1 −→ p01 p1 + p2 −→ p01 + p2

p2 −→ p02 p1 + p2 −→ p1 + p02

n1 < n2 n1 < n2 −→ true

n1 > n2 n1 < n2 −→ false

p1 −→ p01 p1 < p2 −→ p01 < p2

p2 −→ p02 p1 < p2 −→ p1 < p02

if true then p1 else p2 −→ p1

if false then p1 else p2 −→ p2

p −→ p0 if p then p1 else p2 −→ if p0 then p1 else p2 Figure 1.1: Reduction rules. type prog = | Val of | Add of | Lt of | If of

value prog * prog prog * prog prog * prog * prog

A typical program would thus be if 3 < 2 then 5 else 1

(1.5)

which would be encoded as the term If (Lt (Int 3 , Int 2) , Int 5 , Int 1) Reduction. Given programs p and p0 , we write p −→ p0 when p reduces to p0 . This reduction relation is defined as the smallest one such that, for each of the rules listed in figure 1.1, if the relation above the horizontal bar holds, the relation below it also holds. In these rules, we write ni to denote an arbitrary integer and the first rule indicates that the formal sum of two integers reduces to their sum (e.g. 3 + 2 −→ 5). An implementation of the reduction in OCaml is given in figure 1.2: given a program p, the function red either returns Some p0 if there exists a program p0 with p −→ p0 or None otherwise. Note that the reduction is not deterministic, i.e. a program can reduce to distinct programs: 5 + (5 + 4) ←− (3 + 2) + (5 + 4) −→ (3 + 2) + 9 The implementation provided in figure 1.2 chooses a particular reduction when there are multiple possibilities: we say that it implements a reduction strategy.

CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING

(** let | | |

| |

| | |

Perform one reduction step. *) rec red : prog -> prog option = function Bool _ | Int _ -> None Add (Int n1 , Int n2) -> Some (Int (n1 + n2)) Add (p1 , p2) -> ( match red p1 with | Some p1' -> Some (Add (p1' , p2)) | None -> match red p2 with | Some p2' -> Some (Add (p1 , p2')) | None -> None ) Lt (Int n1 , Int n2) -> Some (Bool (n1 < n2)) Lt (p1 , p2) -> ( match red p1 with | Some p1' -> Some (Lt (p1' , p2)) | None -> match red p2 with | Some p2' -> Some (Lt (p1 , p2')) | None -> None ) If (Bool true , p1 , p2) -> Some p1 If (Bool false , p1 , p2) -> Some p2 If (p , p1 , p2) -> match red p with | Some p' -> Some (If (p' , p1 , p2)) | None -> None Figure 1.2: Implementation of the reduction.

33

CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING

` n : int

` b : bool

` p1 : int ` p2 : int ` p1 + p2 : int

` p1 : int ` p2 : int ` p1 < p2 : bool

34

` p : bool ` p1 : A ` p2 : A ` if p then p1 else p2 : A Figure 1.3: Typing rules. A program is irreducible when it does not reduce to another program. It can be remarked that – values are irreducible, – there are irreducible programs which are not values, e.g. 3 + true. In the second case, the reason why the program 3 + true cannot be further reduced is that an unexpected value was provided to the sum: we were hoping for an integer instead of the value true. We will see that the typing system precisely prevents such situations from arising. Typing. A type in our language is either an integer (int) or a boolean (bool), which can be represented by the type type t = TInt | TBool We write ` p : A to indicate that the program p has the type A and call it a typing judgment. This relation is defined inductively by the rules of figure 1.3. This means that a program p has type A when ` p : A can be derived using the above rules. For instance, the program (1.5) has type int: ` 3 : int ` 2 : int ` 3 < 2 : bool ` 5 : int ` 1 : int ` if 3 < 2 then 5 else 1 : int Such a tree showing that a typing judgment is derivable is called a derivation tree. The principle of type checking and type inference algorithms of OCaml is to try to construct such a derivation of a typing judgment, using the above rules. In our small toy language, this is quite easy and presented in figure 1.4. For a language with much more features as OCaml (we have functions and polymorphism, not to mention objects or generalized algebraic data types) this is much more subtle, but still follows the same general principle. It can be observed that a term can have at most one type. We can thus speak of the type of a typable program: Theorem 1.4.3.1 (Uniqueness of types). Given a program p, if p is both of types A and A0 then A = A0 .

CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING

exception Type_error (** let | | |

Infer the type of a program. *) rec infer = function Bool _ -> TBool Int _ -> TInt Add (p1 , p2) -> check p1 TInt; check p2 TInt; TInt | Lt (p1 , p2) -> check p1 TInt; check p2 TInt; TBool | If (p , p1 , p2) -> check p TBool; let t = infer p1 in check p2 t; t

(** Check that a program has a given type. *) and check p t = if infer p t then raise Type_error Figure 1.4: Type inference and type checking.

35

CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING

36

Proof. By induction on p: depending on the form of p, at most one rule applies. For instance, if p is of the form if p0 then p1 else p2 , the only rule which allows typing p is ` p0 : bool ` p1 : A ` p2 : A ` if p0 then p1 else p2 : A Since p1 and p2 admit at most one type A by induction hypothesis, p also does. Other cases are similar. As explained in section 1.4.2, full-fledged languages such as OCaml do not generally satisfy such a strong property. The type of a program is not generally unique, but in good typing systems there exists instead a type which is “the most general”. Safety. We are now ready to formally state the safety properties ensured for typed programs. The first one, called subject reduction, states that the reduction preserves typing: Theorem 1.4.3.2 (Subject reduction). Given programs p and p0 such that p −→ p0 , if p has type A then p0 also has type A. Proof. By hypothesis, we have both a derivation of p −→ p0 and ` p : A. We reason by induction on the former. For instance, suppose that the last rule is p1 −→ p01 p1 + p2 −→ p01 + p2 The derivation of ` p : A necessarily ends with ` p1 : int ` p2 : int ` p1 + p2 : int In particular, we have ` p1 : int and thus, by induction hypothesis, ` p01 : int is derivable. We conclude using the derivation ` p01 : int ` p2 : int 0 ` p1 + p2 : int Other cases are similar. The second important property is called progress, and states that the program is either a value or reduces. Theorem 1.4.3.3 (Progress). Given a program p of type A, either p is a value or there exists a program p0 such that p −→ p0 . Proof. By induction on the derivation of ` p : A. For instance, suppose that the last rule is ` p1 : int ` p2 : int ` p1 + p2 : int By induction hypothesis, the following cases can happen:

CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING

37

– p1 −→ p01 : in this case, we have p1 + p2 −→ p01 + p2 , – p2 −→ p02 : in this case, we have p1 + p2 −→ p1 + p02 , – p1 and p2 are values: in this case, they are necessarily integers and p1 + p2 reduces to their sum. Other cases are similar. The safety property finally states that typable programs never encounter errors, in the sense that their execution is never stuck: typically, we will never try to evaluate a program such as 3 + true during the reduction. Theorem 1.4.3.4 (Safety). A program p of type A is safe: either – p reduces to a value v in finitely many steps p −→ p1 −→ p2 −→ · · · −→ pn −→ v – or p loops: there is an infinite sequence of reductions p −→ p1 −→ p2 −→ · · · Proof. Consider a maximal sequence of reductions from p. If this sequence is finite, by maximality, its last element p0 is an irreducible program. Since p is of type A and reduces to p0 , by the subject reduction theorem 1.4.3.2 p0 also has type A. We can thus apply the progress theorem 1.4.3.3 and deduce that either p0 is a value or there exists p00 such that p0 −→ p00 . The second case is impossible since it would contradict the maximality of the sequence of reductions. Of course, in our small language, a program cannot give rise to an infinite sequence of reductions, but the formulation and proof of the previous theorem will generalize to languages in which this is not the case. The previous properties of subject reduction and progress are entirely formalized in section 7.1. Limitations of typing. The typing systems (such as the one described above or the one of OCaml) reject legit programs such as (if true then 3 else false) + 1 which reduces to a value. Namely, the system imposes that the two branches of a conditional branching should have the same type, which is not the case here, even though we know that only the first branch will be taken, because the condition is the constant boolean true. We thus ensure that typable programs are safe, but not that all safe programs are typable. In fact, this has to be this way since an easy reduction to the halting problem shows that the safety of programs is undecidable as soon as the language is rich enough. Also, the typing system does not prevent all errors from occurring during the execution, such as dividing by zero or accessing an array out of its bounds. This is because the typing system is not expressive enough. For instance, the function let f x = 1 / (x - 2)

CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING

38

should intuitively be given the type {n : int | n 6= 2} → int which states this function is correct as long as its input is an integer different from 2, but this is of course not a valid type in OCaml. We will see in chapters 6 and 8 that some languages do allow such rich typing, at the cost of losing type inference (but type checking is still decidable).

1.5 Typing as proving We would now like to give the intuition for the main idea of this course, that programs correspond to proofs. Understanding in details this correspondence will allow us to design very rich typing systems which allow to formally prove fine theorems and reason about programs. 1.5.1 Arrow as implication. As a first illustration of this, we will see here that simple types (such as the ones used in OCaml) can be read as propositional formulas. The translation is simply a matter of slightly changing the way we read types: a type variable ’a can be read as a propositional variable A and the arrow -> can be read as an implication ⇒. Now, we can observe that there is a program of a given type (in a reasonable subset of OCaml, see below) precisely when the corresponding formula is true (for a reasonable notion of true formula). For instance, we expect that the formula A⇒A

corresponding to the type

’a -> ’a

is provable. And indeed, there is a program of this type, the identity: let id : 'a -> 'a = fun x -> x We have specified here the type of this function for clarity. We can give many other such examples. For instance, A ⇒ B ⇒ A is proved by let k : 'a -> 'b -> 'a = fun x y -> x The formula (A ⇒ B) ⇒ (B ⇒ C) ⇒ (A ⇒ C) can be proved by the composition let comp : ('a -> 'b) -> ('b -> 'c) -> ('a -> 'c) = fun f g x -> g (f x) The formula (A ⇒ B ⇒ C) ⇒ (A ⇒ B) ⇒ (A ⇒ C) is proved by let s : ('a -> 'b -> 'c) -> ('a -> 'b) -> ('a -> 'c) = fun f g x -> f x (g x) and so on. Remark 1.5.1.1. In general, there is not a unique proof of a given formula. For instance, A ⇒ A can also be proved by fun x -> k x 3 where k is the function defined above.

CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING

39

1.5.2 Other connectives. For now, the fragment of the logic we have is very poor (we only have implication as connective), but other usual connectives also have counterparts in types. Conjunction. A conjunction proposition A ∧ B means that both A and B hold. In terms of types, the counterpart is a product: A∧B

corresponds to

’a * ’b

and we have programs implementing usual propositions such as A ∧ B ⇒ A: let proj1 : ('a * 'b) -> 'a = fun (a , b) -> a or the commutativity of conjunction A ∧ B ⇒ B ∧ A: let comm : ('a * 'b) -> ('b * 'a) = fun (a , b) -> b , a Truth. The formula > corresponding to truth is always provable and we expect that there is exactly one reason for which it should be true. Thus >

corresponds to

unit

and we can prove A ⇒ >: let unit_intro : 'a -> unit = fun x -> () Falsity. The formula ⊥ corresponds to falsity and we do not expect that it can be proved (because false is never true). We can make it correspond to the empty type, which can be defined as a type with no constructor: type empty = | The formula ⊥ ⇒ A is then shown by let empty_elim : empty -> 'a = fun x -> match x with _ -> . (the “.” is a “refutation case” meaning that the compiler should ensure that this case should never happen, it is almost never used in OCaml unless you are doing tricky stuff such as the above). Negation. The negation can then be defined as usual by ¬A being a notation for A ⇒ ⊥, and we can prove the reasoning by contraposition (A ⇒ B) ⇒ (¬B ⇒ ¬A) by let contr : ('a -> 'b) -> (('b -> empty) -> ('a -> empty)) = fun f g a -> g (f a) or A ⇒ ¬¬A by let nni : 'a -> (('a -> empty) -> empty) = fun a f -> f a

CHAPTER 1. TYPED FUNCTIONAL PROGRAMMING

40

Disjunction. A disjunction formula A ∨ B can be thought of as being either A or B. We can implement it as a coproduct type, which is an inductive type being either a ’a or a ’b, see section 1.3.2: type ('a , 'b) coprod = Left of 'a | Right of 'b We can then prove the formula A ∨ B ⇒ B ∨ A, stating that disjunction is commutative, by let comm : ('a , 'b) coprod -> ('b , 'a) coprod = fun x -> match x with | Left a -> Right a | Right b -> Left b or the distributivity A ∧ (B ∨ C) ⇒ (A ∧ B) ∨ (A ∧ C) of conjunction over disjunction by let dist : ('a * ('b , 'c) coprod) -> ('a * 'b , 'a * 'c) coprod = fun (a , x) -> match x with | Left b -> Left (a , b) | Right c -> Right (a , c) or the de Morgan formula (¬A ∨ B) ⇒ (A ⇒ B) by let de_Mogan : ('a -> empty, 'b) coprod -> ('a -> 'b) = fun x a -> match x with | Left f -> empty_elim (f a) | Right b -> b 1.5.3 Limitations of the correspondence. This correspondence has some limitations due to the fact that OCaml is, after all, a language designed to do programming, not logic. It is easy to prove formulas which are not true if we use “advanced” features of the language such as exceptions. For instance, the following “proves” A ⇒ B: let absurd : 'a -> 'b = fun x -> raise Not_found More annoying counter-examples come from functions which are not terminating (i.e. looping). For instance, we can also “prove” A ⇒ B by let rec absurd : 'a -> 'b = fun x -> absurd x Note that, in particular, both allow to “prove” ⊥: let fake : empty = absurd () Finally, we can notice that there does not seem to be any reasonable way to implement the classical formula ¬A ∨ A (apart from using the above tricks), which would correspond to a program of the type ('a -> empty , 'a) coprod In next chapters, we will see that it is indeed possible to design languages in which a formula is provable precisely when there is a program of the corresponding type. Such languages do not have functions with “side-effects” (such as raising an exception) and enforce that all the programs are terminating.

Chapter 2

Propositional logic In this chapter, we present propositional logic: this is the fragment of logic consisting of propositions (very roughly, something which can either be true or false) joined by connectives. We will see various ways of formalizing the proofs in propositional logic – with a particular focus on natural deduction – and study the properties of those. We begin with the formalism of natural deduction in section 2.2, show that it enjoys the cut elimination property in section 2.3 and discuss strategies for searching for proofs in section 2.4. The classical variant of logic is presented in section 2.5. We then present two alternative logical formalisms: sequent calculus (section 2.6) and Hilbert calculus (section 2.7). Finally, we introduce Kripke semantics in section 2.8, which can be considered as an intuitionistic counterpart of boolean models for classical logic.

2.1 Introduction 2.1.1 From provability to proofs. Most of you are acquainted with boolean logic based on the booleans, which we write here 0 for false, and 1 for true. In this setting, every propositional formula can be interpreted as a boolean, provided that we have an interpretation for the variables. The truth tables for usual connectives are A∧B 0 1

0 0 0

1 0 1

A∨B 0 1

0 0 1

1 1 1

A⇒B 0 1

0 1 0

1 1 1

For instance, we know that the formula A ⇒ A is valid because, for whichever interpretation of A as a boolean, the induced interpretation of the formula is 1. We have this idea that propositions should correspond to types. Therefore, rather than booleans, propositions should be interpreted as sets of values and implications as functions between the corresponding values. For instance, if we write N for a proposition interpreted as the set of natural numbers, the type N ⇒ N would correspond to the set of functions from natural numbers to themselves. We now see that the boolean interpretation is very weak: it only cares about whether sets are empty or not. For instance, depending on whether X is empty (∅) or non-empty (¬∅), the following table indicates whether the set X → X of functions from X to X is empty or not: A→B ∅ ¬∅

∅ ¬∅ ¬∅ ¬∅ ∅ ¬∅

Reading ∅ as “false” and ¬∅ as “true”, we see that we recover the usual truth table for implication. In this sense, the fact that the formula N ⇒ N is true only shows that there exists such a function, but in fact there are many such functions, and we would be able to reason about the various functions themselves.

CHAPTER 2. PROPOSITIONAL LOGIC

42

Actually, the interpretation of implications as sets of functions, is still not entirely satisfactory because, given a function of type N → N, there are many ways to implement it. We could have programs of different complexities, using their arguments in different ways, and so on. For instance, the constant function x 7→ 0 can be implemented as let f x = 0 or let f x = x - x or let rec f x = if x = 0 then x else f (x - 1) respectively in constant, linear and exponential complexity (if natural numbers are coded in binary). We thus want to shift from an extensional perspective, where two functions are equal when they have the same values on the same inputs, to an intentional one where the way the result is computed matters. This means that we should be serious about what is a program, or equivalently a proof, and define it precisely so that we can reason about the proofs of a proposition instead of its provability: we want to know what the proofs are and not only whether there exist one or not. This is the reason why Girard advocates that there are three levels for interpreting proofs [Gir11, section 7.1]: 0. the boolean level: propositions are interpreted as booleans and we are interested in whether a proposition is provable or not, 1. the extensional level: propositions are interpreted as sets and we are interested in which functions can be implemented, 2. the intentional level: we are interested in the proofs themselves (and how they evolve via cut elimination). 2.1.2 Intuitionism. This shift from provability to proofs, was started by the philosophical position of Brouwer starting from the early twentieth century, called intuitionism. According to this point of view, mathematics do not consist in discovering the properties of a preexisting objective reality, but is rather a mental subjective construction, which is independent of the reality and has an existence on its own, whose validity follows from the intuition of the mathematician. From this point of view – the conjunction A ∧ B of two propositions should be seen as having both a proof of A and a proof of B: if we interpret propositions as sets, A ∧ B should not be interpreted as the intersection A ∩ B, but rather are the product A × B, – a disjunction A∨B should be interpreted as having a proof of A or a proof of B, i.e. it does not correspond to the union A ∪ B, but rather to the disjoint union A t B, – an implication A ⇒ B should be interpreted as having a way to construct a proof of B from a proof of A,

CHAPTER 2. PROPOSITIONAL LOGIC

43

– a negation ¬A = A ⇒ ⊥ should be interpreted as having a counterexample to A, i.e. a way to produce an absurdity from a proof of A. Interestingly, this lead Brouwer to reject principles which are classically valid. For instance, according to this point of view ¬¬A should not be considered as equivalent to A because the implication ¬¬A ⇒ A should not hold: if we can show that there is no counter-example to A, this does not mean that we actually have a proof of A. For instance, suppose that I cannot find my key inside my apartment and my door is locked: I must have locked my door so that I know that my key is somewhere in the apartment and it is not lost, but I still cannot find it. Not having lost my key (i.e. not not having my key) does not mean that I have my key; otherwise said ¬¬Key ⇒ Key does not hold (explanation borrowed from Ingo Blechschmidt). For similar reasons, Brouwer also rejected the excluded middle ¬A ∨ A given an arbitrary proposition A: in order to have a proof for it, we should have a way, whichever the proposition A is, to produce a counter-example to it or a proof of it. Logic rejecting these principles is called intuitionistic and, by opposition, we speak of classical logic when they are admitted. 2.1.3 Formalizing proofs. Our goal is to give a precise definition of what a proof is. This will be done by formalizing the rules using which we usually construct our reasoning. For instance, suppose that we want to prove that the function x 7→ 2 × x is continuous in 0: we have to prove the formula ∀ε.(ε > 0 ⇒ ∃η.(η > 0 ∧ ∀x.|x| < η ⇒ |2x| < ε)) This is done according to the following steps, resulting in the following transformed formulas to be proved. – Suppose given ε, we have to show: ε > 0 ⇒ ∃η.(η > 0 ∧ ∀x.|x| < η ⇒ |2x| < ε) – Suppose that ε > 0 holds, we have to show: ∃η.(η > 0 ∧ ∀x.|x| < η ⇒ |2x| < ε) – Take η = ε/2, we have to show: ε/2 > 0 ∧ ∀x.|x| < ε/2 ⇒ |2x| < ε – We have to show both ε/2 > 0

and

∀x.|x| < ε/2 ⇒ |2x| < ε

CHAPTER 2. PROPOSITIONAL LOGIC

44

– For ε/2 > 0: – because 2 > 0, this amounts to show (ε/2) × 2 > 0 × 2, – which, by usual identities, amounts to show ε > 0, – which is an hypothesis. – For ∀x.|x| < ε/2 ⇒ |2x| < ε: – suppose given x, we have to show: |x| < ε/2 ⇒ |2x| < ε, – suppose that |x| < ε/2 holds, we have to show: |2x| < ε, – since 2 > 0, this amounts to show: |2x|/2 < ε/2, – which, by usual identities, amounts to show: |x| < ε/2, – which is an hypothesis. Now that we have decomposed the proof in very small steps, it seems possible to give a list of all the generic rules that we are allowed to apply in a reasoning. We will do so and will introduce a convenient formalism and notations, so that the above proof will be noted as: ε > 0, |x| < ε/2 ` |x| < ε/2 ε > 0, |x| < ε/2 ` |2x|/2 < ε/2 ε>0`ε>0 ε > 0, |x| < ε/2 ` |2x| < ε ε > 0 ` (ε/2) × 2 > 0 × 2 ε > 0 ` |x| < ε/2 ⇒ |2x| < ε ε > 0 ` ε/2 > 0 ε > 0 ` ∀x.|x| < ε/2 ⇒ |2x| < ε ε > 0 ` ε/2 > 0 ∧ ∀x.|x| < ε/2 ⇒ |2x| < ε ε > 0 ` ∃η.(η > 0 ∧ ∀x.|x| < η ⇒ |2x| < ε) ` ε > 0 ⇒ ∃η.(η > 0 ∧ ∀x.|x| < η ⇒ |2x| < ε) ` ∀ε.(ε > 0 ⇒ ∃η.(η > 0 ∧ ∀x.|x| < η ⇒ |2x| < ε)) (when read from bottom to top, you should be able to see the precise correspondence with the previous description of the proof). 2.1.4 Properties of the logical system. Once we have formalized our logical system we should do some sanity checks. The first requirement is that it should be consistent: there is at least one formula A which is not provable (otherwise, the system would be entirely pointless). The second requirement is that typechecking should be decidable: there should be an algorithm which checks whether a proof is a valid proof or not. In contrast, the question of deciding whether a formula is provable or not will not be decidable in general and we do not expect to have an algorithm for that.

2.2 Natural deduction Natural deduction is the first formalism for proofs that we will study. It was introduced by Gentzen [Gen35]. We first present the intuitionistic version.

CHAPTER 2. PROPOSITIONAL LOGIC

45

2.2.1 Formulas. We suppose fixed a countably infinite set X of propositional variables. The set A of formulas or propositions is generated by the following grammar A, B ::= X | A ⇒ B | A ∧ B | > | A ∨ B | ⊥ | ¬A where X denotes a propositional variable (in X ) and A and B denote propositions. They are respectively read as a propositional variable, implication, conjunction, truth, disjunction, falsity and negation. By convention, ¬ binds the most tightly, then ∧, then ∨, then ⇒: ¬A ∨ B ∧ C ⇒ D

reads as

((¬A) ∨ (B ∧ C)) ⇒ D

Moreover, all binary connectives are implicitly bracketed on the right: A1 ∧ A2 ∧ A3 ⇒ B ⇒ C

reads as

(A1 ∧ (A2 ∧ A3 )) ⇒ (B ⇒ C)

This is particularly important for ⇒, for the connectives ∧ and ∨ the other convention could be chosen with almost no impact. We sometimes write A ⇔ B for (A ⇒ B) ∧ (B ⇒ A). A subformula of a formula A is a formula occurring in A. The set of subformulas of A can formally be defined by induction on A by Sub(X) = {X}

Sub(A ⇒ B) = {A ⇒ B} ∪ Sub(A) ∪ Sub(B)

Sub(>) = {>}

Sub(A ∧ B) = {A ∧ B} ∪ Sub(A) ∪ Sub(B)

Sub(⊥) = {⊥}

Sub(A ∨ B) = {A ∨ B} ∪ Sub(A) ∪ Sub(B) Sub(¬A) = {¬A} ∪ Sub(A)

2.2.2 Sequents. A context Γ = A1 , . . . , A n is a list of propositions. A sequent, or judgment, is a pair Γ`A consisting of a context Γ and a variable A. Such a sequent should be read as “under the hypothesis in Γ, I can prove A” or “supposing that I can prove the propositions in Γ, I can prove A”. The comma in a context can thus be read as a “meta” conjunction (the logical conjunction being ∧) and the sign ` as a “meta” implication (the logical implication being ⇒). Remark 2.2.2.1. The notation derives from Frege’s Begriffsschrift [Fre79], an axiomatization of first-order logic based on a graphical notation, in which logical connectives are noted by using wires of particular shapes: the formulas ¬A, A ⇒ B and ∀x.A are respectively written A

A B

x

A

In this system, given a proposition noted A , the notation A means that A is provable. The assertion that (∀x.A) ⇒ (∃x.B) is provable would for instance be written x A x B (in classical logic, the formula ∃x.B is equivalent to ¬∀x.¬B). The symbol ` used in sequents, as well as the symbol ¬ for negation, originate from there.

CHAPTER 2. PROPOSITIONAL LOGIC

Γ, A, Γ0 ` A

46

(ax) Γ, A ` B (⇒I ) Γ`A⇒B

Γ`A⇒B Γ`A (⇒E ) Γ`B Γ`A∧B l (∧E ) Γ`A

Γ`A∧B r (∧E ) Γ`B

Γ`A Γ`B (∧I ) Γ`A∧B Γ`>

Γ`A∨B

Γ, A ` C Γ`C

Γ, B ` C

(∨E )

Γ`A (∨l ) Γ`A∨B I

(>I ) Γ`B (∨r ) Γ`A∨B I

Γ`⊥ (⊥E ) Γ`A Γ ` ¬A Γ`A (¬E ) Γ`⊥

Γ, A ` ⊥ (¬I ) Γ ` ¬A

Figure 2.1: NJ: rules of intuitionistic natural deduction. 2.2.3 Inference rules. An inference rule, noted Γ 1 ` A1

... Γ`A

Γn ` An

(2.1)

consists of n sequents Γi ` Ai , called the premises of the rule, and a sequent Γ ` A, called the conclusion of the rule. We sometimes identify the rules by a name given to them, which is written on the right of the rule. Some rules also come with external hypothesis on the formulas occurring in the premises: those are called side conditions. There are two ways to read an inference rule: – the deductive way, from top to bottom: from a proof for each of the premises Γi ` Ai we can deduce Γ ` A, – the inductive or proof search way, from bottom to top: if we want to prove Γ ` A we need to prove all the premises Γi ` Ai . Both are valid ways of thinking about proofs, but one might be more natural than the other one depending on the application. 2.2.4 Intuitionistic natural deduction. The rules for intuitionistic natural deductions are shown in figure 2.1, the resulting system often being called NJ (N for natural deduction and J for intuitionistic). Apart from the axiom rule (ax), each rule is specific to a connective and the rules can be classified in two families depending on whether this connective appears in the conclusion or in the premises:

CHAPTER 2. PROPOSITIONAL LOGIC

47

– the elimination rules allow the use of a formula with a given connective (which is in the formula in the leftmost premise, called the principal premise), – the introduction rules construct a formula with a given connective. In figure 2.1, the elimination (resp. introduction) rules are figured on the left (resp. right) and bear names of the form (. . .E ) (resp. (. . .I )). The axiom rule allows the use of a formula in the context Γ: supposing that a formula A holds, we can certainly prove it. This rule is the only one to really make use of the context: when read from the bottom to top, all the other rules either propagate the context or add hypothesis to it, but never inspect it. The introduction rules are the most easy to understand: they allow proving a formula with a given logical connective from the proofs of the immediate subformulas. For instance, (∧I ) states that from a proof of A and a proof of B, we can construct a proof of A ∧ B. Similarly, the rule (⇒I ) follows the usual reasoning principle for implication: if, after supposing that A holds, we can show B, then A ⇒ B holds. In contrast, the elimination rules allow the use of a connective. For instance, the rule (⇒E ), which is traditionally called modus ponens or detachment rule, says that if A implies B and A holds then certainly B must hold. The rule (∨E ) is more subtle and corresponds to a case analysis: if we can prove A ∨ B then, intuitively, we can prove A or we can prove B. If in both cases we can deduce C then C must hold. The elimination rule (⊥E ) is sometimes called ex falso quodlibet or the explosion principle: it states that if we can prove false then the whole logic collapses, and we can prove anything. We can notice that there is no elimination rule for > (knowing that > is true does not bring any new information), and no introduction rule for ⊥ (we do not expect that there is a way to prove falsity). There are two elimination rules for ∧ which are respectively called left and right rules, and similarly there are two introduction rules for ∨. 2.2.5 Proofs. The set of proofs (or derivations) is the smallest set such that given proofs πi of the sequent Γi ` Ai , for 1 6 i 6 n, and an inference rule of the form (2.1) there is a proof of Γ ` A, often noted in the form of a tree as π1 Γ 1 ` A1

...

πn Γ n ` An

Γ`A A sequent Γ ` A is provable (or derivable) when it is the conclusion of a proof. A formula A is provable when it is provable without hypothesis, i.e. when the sequent ` A is provable. Example 2.2.5.1. The formula (A∧B) ⇒ (A∨B) is provable (for any formulas A and B): (ax)

A∧B `A∧B (∧E ) A∧B `A (∨lI ) A∧B `A∨B (⇒I ) `A∧B ⇒A∨B

CHAPTER 2. PROPOSITIONAL LOGIC

48

Example 2.2.5.2. The formula (A ∨ B) ⇒ (B ∨ A) is provable: (ax)

A∨B `A∨B

(ax)

A ∨ B, A ` A (∨rI ) A ∨ B, A ` B ∨ A A∨B `B∨A `A∨B ⇒B∨A

(ax)

A ∨ B, B ` B (∨lI ) A ∨ B, B ` B ∨ A

(∨E ) (⇒I )

Example 2.2.5.3. The formula A ⇒ ¬¬A is provable: (ax)

(ax)

A, ¬A ` ¬A A, ¬A ` A (¬E ) A, ¬A ` ⊥ (¬I ) A ` ¬¬A (⇒I ) ` A ⇒ ¬¬A Example 2.2.5.4. The formula (A ⇒ B) ⇒ (¬B ⇒ ¬A) is provable: (ax)

A ⇒ B, ¬B, A ` ¬B

(ax)

A ⇒ B, ¬B, A ` A ⇒ B A ⇒ B, ¬B, A ` A (⇒E ) A ⇒ B, ¬B, A ` B (¬E ) A ⇒ B, ¬B, A ` ⊥ (¬I ) A ⇒ B, ¬B ` ¬A (⇒I ) A ⇒ B ` ¬B ⇒ ¬A (⇒I ) ` (A ⇒ B) ⇒ ¬B ⇒ ¬A

(ax)

Example 2.2.5.5. The formula (¬A ∨ B) ⇒ (A ⇒ B) is provable: (ax)

¬A ∨ B, A ` ¬A ∨ B

(ax)

(ax)

¬A ∨ B, A, ¬A ` ¬A ¬A ∨ B, A, ¬A ` A (¬E ) ¬A ∨ B, A, ¬A ` B ¬A ∨ B, A ` B ¬A ∨ B ` A ⇒ B ` (¬A ∨ B) ⇒ (A ⇒ B)

¬A ∨ B, A, B ` B

Other typical provable formulas are – ∧ and > satisfy the axioms of idempotent commutative monoids: (A ∧ B) ∧ C ⇔ A ∧ (B ∧ C) >∧A⇔A⇔A∧>

A∧B ⇔B∧A A∧A⇔A

– ∨ and ⊥ satisfy the axioms of commutative monoids – ∧ distributes over ∨ and conversely: A ∧ (B ∨ C) ⇔ (A ∧ B) ∨ (A ∧ C) A ∨ (B ∧ C) ⇔ (A ∨ B) ∧ (A ∨ C) – ⇒ is reflexive and transitive A⇒A

(A ⇒ B) ⇒ (B ⇒ C) ⇒ (A ⇒ C)

– curryfication: ((A ∧ B) ⇒ C) ⇔ (A ⇒ (B ⇒ C))

(ax) (∨E ) (⇒I ) (⇒I )

CHAPTER 2. PROPOSITIONAL LOGIC

49

– usual reasoning structures with latin names, such as (A ⇒ B) ⇒ (¬B ⇒ ¬A)

(modus tollens)

(A ∨ B) ⇒ (¬A ⇒ B)

(modus tollendo ponens)

¬(A ∧ B) ⇒ (A ⇒ ¬B)

(modus ponendo tolens)

Reasoning on proofs. In this formalism, the proofs are defined inductively and therefore we can reason by induction on them, which is often useful. Precisely, the induction principle on proofs is the following one: Theorem 2.2.5.6 (Induction on proofs). Suppose given a predicate P (π) on proofs π. Suppose moreover that for every rule of figure 2.1 and every proof π ending with this rule

π

=

π1 Γ 1 ` A1

...

πn Γn ` An

Γ`A

if P (πi ) holds for every index i, with 1 6 i 6 n, then P (π) also holds. Then P (π) holds for every proof π. 2.2.6 Fragments. A fragment of intuitionistic logic is a system obtained by restricting to formulas containing certain connectives and rules concerning these connectives. By convention, the axiom rule (ax) is present in every fragment. For instance, the implicational fragment of intuitionistic logic is obtained by restricting to implication: formulas are generated by the grammar A, B ::= X | A ⇒ B and the rules are 0

Γ, A, Γ ` A

(ax)

Γ`A⇒B Γ`A (⇒E ) Γ`B

Γ, A ` B (⇒I ) Γ`A⇒B

The cartesian fragment is obtained by restricting to product and implication. Another useful fragment is minimal logic obtained by considering formulas without ⊥, and thus removing the rule (⊥E ). 2.2.7 Admissible rules. A rule is admissible when, whenever the premises are provable, the conclusion is also provable. An important point here is that the way the proof of the conclusion is constructed might depend on the proofs of the premises, and not only on the fact that we know that the premises are provable. Structural rules. We begin by showing that the structural rules are admissible. Those rules are named in this way because they concern the structure of the logical proofs, as opposed to the particular connectives we are considering for formulas. They express some resource management possibilities for the hypothesis in sequents: we can permute, merge and weaken them, see section 2.2.10. A first admissible rule is the weakening rule, which states that whenever one can prove a formula with some hypothesis, we can still prove it with more hypothesis. The proof with more hypothesis is “weaker” in the sense that it apply in less cases (since more hypothesis have to be satisfied).

CHAPTER 2. PROPOSITIONAL LOGIC

50

Proposition 2.2.7.1 (Weakening). The weakening rule Γ, Γ0 ` B (wk) Γ, A, Γ0 ` B is admissible. Proof. By induction on the proof of the hypothesis Γ, Γ0 ` B. – If the proof is of the form Γ, Γ0 ` B

(ax)

with B occurring in Γ or Γ0 , then we conclude with Γ, A, Γ0 ` B

(ax)

– If the proof is of the form π1 π2 Γ, Γ ` B ⇒ C Γ, Γ0 ` B (⇒E ) Γ, Γ0 ` C 0

then we conclude with π10 π20 Γ, A, Γ0 ` B ⇒ C Γ, A, Γ0 ` B (⇒E ) 0 Γ, A, Γ ` C where π10 and π20 are respectively obtained from π1 and π2 by induction hypothesis: π1 0 Γ, Γ ` B⇒C (wk) π10 = 0 Γ, A, Γ ` B ⇒ C

π2 0 Γ, Γ `B π20 = (wk) Γ, A, Γ0 ` B

– If the proof is of the from π Γ, Γ0 , B ` C (⇒I ) Γ, Γ0 ` B ⇒ C then we conclude with π0 Γ, A, Γ0 , B ` C (⇒I ) Γ, A, Γ0 ` B ⇒ C where π 0 is obtained from π by induction hypothesis. – Other cases are similar.

CHAPTER 2. PROPOSITIONAL LOGIC

51

Also admissible is the exchange rule, which states that we can reorder hypothesis in the contexts: Proposition 2.2.7.2 (Exchange). The exchange rule Γ, A, B, Γ0 ` C (xch) Γ, B, A, Γ0 ` C is admissible. Proof. By induction on the proof of the hypothesis Γ, A, B, Γ0 ` C. Given a proof π of some sequent, we often write w(π) for a proof obtained by weakening. Another admissible rule is contraction, which states that if we can prove a formula with two occurrences of a hypothesis, we can also prove it with one occurrence. Proposition 2.2.7.3 (Contraction). The contraction rule Γ, A, A, Γ0 ` B (contr) Γ, A, Γ0 ` B is admissible. Proof. By induction on the proof of the hypothesis Γ, A, A, Γ0 ` B. We can also formalize the fact that knowing > does not bring information, what we call here truth strengthening (we are not aware of a standard terminology for this one): Proposition 2.2.7.4 (Truth strengthening). The following rule is admissible: Γ, >, Γ0 ` A (tstr) Γ, Γ0 ` A Proof. By induction on the proof of the hypothesis Γ, >, Γ0 ` A, the only “subtle” case being that we have to transform Γ, >, Γ0 ` >

(ax)

into

Γ, Γ0 ` >

(>I )

The cut rule. A most important admissible rule is the cut rule, which states that if we can prove a formula B using a hypothesis A (thought of as a lemma used in the proof) and we can prove the hypothesis A, then we can directly prove the formula B. Theorem 2.2.7.5 (Cut). The cut rule Γ`A

is admissible.

Γ, A, Γ0 ` B (cut) Γ, Γ0 ` B

CHAPTER 2. PROPOSITIONAL LOGIC

52

Proof. For simplicity, we restrict to the case where Γ0 is the empty context, which is not an important limitation because the exchange rule admissible. The cut rule can be derived from the rules of implication by Γ`A

Γ, A ` B (⇒I ) Γ`A⇒B (⇒E ) Γ`B

We will see in section 2.3.2 that the above proof is not satisfactory and will provide another one, which brings much more information about the dynamics of the proofs. Admissible rules via implication. Many rules can be proved to be admissible by eliminating provable implications: Lemma 2.2.7.6. Suppose that the formula A ⇒ B is provable. Then the rule Γ`A Γ`B is admissible. Proof. We have

Γ`A

.. . `A⇒B (wk) Γ`A⇒B (⇒E ) Γ`B

For instance, we have seen in example 2.2.5.4 that the implication (A ⇒ B) ⇒ (¬B ⇒ ¬A) is provable. We immediately deduce: Lemma 2.2.7.7 (Modus tollens). The following two variants of the modus tollens rule Γ`A⇒B Γ ` ¬B ⇒ ¬A

Γ`A⇒B Γ ` ¬B Γ ` ¬A

are admissible. 2.2.8 Definable connectives. A logical connective is definable when it can be expressed from other connectives in such a way that replacing the connective by its expression and removing the associated logical rules preserves provability. Lemma 2.2.8.1. Negation is definable as ¬A = A ⇒ ⊥. Proof. The introduction and elimination rules of ¬ are derivable by Γ, A ` ⊥ (¬I ) Γ ` ¬A Γ ` ¬A Γ`A (¬E ) Γ`⊥

Γ, A ` ⊥ (⇒I ) Γ`A⇒⊥ Γ`A⇒⊥ Γ`A (⇒E ) Γ`⊥

CHAPTER 2. PROPOSITIONAL LOGIC

53

from which it follows that, given a provable formula A, the formula A0 obtained from A by changing all connectives ¬− into − ⇒ ⊥ is provable, without using (¬E ) and (¬I ). Conversely, suppose given a formula A, such that the transformed formula A0 is provable. We have to show that A is also provable, which is more subtle. In the proof of A0 , for each subproof of the form π Γ`B⇒⊥ where the conclusion B ⇒ ⊥ corresponds to the presence of ¬B as a subformula of A, we can transform the proof as follows: π Γ`B (wk) Γ, B ` B ⇒ ⊥ Γ, B ` ⊥ Γ ` ¬B

Γ, B ` B

(ax) (⇒E ) (¬I )

Applying this transformation enough times, we can transform the proof of A0 into a proof of A. A variant of this proof is given in corollary 2.2.9.2. Lemma 2.2.8.2. Truth is definable as A ⇒ A, for any provable formula A not involving >. For instance: > = (⊥ ⇒ ⊥). Remark 2.2.8.3. In intuitionistic logic, contrarily to what we expect from the usual de Morgan formulas, the implication is not definable as A ⇒ B = ¬A ∨ B see sections 2.3.5 and 2.5.1. 2.2.9 Equivalence. We could have added to the syntax of our formulas an equivalence connective ⇔ with associated rules Γ`A⇔B (⇔lI ) Γ`A⇒B

Γ`A⇔B (⇔rI ) Γ`B⇒A

Γ`A⇒B Γ`B⇒A (⇔E ) Γ`A⇔B

It would have been definable as A ⇔ B = (A ⇒ B) ∧ (B ⇒ A) Two formulas A and B are equivalent when A ⇔ B is provable. This notion of equivalence relates in the expected way to provability: Lemma 2.2.9.1. If A and B are equivalent then, for every context Γ, Γ ` A is provable if and only if Γ ` B is provable. Proof. Immediate application of lemma 2.2.7.6. In this way, we can give a variant of the proof of lemma 2.2.8.1: Corollary 2.2.9.2. Negation is definable as ¬A = (A ⇒ ⊥).

CHAPTER 2. PROPOSITIONAL LOGIC

54

Proof. We have ¬A ⇔ (A ⇒ ⊥): (ax)

(ax)

(ax)

(ax)

¬A, A ` ¬A ¬A, A ` A A ⇒ ⊥, A ` A ⇒ ⊥ A ⇒ ⊥, A ` A (¬E ) (⇒E ) ¬A, A ` ⊥ A ⇒ ⊥, A ` ⊥ (⇒I ) (¬I ) ¬A ` A ⇒ ⊥ A ⇒ ⊥ ` ¬A (⇒I ) (⇒I ) ` ¬A ⇒ A ⇒ ⊥ ` (A ⇒ ⊥) ⇒ ¬A (⇔E ) ` ¬A ⇔ (A ⇒ ⊥)

and we conclude using lemma 2.2.9.1. 2.2.10 Structural rules. The rules of exchange, contraction, weakening and truth strengthening are often called structural rules: Γ, A, B, Γ0 ` C (xch) Γ, B, A, Γ0 ` C

Γ, A, A, Γ0 ` B (contr) Γ, A, Γ0 ` B

Γ, Γ0 ` B (wk) Γ, A, Γ0 ` B

Γ, >, Γ0 ` A (tstr) Γ, Γ0 ` A

We have seen in section 2.2.7 that they are admissible our system. Contexts as sets. The rules of exchange and contraction allow to think of contexts as sets (rather than lists) of formulas, because a set is a list “up to permutation and duplication of its elements”. More precisely, given a set A, we write P(A) for the set of subsets of A, and A∗ for the set of lists over A. We define an equivalence relation ∼ on A∗ as the smallest equivalence relation such that Γ, A, B, ∆ ∼ Γ, B, A, ∆

Γ, A, A, ∆ ∼ Γ, A, ∆

Lemma 2.2.10.1. The function f : A∗ → P(A) which to a list associates its set of elements is surjective. Moreover, given Γ, ∆ ∈ A∗ , we have f (Γ) = f (∆) if and only if Γ ∼ ∆. We could therefore have directly defined contexts to be sets of formulas, as is sometimes done, but this would be really unsatisfactory. Namely, a formula A in a context can be thought of as some kind of hypothesis which is to be proved by an auxiliary lemma and we might have twice the same formula A, but proved by different means: in this case, we would like to be able to refer to a particular instance of A (which is proved in a particular way), and we cannot do this if we have a set of hypothesis. For instance, there are intuitively two proofs of A ⇒ A ⇒ A: the one which uses the left A to prove A and the one which uses the right one (this will become even more striking with the Curry-Howard correspondence, see remark 4.1.7.2). However, with contexts as sets, both are the same: (ax)

A`A (⇒I ) A`A⇒A (⇒I ) `A⇒A⇒A A less harmful simplification which is sometimes done is to quotient by exchange only (and not contraction), in which case the contexts become multisets, see appendix A.3.5. We will refrain from doing that either here.

CHAPTER 2. PROPOSITIONAL LOGIC

55

Variants of the proof system. The structural rules are usually taken as “real” (as opposed to admissible) rules of the proof system. Here, we have carefully chosen the formulation of rules, so that they are admissible, but it would not hold anymore if we had used subtle variants instead. For instance, if we replace the axiom rule by Γ, A ` A

(ax)

or

A`A

(ax)

or replace the introduction rule for conjunction by Γ`A ∆`B (∧I ) Γ, ∆ ` A ∧ B the structural rules are not all admissible anymore. The fine study of the structure behind this lead Girard to introduce linear logic [Gir87]. 2.2.11 Substitution. Given types A and B and a variable X, we write A[B/X] for the substitution of X by B in A, i.e. the type A where all the occurrences of X have been replaced by B. More generally, a substitution for A is a function which to every variable X ∈ FV(A) assigns a type σ(X) and we also write A[σ] for the type A where every variable X has been replaced by σ(X). Similarly, given a context Γ = x1 : A1 , . . . , xn : An , we define Γ[σ] = x1 : A1 [σ], . . . , xn : An [σ] We often write [A1 /X1 , . . . , An /Xn ] for the substitution σ such that σ(Xi ) = Ai and σ(X) = X for X different from each Xi . It satisfies A[A1 /X1 , . . . , An /Xn ] = A[A1 /X1 ] . . . [An /Xn ] We always suppose that, for a substitution σ, the set {X ∈ X | σ(X) 6= X} is finite so that the substitution can be represented as the list of images of elements of this set. Provable formulas are closed under substitution: Proposition 2.2.11.1. Given a provable sequent Γ ` A and a substitution σ, the sequent Γ[σ] ` A[σ] is also provable. Proof. By induction on the proof of Γ ` A.

CHAPTER 2. PROPOSITIONAL LOGIC

56

2.3 Cut elimination In mathematics, one often uses lemmas to show results. For instance, suppose that we want to show that 6 admits a half, i.e. there exists a number n such that n + n = 6. We could proceed in this way by observing that – every even number admits a half, and – 6 is even. In this proof, we have used a lemma (even numbers can be halved) that we have supposed to be already proved. Of course, there was another, much shorter, proof of the fact that 6 admits a half: simply observe that 3 + 3 = 6. Somehow, we should be able to extract the second proof (giving directly 3) from the first one, but looking in details at the proof of the lemma: this process of extracting a direct proof from a proof of a lemma is called cut elimination. We will see that is has a number of applications and will allow us to take a “dynamic” point of view on proofs: removing cuts somehow corresponds to “executing” proofs. Let us illustrate how this process works in more details on the above example. We first need to make precise the notions we are using here, see section 6.6.3 for a full formalization. We say that a number m is a half of a number n when m + m = n, and the set of even numbers is defined here to be the smallest set containing 0 and such that n + 2 is even when n is. Moreover, our lemma is proved in this way: Lemma 2.3.0.1. Every even number admits a half. Proof. Suppose given an even number n. By definition of evenness, it can be of the two following forms and we can reason by induction. – If n = 0 then it admits 0 as half, since 0 + 0 = 0. – If n = n0 + 2 with n0 even, then by induction n0 admits a half m, i.e. m + m = n, and therefore n admits m + 1 as half since n = n0 + 2 = (m + m) + 2 = (m + 1) + (m + 1) In our reasoning to prove that 6 can be halved, we have used the fact that 6 is even, which we must have proved in this way: – 6 is even because 6 = 4 + 2 and 4 is even, where – 4 is even because 4 = 2 + 2 and 2 is even, where – 2 is even because 2 = 0 + 2 and 0 is even, where – 0 is even by definition. From the proof of the lemma, we know that the half of 6 is the successor of the half of 4, which is the successor of the half of 2 which is the successor of the half of 0, which is 0. Writing, as usual, n/2 for a half of n, we have 6/2 = (4/2) + 1 = (2/2) + 1 + 1 = (0/2) + 1 + 1 + 1 = 0 + 1 + 1 + 1 = 3 Therefore the half of 6 is 3: we have managed to extract the actual value of the half of 6 from the proofs the 6 is even and the above lemma. This example is further formalized in section 6.6.3.

CHAPTER 2. PROPOSITIONAL LOGIC

57

2.3.1 Cuts. In logic, the use of a lemma to show a result is called a “cut”. This must not be confused with the (cut) rule presented in theorem 2.2.7.5, although they are closely related. Formally, a cut in a proof is an elimination rule whose principal premise is proved by an introduction rule of the same connective. For instance, the following are cuts: π0 π Γ`A Γ`B (∧I ) Γ`A∧B (∧lE ) Γ`A

π Γ, A ` B (⇒I ) Γ`A⇒B Γ`B

π0 Γ`A

(⇒E )

The formula in the principal premise is called the cut formula: above, the cut formulas are respectively A∧B and A ⇒ B. A proof containing a cut intuitively does “useless work”. Namely, the one on the left starts from a proof π of A in the context Γ, which it uses to prove A ∧ B, from which it deduces A: in order to prove A, the proof π was already enough and the proof π 0 of B was entirely superfluous. Similarly, for the proof on the right, we show in π that supposing A we can prove B, and also in π 0 that we can prove A: we could certainly directly prove B, replacing in π all the places where the hypothesis A is used (say by an axiom) by the proof π 0 . For this reason, cuts are sometimes also called detours. From a proof-theoretic point of view, it might seem a bit strange that someone would use such a kind of proof structure, but this is actually common in mathematics: when we want to prove a result, we often prove a lemma which is more general than the result we want to show and then deduce the result we were aiming at. One of the reasons for proceeding in this way is that we can use the same lemma to cover multiple cases, and thus have shorter proofs (not to mention that they are generally more conceptual and modular, since we can reuse the lemmas for other proofs). We will see that, however, we can always avoid using cuts in order to prove formulas. Before doing so, we first need to introduce the main technical result which allows this. 2.3.2 Proof substitution. A different kind of substitution (than the one of section 2.2.11) consists in replacing some axioms in a proof by another proof. For instance, consider two proofs (ax)

(ax)

Γ, A ` A Γ, A ` A (∧I ) Γ, A, B ` A ∧ A π= (⇒I ) Γ, A ` B ⇒ A ∧ A

.. . π = Γ`A 0

The proof π 0 allows to deduce A from the hypothesis in Γ. Therefore, in the proof π, each time the hypothesis A of the context is used (by an axiom rule), we can instead use the proof π 0 and reprove A. Doing so, the hypothesis A in the context becomes superfluous and we can remove it. The proof resulting from this transformation is thus obtained by “re-proving” A each time we need

CHAPTER 2. PROPOSITIONAL LOGIC

58

it instead of having it as an hypothesis: π0 π0 Γ`A Γ`A (wk) (wk) Γ, B ` A Γ, B ` A (∧I ) Γ, B ` A ∧ A (⇒I ) Γ`B ⇒A∧A This process generalizes as follows: Proposition 2.3.2.1. Given provable sequents π Γ, A, Γ0 ` B

and

π0 Γ`A

the sequent Γ, Γ0 ` B is also provable, by a proof that we denote π[π 0 /A]: π[π 0 /A] Γ, Γ0 ` B Otherwise said, the (cut) rule Γ`A

Γ, A, Γ0 ` B (cut) Γ, Γ0 ` B

is admissible. Proof. By induction on π. We will see that the admissibility of this rule is the main ingredient to prove cut elimination, thus its name. 2.3.3 Cut elimination. A logic has the cut elimination property when whenever a formula is provable then it is also provable with a proof which does not involve cuts: we can always avoid doing unnecessary things. This procedure was introduced by Gentzen under the name Hauptsatz [Gen35]. In general, we not only want to know that such a proof exists, but also to have an effective cut elimination procedure which transforms a proof into one without cuts. The reason for this is that we will see in section 4.1.8 that this corresponds to somehow “executing” the proof (or the program corresponding to it): this is why Girard [Gir87] claims that A logic without cut elimination is like a car without an engine. Although the proof obtained after eliminating cuts is “simpler” in the sense that it does not contain unnecessary steps (cuts), it cannot always be considered as “better”: it is generally much bigger than the original one. The quote above explains it: think of a program computing the factorial of 1000. We see that a result can be much bigger than the program computing it [Boo84], and it can take much time to compute [Ore82]. Theorem 2.3.3.1. Intuitionistic natural deduction has the cut elimination property.

CHAPTER 2. PROPOSITIONAL LOGIC

π Γ, A ` B (⇒I ) Γ`A⇒B Γ`B

π Γ`A (∨lI ) Γ`A∨B π Γ`B (∨rI ) Γ`A∨B

π0 Γ`A

59

(⇒E )

π[π 0 /A] Γ`B

π π0 Γ`A Γ`B (∧I ) Γ`A∧B (∧lE ) Γ`A

π Γ`A

π π0 Γ`A Γ`B (∧I ) Γ`A∧B (∧rE ) Γ`B

π0 Γ`B

π0 Γ, A ` C Γ`C

π 00 Γ, B ` C

(∨E )

π 0 [π/A] Γ`C

π0 Γ, A ` C Γ`C

π 00 Γ, B ` C

(∨E )

π 00 [π/B] Γ`C

Figure 2.2: Transforming proofs in NJ in order to eliminate cuts. Proof. Suppose given a proof which contains a cut. This means that at some point in the proof we encounter one of the following situations (i.e. we have a subproof of one of the following forms), in which case we transform the proof as indicated by on figure 2.2 (we do not handle the cut on ¬ since ¬A can be coded as A ⇒ ⊥). For instance, (ax)

(ax)

Γ, A ` A Γ, A ` A (∧I ) Γ, A ` A ∧ A (⇒I ) Γ`A⇒A∧A Γ`A∧A

π Γ`A

(⇒E )

is transformed into π π Γ`A Γ`A (∧I ) Γ`A∧A We iterate the process on the resulting proof until all the cuts have been re-

CHAPTER 2. PROPOSITIONAL LOGIC

60

moved. As it can be noticed on the above example, applying the transformation might duplicate cuts: if the above proof π contained cuts, then the transformed proof contains twice the cuts of π. It is therefore not clear that the process actually terminates, whichever order we choose to eliminate cuts. We will see in section 4.2 that it indeed does, but the proof will be quite involved. It is sufficient for now to show that a particular strategy for eliminating cuts is terminating: at each step, we suppose that we eliminate a cut of highest depth, i.e. there is no cut “closest to the axioms” (for instance, we could apply the above transformation only if π has not cuts). We define the size |A| of a formula A as its number of connectives and variables: |X| = |>| = |⊥| = 1

|A ⇒ B| = |A ∧ B| = |A ∨ B| = 1 + |A| + |B|

The degree of a cut is the size of the cut formula (e.g. of A ⇒ A ∧ A in the above example, whose size is 2 + 3|A|), and the degree of a proof is then defined as the multiset (see appendix A.3.5) of the degrees of the cuts it contains. It can then be checked that whenever we apply , the newly created cuts are of strictly lower degree than the cut we eliminated and therefore the degree of the proof decreases according to the multiset order, see appendix A.3.5. For instance, if we apply a transformation π Γ, A ` B (⇒I ) Γ`A⇒B Γ`B

π0 Γ`A

(⇒E )

π[π 0 /A] Γ`B

we suppose that π 0 has no cuts (otherwise the eliminated cut would not be of highest depth). The degree of the cut is |A ⇒ B|. All the cuts present in the resulting proof where already present in the original proof, excepting new cuts on A which might be created by the substitution of π 0 in π, which are of degree |A| < |A ⇒ B|. Since the multiset order is well-founded, see theorem A.3.5.1, the process will eventually come to an end: we cannot have an infinite sequence of transformations, chosen according to our strategy. The previous theorem states that, as long as we are interested in provability, we can restrict to cut-free proofs. This is of interest because we often have a good idea of which rules can be used in those. In particular, we have the following useful result: Proposition 2.3.3.2. For any formula A, a cut-free proof of ` A necessarily ends with an introduction rule. Proof. Consider the a cut-free proof π of ` A. We reason by induction on it. This proof cannot be an axiom because the context is empty. Suppose that π ends with an elimination rule: .. . π= (?E ) `A For each of the elimination rules, we observe that the principal premise is necessarily of the form ` A0 , and therefore ends with an introduction rule, by

CHAPTER 2. PROPOSITIONAL LOGIC

61

induction hypothesis. The proof is then of the form .. . (?I ) ` A0 `A

...

(?E )

and thus contains a cut, which is excluded since we have supposed π to be cutfree. Since π cannot end with an axiom nor an elimination rule, it necessarily ends with an introduction rule. In the above proposition, it is crucial that we consider a formula in an empty context: a cut-free proof of Γ ` A does not necessarily end with an introduction rule if Γ is arbitrary. 2.3.4 Consistency. The least one can expect from a non-trivial logical system is that not every formula is provable, otherwise the system is of no use. A logical system is consistent when there is at least one formula which cannot be proved in the system. Since, by (⊥E ), one can deduce any formula from ⊥, we have: Lemma 2.3.4.1. The following are equivalent: (i) the logical system is consistent, (ii) the formula ⊥ cannot be proved, (iii) the principle of non-contradiction holds: there is no formula A such that both A and ¬A can be proved. Theorem 2.3.4.2. The system NJ is consistent. Proof. Suppose that it is inconsistent, i.e. by lemma 2.3.4.1 that it can prove ` ⊥. By theorem 2.3.3.1, there is a cut-free proof of ` ⊥ and, by proposition 2.3.3.2, this proof necessarily ends with an introduction rule. However, there is no introduction rule for ⊥, contradiction. Remark 2.3.4.3. As a side note, we would like to point out that if we naively allowed proofs to be infinite or cyclic (i.e. contain themselves as subproofs), then the system would not be consistent anymore. For instance, we could prove ⊥ by (ax)

π

=

⊥`⊥ `⊥⇒⊥

(⇒I )

`⊥

π `⊥

(⇒E )

(this proof is infinite in the sense that we should replace π by the proof itself above). Also, for such a proof, the cut elimination procedure would not terminate...

CHAPTER 2. PROPOSITIONAL LOGIC

62

2.3.5 Intuitionism. We have explained in the introduction that the intuitionistic point of view about proofs is that they should be “accessible to intuition” or “constructive”. This entails in particular that a proof of a disjunction A ∨ B should imply that one of the two formulas A or B is provable: we not only know that the disjunction is true, but we can explicitly say which one is true. This property is satisfied for the system NJ we have defined above, and explains why we have said that it is intuitionistic: Proposition 2.3.5.1. If a formula A ∨ B is provable in NJ then either A or B is provable. Proof. Suppose that we have a proof of A ∨ B. By theorem 2.3.3.1, we can suppose that this proof is cut-free and thus ends with an introduction rule by proposition 2.3.3.2. The proof is thus of one of the following two forms π `B (∨r ) `A∨B I

π `A (∨l ) `A∨B I

which means that we either have a proof of A or a proof of B. While quite satisfactory, this property means that truth in our logical systems behaves differently from usual systems (e.g. validity in boolean models), which are called classical by contrast. Every formula provable in NJ is true in classical systems, but the converse is not true. One of the most striking example is the so-called principle of excluded middle stating that, for any formula A, the formula ¬A ∨ A should hold. While this is certainly classically true, this cannot be proved intuitionistically for a general formula A: Lemma 2.3.5.2. Given a propositional variable X, the formula ¬X ∨ X cannot be proved in NJ. Proof. Suppose that it is provable. By proposition 2.3.5.1, either ¬X or X is provable and by theorem 2.3.3.1 and proposition 2.3.3.2, we can assume that this proof is cut-free and ends with an introduction rule. Clearly, ` X is not provable (because there is no corresponding introduction rule), so that we must have a cut-free proof of the form π X`⊥ (¬I ) ` ¬X By proposition 2.2.11.1, if we had such a proof, we would in particular have one where X is replaced by >: π0 >`⊥ but, by proposition 2.2.7.4, we could remove > from the hypothesis and obtain a proof π 00 `⊥

CHAPTER 2. PROPOSITIONAL LOGIC

63

which is excluded by the consistency of NJ, see theorem 2.3.4.2. Of course, the above theorem does not state that, for a particular given formula A, the formula ¬A ∨ A is not provable. For instance, with A = >, we have (>I )

`> (∨rI ) ` ¬> ∨ > It however states that we cannot prove ¬A ∨ A without entering the details of A. This will be studied in more detail in section 2.5, where other examples of non-provable formulas are given. Since the excluded-middle is not provable, maybe could it false in our logic? It is not the case because we can show that the excluded-middle is not falsifiable either, since we can prove the formula ¬¬(¬A ∨ A) as follows: (ax)

¬(¬A ∨ A), A ` A (∨rI ) ¬(¬A ∨ A), A ` ¬(¬A ∨ A) ¬(¬A ∨ A), A ` ¬A ∨ A (¬E ) ¬(¬A ∨ A), A ` ⊥ (¬I ) ¬(¬A ∨ A) ` ¬A (∨lI ) ¬(¬A ∨ A) ` ¬A ∨ A (¬E ) ¬(¬A ∨ A) ` ⊥ (¬I ) ` ¬¬(¬A ∨ A) (ax)

¬(¬A ∨ A) ` ¬(¬A ∨ A)

(ax)

(2.2) This proof will be analyzed in more details in section 2.5.2. A variant of the above lemma which is sometimes useful is the following one: Lemma 2.3.5.3. Given a propositional variable X, the formula ¬X ∨¬¬X cannot be proved in NJ. Proof. Let us prove this in a slightly different way than in lemma 2.3.5.2. It can be proved in NJ that ¬> ⇒ ⊥: ¬> ` ¬>

(ax)

¬> ` >

¬> ` ⊥ ` ¬> ⇒ ⊥

(>I ) (¬E ) (⇒I )

and that ¬¬⊥ ⇒ ⊥: (ax)

¬¬⊥, ⊥ ` ⊥ (ax) (¬I ) ¬¬⊥ ` ¬¬⊥ ¬¬⊥ ` ¬⊥ (¬E ) ¬¬⊥ ` ⊥ (⇒I ) ` ¬¬⊥ ⇒ ⊥ Now, suppose that we have a proof of ¬X ∨ ¬¬X. By proposition 2.3.5.1, either ¬X or ¬¬X is provable. By proposition 2.2.11.1, either ¬> or ¬¬⊥ is provable. In both cases, by (⇒E ), using the above proof, ⊥ is provable, which we know is not the case by consistency, see theorem 2.3.4.2

CHAPTER 2. PROPOSITIONAL LOGIC

Γ`⊥ (⊥E ) Γ`A Γ`B π Γ`A∨B

π Γ`A∨B

64

...

Γ`⊥ (⊥E ) Γ`B

(?E )

π0 π 00 Γ, A ` C Γ, B ` C (∨E ) Γ`C Γ`D π0 ... Γ, A ` C (?E ) Γ, A ` D Γ`D

... (?E )

π 00 ... Γ, B ` C (?E ) Γ, B ` D

(∨E )

Figure 2.3: Elimination of commutative cuts. 2.3.6 Commutative cuts. Are the cuts the only situations where one is doing useless work in proofs? No. It turns out that falsity and disjunction induce some more situations where we would like to eliminate “useless work”. For instance, consider the following proof: (ax)

⊥`⊥ (⊥E ) ⊥`A∨A

⊥, A ` A ⊥`A

(ax)

⊥, A ` A

(ax) (∨E )

For the hypothesis ⊥, we deduce the “general” statement that A ∨ A holds, from which we deduce that A holds. Clearly, we ought to be able to simplify this proof by into (ax)

⊥`⊥ (⊥E ) ⊥`A where we directly prove A instead of using the “lemma” A∨A as an intermediate step. Another example of a situation is the following one: (ax)

A, B ∨ C ` B ∨ C

(ax)

(ax)

A, B ∨ C, B ` A A, B ∨ C, B ` A (∧I ) A, B ∨ C, B ` A ∧ A A, B ∨ C ` A ∧ A A, B ∨ C ` A

(ax)

(ax)

A, B ∨ C, C ` A A, B ∨ C, C ` A (∧I ) A, B ∨ C, C ` A ∧ A

(∨E ) (∧lE )

Here, in a context containing A, we prove A ∧ A, from which we deduce A, whereas we could have directly proved A instead. This is almost a typical cut situation between the rule (∧I ) and (∧lE ), excepting that we cannot eliminate the cut because the two rules are separated by the intermediate rule (∨E ). In order for the system to have nice properties, we should thus add to the usual cut-elimination rules, the rules of figure 2.3, where (?E ) is meant to denote an arbitrary elimination rule. Those rules eliminate what we call commutative cuts, see [Gir89, section 10.3].

CHAPTER 2. PROPOSITIONAL LOGIC

65

2.4 Proof search An important question is whether there is an automated procedure in order to perform proof-search in NJ, i.e. answer the question: Is a given sequent Γ ` A provable? In general, the answer is yes, but the complexity is hard. In order to do so, the basic idea of course consists in trying to construct a proof derivation whose conclusion is our sequent, from bottom up. 2.4.1 Reversible rules. A rule is reversible when, if its conclusion is provable, then its hypothesis are provable. Such rules are particularly convenient in order to search for proofs since we know that we can always apply them: if the conclusion sequent was provable then the hypothesis still are. For instance, the rule Γ`A (∨lI ) Γ`A∨B is not reversible: if, while searching for a proof of Γ ` A ∨ B, we apply it, we might have to backtrack in the case where Γ ` A is not provable, since maybe Γ ` B was provable instead, the most extreme example being .. . `⊥ (∨lI ) `⊥∨> where we have picked the wrong branch of the disjunction and try to prove ⊥, whereas > was directly provable. On the contrary, the rule Γ`A Γ`B (∧I ) Γ`A∧B is reversible: during proof search, we can apply it without regretting our choice. Proposition 2.4.1.1. In NJ, the reversible rules are (ax), (⇒I ), (∧I ), (>I ) and (¬I ). Proof. Consider the case of (⇒I ), the other cases being similar. In order to show that this rule (recalled on the left) is reversible, we have to show that if the conclusion is provable then the premise also is, otherwise said that the rule on the right is admissible: Γ, A ` B (⇒I ) Γ`A⇒B

Γ`A⇒B Γ, A ` B

Suppose that we have a proof π of the conclusion Γ ` A ⇒ B. We can construct a proof of Γ, A ` B by π Γ`A⇒B (wk) Γ, A ` A ⇒ B Γ, A ` B

Γ, A ` A

(ax) (⇒E )

CHAPTER 2. PROPOSITIONAL LOGIC

66

For instance, want to prove the formula X ⇒ Y ⇒ X ∧ Y . We can try to apply the reversible rules as long as we can, and indeed, we end up on a proof: (ax)

(ax)

X, Y ` X X, Y ` Y X, Y ` X ∧ Y X `Y ⇒X ∧Y `X ⇒Y ⇒X ∧Y

(∧I ) (⇒I ) (⇒I )

2.4.2 Proof search. In order to have an idea of how proof search can be performed in general in NJ, we now describe an algorithm, due to Statman, see [Sta79] and [SU06, section 6.6]. For simplicity, we restrict to the implicational fragment, where formulas are built out of variables and implication, and the rules are (ax), (⇒E ) and (⇒I ). Suppose fixed a sequent Γ ` A. A naive search for a proof tree of this sequent might end up in a loop: the search space is not finite. For instance, looking for a proof of X ⇒ X ` X, our search might try to construct an infinite proof tree looking like this: (ax)

Γ`X⇒X

(ax)

Γ`X⇒X

Γ`X⇒X Γ`X Γ`X

(ax)

Γ`X

.. . (⇒E ) Γ`X

(⇒E ) (⇒E ) (⇒E )

where Γ = X ⇒ X. However, we can observe that such a proof contains infinitely many cuts, whereas theorem 2.3.3.1 ensures that we can look for cutfree proofs. We see another application of this theorem: it drastically shrinks the search space. After some more thought, it can be observed that we can always look on cut-free proofs of the following form, depending on the form of the formula we are trying to prove: – Γ ` A ⇒ B: the last rule must be Γ, A ` B (⇒I ) Γ`A⇒B and we look for a proof of Γ, A ` B, – Γ ` X: the proof must end with (ax)

.. . Γ ` A1

Γ ` A1 ⇒ A2 ⇒ . . . ⇒ An ⇒ X Γ ` A2 ⇒ . . . ⇒ An ⇒ X .. . Γ ` An ⇒ X

(⇒E )

.. . Γ ` A2

(⇒E ) (⇒E )

Γ`X

where the particular case n = 0 is Γ`X

(ax)

.. . Γ ` An

(⇒E )

CHAPTER 2. PROPOSITIONAL LOGIC

67

(** Formulas. *) type t = | Var of string | Imp of t * t (** let | |

Arguments of an implication ending with x. *) rec split_imp x = function Var y -> if x = y then Some [] else None Imp (a, b) -> match split_imp x b with | Some l -> Some (a::l) | None -> None

(** Determine whether a sequent is provable in a given context. *) let rec provable env : t -> bool = function | Var x -> List.exists (fun a -> match split_imp x a with | Some l -> List.for_all (provable env) l | None -> false ) env | Imp (a, b) -> provable (a::env) b Figure 2.4: Statman’s algorithm. we thus try to find a formula of the form A1 ⇒ . . . ⇒ An ⇒ X in the context such that all the Γ ` Ai are provable. This results in an algorithm which is relatively easy to implement, see figure 2.4. This algorithm can be shown to be in PSPACE and the problem is actually PSPACE-complete. Other methods for searching proofs in intuitionistic logic are presented in section 2.6.5.

2.5 Classical logic As we have seen in section 2.3.5, not all the formulas that we expect to hold in logic are provable in intuitionistic logic, such as the excluded middle (lemma 2.3.5.2). By opposition, the usual notion of validity (e.g. coming from boolean models) is called classical logic. If classical logic is closer to the usual intuition of validity, the main drawback for us is that this logic is not constructive, in the sense that we cannot necessarily extract witnesses from proofs: if we have proved ¬A ∨ A, we do not necessarily know which one of ¬A or A actually holds. A well-known typical classical reasoning is the following. We want to prove that there √ exist two irrational√ numbers a and b such that ab is rational. We know that 2 is irrational: if 2 = p/q then p2 = 2q 2 , but the number of prime factors is even on the left and odd on√the right. Reasoning using the excluded √ 2 middle, we know that the number 2 is either rational or irrational:

CHAPTER 2. PROPOSITIONAL LOGIC

68

– if it is rational, we conclude with a = b = – otherwise, we take a = concludes the proof.



2

√ 2

and b =





2,

2, and we have ab = 2 which

We have been able to prove the property, but we are not able to exhibit a concrete value for a and b. From the proof-as-program correspondence, the excluded middle is also quite puzzling. Suppose that we are in a logic rich enough to encode Turing machines (or, equivalently, execute a program in a usual programming language) and that we have a predicate Halts(M ) which is holds when M is halting (you should find this quite plausible after having read chapter 6). In classical logic, the formula ¬ Halts(M ) ∨ Halts(M ) holds for every Turing machine M , which seems to mean that we should be able to decide whether a Turing machine is halting or not, but there is no hope of finding such an algorithm since Turing has shown that the halting problem is undecidable [Tur37]. 2.5.1 Axioms for classical logic. A logical system for classical logic, called NK (for K lassical N atural deduction), can be obtained from NJ (figure 2.1) by adding a new rule corresponding to the excluded middle Γ ` ¬A ∨ A

(lem)

In this sense, the excluded middle is the only thing which is missing in intuitionistic logic to be classical. This is shown in theorems 2.5.6.1 and 2.5.6.5. In fact, excluded middle is not the only possible choice, and other equivalent axioms can be added instead. Most of those axioms correspond to usual reasoning patterns, which have been known for a long time, and thus bear latin names. Theorem 2.5.1.1. The following principles are equivalent in NJ: (i) excluded middle, also called tertium non datur: ¬A ∨ A (ii) double-negation elimination or reductio ad absurdum: ¬¬A ⇒ A (iii) contraposition: (¬B ⇒ ¬A) ⇒ (A ⇒ B) (iv) counter-example principle: ¬(A ⇒ B) ⇒ A ∧ ¬B (v) Peirce law: ((A ⇒ B) ⇒ A) ⇒ A

CHAPTER 2. PROPOSITIONAL LOGIC

69

(vi) Clavius’ law or consequentia mirabilis: (¬A ⇒ A) ⇒ A (vii) Tarski’s formula: A ∨ (A ⇒ B) (viii) one of the following de Morgan laws: ¬(¬A ∧ ¬B) ⇒ A ∨ B ¬(¬A ∨ ¬B) ⇒ A ∧ B (ix) material implication: (A ⇒ B) ⇒ (¬A ∨ B) (x) ⇒/∨ distributivity: (A ⇒ (B ∨ C)) ⇒ ((A ⇒ B) ∨ C) By “equivalent”, we mean here that if we suppose that one holds for every formulas A, B and C then the other one also holds for every formulas A, B and C, and conversely. Proof. We only show here the equivalence between the first two, the other ones being left as an exercise. Suppose that the excluded middle holds, we can show reductio ad absurdum by (ax)

¬¬A ` ¬A ∨ A

(ax)

¬¬A, ¬A ` ¬¬A ¬¬A, ¬A ` ¬A (¬E ) ¬¬A, ¬A ` A ¬¬A ` A ` ¬¬A ⇒ A

¬¬A, A ` A

(ax) (∨E ) (⇒I )

(2.3) Suppose that reductio ad absurdum holds, we can show the excluded middle by

` ¬¬(¬A ∨ A) ⇒ (¬A ∨ A) ` ¬A ∨ A

π ` ¬¬(¬A ∨ A)

(∨E )

(2.4)

where π is the proof (2.2) on page 63. Remark 2.5.1.2. One should be careful about the quantifications over formulas involved in theorem 2.5.1.1. In order to illustrate this, let us detail the equivalence between excluded middle and reductio ad absurdum. We say that a formula A is decidable when ¬A ∨ A holds and regular when ¬¬A ⇒ A holds. The derivation (2.3) shows that every decidable formula is regular, but the converse does not hold: the derivation (2.4) only shows that A is decidable when ¬A ∨ A (as opposed to A) is regular. In fact a concrete example of a formula which is regular but not decidable, can be given by taking A = ¬X: the formula ¬¬¬X ⇒ ¬X holds (lemma 2.5.9.4), but ¬¬X ∨ ¬X cannot be proved (lemma 2.3.5.3). Thus, it is important to remark that theorem 2.5.1.1 does not say that a formula is regular if and only if it is decidable, but rather that every formula is regular if and only if every formula is decidable.

CHAPTER 2. PROPOSITIONAL LOGIC

70

Among those axioms, the Pierce law is less natural than others but has the advantage of requiring only implication, so that it still makes sense in some small fragments of logic such as minimal implicational logic. Also, note that the fact that material implication occurs in this list means that A ⇒ B is not equivalent to ¬A ∨ B in NJ, as it is in NK. For each of these axioms, we could add more or less natural forms of rules. For instance, the law of the excluded middle can also be implemented by the nicer looking rule Γ, ¬A ` B Γ, A ` B (lem) Γ`B similarly, reductio ad absurdum can be implemented by one of the following rules Γ ` ¬¬A ⇒ A

Γ ` ¬¬A (¬¬E ) Γ`A

Γ, ¬A ` ⊥ Γ`A

Γ, ¬A ` A Γ`A

Since classical logic is obtained from intuitionistic by adding axioms, it is obvious that Lemma 2.5.1.3. An intuitionistic proof is a valid classical proof. We have seen that the converse does not hold (lemma 2.3.5.2), but we will see in section 2.5.9 that we can still embed classical proofs in intuitionistic logic. 2.5.2 The intuition behind classical logic. Let us try to give some proof theoretic intuition about how classical logic works. Proof irrelevance. We have already mentioned that we can interpret a formula as a set JAK, intuitively corresponding to all the proofs of A, and implications as function spaces: JA ⇒ BK = JAK → JBK. In this interpretation, ⊥ of course corresponds to the empty set since we do not expect to have a proof of it: J⊥K = ∅. Now, given a formula A, its negation ¬A = (A ⇒ ⊥) is interpreted as the set of functions from JAK to ∅: – if JAK is non-empty, the set J¬AK = JAK → ∅ is empty, – if JAK is empty, the set J¬AK = ∅ → ∅ contains exactly one element. The last point might seem surprising, but if we think hard about it it makes sense. For instance, in set theory, a function f : JAK → JBK is usually defined as a relation f ⊆ JAK × JBK which satisfies some properties, expressing that each element of JAK should have exactly one image in JBK. Now, when both the sets are empty, we are looking for a relation f ⊆ ∅ × ∅ and there is exactly one such relation: the empty set (which trivially satisfies the axioms for functions). Applying twice the reasoning above, we get that – if JAK is non-empty, J¬¬AK contains exactly one element, – if JAK is empty, J¬¬AK is empty. In other words, ¬¬A can be seen as the formula A where the only thing which matters is not all the proofs of A (i.e. the elements of JAK), but only whether there exists a proof of A or not, since we have reduced its contents to at most

CHAPTER 2. PROPOSITIONAL LOGIC

71

one point. For this reason, doubly negated formulas are sometimes said to be proof irrelevant: again, the actual proof does not matter, only its existence. For instance, we now understand why ¬¬(¬A ∨ A) is provable intuitionistically: it states that it is true that there exists a proof of ¬A or a proof of A, as opposed to ¬A ∨ A which states that we have a proof of ¬A or a proof of A. From this point of view, the classical axiom ¬¬A ⇒ A now seems like deep magic: it means that if we know that there exists a proof of A we can actually extract a proof of A. This can only be true if we assume that there can be at most one proof for a formula, i.e. formulas are interpreted as booleans and not sets (see section 2.5.4 for a logical point of view on this). This also explains why we can actually embed classical logic into intuitionistic logic by double-negating formulas, see section 2.5.9: if we are only interested in their existence, intuitionistic proofs behave classically. Resetting proofs. Let us give another, more operational, point of view on the axiom ¬¬A ⇒ A. We have mentioned that it is equivalent to having the rule Γ ` ¬¬A (¬¬E ) Γ`A so that when searching for a proof of A, we can instead prove ¬¬A. What do we gain in doing so? At first it does not seem much, since we can go back to proving A: .. . (ax)

Γ, ¬A ` ¬A Γ, ¬A ` A (¬E ) Γ, ¬A ` ⊥ (¬I ) Γ ` ¬¬A But there is one difference, we now have the additional hypothesis ¬A in our context, and we can use it at any point in the proof to go back to proving A instead of the current goal B, while keeping the current context: (ax)

.. . 0 Γ , ¬A ` A

Γ0 , ¬A ` ¬A Γ0 , ¬A ` ⊥ Γ0 , ¬A ` B

(⊥E ) (¬E )

In other words, we can “reset proofs” during proof search, i.e. we can implement the following behavior (up to minor details such as weakening): .. . 0 Γ `A (reset) Γ0 ` B .. . Γ`A

CHAPTER 2. PROPOSITIONAL LOGIC

72

Note that we keep the context Γ0 after the reset. Now, let us show how we can use this to prove ¬A ∨ A. When faced with the disjunction, we choose the left branch, i.e. prove ¬A, which by (¬I ) this amounts to prove ⊥, supposing A as hypothesis. Instead of going on and proving ⊥, which is quite hopeless, we use our reset mechanism and go back to proving ¬A ∨ A: while doing so we have kept A as hypothesis! So, this time we chose to prove A, which can be done by an axiom. If we think of reset as the possibility of “going back in time” and changing one’s mind, this proof implements the following conversation between us, trying to build the proof, and an opponent trying to prove us wrong: — Show me the formula ¬A ∨ A. — Ok, I will show that ¬A holds. — Here is a proof π of A, show me how to deduce ⊥. — Actually, I changed my mind, I will prove A, here is the proof: π. The formal proof goes on like this (ax)

A`A (∨rI ) A ` ¬A ∨ A (reset) A`⊥ (¬I ) ` ¬A (∨lI ) ` ¬A ∨ A In more details, the proof begins by proving ¬¬(¬A ∨ A) instead of ¬A ∨ A and proceeds as in (2.2) on page 63. This idea of resetting will be explored again, under a different form, in section 4.6. 2.5.3 A variant of natural deduction. The presentation given in section 2.5.1 is not very “canonical” in the sense that it consists in randomly adding an axiom in the list given in theorem 2.5.1.1. We would like to present another approach which consists in slightly changing the calculus, and allows for a much more pleasant proof system. We now consider sequents of the form Γ`∆ where both Γ and ∆ are contexts. Such a sequent should be read as “supposing all the formula in Γ, I can prove some formula in ∆”. This is a generalization of previous sequents, where ∆ was restricted to exactly one formula. The rules for this sequent calculus are given in figure 2.5. In order to ease the presentation, we consider here that the formulas of ∆ can be explicitly permuted, duplicated, and so on, using the structural rules (xchR ), (wkR ), (contrR ) and (⊥R ), which we generally leave implicit in examples. Those rules are essentially the same as those for NJ, with contexts added on the right, excepting for the rule (∨lI ) and (∨rI ), which are now combined into the rule Γ ` A, B, ∆ (∨I ) Γ ` A ∨ B, ∆

CHAPTER 2. PROPOSITIONAL LOGIC

Γ, A, Γ0 ` A, ∆

73

(ax)

Γ ` A ⇒ B, ∆ Γ ` A, ∆ (⇒E ) Γ ` B, ∆ Γ ` A ∧ B, ∆ l (∧E ) Γ ` A, ∆

Γ, A ` B, ∆ (⇒I ) Γ ` A ⇒ B, ∆

Γ ` A ∧ B, ∆ r (∧E ) Γ ` B, ∆

Γ ` A, ∆ Γ ` B, ∆ (∧I ) Γ ` A ∧ B, ∆ Γ`∆ (>I ) Γ ` >, ∆

Γ ` A ∨ B, ∆

Γ, A ` C, ∆ Γ ` C, ∆

Γ, B ` C, ∆

(∨E )

Γ ` A, B, ∆ (∨I ) Γ ` A ∨ B, ∆ Γ`∆ (⊥I ) Γ ` ⊥, ∆

Γ ` ¬A, ∆ Γ ` A, ∆ (¬E ) Γ ` ⊥, ∆

Γ, A ` ⊥, ∆ (¬I ) Γ ` ¬A, ∆

structural rules: Γ ` ∆, A, B, ∆0 (xchR ) Γ ` ∆, B, A, ∆0

Γ ` ∆, ∆0 (wkR ) Γ ` ∆, A, ∆0

Γ ` ∆, A, A, ∆0 (contrR ) Γ ` ∆, A, ∆0

Γ ` ∆, ⊥, ∆0 (⊥R ) Γ ` ∆, ∆0

Figure 2.5: NK: rules of classical natural deduction.

CHAPTER 2. PROPOSITIONAL LOGIC

74

In order to prove a disjunction A ∨ B, we do not have to choose anymore if we want to prove A or B: we can try to prove both at the same time. This means that there can be some “exchange of information” between the proofs of A and of B (via the context Γ). For instance, we have that the excluded middle can be proved by (ax)

A ` A, ⊥ (xchR ) A ` ⊥, A (¬I ) ` ¬A, A (∨I ) ` ¬A ∨ A Note that the formula A in the context, obtained from the ¬A, is used in the axiom in order to prove the other A. Similarly, double negation elimination is proved by (ax)

¬¬A, A ` A (ax) (¬I ) ¬¬A ` ¬¬A, A ¬¬A ` ¬A, A (¬E ) ¬¬A ` ⊥, A (⊥E ) ¬¬A ` A, A (contrR ) ¬¬A ` A (⇒I ) ` ¬¬A ⇒ A Again, instead of proving A, we decide to either prove ⊥ or A. The expected elimination rule for the constant ⊥ (shown on the left) is not present, but it can be derived (as shown on the right):

Γ ` ⊥, ∆ (⊥E ) Γ ` A, ∆

Γ ` ⊥, ∆ (⊥R ) Γ`∆ (wkR ) Γ ` A, ∆

In fact the constant ⊥ is now superfluous, since one can convince himself that proving ⊥ amounts to proving the empty sequent ∆. 2.5.4 Cut-elimination in classical logic. Classical logic also does have the cut-elimination property, see section 2.3.3, although this is more subtle to show than in the case of intuitionistic logic due to the presence of structural rules. In particular, in addition to the usual cut elimination steps, we need to add rules making elimination rules “commute” with structural rules: namely, an introduction and the corresponding elimination rules can be separated by structural rules. For instance, suppose that we want to eliminate the following “cut”: π π0 Γ`A Γ`B (∧I ) Γ`A∧B (wkR ) Γ ` A ∧ B, C (∧lE ) Γ ` A, C

CHAPTER 2. PROPOSITIONAL LOGIC

75

We first need to make the elimination rule for conjunction commute with the weakening: π π0 Γ`A Γ`B (∧I ) Γ`A∧B (∧lE ) Γ`A (wkR ) Γ ` A, C and then we can finally properly eliminate the cut: π (∧lE ) Γ`A (wkR ) Γ ` A, C Another surprising phenomenon was observed by Lafont [Gir89, section B.1]. Depending on the order in which we eliminate cuts, the following proof π0 π Γ`A Γ`B (wkR ) (wkR ) Γ ` ¬C, A Γ ` C, B (¬E ) Γ ` ⊥, A, B (⊥R ) Γ ` A, B both cut-eliminates to π Γ`A (wkR ) Γ ` A, B

and

π0 Γ`B (wkR ) Γ ` A, B

This is sometimes called Lafont’s critical pair. We like to identify proofs up to cut elimination (much more on this in chapter 4) and therefore those two proofs should be considered as being “the same”. In particular, when both π and π 0 are proofs of Γ ` A, i.e. A = B, this forces us to identify the two proofs π Γ`A (wkR ) Γ ` A, A (contrR ) Γ`A

and

π0 Γ`A (wkR ) Γ ` A, A (contrR ) Γ`A

and thus to identify the two proofs π and π 0 . More generally, by a similar reasoning, any two proofs of a same sequent Γ ` ∆ should be identified. Cuts can hurt! This gives another, purely logical, explanation about why classical logic is “proof irrelevant”, as already mentioned in section 2.5.2: up to cutelimination, there is at most one proof of a given sequent. 2.5.5 De Morgan laws. In classical logic, the well-known de Morgan laws hold: ¬(A ∧ B) ⇔ ¬A ∨ ¬B

¬> ⇔ ⊥

¬(A ∨ B) ⇔ ¬A ∧ ¬B

¬⊥ ⇔ >

A ⇒ B ⇔ ¬A ∨ B ¬¬A ⇔ A

CHAPTER 2. PROPOSITIONAL LOGIC

76

Definable connectives. Because of these laws, many connectives are superfluous. For instance, classical logic can be axiomatized with ⇒ and ⊥ as only connectives, since we can define A ∨ B = ¬A ⇒ B

A ∧ B = ¬(A ⇒ ¬B)

¬A = A ⇒ ⊥

>=⊥⇒⊥

and the logical system can be reduced to the four following rules: 0

Γ, A, Γ ` A, ∆

Γ`∆ (⊥I ) Γ ` ⊥, ∆

(ax)

Γ ` A ⇒ B, ∆ Γ ` A, ∆ (⇒E ) Γ ` B, ∆

Γ, A ` B, ∆ (⇒I ) Γ ` A ⇒ B, ∆

together with the four structural rules. Several other choices of connectives are possible. Clausal form. It is natural to consider the equivalence relation on formulas which relates any two formulas A and B such that A ⇔ B. The de Morgan laws can be used to put every formula into some canonical representative in its equivalence class up to this equivalence relation. We first need to introduce some classes of formulas. A literal L is either a variable or a negated variable: L ::= X | ¬X A clause C is a disjunction of literals: C ::= L | C ∨ C | ⊥ A formula A is in clausal form or in conjunctive normal form when it is a conjunction of clauses: A ::= C | A ∧ A | > Proposition 2.5.5.1. Every formula is equivalent to one in clausal form. One way to show this result is to use the de Morgan laws, as well as usual intuitionistic laws (section 2.2.5), in order to push negations toward variables and disjunctions below conjunctions, i.e. we replace subformulas according to the following rules, until no rule applies: ¬(A ∧ B)

¬A ∨ ¬B

¬>



¬(A ∨ B)

¬A ∧ ¬B

¬⊥

>

(A ∧ B) ∨ C

(A ∨ C) ∧ (B ∨ C)

>∨C

>

A ∨ (B ∧ C)

(A ∨ B) ∧ (A ∨ C)

A∨>

>

¬¬A

A

A⇒B

¬A ∨ B

Those rules rewrite formulas into classically equivalent ones, since those are instances of de Morgan laws. However, it is not clear that the process terminates. It does, but it is not efficient, and we will see below a better way to put a formula in clausal form.

CHAPTER 2. PROPOSITIONAL LOGIC

77

Example 2.5.5.2. A clausal form of (X ⇒ Y ) ⇒ (Y ⇒ Z) can be computed by ¬(¬X ∨ Y ) ∨ (¬Y ∨ Z)

(¬¬X ∧ ¬Y ) ∨ (¬Y ∨ Z) (X ∧ ¬Y ) ∨ (¬Y ∨ Z) (X ∨ ¬Y ∨ Z) ∧ (¬Y ∨ ¬Y ∨ Z)

Efficient computation of the clausal form. Given a clause C, we write L(C) for the set of literals occurring in it: L(X) = {X}

L(¬X) = {¬X}

L(C ∨ D) = L(C) ∪ L(D)

L(⊥) = ∅

A variable X occurs positively (resp. negatively) in A if we have X ∈ L(C) (resp. ¬X ∈ L(C)). Up to equivalence, formulas satisfy the laws of commutative idempotent monoids with respect to ∨ and ⊥: (A ∨ B) ∨ C ⇔ A ∨ (B ∨ C)

⊥∨A⇔A

B∨A⇔A∨B

A∨⊥⇔A

A∨A⇔A

Because of this, a clause is characterized by the set of literals occurring in it, see appendix A.2: Lemma 2.5.5.3. Given clauses C and D, if L(C) = L(D) then C ⇔ D. Similarly, a formula in clausal form is characterized by the set of clauses occurring in it. A formula in clausal form A can thus be encoded as a set of sets of literals: A = {{L11 , . . . , L1k1 }, . . . , {Ln1 , . . . , Lnkn }} Note that the empty set ∅ corresponds to the formula > whereas the set {∅} corresponds to the formula ⊥. In practice, we can represent a formula as a list of lists of clauses (where the order or repetitions of the elements of the lists do not matter). Based on this, an algorithm for putting a formula in clausal form is provided in figure 2.6. A literal is encoded as a pair consisting of a variable and a boolean indicating whether it is negated or not (by convention, false means negated), a clause as a list of literals, and clausal form as a list of clauses. Given a formula A, the functions pos and neg, respectively compute the clausal form of A and ¬A. They are using the function merge which, given two formulas in clausal form A = {C1 , . . . , Cm }

and

B = {D1 , . . . , Dn }

compute the clausal form of A ∨ B, which is A ∨ B = {Ci ∪ Dj | 1 6 i 6 m, 1 6 j 6 n} The notion clausal form can be further improved as follows. We say that a formula is in canonical clausal form when 1. it is in clausal form, 2. it does not contain twice the same clause or > (this is automatic if it is represented as a set of clauses), 3. no clause contains twice the same literal or ⊥ (this is automatic if they are represented as sets of literals),

CHAPTER 2. PROPOSITIONAL LOGIC

type var = int (** Formulas. *) type t = | Var of var | And of t * t | Or of t * t | Imp of t * t | Not of t | True | False type litteral = bool * var (** Litteral. *) (* false = negated *) type clause = litteral list (** Clause. *) type cnf = clause list (** Clausal formula. *) let clausal a : cnf = let merge a b = List.flatten (List.map (fun c in let rec pos = function | Var x -> [[true, x]] | And (a, b) -> let a = pos a | Or (a, b) -> let a = pos a | Imp (a, b) -> let a = neg a | Not a -> neg a | True -> [] | False -> [[]] and neg = function | Var x -> [[false, x]] | And (a, b) -> let a = neg a | Or (a, b) -> let a = neg a | Imp (a, b) -> let a = pos a | Not a -> pos a | True -> [[]] | False -> [] in pos a

-> List.map (fun d -> c@d) b) a)

in let b = pos b in a@b in let b = pos b in merge a b in let b = pos b in merge a b

in let b = neg b in merge a b in let b = neg b in a@b in let b = neg b in a@b

Figure 2.6: Putting a formula in clausal form.

78

CHAPTER 2. PROPOSITIONAL LOGIC

79

4. no clause contains both a literal and its negation. For the last point, given a clause C containing both X and ¬X, the equivalences ¬X ∨ X ⇔ > and > ∨ A ⇔ A imply that the whole clause is equivalent to > and can thus be removed from the formula. For instance, the clausal form computed in example 2.5.5.2 is not canonical because it does not satisfy the second point above. Exercise 2.5.5.4. Modify the algorithm of figure 2.6 so that it computes canonical clausal forms. De Morgan laws in intuitionistic logic. Let us insist once again on the fact that the de Morgan laws do not hold in intuitionistic logic. Namely, the following implications are intuitionistically true, but not their converse: A ∨ B ⇒ ¬(¬A ∧ ¬B)

¬A ∨ ¬B ⇒ ¬(A ∧ B)

A ∧ B ⇒ ¬(¬A ∨ ¬B)

¬A ∨ B ⇒ A ⇒ B

However, the following equivalence does hold intuitionistically: ¬A ∧ ¬B ⇔ ¬(A ∨ B) 2.5.6 Boolean models. Classical natural deduction matches exactly the notion of truth one would get from usual boolean models. Let us detail this. We write B = {0, 1} for the set of booleans. A valuation ρ is a function X → B, assigning booleans to variables. Such a valuation can be extended as a function ρ : Prop → B, from propositions to booleans, by induction over the propositions by ρ(X) = 1 iff ρ(X) = 1 ρ(A ⇒ B) = 1 iff ρ(A) = 0 or ρ(B) = 1 ρ(A ∧ B) = 1 iff ρ(A) = 1 and ρ(B) = 1 ρ(>) = 1 ρ(A ∨ B) = 1 iff ρ(A) = 1 or ρ(B) = 1 ρ(⊥) = 0 Given a formula A and a valuation ρ, we write ρ A whenever ρ(A) = 1 and say that the formula A is satisfied in the valuation ρ. Given a context Vn Γ = A1 , . . . , An , we write Γ ρ A whenever ρ ( i=1 Ai ) ⇒ A. Finally, we write Γ  A whenever Γ ρ A for every valuation ρ and, in this case, say that A is valid in the context Γ or that the sequent Γ ` A is valid. The system NK is correct in the sense that is only allows the derivation of valid sequents. Theorem 2.5.6.1 (Soundness). If Γ ` A is derivable then Γ  A. Proof. By induction on the proof of Γ ` A. Since NJ is a subsystem of NK, it thus also allows only the derivation of valid sequents. As simple as it may seem, the above theorem allows proving the consistency of intuitionistic and classical logic (which was already demonstrated in theorem 2.3.4.2 for intuitionistic logic):

CHAPTER 2. PROPOSITIONAL LOGIC

80

Corollary 2.5.6.2. The system NK (and thus also NJ) is consistent. Proof. Suppose that NK is not consistent. By lemma 2.3.4.1, we would have a proof of ` ⊥. By theorem 2.5.6.1, we would have ρ(⊥) = 1. But ρ(⊥) = 0 by definition, contradiction. Conversely, we can show that the system NK is complete, meaning that if a sequent Γ ` A is valid, i.e. we have Γ  A, then it is derivable. As a particular case, we will have that if a formula A valid then it is provable, i.e. ` A is derivable. We first need the following lemmas. Lemma 2.5.6.3. For any formulas A and B, variable X and valuation ρ, we have ρ(A[B/X]) = ρ0 (A), where ρ0 (X) = ρ(B) and ρ0 (Y ) = ρ(X) for X 6= Y . Proof. By induction on A. Lemma 2.5.6.4. For any formula A, the formula ((X ⇒ A[>/X]) ∧ (¬X ⇒ A[⊥/X])) ⇒ A is derivable in NK. Proof. For concision, we write δX A = (X ⇒ A[>/X]) ∧ (¬X ⇒ A[⊥/X]) We reason by induction on the formula A. If A = X then δX X = (X ⇒ >) ∧ (¬X ⇒ ⊥) and we have (ax)

δX X, ¬X ` (X ⇒ >) ∧ (¬X ⇒ ⊥) (∧rE ) δX X, ¬X ` ¬X ⇒ ⊥ δX X, ¬X ` ⊥ δX X ` ¬¬X δX X ` X ` δX X ⇒ X

δX X, ¬X ` ¬X

(ax) (⇒E ) (¬I ) (¬¬E ) (⇒I )

If A = Y with Y 6= X, we have δX Y = (X ⇒ Y ) ∧ (¬X ⇒ Y ) and, using the fact that X ∨ ¬X is derivable,

δX Y ` X ∨ ¬X

δX Y, X ` (X ⇒ Y ) ∧ (¬X ⇒ Y ) δX Y, X ` X ⇒ Y δX Y, X ` Y δX Y ` Y

Other cases are left to the reader.

δX Y, X ` X

.. . δX Y, ¬X ` Y

CHAPTER 2. PROPOSITIONAL LOGIC

81

Theorem 2.5.6.5 (Completeness). If Γ  A holds then Γ ` A is derivable. Proof. We proceed by induction on the number of free variables of A. If FV(A) = ∅ then we easily show that Γ ` A by induction on A. Otherwise, pick a variable X ∈ FV(A). By lemma 2.5.6.3, the sequents Γ, X ` A[>/X] and Γ, ¬X ` A[⊥/X] are valid, and thus derivable by induction hypothesis. Finally, using lemma 2.5.6.4, the fact that the excluded middle X ∨ ¬X holds, we have the derivation

.. . Γ ` δX A ⇒ A

Γ ` X ∨ ¬X

.. . Γ, X ` A[>/X] Γ ` X ⇒ A[>/X] Γ ` δX A Γ`A

.. . Γ, ¬X ` A[⊥/X] Γ ` ¬X ⇒ A[⊥/X]

(∨E ) (⇒E )

which allows us to conclude. A detailed and formalized version of this proof can be found in [CKA]. Of course, intuitionistic natural deduction is not complete with respect to boolean models since there are formulas, such as ¬X ∨ X, which evaluate to true under any valuation but are not derivable (lemma 2.3.5.2). One way to understand this is that there are “not enough boolean models” in order to detect that such formulas are not valid. A natural question is thus: is there a way to generalize the notion of boolean model, so that intuitionistic natural deduction is complete with respect to this notion of model, i.e. a formula which is valid in any such model is necessarily intuitionistically provable? We will see in section 2.8 that such a notion of model exists: Kripke models. 2.5.7 DPLL. As an aside, we would like to present the usual algorithm to decide the satisfiability of boolean formulas, which is based on the previous observations. A propositional formula A is satisfiable when there exists a valuation ρ making it true, i.e. such that ρ A. An efficient way to test whether this is the case or not is the DPLL algorithm, due to Davis, Putnam, Logemann and Loveland [DLL62]. The basic idea here is the one we have already seen in lemma 2.5.6.4: if the formula A is satisfiable by a valuation ρ then, given a variable X occurring in A, we have either ρ(X) = ⊥ or ρ(X) = > and we can test whether A is satisfiable in both cases recursively since this makes the number of variables decrease in the formula (we call this splitting on the variable X): Lemma 2.5.7.1. Given a variable X, a formula A is satisfiable if and only if the formula A[⊥/X] or A[>/X] is satisfiable. Proof. If A is satisfiable, then there is a valuation ρ such that ρ(A) = 1. If ρ(X) = 0 (resp. ρ(X) = 1) then, by lemma 2.5.6.3, ρ(A[⊥/X]) = ρ(A) = 1 (resp. ρ(A[>/X]) = ρ(A) = 1) and therefore A[⊥/X] (resp. A[>/X]) is satisfiable. Conversely, if A[⊥/X] (resp. A[>/X]) is satisfiable by a valuation ρ then, writing ρ0 for the valuation such that ρ0 (X) = 0 (resp. ρ0 (X) = 1) and ρ0 (Y ) = ρ(Y ) for Y 6= X, by lemma 2.5.6.3 we have ρ(A[⊥/X]) = ρ0 (A) = 1 (resp. ρ(A[>/X]) = ρ0 (A) = 1).

CHAPTER 2. PROPOSITIONAL LOGIC

82

In the base case, the formula A has no variable and it thus evaluates to the same value in any environment, and we can easily compute this value: it is satisfiable if and only if this value is true. This directly leads to a very simple implementation of a satisfiability algorithm, see figure 2.7: the function subst computes the substitution of a formula into another one, the function var finds a free variable, and finally the function sat tests the satisfiability of a formula. As is, this algorithm is not very efficient: some subformulas get evaluated many times during the search. It can however be much improved by using formulas in canonical clausal form, as described in proposition 2.5.5.1. First, substitution can be implemented on those as follows: Lemma 2.5.7.2. Given a canonical clausal formula A and a variable X, a canonical clausal formula for A[>/X] (resp. A[⊥/X]) can be obtained from A by – removing all clauses containing X (resp. ¬X), – removing ¬X (resp. X) from all remaining clauses. The computation can be further improved by carefully choosing the variables we are going to split on first. A unitary clause in a clausal formula A is a clause containing exactly one literal L. If L is X (resp. ¬X) then, if we split on X, the branch A[⊥/X] (resp. A[>/X]) will fail. Therefore, Lemma 2.5.7.3. Consider a clausal formula A containing a unitary clause which is a literal X (resp. ¬X). Then the formula A is satisfiable if and only if the formula A[>/X] (resp. A[⊥/X]) is. A literal X (resp. ¬X) is pure in a clausal formula A if ¬X (resp. X) does not occur in any clause of A: the variable X always occurs with the same polarity in the formula. Lemma 2.5.7.4. A clausal formula A containing a pure literal X (resp. ¬X) is satisfiable if and only if the formula A[>/X] (resp. A[⊥/X]) is satisfiable. Another way to state the above lemma is that the clauses containing the pure literal can be removed from the formula without changing its satisfiability. The DPLL algorithm exploits these optimizations in order to test the satisfiability of formula A: 1. it first tries to see if A is obviously satisfiable (if it is >) or unsatisfiable (if it contains the clause ⊥), 2. otherwise it tries to find a unitary clause and apply lemma 2.5.7.3, 3. otherwise it tries to find a pure clause and apply lemma 2.5.7.4, 4. otherwise it splits on an arbitrary variable by lemma 2.5.7.1. For the last step, various heuristics have been proposed for choosing the splitting variable such as MOM (a variable with Maximal number of Occurrences in the clauses Minimum size) or Jeroslow-Wang (a variable with maximum P of −|C| J(X) = where C ranges over clauses containing X and |C| is the C2 number of literals), and so on. A concrete implementation is provided in figure 2.8. The function sub implements substitution as described in lemma 2.5.7.2, the function unit finds a unitary clause (or raises Not_found if there is none), the function pure finds a

CHAPTER 2. PROPOSITIONAL LOGIC

(** Formulas. *) type t = | Var of int | And of t * t | Or of t * t | Not of t | True | False (** let | | | | |

Substitute a variable by a formula in a formula. *) rec subst x c = function Var y -> if x = y then c else Var y And (a, b) -> And (subst x c a, subst x c b) Or (a, b) -> Or (subst x c a, subst x c b) Not a -> Not (subst x c a) True -> True | False -> False

(** Find a free variable in a formula. *) let var a = let exception Found of int in let rec aux = function | Var x -> raise (Found x) | And (a, b) | Or (a, b) -> aux a; aux b | Not a -> aux a | True | False -> () in try aux a; raise Not_found with Found x -> x (** let | | | | |

Evaluate a closed formula. *) rec eval = function Var _ -> assert false And (a, b) -> eval a && eval b Or (a, b) -> eval a || eval b Not a -> not (eval a) True -> true | False -> false

(** Simple-minded satisfiability. *) let rec sat a = try let x = var a in sat (subst x True a) || sat (subst x False a) with Not_found -> eval a Figure 2.7: Naive implementation of the satisfiability algorithm.

83

CHAPTER 2. PROPOSITIONAL LOGIC

type type type type

var = int (** Variable. *) literal = bool * var (** Literal. *) (* false means negated *) clause = literal list (** Clause. *) cnf = clause list (** Clausal formula. *)

(** Substitution a[v/x]. *) let subst (a:cnf) (v:bool) (x:var) : cnf = let a = List.filter (fun c -> not (List.mem (v,x) c)) a in List.map (fun c -> List.filter (fun l -> l (not v, x)) c) a (** let | | |

Find a unitary clause. *) rec unit : cnf -> literal = function [n,x]::a -> n,x _::a -> unit a [] -> raise Not_found

(** Find a pure literal in a clausal formula. *) let pure (a : cnf) : literal = let rec clause vars = function | [] -> vars | (n,x)::c -> try match List.assoc x vars with | Some n' -> if n' = n then clause vars c else let vars = List.filter (fun (y,_) -> y x) vars in clause ((x,None)::vars) c | None -> clause vars c with Not_found -> clause ((x,Some n)::vars) c in let vars = List.fold_left clause [] a in let x, n = List.find (function (x,Some s) -> true | _ -> false) vars in Option.get n, x (** DPLL procedure. *) let rec dpll a = if a = [] then true else if List.mem [] a then false else try let n,x = unit a in dpll (subst a n x) with Not_found -> try let n,x = pure a in dpll (subst a n x) with Not_found -> let x = snd (List.hd (List.hd a)) in dpll (subst a false x) || dpll (subst a true x)

Figure 2.8: Implementation of the DPLL algorithm.

84

CHAPTER 2. PROPOSITIONAL LOGIC

85

pure literal (or raises Not_found) and finally the function dpll implements the above algorithm. The function pure uses an auxiliary list vars of pairs X,b where X is a variable and b is either Some true or Some false if the variable X occurs only positively or negatively, or None if it occurs both positively and negatively. 2.5.8 Resolution. The resolution procedure is a generalization of the previous DPLL algorithm, which was introduced by Davis and Putnam [DP60]. It is not the most efficient algorithm, but one of its main interest is that is generalizes well to first-order, see section 5.4.6. It stems from the following observation. Lemma 2.5.8.1 (Correctness). Suppose given two clauses of the form C ∨ X and ¬X ∨ D, respectively containing a variable X and its negation ¬X. Then the formula C ∨ D is a consequence of them. Proof. Given a valuation ρ such that ρ(C ∨ X) = ρ(¬X ∨ D) = 1, – if ρ(X) = 1 then necessarily ρ(D) = 1 and thus ρ(C ∨ D) = 1, – if ρ(X) = 0 then necessarily ρ(C) = 1 and thus ρ(C ∨ D) = 1. From a logical point of view, this deduction corresponds to the following resolution rule: Γ`C ∨X Γ ` ¬X ∨ D (res) Γ`C ∨D In the following, we implicitly consider formulas up to commutativity of disjunction, i.e. identify the formulas A ∨ B and B ∨ A, so that the above rule also applies to clauses containing X and its negation: Γ ` C1 ∨ X ∨ C2 Γ ` D1 ∨ ¬X ∨ D2 (res) Γ ` C1 ∨ C2 ∨ D1 ∨ D2 Previous lemma can be reformulated in classical logic as follows: Lemma 2.5.8.2. The resolution rule is admissible in classical natural deduction. Proof. We have Γ ` C0 ∨ X Γ ` ¬X ∨ D0 Γ ` C 0, X Γ ` ¬X, D0 (wkR ) 0 0 (wkR ) Γ ` C , X, D Γ ` C 0 , ¬X, D0 (¬E ) Γ ` C 0 , ⊥, D0 (⊥R ) Γ ` C 0 , D0 where the rule

Γ`A∨B Γ ` A, B

is derivable by Γ`A∨B

(ax)

Γ, A ` A, B Γ ` A, B

(otherwise said, the rule (∨I ) is reversible).

Γ, B ` A, B

(ax) (∨E )

CHAPTER 2. PROPOSITIONAL LOGIC

86

Remark 2.5.8.3. If we recall that, in classical logic, implication can be defined as A ⇒ B = ¬A ∨ B, the resolution rule simply corresponds to the transitivity of implication: Γ ` ¬C ⇒ X Γ`X⇒D Γ ` ¬C ⇒ D For simplicity, in the following, a context Γ will be seen as a set of clauses (as opposed to a list of clauses, see section 2.2.10) and will also be interpreted as a clausal form (the conjunction of its clauses, see section 2.5.5). We will always implicitly suppose that it is canonical (see section 2.5.5): a clause cannot contain twice the same literal or a literal and its negation. Previous lemmas entail that, if we can prove the sequent Γ ` ⊥ using axiom and resolution rules only then Γ is not satisfiable: otherwise, ⊥ would also be satisfiable, which it is not by definition. We are going to show in theorem 2.5.8.7 that this observation admits a converse. Resolvent. Given clauses C ∨ X and ¬X ∨ D, the clause C ∨ D does not contain the variable X, which gives us the idea of using resolution to remove the variable X from a set of formulas, by performing all the possible deductions we can. Suppose given a set Γ of clauses and X a variable. We write ΓX = {C | C ∨ X ∈ Γ}

Γ¬X = {D | ¬X ∨ D ∈ Γ}

and Γ0 for the set of clauses in Γ which neither contain X nor ¬X. We supposed that the clauses are in canonical form, so that we have a partition of Γ as Γ = Γ0 ] {C ∨ X | C ∈ ΓX } ] {¬X ∨ D | D ∈ Γ¬X } The resolvent Γ \ X of Γ with respect to X is Γ \ X = Γ0 ∪ {C ∨ D | C ∈ ΓX , D ∈ Γ¬X } Remark 2.5.8.4. As defined above, the resolvent might contain clauses not in canonical form, even if C and D are. In order to keep this invariant, we should remove all clauses of the form C ∨ D such that C contains a literal and D its negation, which we will implicitly do; in clauses, we should also remove duplicate literals. As indicated above, computing the resolvent reduces the number of free variables of Γ: Lemma 2.5.8.5. Given a clausal form Γ and a variable X, we have FV(Γ \ X) = FV(Γ) \ {X} Its main interest lies in the fact that it preserves satisfiability: Lemma 2.5.8.6. Given a clausal form Γ and a variable X, Γ is satisfiable if and only if Γ \ X is satisfiable. Proof. The left-to-right implication is immediate by correctness of the resolution rule (lemma 2.5.8.1). For the right-to-left implication, suppose that Γ \ X is satisfied under a valuation ρ. We are going to show that Γ is satisfied under either ρ0 or ρ1 , where ρi is defined, for i = 0 or i = 1, by ρi (X) = i and ρi (Y ) = ρ(Y ) whenever Y 6= X. We distinguish two cases.

CHAPTER 2. PROPOSITIONAL LOGIC

87

– Suppose that we have ρ(C 0 ) = 1 for every clause C = C 0 ∨ X in ΓX . Then we can take i = 0. Namely, given a clause C ∈ Γ = Γ0 ] ΓX ] Γ¬X : – if C ∈ Γ0 then ρ0 (C) = ρ(C) = 1 because C does not contain the literal X, – if C ∈ ΓX then C = C 0 ∨ X and ρ0 (C) = 1 because, by hypothesis, ρ0 (C 0 ) = ρ(C 0 ) = 1, – if C ∈ Γ¬X then C = C 0 ∨ ¬X and ρ0 (C) = 1 because ρ0 (¬X) = 1 since ρ0 (X) = 0 by definition of ρ0 . – Otherwise, there exists a clause C = C 0 ∨ X ∈ ΓX such that ρ(C 0 ) = 0. Then we can take i = 1. Namely, given a clause D ∈ Γ = Γ0 ] ΓX ] Γ¬X : – if D ∈ Γ0 then ρ1 (D) = ρ(D) = 1 because D does not contain the literal X, – if D ∈ ΓX then D = D0 ∨ X and ρ1 (D) = 1 because ρ1 (X) = 1, – if D ∈ Γ¬X then D = D0 ∨¬X and ρ(C 0 ∨D0 ) = 1 by hypothesis, thus ρ1 (D0 ) = ρ(D0 ) = 1 because ρ(C 0 ) = 0, thus ρ1 (D) = ρ1 (D0 ∨X) = 1. Previous lemma implies that resolution is refutation complete in the sense that is can always be used to show that a set of clauses cannot be satisfied (by whichever valuation): Theorem 2.5.8.7 (Refutation completeness). A set Γ of clauses is unsatisfiable if and only if Γ ` ⊥ can be proved using the axiom and resolution rules only. Proof. Writing FV(Γ) = {X1 , . . . , Xn } for the free variables of Γ, define the sequence of sets of clauses Γ06i6n by Γ0 = Γ and Γi+1 = Γi \ Xi : – the clauses of Γ0 can be deduced from those of Γ using the axiom rule, – the clauses of Γi+1 can be deduced from those in Γi using the resolution rule. Lemma 2.5.8.6 ensures that Γi is satisfiable if and only if Γi+1 is satisfiable, and thus, by induction, Γ0 is satisfiable if and only if Γn is satisfiable. Moreover, by lemma 2.5.8.5, we have FV(Γn ) = ∅, thus Γn = ∅ or Γn = {⊥}, and therefore Γn is unsatisfiable if and only if Γn = {⊥}. Finally, Γ is unsatisfiable if and only if Γn = {⊥}, i.e. ⊥ can be deduced from Γ using axiom and resolution rules. Completeness. Resolution is not complete: given a context Γ, there are clauses that can be deduced which cannot using resolution only. For instance, from Γ = X, we cannot deduce X ∨ Y using resolution only. However, resolution can be used in order to decide whether a formula A is a consequence of a context Γ in the following way: Lemma 2.5.8.8. A formula A is a consequence of a context Γ if and only if Γ ∪ {¬A} is unsatisfiable. Proof. Given a clausal form Γ, we have Γ ⇒ A equivalent to ¬¬(Γ ⇒ A) equivalent to ¬(Γ ∧ ¬A), i.e. Γ ∪ {¬A} not satisfiable.

CHAPTER 2. PROPOSITIONAL LOGIC

88

This lemma is the usual way we use resolution. Example 2.5.8.9. We can show that given X⇒Y

Y ⇒Z

X

we can deduce Z. Putting those in normal form and using previous lemma, this amounts to show that Γ consisting of ¬X ∨ Y

¬Y ∨ Z

¬Z

X

is not satisfiable. Indeed, we have (ax)

(ax)

Γ ` ¬X ∨ Y Γ`Y ∨Z (res) Γ ` ¬X ∨ Z Γ`Z Γ`⊥

Γ`X

(ax) (res)

Γ ` ¬Z

(ax) (res)

Implementation. We implement clausal forms using lists as in section 2.5.7. Using this representation, the resolvent of a clausal form Γ (noted g) with respect to a variable X (noted x) can be computed using the following function: let resolve x g = let gx = List.filter (List.mem (true ,x)) g in let gx = List.map (List.remove (true ,x)) gx in let gx' = List.filter (List.mem (false,x)) g in let gx' = List.map (List.remove (false,x)) gx' in let g' = List.filter (List.for_all (fun (_,y) -> y x)) g in let disjunction c d = let union c d = List.fold_left (fun d l -> if List.mem l d then d else l::d) d c in if c = [] && d = [] then raise False else if List.exists (fun (n,x) -> List.mem (not n,x) d) c then None else Some (union c d) in g'@(List.filter_map_pairs disjunction gx gx') Here, g’ is Γ0 and gx is ΓX and gx’ is Γ¬X and we return the resolvent computed following the definition Γ \ X = Γ0 ∪ {C ∨ D | C ∈ ΓX , D ∈ Γ¬X } The function List.filter_map_pairs which is of type ('a -> 'b -> 'c option) -> 'a list -> 'b list -> 'c list takes a function and two lists as arguments, maps the functions to every pair of elements of one list and the other, and returns the list of images which are

CHAPTER 2. PROPOSITIONAL LOGIC

89

not None. It is used to compute the clauses C ∨ D in the definition of Γ \ X. The disjunction is computed by disjunction with some subtleties. Firstly, as noted in remark 2.5.8.4, we should be careful in order to produce formulas in canonical form: – a disjunction C ∨ D containing both a literal and its negation should not be added, – in a disjunction C ∨ D, if a literal occurs twice (once in C and once D), we should only keep one instance. Secondly, since we want to detect as early as possible when ⊥ can be deduced, we raise an exception False when we find one. We can then see whether a clausal form is inconsistent by iteratively eliminating free variables using resolution. We use the auxiliary function free_var in order to find a free variable (it raises Not_found if there is none), its implementation being left to the reader. By theorem 2.5.8.7, if it is inconsistent then ⊥ will be produced during the process (which is detected by the exception False) otherwise the free variables will be exhausted (which is detected by the exception Not_found). This can thus be computed with the following function: let rec inconsistent g = try inconsistent (resolve (free_var g) g) with | False -> true | Not_found -> false We can then decide whether a clause is a consequence of set of other by applying lemma 2.5.8.8: let prove g c = inconsistent ((neg c)::g) As an application, we can compute example 2.5.8.9 with let () = let g = [ [false,0;true,1]; [false,1;true,2]; [true,0] ] in let c = [true,2] in assert (prove g c) 2.5.9 Double-negation translation. We have seen in section 2.3.5 that some formulas are not provable in intuitionistic logic, whereas those are valid in classical logic, a typical example being excluded middle ¬A ∨ A. But can we really prove less in intuitionistic logic than in classical logic? A starting observation is that, even though ¬A ∨ A is not provable, its double negation ¬¬(¬A ∨ A)

CHAPTER 2. PROPOSITIONAL LOGIC

90

becomes provable: (ax)

¬(¬A ∨ A), A ` A (∨rI ) ¬(¬A ∨ A), A ` ¬(¬A ∨ A) ¬(¬A ∨ A), A ` ¬A ∨ A (¬E ) ¬(¬A ∨ A), A ` ⊥ (¬I ) ¬(¬A ∨ A) ` ¬A (∨lI ) ¬(¬A ∨ A) ` ¬A ∨ A (¬E ) ¬(¬A ∨ A) ` ⊥ (¬I ) ` ¬¬(¬A ∨ A) (ax)

¬(¬A ∨ A) ` ¬(¬A ∨ A)

(ax)

One of the main ingredients behind this proof is that having ¬(¬A ∨ A) as hypothesis in a context Γ allows to discard the current proof goal B and go back to proving ¬A ∨ A:

Γ ` ¬(¬A ∨ A)

(ax)

Γ`⊥ Γ`B

.. . Γ ` ¬A ∨ A

(¬E ) (⊥E )

What do we get more than directly proving ¬A ∨ A? The fact that, during the proof, we can reset our proof goal to ¬A ∨ A! We thus start by proving ¬A ∨ A by proving ¬A, which requires proving ⊥ from A. At this point, we change our mind and start again the proof of ¬A ∨ A, but this time we prove A, which we can because we gained this information from the previously “aborted” proof. A more detailed explanation of this kind of behavior was already developed in section 2.5.2. This actually generalizes to any formula, by a result due to Glivenko [Gli29]. Given a context Γ, we write ¬¬Γ for the context obtained from Γ by double-negating every formula. Theorem 2.5.9.1 (Glivenko’s theorem). Given a context Γ and propositional formula A, the sequent Γ ` A is provable in classical logic if and only if the sequent ¬¬Γ ` ¬¬A is provable in intuitionistic logic. This result allows to relate the consistency of classical and intuitionistic logic in the following way. Theorem 2.5.9.2. Intuitionistic logic is consistent if and only if classical logic is also consistent. Proof. Suppose that intuitionistic logic is inconsistent: there is an intuitionistic proof of ⊥. This proof is also a valid classical proof and thus classical logic is inconsistent. Conversely, suppose that classical logic is inconsistent. There is a classical proof of ⊥ and thus, by theorem 2.5.9.1, an intuitionistic proof π of ¬¬⊥. However, the implication ¬¬⊥ ⇒ ⊥ holds intuitionistically: (ax)

¬¬⊥, ⊥ ` ⊥ (ax) (¬I ) ¬¬⊥ ` ¬¬⊥ ¬¬⊥ ` ¬⊥ (¬E ) ¬¬⊥ ` ⊥ (⇒I ) ` ¬¬⊥ ⇒ ⊥

CHAPTER 2. PROPOSITIONAL LOGIC

91

We thus have an intuitionistic proof of ⊥: .. . ` ¬¬⊥ ⇒ ⊥ `⊥

π ` ¬¬⊥

(⇒E )

and intuitionistic logic is inconsistent. Remark 2.5.9.3. The theorem 2.5.9.1 does not generalize as is to first-order logic, but some other translations of classical formulas into intuitionistic logic do. The most brutal one is due to Kolmogorov and consists in adding ¬¬ in front of every subformula. Interestingly, it corresponds to the call-by-name continuation-passing style translation of functional programming languages. A more economic translation is due to Gödel, transforming a formula A into the formula A∗ defined by induction by X∗ = X (A ∧ B)∗ = A∗ ∧ B ∗ >∗ = > (A ⇒ B)∗ = A∗ ⇒ B ∗

(A ∨ B)∗ = ¬(¬A∗ ∧ ¬B ∗ ) ⊥∗ = ⊥ (¬A)∗ = ¬A∗

Finally, it could be wondered if, by adding four negations to a formula we could gain even more proof power, but this is not the case: the process stabilizes after the first iteration. Lemma 2.5.9.4. For every natural number n > 0, we have ¬n+2 A ⇔ ¬n A. Proof. The implication A ⇒ ¬¬A is intuitionistically provable, as already shown in example 2.2.5.3, as well as the implication ¬¬¬A ⇒ ¬A: (ax)

¬¬¬A, A ` ¬¬¬A

(ax)

(ax)

¬¬¬A, A, ¬A ` ¬A ¬¬¬A, A, ¬A ` A (¬E ) ¬¬¬A, A, ¬A ` ⊥ (¬I ) ¬¬¬A, A ` ¬¬A (¬E ) ¬¬¬A, A ` ⊥ (¬I ) ¬¬¬A ` ¬A (⇒I ) ` ¬¬¬A ⇒ ¬A

We conclude by induction on n. In particular, ¬¬¬¬A ⇔ ¬¬A, so that we gain nothing by performing twice the double-negation translation. 2.5.10 Intermediate logics. Once again classical logic is obtained by adding the excluded middle ¬A∨A (or any of the equivalent axioms, see theorem 2.5.1.1) to intuitionistic logic so that new formulas are provable. A natural question is: are there intermediate logics between intuitionistic and classical? This means: can we add axioms to intuitionistic logic so that we get strictly more than intuitionistic logic, but strictly less than classical logic? In more details: are there families of formulas, which are classically provable but not intuitionistically,

CHAPTER 2. PROPOSITIONAL LOGIC

92

such that, by adding those as axioms to intuitionistic logic we obtain a logic in which some classical formulas are not provable? The answer is yes. A typical such family of axioms is the weak excluded middle: ¬¬A ∨ ¬A Namely, the formula ¬¬X∨¬X is not provable intuitionistic logic (see lemma 2.3.5.3), so that assuming the weak excluded middle (for every formula A) allows proving new formulas. However, the formula ¬X ∨ X does not follow from the weak excluded middle (example 2.8.1.4). There are many other possible families of axioms giving rise to intermediate logics such as – linearity (or Gödel-Dummett) axiom [God32, Dum59]: (A ⇒ B) ∨ (B ⇒ A) – Kreisel-Putnam axiom [KP57]: (¬A ⇒ (B ∨ C)) ⇒ ((¬A ⇒ B) ∨ (¬A ⇒ C)) – Scott’s axiom [KP57]: ((¬¬A ⇒ A) ⇒ (A ∨ ¬A)) ⇒ (¬¬A ∨ ¬A) – Smetanich’s axiom [WZ07]: (¬B ⇒ A) ⇒ (((A ⇒ B) ⇒ A) ⇒ A) – and many more [DMJ16]. Exercise 2.5.10.1. Show that the above linearity principle (A ⇒ B) ∨ (B ⇒ A) is equivalent to the following global choice principle for disjunctions (A ⇒ B ∨ C) ⇒ (A ⇒ B) ∨ (A ⇒ C)

2.6 Sequent calculus Natural deduction is “natural” in order to make a precise correspondence between logic and computation, see chapter 4. However, it has some flaws. On an aesthetical point of view, the rules for ∧ and ∨ are not entirely dual, contrarily to what one would expect: if they were the same, we could think of reducing the work during proofs or implementations by handling them in the same way. More annoyingly, proof search is quite difficult. Namely, suppose that we are trying to prove A∨B `B∨A

CHAPTER 2. PROPOSITIONAL LOGIC

93

The proof cannot begin with an introduction rule because we have no hope of filling the dots: .. . A∨B `B (∨l ) A∨B `B∨A I

.. . A∨B `A (∨r ) A∨B `B∨A I

This means that we have to use another rule such as (∨E ) Γ`A∨B

Γ, A ` C Γ`C

Γ, B ` C

(∨E )

which requires us to come up with a formula A∨B which is not directly indicated in the conclusion Γ ` C and it is not clear how to automatically generate such formulas. Starting in this way, the proof can be ended as in example 2.2.5.2. In order to overcome this problem, Gentzen has invented sequent calculus which is another presentation of logic. In natural deduction, all rules operate on the formula on the right of ` and there are introduction and elimination rules. In sequent calculus, there are only introduction rules but those can operate either on formulas on the left or on the right of `. This results in a highly symmetrical calculus. 2.6.1 Sequents. In sequent calculus, sequents are of the form Γ`∆ where Γ and ∆ are contexts: the intuition is that we have the conjunction of formulas in Γ as hypothesis, from which we can deduce the disjunction of formulas in ∆. 2.6.2 Rules. In all the systems we consider, unless otherwise stated, we always suppose that we can permute, duplicate and erase formulas in context, i.e. that the structural rules of figure 2.9 are always present. The additional rules for sequent calculus are shown in figure 2.10 and the resulting system is called LK. In sequent calculus, as opposed to natural deduction, the symmetry between disjunction and conjunction has been restored: excepting for axiom and cut, all rules come in a left and right flavor. Although the presentation is quite different, the provability power of this system is the same as the one for classical natural deduction presented in section 2.5: Theorem 2.6.2.1. A sequent Γ ` ∆ is provable in NK (figure 2.5) if and only if it is provable in LK (figure 2.10). Proof. The idea is that, by induction, we can translate a proof in NK into a proof in LK, by induction, and back. The introduction rules in NK correspond to right rules in LK, the axiom rules match in both systems, the cut rule is admissible in NK (the proof is similar to the one in proposition 2.3.2.1 for NJ), as well as various structural rules (shown as in section 2.2.7), so that we only have to show the that the elimination rules of NK are admissible in LK and the left rules of LK are admissible in NK. We only handle the case of conjunction here:

CHAPTER 2. PROPOSITIONAL LOGIC

94

Γ, B, A, Γ0 ` ∆ (xchL ) Γ, A, B, Γ0 ` ∆

Γ ` ∆, B, A, ∆0 (xchR ) Γ ` ∆, A, B, ∆

Γ, A, A, Γ0 ` ∆ (contrL ) Γ, A, Γ0 ` ∆

Γ ` ∆, A, A, ∆0 (contrR ) Γ ` ∆, A, ∆0

Γ, Γ0 ` ∆ (wkL ) Γ, A, Γ0 ` ∆

Γ ` ∆, ∆0 (wkR ) Γ ` ∆, A, ∆0

Γ, >, Γ0 ` ∆ (>L ) Γ, Γ0 ` ∆

Γ ` ∆, ⊥, ∆0 (⊥R ) Γ ` ∆, ∆0

Figure 2.9: Structural rules for sequent calculus

Γ, A ` A, ∆

(ax)

Γ, A, B ` ∆ (∧L ) Γ, A ∧ B ` ∆

Γ ` A, ∆ Γ, A ` ∆ (cut) Γ`∆ Γ ` A, ∆ Γ ` B, ∆ (∧R ) Γ ` A ∧ B, ∆ Γ ` >, ∆

Γ, A ` ∆ Γ, B ` ∆ (∨L ) Γ, A ∨ B ` ∆ Γ, ⊥ ` ∆

(>R )

Γ ` A, B, ∆ (∨R ) Γ ` A ∨ B, ∆

(⊥L )

Γ ` A, ∆ Γ, B ` ∆ (⇒L ) Γ, A ⇒ B ` ∆

Γ, A ` B, ∆ (⇒R ) Γ ` A ⇒ B, ∆

Γ ` A, ∆ (¬L ) Γ, ¬A ` ∆

Γ, A ` ∆ (¬R ) Γ ` ¬A, ∆

Figure 2.10: LK: rules of classical sequent calculus.

CHAPTER 2. PROPOSITIONAL LOGIC

95

– the rule (∧lE ) is admissible in LK: .. . (ax) Γ ` A ∧ B, ∆ Γ, A, B ` A, ∆ (wkR ) (∧L ) Γ ` A ∧ B, A, ∆ Γ, A ∧ B ` A, ∆ (cut) Γ ` A, ∆ and admissibility of (∧rE ) is similar, – the rule (∧L ) is admissible in NK: .. . Γ, A ∧ B, A ` A ∧ B, ∆ Γ, A, B `∆ (∧rE ) (wkR ) Γ, A ∧ B, A ` B, ∆ Γ, A ∧ B, A, B ` ∆ (cut) Γ, A ∧ B, A ` ∆ (cut) Γ, A ∧ B ` ∆ (ax)

(ax)

Γ, A ∧ B ` A ∧ B, ∆ (∧lE ) Γ, A ∧ B ` A, ∆

Other cases are similar. Remark 2.6.2.2. As noted in [Gir89, chapter 5], the correspondence between proofs in NK and LK is not bijective. For instance, the two proofs in LK (ax)

(ax)

A, B ` A A, B ` B (∧R ) A, B ` A ∧ B (wkR ) A, B, B 0 ` A ∧ B (wkR ) A, A0 , B, B 0 ` A ∧ B (∧L ) 0 0 A, A , B ∧ B ` A ∧ B (∧L ) 0 0 A ∧ A ,B ∧ B ` A ∧ B

(ax)

(ax)

A, B ` A A, B ` B (∧R ) A, B ` A ∧ B (wkR ) A, B, B 0 ` A ∧ B (wkR ) A, A0 , B, B 0 ` A ∧ B (∧L ) 0 0 A ∧ A , B, B ` A ∧ B (∧L ) 0 0 A ∧ A ,B ∧ B ` A ∧ B

get mapped to the same proof in NK. Mutiplicative presentation. An alternative presentation (called multiplicative presentation) of the rules is given in figure 2.11: instead of supposing that we have the same context, we can “merge” the contexts of the premises in the conclusion.

Single-sided presentation. By de Morgan laws, in classical logic, we can suppose that only the variables are negated, and negated at most once, see section 2.5.5. For instance, ¬(X ∨ ¬Y ) is equivalent to ¬X ∧ Y which satisfies this property. Given a formula A of this form, we write A∗ for a formula of this form equivalent to ¬A: it can be defined by induction by X ∗ = ¬X (A ∧ B)∗ = A∗ ∨ B ∗ ∗

> =⊥

(¬X)∗ = X (A ∨ B)∗ = A∗ ∧ B ∗ ⊥∗ = >

We omit implication here, since it can be defined as A ⇒ B = A∗ ∨ B. Now, it can be observed that proving a sequent of the form Γ, A ` ∆ is essentially the same as proving the sequent Γ ` A∗ , ∆, excepting that all the rules get replaced by their opposite:

CHAPTER 2. PROPOSITIONAL LOGIC

A`A

(ax)

Γ, A, B ` ∆ (∧L ) Γ, A ∧ B ` ∆

96

Γ ` A, ∆ Γ 0 , A ` ∆0 (cut) 0 Γ, Γ ` ∆, ∆0 Γ ` A, ∆ Γ0 ` B, ∆0 (∧R ) Γ, Γ0 ` A ∧ B, ∆, ∆0 Γ ` >, ∆

Γ, A ` ∆ Γ 0 , B ` ∆0 (∨L ) Γ, Γ0 , A ∨ B ` ∆, ∆0 Γ, ⊥ ` ∆

(>R )

Γ ` A, B, ∆ (∨R ) Γ ` A ∨ B, ∆

(⊥L )

Γ ` A, ∆ Γ0 , B ` ∆0 (⇒L ) Γ, Γ0 , A ⇒ B ` ∆, ∆0

Γ, A ` B, ∆ (⇒R ) Γ ` A ⇒ B, ∆

Γ ` A, ∆ (¬L ) Γ, ¬A ` ∆

Γ, A ` ∆ (¬R ) Γ ` ¬A, ∆

Figure 2.11: LK: rules of classical sequent calculus (multiplicative presentation). Lemma 2.6.2.3. A sequent Γ, A ` ∆ is provable in LK if and only if Γ ` A∗ , ∆ is. For instance the proof on the left below corresponds to the proof on the right: (ax)

X ` X, ⊥ (¬L ) X, ¬X ` ⊥ (∧L ) X ∧ ¬X ` ⊥

(ax)

X ` X, ⊥ (¬R ) ` ¬X, X, ⊥ (∨R ) ` ¬X ∨ X, ⊥

Because of this, we can restrict to proving sequents of the form ` ∆, which are called single-sided. All the rules preserve single-sidedness excepting the axiom rule, which is easily modified in order to satisfy this property. With some extra care, we can even come up with a presentation which does not require any structural rule (those are admissible): the resulting presentation of the calculus is given in figure 2.12. If we do not want to consider only formulas where only variables can be negated, then the de Morgan laws can be added as the following explicit rules: ` ¬A ∨ ¬B, ∆ ` ¬(A ∧ B), ∆

` ¬A ∧ ¬B, ∆ ` ¬(A ∨ B), ∆

` ⊥, ∆ ` ¬>, ∆

` >, ∆ ` ¬⊥, ∆

` A, ∆ ` ¬¬A, ∆

2.6.3 Intuitionistic rules. In order to obtain a sequent calculus adapted to intuitionistic logic, one should restrict the two-sided proof system to sequents of the form Γ ` A, i.e. those where the context on the right of ` contains exactly one formula. We also have to take variants of rules such as (∨R ), which would

CHAPTER 2. PROPOSITIONAL LOGIC

` ∆, A∗ , ∆0 , A, ∆00

(ax)

97

` ∆, A, ∆0 , A∗ , ∆00

(ax)

` ∆, A, ∆0 ` ∆, A∗ , ∆0 (cut) ` ∆, ∆0 ` ∆, A, ∆0 ` ∆, B, ∆0 (∧) ` ∆, A ∧ B, ∆0

` ∆, >, ∆0

(>)

` ∆, A, B, ∆0 (∨) ` ∆, A ∨ B, ∆0 ` ∆, ∆0 (⊥) ` ∆, ⊥, ∆0

Figure 2.12: LK: single-sided presentation. otherwise not maintain the invariant of having one formula on the right. With little more care, one can write rules which do not require adding structural rules (they are admissible): the resulting calculus is presented in figure 2.13. Remark that, in order for contraction to be admissible, one has to keep A ⇒ B in the context of the left premise. Similarly to theorem 2.6.2.1, one shows: Theorem 2.6.3.1. A sequent Γ ` A is provable in NJ if and only if it is provable in LJ. 2.6.4 Cut elimination. By a similar argument as in section 2.3.3, it can be shown: Theorem 2.6.4.1. A sequent Γ ` ∆ (resp. Γ ` A) is provable in LK (resp. LJ) if and only if it admits a proof without using the (cut) rule. 2.6.5 Proof search. From a proof-search point of view, sequent calculus is much more well-behaved since, excepting from the cut rule, we do not have to come up with new formulas when searching for proofs: Proposition 2.6.5.1. LK has the subformula property: apart from the (cut) rule, all the formulas occurring in the premise of a rule are subformulas of the formulas occurring in the conclusion. Since, by theorem 2.6.4.1, we can look for proofs without cuts, this means that we never have to come up with a new formula during proof search! Moreover, there is no harm in applying a rule whenever it applies thanks to the following property: Proposition 2.6.5.2. In LK, all the rules are reversible. Implementation. We now implement proof search, which is most simple to do using the single-sided presentation, see figure 2.12. We describe formulas as type t = | Var of bool * string (* false means negated variable *) | Imp of t * t | And of t * t

CHAPTER 2. PROPOSITIONAL LOGIC

0

Γ, A, Γ ` A

98

Γ`A

(ax)

Γ, A, B, Γ0 ` C (∧L ) Γ, A ∧ B, Γ0 ` C

Γ, A ` B (cut) Γ`B

Γ`A Γ`B (∧R ) Γ`A∧B Γ`>

Γ, A, Γ0 ` C Γ, B, Γ0 ` C (∨L ) Γ, A ∨ B, Γ0 ` C Γ, ⊥, Γ0 ` A

(>R )

Γ`A (∨lL ) Γ`A∨B

Γ`B (∨rL ) Γ`A∨B

(⊥L )

Γ, A ⇒ B, Γ0 ` A Γ, B, Γ0 ` C (⇒L ) 0 Γ, A ⇒ B, Γ ` C

Γ, A ` B (⇒R ) Γ`A⇒B

Γ, ¬A, Γ0 ` A (¬L ) Γ, ¬A, Γ0 ` ⊥

Γ, A ` ⊥ (¬R ) Γ ` ¬A

Figure 2.13: Rules of intuitionistic sequent calculus (LJ). | Or of t * t | True | False Using this representation, the negation of a formula can be computed with the function let | | | | | |

rec neg Var (n, Imp (a, And (a, Or (a, True False

= function x) -> Var (not n, b) -> And (a, neg b) -> Or (neg a, b) -> And (neg a, -> False -> True

x) b) neg b) neg b)

Finally, the following function implements proof search in LK: let rec prove venv = function | [] -> false | a::env -> match a with | Var (n, x) -> List.mem (Var (not n, x)) venv || prove ((Var (n, x))::venv) env | Imp (a, b) -> prove venv ((neg a)::b::env) | And (a, b) -> prove venv (a::env) && prove venv (b::env) | Or (a, b) -> prove venv (a::b::env)

CHAPTER 2. PROPOSITIONAL LOGIC | True | False

99

-> true -> prove venv env

Since we are considering single sided sequents here, those can be encoded as lists of terms. The above function takes as argument a sequent Γ0 = venv and the sequent Γ to be proved. It picks a formula A in Γ and applies the rules of figure 2.12 on it, until A is split into a list of literals: once this is the case, those literals are put into the sequent Γ0 (of already handled formulas). Initially, the context Γ0 is empty, and we usually want to prove one formula A, so that we can define let prove a = prove [] [a] Proof search in intuitionistic logic. Proof search can be performed in LJ, but the situation is more subtle. First remark that, similarly to the situation in LK (proposition 2.6.5.1), we have Proposition 2.6.5.3. LJ has the subformula property. As an immediate consequence, we deduce Theorem 2.6.5.4. We can decide whether a sequent Γ ` A is provable in LJ or not. Proof. There is only a finite number of subformulas of Γ ` A. We can restrict to sequents where a formula occurs at most 3 times in the context [Gir11, section 4.2.2] and therefore there is a finite number of possible sequents formed with those subformulas. By testing all the possible rules, we can determine which of those are provable, and thus determine whether the initial sequent is provable. The previous theorem is constructive, but the resulting algorithm is quite inefficient. The problem of finding proofs is more delicate than for LK because not all the rules are reversible: (∨lL ), (∨rL ) and (⇒L ) are not reversible. The rules (∨lL ), (∨rL ) are easy to handle when performing proof search: when trying to prove a formula A ∨ B, we either try to prove A or to prove B. The rule (⇒L ) Γ, A ⇒ B, Γ0 ` A Γ, B, Γ0 ` C (⇒L ) 0 Γ, A ⇒ B, Γ ` C is more difficult to handle. If we apply it naively, it can loop for the same reasons as in section 2.4.2: .. .. . . (⇒L ) Γ, A ⇒ B ` A Γ, B ` B (⇒L ) Γ, A ⇒ B ` A Γ, A ⇒ B ` A

.. . Γ, B ` B

(⇒L )

Although we can detect loops by looking at whether we encounter the same sequent twice during the proof search, this is quite impractical. Also, since the rule (⇒L ) is not reversible, the order in which we apply it during proof search

CHAPTER 2. PROPOSITIONAL LOGIC

100

Γ, A, Γ0 ` B X ∈ Γ, Γ0 (⇒X ) Γ, X ⇒ A, Γ0 ` B Γ, B ⇒ C, Γ0 ` A ⇒ B Γ, C ` D (⇒⇒ ) 0 Γ, (A ⇒ B) ⇒ C, Γ ` D Γ, A ⇒ (B ⇒ C), Γ0 ` D (⇒∧ ) Γ, (A ∧ B) ⇒ C, Γ0 ` D

Γ, A ⇒ C, B ⇒ C, Γ0 ` D (⇒∨ ) Γ, (A ∨ B) ⇒ C, Γ0 ` D

Γ, A, Γ0 ` B (⇒> ) Γ, > ⇒ A, Γ0 ` B

Γ, Γ0 ` B (⇒⊥ ) Γ, ⊥ ⇒ A, Γ0 ` B

Figure 2.14: Left implication rules in LJT. is relevant, and we would like to minimize the number of times we have to backtrack. The logic LJT was introduced by Dyckoff in order to overcome this problem [Dyc92]. It is obtained from LJ by replacing the (⇒L ) rule with the six rules of figure 2.14, which allow proving sequents of the form Γ, A ⇒ B, Γ0 ` C depending on the form of A. Proposition 2.6.5.5. A sequent is provable in LJ if and only if it is provable in LJT. The main interest of this variant is that proof search is always terminating (thus the T in LJT). Moreover, the rules (⇒∧ ), (⇒∨ ), (⇒> ) and (⇒⊥ ) are reversible and can thus always be applied during proof search. Many variants of this idea have been explored, such as the SLJ calculus [GLW99]. A proof search procedure based on this sequent calculus can be implemented as follows. We describe terms as usual as type t = | Var of | Imp of | And of | Or of | True |

string t * t t * t t * t False

The procedure which determines whether a formula is provable is then shown in figure 2.15. This procedure takes as argument two contexts Γ0 and Γ (respectively called env’ and env) and a formula A. Initially, the context Γ0 is empty and will be used to store the formulas of Γ which have already been “processed”. The procedure first applies all the reversible right rules, then all the reversible left rules; a formula of Γ which does not give rise to a reversible left rule is put in Γ0 . Once this is done, the procedure tries to apply the axiom rule, handles disjunctions by trying to apply either (∨lL ) or (∨rL ), and finally successfully tries

CHAPTER 2. PROPOSITIONAL LOGIC

let rec prove env' env a = match a with | True -> true | And (a, b) -> prove env' env a && prove env' env b | Imp (a, b) -> prove env' (a::env) b | _ -> match env with | b::env -> (match b with | And (b, c) -> prove env' (b::c::env) a | Or (b, c) -> prove env' (b::env) a && prove env' (c::env) a | True -> prove env' env a | False -> true | Imp (And (b, c), d) -> prove env' ((Imp (b, Imp (c,d)))::env) a | Imp (Or (b, c), d) -> prove env' ((Imp (b,d))::(Imp (c,d))::env) a | Imp (True , b) -> prove env' (b::env) a | Imp (False, b) -> prove env' env a | Var _ | Imp (Var _, _) | Imp (Imp (_,_),_) -> prove (b::env') env a ) | [] -> match a with | Var _ when List.mem a env' -> true | Or (a, b) -> prove env' env a || prove env' env b | a -> List.exists (fun (b, env') -> match b with | Imp (Var x, b) when List.mem (Var x) env' -> prove env' [b] a | Imp (Imp (b, c), d) -> prove env' [Imp (c, d)] (Imp (b, c)) && prove env' [d] a | _ -> false ) (context_formulas env') let prove a = prove [] [] a

Figure 2.15: Proof search in LJT

101

CHAPTER 2. PROPOSITIONAL LOGIC

102

all the possible applications of the non-reversible rules (⇒X ) and (⇒⇒ ). Here, the function context_formulas returns, given a context Γ the list of all the pairs consisting of a formula A and a context Γ0 , Γ00 such that Γ = Γ0 , A, Γ00 , i.e. the context Γ where some formula A has been removed.

2.7 Hilbert calculus The Hilbert calculus is another formalism, due to Hilbert [Hil22], which makes opposite “design choices” than previous formalisms (natural deduction and sequent calculus): it has lots of axioms and very few logical rules. 2.7.1 Proofs. In this formalism, sequents are of the form Γ ` A, with Γ a context and A a formula, and are deduced according to the two following rules only: 0

Γ, A, Γ ` A

Γ`A⇒B Γ`A (⇒E ) Γ`B

(ax)

respectively called axiom and modus ponens. Of course, there is very little that we can deduce with these two rules only. The other necessary logical principles are added in the form of axiom schemes, which can be assumed at any time during the proofs. In the case of the implicational fragment (implication is the only connective from which are built the formulas), those are (K) A ⇒ B ⇒ A, (S) (A ⇒ B ⇒ C) ⇒ (A ⇒ B) ⇒ A ⇒ C. By “axiom schemes”, we mean that the above formulas can be assumed, for whichever given formulas A, B and C. In other words, this amounts to adding the rules Γ`A⇒B⇒A

(K)

Γ ` (A ⇒ B ⇒ C) ⇒ (A ⇒ B) ⇒ A ⇒ C

(S)

A sequent is provable when it is the conclusion of a proof built from the above rules, and a formula A is provable when the sequent ` A is provable. Example 2.7.1.1. For any formula A, the formula A ⇒ A is provable: (S)

(K)

` (A ⇒ (B ⇒ A) ⇒ A) ⇒ (A ⇒ B ⇒ A) ⇒ A ⇒ A ` A ⇒ (B ⇒ A) ⇒ A (⇒E ) ` (A ⇒ B ⇒ A) ⇒ A ⇒ A `A⇒A

`A⇒B⇒A

(K) (⇒E )

Note the complexity compared to NJ or LJ. None of the rules modify the context Γ, so that people generally omit writing it. Also, traditionally, instead of using proof trees, a proof of A in the context Γ is rather formalized as a finite sequence of formulas A1 , . . . , An , with An = A such that either – Ai belongs to Γ, or – Ai is an instance of an axiom, or

CHAPTER 2. PROPOSITIONAL LOGIC

103

– there are indices j, k < i such that Ak = Aj ⇒ Ai , i.e. Ai can be deduced by Γ ` Aj ⇒ Ai Γ ` Aj (⇒E ) Γ ` Ai This corresponds to describing the proof tree by some traversal of it. Example 2.7.1.2. The proof example 2.7.1.1 is generally noted as follows: 1. (A ⇒ (B ⇒ A) ⇒ A) ⇒ (A ⇒ B ⇒ A) ⇒ A ⇒ A by (S) 2. A ⇒ (B ⇒ A) ⇒ A by (K) 3. (A ⇒ B ⇒ A) ⇒ A ⇒ A by modus ponens on 1. and 2. 4. A ⇒ B ⇒ A by (K) 5. A ⇒ A by modus ponens on 3. and 4. 2.7.2 Other connectives. In the case where other connectives than implication are considered, appropriate axioms should be added: conjunction:

A∧B ⇒A

A⇒B ⇒A∧B

A∧B ⇒B A⇒>

truth: disjunction:

A ∨ B ⇒ (A ⇒ C) ⇒ (B ⇒ C) ⇒ C

A⇒A∨B B ⇒A∨B

falsity: negation:

⊥⇒A ¬A ⇒ A ⇒ ⊥

A ⇒ ⊥ ⇒ ¬A

It can be observed that the axioms are somehow in correspondence with elimination and introduction rules in natural deduction (respectively left and right column above). The classical variants of the system can be obtained by further adding one of the axioms of theorem 2.5.1.1. 2.7.3 Relationship with natural deduction. In order to show that proofs in Hilbert calculus correspond to proofs in natural deduction, we first need to study some of its properties. The usual structural rules are admissible in this system: Proposition 2.7.3.1. The rules of exchange, contraction, truth strengthening and weakening are admissible in Hilbert calculus: Γ, A, B, Γ0 ` C Γ, B, A, Γ0 ` C

Γ, A, A ` C Γ, A ` C

Γ, > ` A Γ`A

Proof. By induction on the proof of the premise.

Γ`C Γ, A ` C

CHAPTER 2. PROPOSITIONAL LOGIC

104

The introduction rule for implication is also admissible. This is sometimes called the deduction theorem and is due to Herbrand. Proposition 2.7.3.2. The introduction rule for implication is admissible: Γ, A, Γ0 ` B (⇒I ) Γ, Γ0 ` A ⇒ B Proof. By induction on the proof of Γ, A, Γ0 ` B. – If it is of the form Γ, A, Γ0 ` A

(ax)

then we can show ` A ⇒ A by example 2.7.1.1 and thus Γ, Γ0 ` A ⇒ A by weakening. – If it is of the form Γ, A, Γ0 ` B

(ax)

with B different from A which belongs to Γ or Γ0 , then we can show B ` A ⇒ B by (K)

B`B⇒A⇒B B`A⇒B

B`B

(ax) (⇒E )

and thus Γ, Γ0 ` A ⇒ B by weakening. – If it is of the form Γ, A, Γ0 ` C Γ, A, Γ0 ` C ⇒ B (⇒E ) Γ, A, Γ0 ` B then, by induction hypothesis, we have proofs of Γ, Γ0 ` A ⇒ C and of Γ, Γ0 ` A ⇒ C ⇒ B and the derivation (S)

0

.. . Γ, Γ ` A ⇒ C ⇒ B

Γ, Γ ` (A ⇒ C ⇒ B) ⇒ (A ⇒ C) ⇒ A ⇒ B Γ, Γ0 ` (A ⇒ C) ⇒ A ⇒ B Γ, Γ0 ` A ⇒ B

0

(⇒E )

... Γ, Γ0 ` A ⇒ C

(⇒E )

allows us to conclude. We can thus show that provability in this system is the usual one. Theorem 2.7.3.3. A sequent Γ ` A is provable in Hilbert calculus if and only if it is provable in natural deduction. Proof. For simplicity, we restrict to the case of the implicational fragment. In order to show that a proof in the Hilbert calculus induces a proof in NJ, we should show that the rules (ax) and (⇒E ) are admissible in NJ (this is the case by definition) and that the axioms (S) and (K) can be derived in NJ, which is easy: (ax)

(ax)

(ax)

(ax)

A ⇒ B ⇒ C, A ⇒ B, A ` A ⇒ B ⇒ C A ⇒ B ⇒ C, A ⇒ B, A ` A A ⇒ B ⇒ C, A ⇒ B, A ` A ⇒ B A ⇒ B ⇒ C, A ⇒ B, A ` A (⇒E ) (⇒E ) A ⇒ B ⇒ C, A ⇒ B, A ` B ⇒ C A ⇒ B ⇒ C, A ⇒ B, A ` B (⇒E ) A ⇒ B ⇒ C, A ⇒ B, A ` C (⇒I ) A ⇒ B ⇒ C, A ⇒ B ` A ⇒ C (⇒I ) A ⇒ B ⇒ C ` (A ⇒ B) ⇒ A ⇒ C (⇒I ) ` (A ⇒ B ⇒ C) ⇒ (A ⇒ B) ⇒ A ⇒ C

CHAPTER 2. PROPOSITIONAL LOGIC

105

(this is voluntarily too small to read, you should prove this by yourself) and (ax)

A, B ` A (⇒I ) A`B⇒A (⇒I ) `A⇒B⇒A Conversely, in order to show that a proof in NJ induces one in Hilbert calculus, we should show that the rules (ax), (⇒E ) and (⇒I ) are admissible in Hilbert calculus: the two first are by definition, and the third one was proved in proposition 2.7.3.2.

2.8 Kripke semantics We have seen in section 2.5.6 that the usual boolean interpretation of formulas is correct and complete, meaning that a formula is classically provable if and only if it is valid in every boolean model. One can wonder if there is an analogous notion of model for proofs in intuitionistic logic and this is indeed the case: Kripke models are correct and complete for intuitionistic logic. They were discovered in the 1960s by Kripke [Kri65] and Joyal for modal logic, and can be thought of as a semantics of possible worlds evolving through time: as time progresses more propositions may become true. The morale is thus that intuitionistic logic is thus somehow a logic where the notion of truth is “local”, contrarily to classical logic. 2.8.1 Kripke structures. A Kripke structure (W, 6, ρ) consists of a partially ordered set (W, 6) of worlds together with a valuation ρ : W × X → B which indicates whether in a given world, a given propositional variable is true or not. The valuation is always assumed to be monotonous, i.e. to satisfy ρ(w, X) = 1 implies ρ(w0 , X) = 1 for every worlds such that w 6 w0 . We sometimes simply write W for a Kripke structure (W, 6, ρ). Given a Kripke structure W and a world w ∈ W , we write w W A (or simply w  A when W is clear from the context) when a formula A is satisfied in w: this is defined by induction on A by w w w w w w w

X > ⊥ A∧B A∨B A⇒B  ¬A

iff ρ(w, X) holds does not hold iff w  A and w  B iff w  A or w  B iff, for every w0 > w, w0  A implies w0  B iff, for every w0 > w, w0  A does not hold

A Kripke structure is often pictured as a graph, whose vertices correspond to worlds, an edge from w to w0 indicating that w 6 w0 , with the variables X such that ρ(w, X) = 1 being written next to the node w, see examples 2.8.1.4 and 2.8.1.5. We can think of a Kripke structure as describing the evolution of a world through time: given two worlds such that w 6 w0 , we think of w0 as being a possible future for w. Since the order is not necessarily total, a given world

CHAPTER 2. PROPOSITIONAL LOGIC

106

might have different possible futures. In each world, the valuation indicates which formulas we know are true, and the monotonicity condition imposes that our knowledge can only grow: if we know that a formula is true then we will still know it in the future. Lemma 2.8.1.1. Satisfaction is monotonic: given a formula A, a Kripke structure W and a world w, if w  A then w0  A for every world w0 > w. Proof. By induction on the formula A. Given a context Γ = A1 , . . . , An , a formula A, and a Kripke structure W , we write Γ W A when, for every world w ∈ W in which all the formulas Ai are satisfied, the formula A is also satisfied. We write Γ  A when Γ W A holds for every structure W : in this case, we say that A is valid in the context Γ. Remark 2.8.1.2. It should be observed that the notion of Kripke structure generalizes the notion of boolean model recalled in section 2.5.6. Namely, a boolean valuation ρ : X → B can be seen as a Kripke structure W with a single world w, the valuation being given by ρ. The notion of validity for Kripke structures defined above then coincides with the one for boolean models. A following theorem ensures that Kripke semantics are sound: a provable formula is valid. Theorem 2.8.1.3 (Soundness). If a sequent Γ ` A is derivable in intuitionistic logic then Γ  A. Proof. By induction on the proof of Γ ` A. The contrapositive of this theorem says that if we can find a Kripke structure in which there is a world where a formula A is not satisfied, then A is not intuitionistically provable. This thus provides an alternative to methods based on cut-elimination (see section 2.3.5) in order to discard the provability of formulas. Example 2.8.1.4. Consider the formula expressing double negation elimination ¬¬X ⇒ X and the Kripke structure with W = {w0 , w1 }, with w0 6 w1 , and ρ(w0 , X) = 0 and ρ(w1 , X) = 1, which can be pictured as ·

w0

X

·

w1

We have w0 2 ¬X because (there is the future world w1 in which X holds) and w1 2 ¬X, and thus w0  ¬¬X (in fact, it can be shown that w  ¬¬X in an arbitrary structure iff for every world w0 > w there exists a world w00 > w0 such that w00  X). Moreover, we have w0 2 X, and thus w0 2 ¬¬X ⇒ X. This shows that ¬¬X ⇒ X is not intuitionistically provable. In the same Kripke structure, we have w0 2 ¬X ∨ X and thus the excluded middle ¬X ∨ X is also not intuitionistically provable either. Given an arbitrary formula A, by lemma 2.8.1.1, in this structure, this formula is either satisfied both in w0 and w1 , or only in w1 or in no world: A

A

w0

w1

·

·

·

w0

A

·

w1

·

w0

·

w1

In the two first cases, ¬¬A is satisfied and in the last one ¬A is satisfied. Therefore, the weak excluded middle ¬¬A ∨ ¬A is always satisfied: this shows

CHAPTER 2. PROPOSITIONAL LOGIC

107

that the weak excluded middle does not imply the excluded middle. By using a similar reasoning, it can be shown that the linearity axiom (A ⇒ B) ∨ (B ⇒ A) does not imply the excluded middle. Both thus give rise to intermediate logics, see section 2.5.10. Example 2.8.1.5. The Kripke structure X

·

·

Y

·

shows that the linearity formula (X ⇒ Y ) ∨ (Y ⇒ X) is not intuitionistically provable (whereas it is classically). 2.8.2 Completeness. We now consider the converse of theorem 2.8.1.3. We will show that if a formula is valid then it is provable intuitionistically. Or equivalently, that if a formula is not provable, then we can always exhibit a Kripke structure in which it does not hold. Given a possibly infinite set Φ of formulas, we write Φ ` A whenever there exists a finite subset Γ ⊆ Φ such that Γ ` A is intuitionistically provable. Such a set is consistent if Φ 0 ⊥. By lemma 2.3.4.1, we have: Lemma 2.8.2.1. A set Φ of formulas is consistent if and only if there is a formula A such that Φ 0 A. A set Φ of formulas is disjunctive if Φ ` A ∨ B implies Φ ` A or Φ ` B. Lemma 2.8.2.2. Given a set Φ of formulas and a formula A such that Φ 0 A, there exists a disjunctive set Φ such that Φ ⊆ Φ and Φ 0 A. Proof. Suppose fixed an enumeration of all the formulas of the form B ∨ C occurring in Φ. We construct by recurrence a sequence Φn of sets of formulas which are such that Φn 0 A. We set Φ0 = Φ. Suppose constructed Φn and consider the n-th formula B ∨ C in Φ: – if Φn , B 0 A, we define Φn+1 = Φn ∪ {B}, – if Φn , B ` A, we define Φn+1 = Φn ∪ {C}. In the first case, it is immediate that Φn+1 0 A. In the second one, we have Φn ` B ∨ C and Φn , B ` A, if we also had Φn , C ` A then, by (∨E ), we would also have S Φn ` A, which is excluded by recurrence hypothesis. Finally, we take Φ = n∈N Φn . A set Φ of formulas is saturated if, for every formula A, Φ ` A implies A ∈ Φ. Lemma 2.8.2.3. Given a set Φ of formulas, the set Φ = {A | Φ ` A} is saturated. A set is complete if it is consistent, disjunctive and saturated. Combining the above lemmas we obtain, Lemma 2.8.2.4. Given a set Φ of formulas and a formula A such that Φ 2 A, there exists a complete set Φ such that Φ ⊆ Φ and A 6∈ Φ. Proof. By lemma 2.8.2.2, there exists a disjunctive set of formulas Φ such that Φ ⊆ Φ and Φ 0 A. This set of consistent by lemma 2.8.2.1. Moreover, by lemma 2.8.2.3, we can suppose that this set is saturated (the construction is easily checked to preserve consistency and disjunctiveness) and such that A 6∈ Φ.

CHAPTER 2. PROPOSITIONAL LOGIC

108

The universal Kripke structure W is defined by W = {wΦ | Φ is complete} with wΦ 6 wΦ0 whenever Φ ⊆ Φ0 , and ρ(wΦ , X) = 1 iff X ∈ Φ. Lemma 2.8.2.5. Let Φ be a complete set and A a formula. Then wΦ  A iff A ∈ Φ. Proof. By induction on the formula A. – If A = X is a propositional variable, we have wΦ  X iff ρ(wΦ , X) = 1 iff X ∈ Φ. – If A = >, we always have wΦ  > and we always have > ∈ Φ because Φ ` > by (>I ) and Φ is saturated. – If A = ⊥, we never have wΦ  ⊥ and we never have ⊥ ∈ Φ because Φ is consistent. – If A = B ∧ C. Suppose that wΦ  B ∧ C. Then wΦ  B and wΦ  C, and therefore Φ ` B and Φ ` C by induction hypothesis. We deduce Φ ` B ∧ C by (∧I ) and weakening and thus B ∧ C ∈ Φ by saturation. Conversely, suppose B∧C ∈ Φ and thus Φ ` B∧C. This entails Φ ` B and Φ ` C by (∧lE ) and (∧rE ), and thus B ∈ Φ and C ∈ Φ by saturation. By induction hypothesis, we have wΦ  B and wΦ  C, and thus wΦ  B ∧ C. – If A = B ∨ C. Suppose that wΦ  B ∨ C. Then wΦ  B or wΦ  C, and therefore Φ ` B or Φ ` C by induction hypothesis. We deduce Φ ` B ∨ C by (∨lI ) or (∨rI ) and thus B ∨ C ∈ Φ by saturation. Conversely, suppose B ∨ C ∈ Φ. Since Φ is disjunctive, we have B ∈ Φ or C ∈ Φ. By induction hypothesis, we have wΦ  B or wΦ  C, and thus wΦ  B ∨ C. – If A = B ⇒ C. Suppose that wΦ  B ⇒ C. Our goal is to show B ⇒ C ∈ Φ. By (⇒I ) and saturation, it is enough to show Φ, B ` C. Suppose that it is not the case. By lemma 2.8.2.4, we can construct a complete set Φ0 with Φ ⊆ Φ0 , B ∈ Φ0 and C 6∈ Φ0 . Since B ∈ Φ0 , by induction hypothesis we have wΦ0  B. Therefore, since wΦ  B ⇒ C and wΦ 6 wΦ0 , we have wΦ0  C, a contradiction. Conversely, suppose B ⇒ C ∈ Φ. Given Φ0 such that wΦ 6 wΦ0 , i.e. Φ ⊆ Φ0 , if wΦ0  B we have to show wΦ0  C. By induction hypothesis, we have B ∈ Φ0 . Moreover, we have Φ0 ` B ⇒ C (because Φ ` B ⇒ C and Φ ⊆ Φ0 ) and therefore Φ0 ` C by (⇒E ). By saturation we have C ∈ Φ0 and thus Φ0  C, by induction hypothesis. Theorem 2.8.2.6 (Completeness). If Γ  A then Γ ` A. Proof. Suppose Γ  A and Γ 0 A. By lemma 2.8.2.4, there exists a complete set of formulas Φ such that Γ ⊆ Φ and Φ 0 A. All the formulas of Γ are valid in wΦ and thus wΦ  A, because we have Γ  A. By lemma 2.8.2.5, we therefore have A ∈ Φ, a contradiction.

CHAPTER 2. PROPOSITIONAL LOGIC

109

Remark 2.8.2.7. It can be shown that we can restrict to Kripke models which are tree-shaped and finite without losing completeness. With further restrictions, various completeness results have been obtained. As an extreme example, if we restrict to models with only one world, then we obtain boolean models (remark 2.8.1.2) which are complete for classical logic (theorem 2.5.6.5). For a more unexpected example, Kripke models which are total orders are complete for intuitionistic logic extended with the linearity axiom (section 2.5.10), thus its name.

Chapter 3

Pure λ-calculus We now introduce the λ-calculus, which is the functional core of a programming language: this is what you obtain when you remove everything from a functional programming language excepting variables, functions and application. In this language everything is thus a function. In the OCaml syntax, a typical λ-term would thus be fun f x -> f (f x) (fun y -> y) Since λ-calculus was actually invented before computers existed, the traditional notation is somewhat different from the above, and we write λx.t instead of fun x -> t so that the above term would rather be written λf x.f (f x)(λy.y) Bound variables. In a function, the name of the variable is not important, it could be replaced by any other name without changing the meaning of the function: we should consider λx.x and λy.y as the same. In a term of the form λx.t, we say that the abstraction λ binds the variable in the term t: the name of the variable x in t is not really relevant, what matters is that this is the variable which was declared by this λ. In mathematics, we are somehow used to this in other situations than functions. For instance, in the first definition below, t is bound by the limit operator, in the second t is bound by the dt operator coming with the integral, and in the last one the summation sign is binding i: Z 1 n X x f (x) = tx dt f (x) = ix f (x) = lim t→∞ t 0 i=0 This means that we can replace the name of the bound variable by any other (as above) without changing the meaning of the expression. For instance, the first one is equivalent to x lim z→∞ z This process of changing the name of the variable is called α-conversion and is more subtle than it seems at first: there are actually some restrictions on the names of the variables we can use. For instance, in the above example, we cannot rename t to x since the following reasoning is clearly not valid: 0 = lim

t→∞

x x = lim = lim 1 = 1 x→∞ t x x→∞

The problem here is that we tried to change the name of t to a variable name which was already used somewhere else. These issues are generally glossed over in mathematics, but in computer science we cannot simply do that: we have to understand in details these α-conversion mechanisms when implementing functional programming languages, otherwise we will evaluate programs incorrectly. Believe it or not this simple matter is a major source of bugs and headaches.

CHAPTER 3. PURE λ-CALCULUS

111

Evaluation. Another aspect we have to make precise is the notion of evaluation or reduction of a functional programming language. In mathematics, if f is the doubling function f (x) = x+x, then f (3) is 3+3, i.e. 6: with our λ notation, we have (λx.x + x)3 = 6. In computer science, we want to see the way the program is executed and we will consider that (λx.x + x)3 reduces to 3 + 3, which will itself reduce to 6, which is the result of our program. The general definition of the reduction in the language is given by the β-reduction rule, which is (λx.t)u −→β t[x/u] It means that the function which to x associates some expression t, when applied to an argument u reduces to t where all the occurrences of x have been replaced by u. The properties of this reduction relation is one of our main objects of interest here. In this chapter. We introduce the λ-calculus in section 3.1 and the β-reduction in section 3.2. We then study the computational power of the resulting calculus in section 3.3 and show that reduction is confluent in section 3.4. We discuss the various ways in which reduction can be implemented in section 3.5, and the ways to handle α-conversion in section 3.6. References. Should you need a more detailed presentation of λ-calculus, its properties and applications, good introductions include [Bar84, SU06, Sel08].

3.1 λ-terms 3.1.1 Definition. Suppose fixed an infinite countable set X = {x, y, z, . . .} whose elements are called variables. The set Λ of λ-terms is generated by the following grammar: t, u ::= x | t u | λx.t This means that a λ-term is either – a variable x, – an application t u, which is a pair of terms t and u, thought of as applying the function t to an argument u, – an abstraction λx.t, which is a pair consisting of a variable x and a term t, thought of as the function which to x associates t. For instance, we have the following λ-terms λx.x

(λx.(xx))(λy.(yx))

By convention, – application is associative to the left, i.e. tuv = (tu)v and not t(uv),

λx.(λy.(x(λz.y)))

CHAPTER 3. PURE λ-CALCULUS

112

– application binds more tightly than abstraction, i.e. λx.xy = λx.(xy) and not (λx.x)y (otherwise said, abstraction extends as far as possible on the right), – we sometimes group abstractions, i.e. λxyz.xz(yz)

is read as

λx.λy.λz.xz(yz).

3.1.2 Bound and free variables. In a term of the form λx.t, the variable x is said to be bound in the term t: in a sense the abstraction “declares” the variable in the term t, and all occurrences of x in t will make reference to the variable declared here (unless it is bound again). Thus, in the term (λx.xy)x, the first occurrence of x refers to the variable declared by the abstraction whereas the second does not. Intuitively, this term is the same as (λz.zy)x, but not to (λz.zy)z; this will be made formal below through the notion of α-equivalence, but we should keep in mind that there is always the possibility of renaming bound variables. A free variable in a term is a variable which is not bound in a subterm. We define the set FV(t) of a term t is formally defined, by induction on t by FV(x) = {x} FV(t u) = FV(t) ∪ FV(u) FV(λx.t) = FV(t) \ {x} A term t is closed when it has no free variable: FV(t) = ∅. Example 3.1.2.1. The set of free variables of the term (λx.xy)z is {y, z}. This term is thus not closed. The term λxy.x is closed. A variable x is fresh with respect to a term t when it does not occur in t, i.e. x ∈ X \ FV(t). Note that the set of variables of a term t is finite and the set of variable is infinite so that we can always find a fresh variable with respect to any term. 3.1.3 Renaming and α-equivalence. In order to define α-equivalence, we first define the operation of renaming a variable x to y in a term t, and write t{y/x} for the resulting term. There is one subtlety though, we only want to rename free occurrences of x, since the other ones refer to the abstraction to which they are bound. Formally, the renaming t{y/x} is defined by x{y/x} = y z{y/x} = z

if z 6= x

(t u){y/x} = (t{y/x}) (u{y/x}) (λx.t){y/x} = λx.t (λy.t){y/x} = λx.(t{y/x}) (λz.t){y/x} = λz.(t{y/x})

if z 6= x and z 6= y

CHAPTER 3. PURE λ-CALCULUS

113

The tree last lines handle the possible cases when renaming a variable in an abstraction: either we are trying to rename the bound variable, or we are trying to rename a variable into the bound variable, or the bound variable and variables involved in the renaming are distinct. The α-equivalence ===α (or α-conversion) is the smallest congruence on terms which identifies terms differing only by renaming bound variables, i.e. λx.t ===α λy.(t{y/x}) whenever y is not free in t. For instance, we have λx.x(λx.xy) ===α λz.z(λx.xy) =6==α λy.y(λx.xy) Formally, the fact that α-equivalence is a congruence means that it is the smallest relation such that whenever all the relations above the bar hold, the relation below the bar also holds: y 6∈ FV(t) λx.t ===α λy.(t{y/x}) t ===α t0 u ===α u0 t u ===α t0 u0

t ===α t

t ===α t0 t0 ===α t

t ===α t0 λx.t ===α λx.t0 t ===α t0 t0 ===α t00 t ===α t00

The equation on the first line is the one we have already seen above, those on the second line ensure that α-equivalence is compatible with application and abstraction, and those on the third line impose that α-equivalence is an equivalence relation (i.e. reflexive, symmetric and transitive). 3.1.4 Substitution. Given λ-terms t and u and a variable x, we can define a new term t[u/x] which is the λ-term obtained from t by replacing free occurrences of x by u. Again we have to properly take care of issues related to the fact that some variables are bound: – we only want to replace free occurrences of the variable x in t, since the bound ones refer to the corresponding abstractions in t and might be renamed, i.e. (x(λxy.x))[u/x] = u(λxy.x)

but not u(λxy.u),

– we do not want free variables in u to become accidentally bound by some abstraction in t, i.e. (λx.xy)[x/y] = (λz.zy)[x/y] = λz.zx

but not λx.xx.

CHAPTER 3. PURE λ-CALCULUS

114

Formally, the substitution t[u/x] is defined by induction on t by x[u/x] = u if y 6= x

y[u/x] = y (t1 t2 )[u/x] = (t1 [u/x]) (t2 [u/x]) (λx.t)[u/x] = λx.t

if y 6= x and y 6∈ FV(t)

(λy.t)[u/x] = λy.(t[u/x]) 0

0

(λy.t)[u/x] = λy .(t{y /y}[u/x])

if y 6= x, y ∈ FV(t) and y 0 6∈ FV(t) ∪ FV(u).

Because of the last line, the result of the substitution is not well-defined, because it depends on an arbitrary choice of a fresh variable y 0 , but one can show that this is a well-defined operation on λ-terms up to α-equivalence. For this reason, as soon as we want to perform substitutions, it only makes sense to consider the set of λ-terms quotiented by the α-equivalence relation: we will implicitly do so in the following, and implicitly ensure that all the constructions we perform are compatible with α-equivalence. The only time where we should take α-conversion seriously is when dealing with implementation matters, see section 3.6.2 for instance. Adopting this convention, the three last cases can be replaced by (λy.t)[u/x] = λy.(t[u/x]) where we suppose that y 6∈ FV(t) ∪ {x}, which we can always do up to α-conversion.

3.2 β-reduction Consider a term of the form (λx.t) u

(3.1)

It intuitively consists of a function expecting an argument x and returning a result t(x), which is given an argument u. We expect therefore the computation to reach the term t[u/x] consisting of the term t where all the free occurrences of x have been replaced by u. This is what the notion of β-reduction does and we write (λx.t) u −→β t[u/x] (3.2) to indicate that the term on the left reduces to the term on the right. Actually, we want to be able to also perform this kind of reduction within a term: we call a β-redex in a term t, a subterm of the form (3.1) and the β-reduction consists in preforming the replacement (3.2) in the term. 3.2.1 Definition. Formally, the β-reduction is defined as the smallest binary relation −→β on terms such that

(λx.t)u −→β t[u/x] t −→β t0 (βl ) tu −→β t0 u

(βs )

t −→β t0 (βλ ) λx.t −→β λx.t0 u −→β u0 (βr ) tu −→β tu0

CHAPTER 3. PURE λ-CALCULUS

115

A “proof tree” showing that t −→β u is called a derivation of it. For instance, a derivation of λx.(λy.y)xz −→β λx.xz is (βs )

(λy.y)x −→β x (βl ) (λy.y)xz −→β xz (βλ ) λx.(λy.y)xz −→β λx.xz Such derivations are often useful to reason about β-reduction steps, by induction on the derivation tree. 3.2.2 An example. For instance, we have the following sequence of β-reductions, were each time we have underlined the β-redex: (λx.y)((λz.zz)(λt.t)) −→β (λx.y)((λt.t)(λt.t)) −→β (λx.y)(λt.t) −→β y 3.2.3 Reduction and redexes. Let us now make some basic observations about how reductions interact with redexes. Reduction can create β-redexes: (λx.xx)(λy.y) −→β (λy.y)(λy.y) In the initial term there was only one redex, and after reducing it a new redex has appeared. Reductions can duplicate β-redexes: (λx.xx)((λy.y)(λz.z)) −→β ((λy.y)(λz.z))((λy.y)(λz.z)) The β-redex (λy.y)(λz.z) occurs once in the initial term and twice in the reduced one. Reduction can also erase β-redexes: (λx.y)((λy.y)(λz.z)) −→β y There were two redexes in the initial term, but there is none left after reducing one of them. 3.2.4 Confluence. The reduction is not deterministic since some terms can reduce in multiple ways: λy.y

β ←−

(λxy.y)((λx.x)(λx.x)) −→β (λxy.y)(λx.x)

We thus have to be careful when studying properties of reduction: in particular, we always have to specify whether those properties hold for some reduction or every reduction. It can be noted that, although the two above reductions differ, they end up in the same term. For instance, the term on the right above reduces to λy.y, which is the term on the left. This property is called “confluence”: eventually, the order in which we chose to perform β-reductions does not matter. This will be detailed and proved in section 3.4.

CHAPTER 3. PURE λ-CALCULUS

116

3.2.5 β-reduction paths. A reduction path t = t0 −→β t1 −→β t2 −→β . . . −→β tn−1 −→β tn = u from t to u is a finite sequence of terms t0 , t1 , . . . , tn such that ti β-reduces to ti+1 for every index i, with t0 = t and tn = u. The natural number n is called the length of the reduction. We write ∗

t −→β u when there exists a reduction path from t to u as above, and say that t reduces ∗ in multiple steps to u. The relation −→β on terms is the reflexive and transitive closure of the relation −→β . 3.2.6 Normalization. Some terms cannot reduce, they are called normal forms: x

x(λy.λz.y)

...

Those can be though of as “values” or “results” for computations in λ-calculus. Those terms are easily characterized: Proposition 3.2.6.1. The λ-terms in normal form can be characterized inductively as the terms of the form λx.t

or

x t1 . . . tn

where the ti and t are normal forms. Proof. We reason by induction on the size of λ-terms (the size being the number of abstractions and applications). Suppose given a λ-term in normal form: by definition if can be of the following forms. – x: it is a normal form and it is a term generated of the expected form. – λx.t: by the rule (βλ ), this term is a normal form if and only if t is, i.e. by induction, if and only if t is itself of the expected form. – t u: by the rules (βl ) and (βr ), if it is a normal form then both t and u are in normal form. By induction, t must be of the form λx.t0 or x t1 . . . tn with t0 and ti of the expected form. The first case is impossible: otherwise, t u = (λx.t0 )u would reduce by (βs ). Therefore, t u is of the form x t1 . . . tn u with ti and u in normal form. Conversely, any term of this form is a normal form. Having identified normal forms as the notion of “result” in the λ-calculus, it is natural to study whether every term will eventually give rise to a result: we will see that this is not the case. A term t is weakly normalizing when it can ∗ reduce to a normal form, i.e. there exists a normal form u such that t −→β u. It is strongly normalizing when every sequence of reductions will eventually reduce to a normal form. Otherwise said, there is no infinite sequence of reductions starting from t: t = t0 −→β t1 −→β t2 −→β . . .

CHAPTER 3. PURE λ-CALCULUS

117

Not every term is strongly normalizing. For instance, the term Ω = (λx.xx)(λx.xx) reduces to itself and thus infinitely: (λx.xx)(λx.xx) −→β (λx.xx)(λx.xx) −→β (λx.xx)(λx.xx) −→β . . . As a variant, the following term keeps growing during the reduction: (λx.xx)(λy.yyy) −→β (λy.yyy)(λy.yyy) −→β (λy.yyy)(λy.yyy)(λy.yyy) −→β . . . Clearly, a strongly normalizing term is weakly normalizing, but the converse does not hold. For instance, the term (λx.y)((λx.xx)(λx.xx)) can reduce to y, which is a normal form and is thus weakly normalizing. It can also reduce to itself and is thus not strongly normalizing. 3.2.7 β-equivalence. We write ===β for the β-equivalence, which is the smallest equivalence relation containing −→β . It is not difficult to show that this ∗ relation can be characterized as the transitive closure of the relation −→β : we have t ===β u whenever there exists terms t0 , . . . , t2n such that ∗











t = t0β ←− t1 −→β t2β ←− t3 −→β . . . β ←− t2n−1 −→β t2n = u The notion of β-equivalence is very natural on λ-terms: it identifies two terms whenever they give rise to the same result. Two β-equivalent terms are sometimes also said to be β-convertible. 3.2.8 η-equivalence. In OCaml, the functions sin and fun x -> sin x are clearly “the same”: one can be used in place of another without changing anything, both will compute the sine of their input. However, they are not identical: their syntax differ. In λ-calculus, the η-equivalence relation relates two such terms: it identifies a term t (which is a function, since everything is function in λ-calculus) with the function which to x associates t x. Formally, the η-equivalence relation ===η is the smallest congruence such that t ===η λx.tx for every term t. By analogy with β-reduction, it will sometimes be useful to consider the η-reduction relation which is the smallest congruence such that λx.tx −→η t for every term t. The opposite relation t −→η λx.tx

CHAPTER 3. PURE λ-CALCULUS

118

is also useful and called η-expansion. We have that η-equivalence is the reflexive, symmetric and transitive closure of this relation. Finally, we write ===βη for the βη-equivalence relation, which is smallest equivalence relation containing both ===β and ===η . In this course, we will mostly focus on β-equivalence, although most proofs generalize to βη-equivalence.

3.3 Computing in the λ-calculus The λ-calculus contains only functions. Even though we have removed most of what is usually found in a programming language, we will see that it is far from trivial as a programming language. In order to do so, we will gradually show that usual programming constructions can be encoded in λ-calculus. 3.3.1 Identity. A first interesting term is the identity λ-term I = λx.x It has the property that, for any term t, we have I t −→β t 3.3.2 Booleans. The booleans true and false can respectively be encoded as T = λxy.x

F = λxy.y

With this encoding, the usual if-then-else conditional construction can be encoded as if = λbxy.bxy Namely, we have ∗



if T t u −→β t

if F t u −→β u

For instance, the first reduction is if T t u = (λbxy.bxy)(λxy.x)tu −→β (λxy.(λxy.x)xy)tu −→β (λy.(λxy.x)ty)u −→β (λxy.x)tu −→β (λy.t)u −→β t and the second one is similar. From there, the usual boolean operations of conjunction, disjunction and negation are easily defined by and = λxy.x y F

or = λxy.x T y

not = λx.x F T

For instance, one can check that we have and T T −→β T

and T F −→β F

and F T −→β F

and F F −→β F

Above, we have defined conjunction (and other operations) from conditionals, which quite classical. In OCaml, we would have written

CHAPTER 3. PURE λ-CALCULUS

119

let and x y = if x then y else false which translates in λ-calculus as ∗



and ←−η λxy.and x y = λxy.if x y F −→β λxy. x y F and suggests the definition we made. There are of course other possible implementations, e.g. and = λxy.xyx In the above implementations, we only guarantee that the expected reductions will happen when the arguments are booleans, but nothing is specified when the arguments are arbitrary λ-terms. 3.3.3 Pairs. The encoding of pairs can be deduced from booleans. We can namely encode the pairing operator as pair = λxyb.if b x y When applied to two terms t and u, it reduces to ∗

pair t u −→β λb.if b t u which can be thought of as an encoding of the pair ht, ui. In order to recover the components of the pair, we can simply apply it to either T or F: ∗



(pair t u) T −→β t

(pair t u) F −→β u

We thus define the two projections as fst = λp.p T

snd = λp.p F

and we have, as expected ∗



fst (pair t u) −→β t

snd (pair t u) −→β u

More generally, n-tuples can be encoded as uplen = λx1 . . . xn b.b x1 . . . xn with the associated projections projni = λx1 . . . xn .xi and one checks that



projni (uplen t1 . . . tn ) −→β ti 3.3.4 Natural numbers. Given λ-terms f and x, and a natural number n ∈ N, we write f n x for the λ-term f (f (. . . (f x))) with n occurrences of f : f n+1 x = f (f n x)

f 0x = x The n-th Church numeral is the λ-term

n = λf x.f n x = λf x.f (f (. . . (f x))) Otherwise said, the λ-term n is such that, when applied to arguments f and x, iterates n times the application of f to x. For low values of n, we have 0 = λf x.x

1 = λf x.f x

2 = λf x.f (f x)

3 = λf x.f (f (f x))

...

CHAPTER 3. PURE λ-CALCULUS

120

Successor. The successor function can be encoded as succ = λnf x.f (nf x) which applies f to f n x. It behaves as expected since succ n === (λnf x.f (nf x))(λf x.f n x) −→β λf x.f ((λf x.f n x)f x) −→β λf x.f ((λx.f n x)x) −→β λf x.f (f n x) === λf x.f n+1 x === n + 1 Another natural possible definition of successor would be succ = λnf x.nf (f x) Arithmetic functions. The addition, multiplication and exponentiation can similarly be as add = λmnf x.m succ n mul = λmnf x.m (add n) 0 exp = λmn.n (mul m) 1 or, alternatively, as add = λmnf x.mf (nf x)

mul = λmnf x.m(nf )x

exp = λmn.nm

It can be checked that addition is such that, for every m, n ∈ N, we have add m n = m + n: it computes the function which to x associates f applied m times to f applied n times to x, i.e. f m+n x. And similarly for other operations. Comparisons. The test-to-zero function takes a natural number n as argument and returns the boolean true or false depending on whether n is 0 or not. It can be encoded as iszero = λnxy.n(λz.y)x Given n, x and y, it applies n times the function f = λz.y to x: if the function is applied 0 times then x is returned, otherwise if the function is applied at least once then y is returned. The predecessor function can also be encoded although it is more difficult (this is detailed below): pred = λnf x.n(λgh.h(gf ))(λy.x)(λy.y) It allows defining subtraction as sub = λmn.n pred m where, by convention, the result of m − n is 0 when m < n. From there, we can define comparisons of natural numbers such as the 6 relation since m 6 n is equivalent to m − n = 0: leq = λmn.iszero (sub m n)

CHAPTER 3. PURE λ-CALCULUS

121

Exercise 3.3.4.1. The Ackermann function [Ack28] from pairs of natural numbers to natural numbers is the function A defined by A(0, n) = n + 1 A(m + 1, 0) = A(m, 1) A(m + 1, n + 1) = A(m, A(m + 1, n)) Show that, in λ-calculus, it can be implemented as ack = λm.m(λf. n f (f 1)) succ Predecessor. We are now going to see how we can implement the predecessor function mentioned above. Before going into that, let us see how we can implement the Fibonacci sequence fn defined by f0 = 0, f1 = 1 and fn+1 = fn +fn−1 . A naive implementation would be let rec fib n = if n = 0 then 0 else if n = 1 then 1 else fib (n-1) + fib (n-2) This function is highly inefficient because many computations are performed multiple times. For instance, to compute fn , we compute both fn−1 and fn−2 , but the computation of fn−1 will require computing another time fn−2 , and so on. The usual strategy to improve that consists in computing two successive values (fn−1 , fn ) of the Fibonacci sequence at a time. Given such a pair, the next pair is computed by (fn , fn+1 ) = (fn , fn−1 + fn ) We thus define the function let fib_fun (q,p) = (p,p+q) which computes the next pair depending on the current pair. If we iterate n times this function on the pair (f0 , f1 ) = (0, 1), we obtain the pair (fn , fn+1 ) and we can thus obtain the n-th term of the Fibonacci sequence by projecting to the first element: let fib n = fst (iter n fib_fun (0,1)) where the function iter applies n times a function f to some element x: let rec iter n f x = if n = 0 then x else f (iter (n-1) f x) Now, suppose that we want to implement the predecessor function on natural numbers without using subtraction. Given n ∈ N, there is one value for which we obviously know the predecessor: the predecessor of n + 1 is n. We will use this fact, and the above trick in order to remember the value for the previous predecessor, which is the n − 1 we are looking for! Let us write pn for the

CHAPTER 3. PURE λ-CALCULUS

122

predecessor of n. We can compute the pair (pn , pn+1 ) of two successive values form the previous pair (pn−1 , pn ) by (pn , pn+1 ) = (pn , pn + 1) We thus define the function let pred_fun (q,p) = (p,p+1) If we iterate this function n times starting from the pair (p0 , p1 ) = (0, 0), we obtain the pair (pn , pn+1 ) and can thus compute pn as its first component: let pred n = fst (iter n pred_fun (0,0)) In λ-calculus, this translates as pred = λn.fst (n (λx.pair (snd x) (succ (snd x))) (pair 0 0))) The formula for predecessor provided above is a variant of this one. 3.3.5 Fixpoints. In order to define more elaborate functions on natural numbers such as the factorial, we need to have the possibility of defining functions recursively. This can be achieved in λ-calculus thanks to the so-called fixpoint combinators. In mathematics, a fixpoint of a function f is a value x such that f (x) = x. Note that such a value may or may not exist: for instance f = x 7→ x2 has 0 and 1 as fixpoints whereas f = x 7→ x + 1 has no fixpoint. Similarly, in λ-calculus a fixpoint for a term t is a term u such that t u ===β u A distinguishing feature of the λ-calculus is that 1. every term t admits a fixpoint, 2. this fixpoint can be computed within λ-calculus: there is a term Y such that Y t is a fixpoint of t: t (Y t) ===β Y t A term Y as above is called a fixpoint combinator. Fixpoints in OCaml. Before giving a λ-term which is fixpoint operator, let us see how it can be implemented in OCaml and used to program recursive functions. In practice, we will look for a function Y such that Y t −→β t (Y t) Note that such a function is necessarily non-terminating since there is an infinite sequence of reductions Y t −→β t (Y t) −→β t t (Y t) −→β t t t (Y t) −→β . . . but it might still be useful because since there might be other possible reductions reaching a normal form. Following the conventions, we will write fix instead of Y. A function which behaves as proposed is easily implemented:

CHAPTER 3. PURE λ-CALCULUS

123

let rec fix f = f (fix f) Let us see how this can be used in order to implement the factorial function without explicitly resorting to recursion. The factorial function satisfies 0! = 1 and n! = n × (n − 1)! so that it can be implemented as let rec fact n = if n = 0 then 1 else n * fact (n-1) In order to implement it without using recursion, the trick is to first transform this function into one which takes, as first argument, a function f which is to be the factorial itself, and replace recursive calls by calls to this function: let fact_fun f n = if n = 0 then 1 else n * f (n - 1) We then expect the factorial function to be obtained as its fixpoint: let fact = fix fact_fun Namely, this function will reduce to fact_fun (fix fact_fun), i.e. the above function where f was replaced by the function itself, as expected. However, if we try to define the function fact in this way, OCaml complains: Stack overflow during evaluation (looping recursion?). This is because OCaml always evaluates arguments first, so that it will fall into the infinite sequence of reductions mentioned above (the stack will grow at each recursive call and will exceed the maximal authorized value): fix fact_fun −→β fact_fun (fix fact_fun) −→β . . . The trick in order to avoid that in order to avoid that is to add an argument in the definition of fix: let rec fix f x = f (fix f) x and now the above definition of factorial computes as expected: this time, the argument fix f does not evaluate further because it is a function which is still expecting its second argument. It is interesting to remark that the two definitions of fix (the looping one and the working one) are η-equivalent, see section 3.2.8, so that two η-equivalent terms can act differently depending on the properties we consider. Fixpoints in λ-calculus. The above definition of fix does not easily translate to λ-calculus, because there is no simple way of defining recursive functions. A possible implementation of the fixpoint combinator can be obtained by a variant on the looping term Ω (see section 3.2.6). The Curry fixpoint combinator is Y = λf.(λx.f (xx))(λx.f (xx)) Namely, given a term t, we have Y t === (λf.(λx.f (xx))(λx.f (xx))) t −→β (λx.t(xx))(λx.t(xx)) −→β t((λx.t(xx))(λx.t(xx))) β ←−

t (Y t)

CHAPTER 3. PURE λ-CALCULUS

124

which shows that we indeed have Y t ===β t (Y t), i.e. Y t is a fixpoint of t. Another possible fixpoint combinator is Turing’s one defined as Θ = (λf x.x(f f x))(λf x.x(f f x)) which satisfies, for any term t, ∗

Θ t −→β t(Θ t) (we have here a proper sequence of β-reductions, as opposed to a mere β-equivalence for Curry’s combinator). The OCaml definition of the factorial let fact = fix (fun f n -> if n = 0 then 1 else n * f (n - 1)) translates into λ-calculus as fact = Y(λf n.if (iszero n) 1 (mul n (f (pred n)))) For instance, the factorial of 2 computes as fact 2 === (YF ) 2 ===β F (YF ) 2 ∗

−→β if (iszero 2) 1 (mul 2 ((YF ) (pred 2))) ∗

−→β if false 1 (mul 2 ((YF ) (pred 2))) ∗

−→β mul 2 ((YF ) (pred 2)) ∗

−→β mul 2 ((YF ) 1) .. . ∗

−→β mul 2 (mul 1 1) ∗

−→β 2 Remark 3.3.5.1. Following Church’s initial intuition when introducing the λ-calculus, we can think of λ-terms as describing sets, in the sense of set theory (see section 5.3). Namely, a set t can be thought of as a predicate, i.e. a function which takes an element u as argument and returns true or false depending on whether the element u belongs to t or not. Following this point of view, instead of writing u ∈ t, we write t u. Similarly, given a predicate t, the set {x | t} is naturally written λx.t: set theory u∈t {x | t}

λ-calculus tu λx.t

In this context, the paradoxical Russell set r = {x | ¬(x ∈ x)} of naive set theory, see section 5.3.1, is written as r = λx.¬(xx)

CHAPTER 3. PURE λ-CALCULUS

125

This set has the property that r ∈ r iff ¬(r ∈ r), i.e. rr = ¬(rr) Otherwise said rr is a fixpoint for ¬. Generalizing this to any f instead of ¬, we recover the definition of Y: Y = λf.rr with r = λx.f (xx). In this sense, the fixpoint combinator is the Russell paradox in disguise! Church’s combinator in OCaml. If we try to implement Church’s combinator in OCaml: let fix = fun f -> (fun x -> f (x x)) (fun x -> f (x x)) we get a typing error concerning the variable x. Namely, x is applied to something in the above expression, so it should be of type ’a -> ’b, but its argument is x itself, which imposes that we should have ’a = ’a -> ’b. The type of x should thus be the infinite type ... -> ’b -> ’b -> ’b -> ’b which is not allowed by default. There are two ways around this. The first one consists in using the -rectypes option of OCaml in order allows such types. If we use this function to define the factorial by let fact = fix fact_fun we get a stack overflow, meaning that the program is looping, which can be solved with an η-expansion (we have already seen this trick above). We can thus define instead let fix = fun f -> (fun x y -> f (x x) y) (fun x y -> f (x x) y) and now the definition of factorial works as expected. The second one, if you do not want to use some exotic flag, consists in using a recursive type, which allow such recursions in types. Namely, we can define the type type 'a t = Arr of ('a t -> 'a) with which we can define the fixpoint operators as let fix f = (fun x y -> f (arr x x) y) (Arr (fun x y -> f (arr x x) y)) where we use the shorthand let arr (Arr f) = f In the same spirit, the Turing fixpoint combinator can be implemented as let turing = let t f x y = x (arr f f x) y in t (Arr t)

CHAPTER 3. PURE λ-CALCULUS

126

3.3.6 Turing completeness. The previous encodings of usual functions, should make it more or less clear that the λ-calculus is a full-fledged programming language. In particular, from the classical undecidability results [Tur37] we can deduce: Theorem 3.3.6.1 (Undecidability). The following problems are undecidable: – whether two terms are β-equivalent, – whether a term can β-reduce to a normal form. In order to make this result more precise, we should encode Turing machines into λ-terms. Instead of doing this directly, we can rather encode recursive functions, which are already known to have the same expressiveness as Turing machines. The class of recursive functions is the smallest class of partially defined functions f : Nk → N for some k ∈ N, which contains the zero constant function z, the successor function s and the projections pki , for k ∈ N and 1 6 i 6 k: z : N0 → N () 7→ 0

s : N1 → N (n) 7→ n + 1

pki :

Nk → N (n1 , . . . , nk ) 7→ ni

and is closed under – composition: given recursive functions f : Nl → N

and

g1 , . . . , gl : Nk → N

the function compfg1 ,...,gl :

Nl → N (n1 , . . . , nk ) 7→ f (g1 (n1 , . . . , nk ), . . . , gl (n1 , . . . , nk ))

is also recursive, – primitive recursion: given recursive functions f : Nk → N

and

g : Nk+2 → N

the function recf,g :

Nk+1 → N (0, n1 , . . . , nk ) 7→ f (n1 , . . . , nk ) (n0 + 1, n1 , . . . , nk ) 7→ g(recf,g (n0 , n1 , . . . , nk ), n0 , n1 , . . . , nk )

is also recursive, – minimization: given a recursive function f : Nk+1 → N the function minf : Nk → N which to (n1 , . . . , nk ) ∈ Nk associates the smallest n0 ∈ N such that f (n0 , n1 , . . . , nk ) = 0 is also recursive.

CHAPTER 3. PURE λ-CALCULUS

127

The presence of minimization is the reason why we need to consider partially defined functions. A function f : Nn → N is definable by a λ-term t when, for every tuple of natural numbers (n1 , . . . , nk ) ∈ Nk , we have ∗

t n1 . . . nk −→β f (n1 , . . . , nk ) where n is the Church numeral associated to n. Theorem 3.3.6.2 (Kleene). The functions definable in λ-calculus are precisely the recursive ones. Proof. The terms constructed in section 3.3 easily allow to encode total recursive functions f as λ-terms Jf K: we define JzK = 0 = λf x.x

JsK = succ = λnf x.f nf x

Jpki K = λx1 . . . xk .xi

composition is given by Jcompfg1 ,...,gl K = λx1 . . . xk .Jf K(Jg1 Kx1 . . . xk ) . . . (Jgl Kx1 . . . xk ) primitive recursion by Jrecf,g K = Y(λrx0 x1 . . . xk .if (iszero x0 ) (Jf Kx1 . . . xk ) (JgK(r(pred x0 ))(pred x0 )x1 . . . xk )) and minimization by Jminf K = Y(λrx0 x1 . . . xk .if (iszero (Jf Kx0 x1 . . . xk )) x0 (r(succ x0 )x1 . . . xk )) 0 In order to handle general recursive functions, which might be partial, there is a subtlety with composition: if g is not defined on x, then compfg (x) should not be defined, even if f is a constant function for instance, and this is not the case with the current encoding. This is easily overcome with the following construction: we write t ↓ u = u t (λx.x) for the term which should be read as “t provided u terminates”. It can be checked ∗ that t ↓ u does not reduce to a normal form if u does not and t ↓ n −→β t. We can now use this trick to correct the behavior of our encoding. For instance, the projection should be encoded as Jpki K = λx1 . . . xk .(xi ↓ x1 ↓ . . . ↓ xk ) For the converse property, i.e. the definable functions are recursive, we should encode λ-terms and their reduction into natural numbers, sometimes called Gödel numbers. This can be done, see [Bar84] (or if you are willing to accept that recursive functions are Turing-equivalent to usual programming languages, this amounts to show that we can make a program which reduces λ-terms, which we can, see section 3.5).

CHAPTER 3. PURE λ-CALCULUS

128

3.3.7 Self-interpreting. We see that λ-calculus provides yet another model which is equivalent to Turing machines. This means that the functions we can compute in both model are the same, but not that they are equally simple to implement in both models. For instance, constructing a universal Turing machine is not an easy task: we have to decide on an encoding of the transitions on the tape and then build a Turing machine to use this encoding and the resulting machine is usually neither small nor particularly elegant. In λ-calculus however, this is easy. For instance, we can encode a λ-term t as a λ-term ptq, as follows. We first pick a fresh variable i (by fresh we mean here i 6∈ FV(t)), replace every application u v in t by i u v and prepend λi to the resulting term. For instance, the term t = succ 0 = (λnf x.f (nf x))(λf x.x) is encoded as ptq = λi.i(λnf x.if (i(i n f )x))(λf x.x) Even though the original term t could reduce, the term ptq cannot (because of the manipulation we have performed on applications), and can thus be considered as a decent encoding of t. We can then define an interpreter as int = λt.t (λx.x) This term has the property that, for every λ-term t, int ptq β-reduces to the normal form of t. More details can be found in [Bar91, Mog92, Lyn17]. 3.3.8 Adding constructors. Even though we have seen that all the usual constructions can be encoded in the λ-calculus, it is often convenient to add those as new explicit constructions to the calculus. For instance, products can be added to the λ-calculus by extending the syntax of λ-terms to t, u ::= x | t u | λx.t | ht, ui | πl | πr The new expressions are – ht, ui: the pair of two terms t and u, – πl and πr : the left and right projections respectively. The β-reduction also has to be extended in order to account for those. We add the two new reduction rules πl ht, ui −→β t

πr ht, ui −→β u

which express the fact that the left (resp. right) projection extracts the left (resp. right) component of a pair. Although most important properties (such as confluence) generalize to such variants of λ-calculus, we stick here to the plain one for simplicity. Some extensions are used and detailed in section 4.3.

3.4 Confluence of the λ-calculus In order to be reasonably useful, the λ-calculus should be reasonably deterministic, i.e. we should somehow be able to speak about “the” result of the

CHAPTER 3. PURE λ-CALCULUS

129

evaluation (by which we mean the β-reduction) of a λ-term. The first observation we already made is that, on given a term, multiple distinct reductions may be performed. For instance, (λxy.y)((λa.a)(λb.b)) y

(λxy.y)(λb.b)

Another hope might be that if we reduce a term long enough, we will end up on a normal form (a term that cannot be reduced further), which can be considered as a result of the computation, and that if we perform two such reductions on a term, we will en up on the same normal form: the intermediate steps might not be the same, but in the end we always end up with the same result. For instance, on natural numbers, we can speak of 10 as the result of (1 + 2) + (3 + 4) because it does not depend on the intermediate steps used to compute it: (1 + 2) + (3 + 4) 3 + (3 + 4)

(1 + 2) + 7 3+7 10

However, in the case of λ-calculus, this hope is vain because we have seen that some terms might lead to infinite sequence of β-reductions, thus never reaching a normal form. 3.4.1 Confluence. The property which turns out to be satisfied in the case of λ-calculus is called confluence: it states that if starting from a term t reduces in many steps to a term u1 and also to a term u2 , then there exists a term v such that both u1 and u2 reduce in many steps to v: t ∗



u1

u2 ∗



v In other words, computation starting from a term t might lead to different intermediate results, but there is always a way for those results to converge to a common term. Note that this result would not be valid if we required to have exactly one reduction step each time. For instance, we need two reductions to complete the

CHAPTER 3. PURE λ-CALCULUS

130

following square on the right: (λyx.xyy)(I I) (λyx.xyy) I

λx.x(I I)(I I) λx.x(I I)I λx.x I I

where I = λx.x is the identity. The easiest way to prove this confluence result first requires to introduce a variant of the β-reduction. 3.4.2 The parallel β-reduction. The parallel β-reduction − is the smallest relation on λ-terms such that k

x − x

(βx )

t − t0 u − u0 k (βa ) t u − t0 u0

t − t0 u − u0 k (βs ) (λx.t)u − t0 [u0 /x] t − t0 k (β ) λx.t − λx.t0 λ



As usual, we write − for the reflexive and transitive closure of the relation −. Informally, t − u means that u is obtained from t by reducing in one step many of the β-redexes present in t at once. For instance, we have (λxy.I x y)(I I) − λy.Iy − λy.y where the first step intuitively corresponds to simultaneously performing the three β-reductions (λxy.I x y)(I I) −→β λy.I (I I) y

I x −→β x

I I −→β I

As for usual β-reduction, the parallel β-reduction might create some β-redexes which were not present in the original term, and could thus not be reduced at first. For this reason, even though we can reduce in multiple places at once, we cannot perform a parallel β-reduction step directly from the term on the left to the term on the right in the above example. In parallel β-reduction, we are allowed not to perform all the available βreduction steps. In particular, we may perform none: Lemma 3.4.2.1. For every λ-term t, we have t − t. Proof. By induction on the term t. 3.4.3 Properties of the parallel β-reduction. We now study some properties of the parallel β-reduction. Since it corresponds to performing β-reduction ∗ ∗ steps in parallel, the relations − and −→β coincide: we can simulate parallel β-reduction with β-reduction and conversely. Moreover, we will see that parallel β-reduction is easily shown to be confluent, from which we will be able to deduce the confluence of β-reduction. First, any β-reduction step can be simulated by a parallel reduction step:

CHAPTER 3. PURE λ-CALCULUS

131

Lemma 3.4.3.1. If t −→β u then t − u. Proof. By induction on the derivation of t −→β u. Conversely, any parallel β-reduction step can be simulated by multiple β-reduction steps: ∗

Lemma 3.4.3.2. If t − u then t −→β u. Proof. By induction on the derivation of t − u. From this, we immediately deduce that the reflexive and transitive closure of the two relations coincide: ∗



Lemma 3.4.3.3. We have t − u if and only if t −→β u. ∗

Proof. If t − u, this means that we have a sequence of parallel reduction steps t = t0 − t1 − t2 − . . . − tn = u Therefore, by lemma 3.4.3.2, ∗







t = t0 −→β t1 −→β t2 −→β . . . −→β tn = u ∗



and thus t −→β u. Conversely, if t −→β u, this means that we have a sequence of β-reduction steps t = t0 −→β t1 −→β t2 −→β . . . −→β tn = u Therefore, by lemma 3.4.3.2, t = t0 − t1 − t2 − . . . − tn = u ∗

and thus t − u. Next, the parallel β-reduction is compatible with substitution. Lemma 3.4.3.4. If t − t0 and u − u0 then t[u/x] − t0 [u0 /x]. Proof. By induction on the derivation of t − t0 . – If the last rule is

k

y − y

(βx )

then t = y = t0 and we conclude with y[u/x] = y − y = y[u/x] or x[u/x] = u − u0 = x[u/x] depending on whether y 6= x or y = x.

CHAPTER 3. PURE λ-CALCULUS – If the last rule is

132

t1 − t01 t2 − t02 k (βs ) (λy.t1 ) t2 − t01 [t02 /y]

with y 6= x, then, by induction hypothesis, we have t1 [u/x] − t01 [u0 /x]

t2 [u/x] − t02 [u0 /x]

k

and thus, by (βs ), (λy.t1 [u/x]) (t2 [u/x]) − t01 [u0 /x][t02 [u0 /x]/y] which can be rewritten as ((λy.t1 ) t2 )[u/x] − t01 [t02 /y][u0 /x] – If the last rule is

t1 − t01 t2 − t02 k (βa ) t1 t2 − t01 t02

then, by induction hypothesis, we have t1 [u/x] − t01 [u0 /x]

t2 [u/x] − t02 [u0 /x]

k

and thus, by (βa ), (t1 [u/x]) (t2 [u/x]) − (t01 [u0 /x]) (t02 [u0 /x]) otherwise said (t1 t2 )[u/x] − (t01 t02 )[u0 /x] – If the last rule is

t1 − t01 k (β ) λy.t1 − λy.t01 λ

the by induction hypothesis we have t1 [u/x] − t01 [u0 /x] k

and thus, by (βλ ), (λy.t1 )[u/x] = λy.t1 [u/x] − λy.t01 [u0 /x] = (λy.t01 )[u0 /x] and we are done. We can use this lemma to show that the β-reduction satisfies a variant of the confluence property called the diamond property, or local confluence: Lemma 3.4.3.5 (Diamond property). Suppose that t − u and t − u0 . Then there exists v such that u − v and u0 − v: t u0

u v

CHAPTER 3. PURE λ-CALCULUS

133

Proof. Suppose that t − u and t − u0 . We show the result by induction on the derivation of t − u. – If the last rule of the derivation of t − u is k

x − x

(βx )

then t = x = u and, by lemma 3.4.2.1, we have u0 − u0 : x u0

x u0

– If the last rule of the derivation of t − u is t1 − u1 t2 − u2 k (βs ) (λx.t1 )t2 − u1 [u2 /x] we have two possible cases depending on the derivation of t − u0 . – If the last rule of the derivation of t − u0 is t1 − u01 t2 − u02 k (βs ) (λx.t1 )t2 − u01 [u02 /x] then, by induction hypothesis, there exists a term vi such that ui − vi and u0i − vi , for i = 1 or i = 2: t1

t2 u01

u1

u02

u2

v1

v2

Therefore, we have both u1 [u2 /x] − v1 [v2 /x]

and

u01 [u02 /x] − v1 [v2 /x]

by lemma 3.4.3.4 and we can conclude: (λx.t1 ) t2 u01 [u02 /x]

u1 [u2 /x] v1 [v2 /x]

– If the last rule of the derivation of t − u0 is λx.t1 − t01 t2 − u02 k (βa ) (λx.t1 ) t2 − t01 u02

CHAPTER 3. PURE λ-CALCULUS

134

then the last rule of the derivation of λx.t1 − t01 is necessarily of the form t1 − u01 k (β ) λx.t1 − λx.u01 λ with t01 = λx.u01 . By induction hypothesis, we have the existence of the dotted reductions t1

t2 u01

u1

u02

u2

v1

v2

We thus have u1 [u2 /x] − v1 [v2 /x] by lemma 3.4.3.4 and (λx.u01 ) u02 − v1 [v2 /x] k

by (βs ), from which we conclude: (λx.t1 ) t2 (λx.u01 ) u02

u1 [u2 /x] v1 [v2 /x] – If the last rule of the derivation of t − u is

t1 − u1 t2 − u2 k (βa ) t1 t2 − u1 u2 k

k

the derivation of t − u0 ends either with (βs ) or (βa ) and both cases are handled similarly as above. – If the last rule of the derivation of t − u is t1 − u1 k (β ) λx.t1 − λx.u1 λ we can reason similarly as above. From this follows easily the confluence property of the relation − in two steps: ∗

Lemma 3.4.3.6. Suppose that t − u and t − u0 . Then there exists v such ∗ ∗ that u − v and u0 − v: t ∗

u0

u ∗

v

CHAPTER 3. PURE λ-CALCULUS

135 ∗

Proof. By recurrence on the length of the upper-right reduction t − u0 , using lemma 3.4.3.5. ∗



Theorem 3.4.3.7 (Confluence). Suppose that t − u and t − u0 . Then there ∗ ∗ exists v such that u − v and u0 − v: ∗

t



u0

u ∗



v



Proof. By recurrence on the length of the upper-left reduction t − u, using lemma 3.4.3.6. 3.4.4 Confluence and the Church-Rosser theorem. As a consequence of the above lemmas, we can finally deduce the confluence property of λ-calculus, first proved by Church and Rosser [CR36], the proof presented here being due to Tait and Martin-Löf: ∗

Theorem 3.4.4.1 (Confluence). The β-reduction is confluent: if t −→β u1 and ∗ ∗ ∗ t −→β u2 then there exists v such that u1 −→β v and u2 −→β v: ∗

t



u0

u ∗



v





Proof. Suppose that t −→β u1 and t −→β u2 . By lemma 3.4.3.3, we have ∗



t − u1 and t − u2 . From theorem 3.4.3.7, we deduce the existence of v such ∗ ∗ ∗ that u1 − v and u2 − v and, by lemma 3.4.3.3 again, we have u1 −→β v ∗ and u2 −→β v. This implies the following theorem, sometimes called the Church-Rosser property of λ-calculus: Theorem 3.4.4.2 (Church-Rosser). Given two terms t and u such that t ===β u, ∗ ∗ there exists a term v such that t −→β v and u −→β v: ∗

t ∗

v

u ∗

Proof. By definition of β-equivalence, see section 3.2.7, there is n ∈ N and terms ti for 0 6 i 6 2n such that ∗











t = t0β ←− t1 −→β t2β ←− t3 −→β . . . β ←− t2n−1 −→β t2n = u

CHAPTER 3. PURE λ-CALCULUS

136

We show the result by induction on n. For n = 0, the result is immediate. Otherwise, we can complete the diagram as follows: ∗

t = t0

t1

v0

t3

t2

(c) ∗





t2n−1





...



t2n = u



(ih) ∗



v

where (c) is obtained by theorem 3.4.4.1 and (ih) by induction hypothesis. One of the most important consequences is that a λ-term cannot reduce to two distinct normal forms: if the computation terminates then its result is uniquely defined. Proposition 3.4.4.3. If t and u are two β-equivalent terms in normal forms then t = u. ∗



Proof. By theorem 3.4.4.2, there exists v such that t −→β v and u −→β v. Since t and u are normal forms, they cannot reduce and thus t = v = u. Another byproduct is the so-called consistency of λ-calculus which states that the β-equivalence relation does not identify all terms: Theorem 3.4.4.4 (Consistency). There are terms which are not β-equivalent. Proof. The terms λxy.x and λxy.y are normal forms. If they where equivalent they would be equal by previous proposition, which is not the case.

3.5 Implementing reduction 3.5.1 Reduction strategies. We have seen that a λ-term can reduce in many ways, but in practice people implement a particular deterministic way of choosing reductions to perform: this is called a reduction strategy. This is the case for OCaml and is easily observed by inserting prints. For instance, the program let p = print_endline let _ = (p "a"; (fun x y -> p "b"; x + y)) (p "c"; 2) (p "d"; 3) will always print dcab in the toplevel. We shall now try to look at the options we have here, in order to chose a strategy. A first question we have to answer is: should we reduce functions or arguments first? Namely, consider a term of the form (λx.t)u such that u reduces to u0 , we have two possible ways of reducing it: t[u/x]β ←− (λx.t)u −→β (λx.t)u0 which correspond to reducing functions or arguments first, giving rise to strategies which are respectively called call-by-name and call-by-value. The call-byvalue has a tendency to be more efficient: even if the argument is used multiple times in the function, we reduce it only once beforehand, whereas the call-by∗ name strategy reduces it each time it is used. For instance, if u −→β u ˆ, where u ˆ is a normal form, we have the following sequences of reductions:

CHAPTER 3. PURE λ-CALCULUS

137



– in call-by-value: (λx.f xx)u −→β (λx.f xx)ˆ u −→β f u ˆu ˆ, ∗



– in call-by-name: (λx.f xx)u −→β f uu −→β f u ˆu −→β f u ˆu ˆ. The function λx.f xx uses its argument twice and therefore we have to reduce u twice in the second case compared to only once in the first (and this can make a huge difference if the argument is used much more than twice or if the reduction of u requires many steps). However, there is a case where the call-by-value strategy is inefficient: when the argument is not used in the function. Namely, we always reduce the argument, even if it is not used afterwards. For instance, we have the following sequences of reductions: ∗

– in call-by-value: (λx.y)u −→β (λx.y)ˆ u −→β y – in call-by-name: (λx.y)u −→β y Otherwise said, we have already observed in section 3.2.3 that β-reduction can duplicate and erase β-redexes. The call-by-value strategy is optimized for duplication and the call-by-name strategy is optimized for erasure. In practice, people often write programs where they use a result multiple times and rarely discard the result of computations, so that call-by-value strategies are generally implemented (this is for instance the case in OCaml). However, for theoretical purposes call-by-value strategies can be a problem: it might happen that a term has a normal form and that this strategy does not find it. Namely, consider the term (λx.y)Ω A call-by-value strategy will first try to compute the normal form for Ω and thus loop, whereas a call-by-name strategy will directly reduce it to y. A strategy is called normalizing when it will reach a normal form whenever a term has one: we have seen that call-by-value does not have this property. Orders on redexes. In more precise terms, to define a reduction strategy, we have to chose the order in which we will reduce the redexes. Two partial orders can be defined on redexes: – the imbrication order: a redex is inside another redex when it is a subterm of it, i.e. the redexes of t or of u are inside the redex (λx.t)u – the horizontal order: in a subterm tu every redex in t is on the left of every redex in u. Any two redexes in a term can be compared with one of those orders; a strategy can thus be specified by which redexes it favors with respect to each of these orders: – a strategy is innermost (resp. outermost) when it begins with redexes which are the most inside (resp. outside),

CHAPTER 3. PURE λ-CALCULUS

138

– a strategy is left (resp. right) when it begins with redexes which are the most on the left (resp. right). For instance, the above examples illustrate the fact that the call-by-value and call-by-name strategies are respectively innermost and outermost. Partial evaluation. Another possibility which is generally offered when defining a reduction strategy is to allow not reducing some terms which are not in normal form. These terms can be thought of as “incomplete” and we are waiting for some more information (e.g. an argument or the value of a free variable) in order to further reduce them. Two such families are mainly considered: – a strategy is weak when it does not reduce abstractions: a term of the form λx.t will never be reduced, even if the term t contains redexes, – a left strategy is head when it does not reduce variables applied to terms: a term of the form x t1 . . . tn will never be reduced, even if some term ti contains redexes. A strategy is full when it is neither weak nor head. The reason for considering weak strategies is that a function is usually thought of as describing the actions to perform once arguments are given and it is therefore natural to delay their execution until we actually know those arguments. For instance, the strategy implemented in OCaml is weak: if it was not the case then the program let f n = print_endline "Incrementing!"; n+1 would always print the message exactly once, even if the function is never called, whereas we expect that the message is printed each time the function is called (which is the case with a weak evaluation strategy). In pure λ-calculus, there is no printing but one thing is easily observed: non-termination. For instance, we want to be able to define a function which loops or not depending on a boolean as follows: λb.if b Ω I This function takes a boolean as argument: if it is true it will return the term Ω whose evaluation is going to loop, otherwise it returns the identity. If we evaluate it with a weak strategy it will behave as expected, whereas if we use a nonweak one, we might evaluate the body of the abstraction and thus loop when reducing Ω, even if we give false as argument to the function. Head reductions mostly make sense for strategies like call-by-name: in a term (λx.t)u, we reduce to t[u/x] even if u is not in normal form because we want to delay the evaluation of u until it is actually used. Now, in a term x u, it might be the case that the free variable x will be replaced by an abstraction later on, and we want therefore to delay the evaluation of u until this is the case. A term which cannot be reduced by a weak or head strategy is not necessarily in normal form in the usual sense. Recall from proposition 3.2.6.1 that λ-terms in normal form can be described by the following grammar: v ::= λx.v | x v1 . . . vn

CHAPTER 3. PURE λ-CALCULUS

139

where v and vi are normal forms. A term is – a weak normal form when it is generated by v ::= λx.t | x v1 . . . vn where t is a term and the vi are weak normal forms, – a head normal form when it is generated by v ::= λx.v | x t1 . . . tn where v is a head normal form and the ti are terms, – a weak head normal form when it is generated by v ::= λx.t | x t1 . . . tn where t and the ti are terms. The terms which cannot be reduced in a weak (resp. head, resp. weak head) strategy are precisely weak (resp. head, resp. weak head) normal forms. Weak normal forms coincide with normal forms for terms which are not abstractions (resp. closed terms): they do the job if we are mostly interested in those terms, which we usually are. However, there are function which are weak (resp. head) normal forms such as λx.I I (resp. x (I I)) and are not normal forms, so that a weak (resp. head) strategy is never normalizing. Summary of strategies. We will detail below four reduction strategies, whose main properties are summarized below. Those strategies are the most wellknown and used ones, but other variants could of course be considered: AO CBV NO CBN

left. X X X X

inner. X X

weak

head

X X

X

norm. X

The columns respectively indicate whether the strategy is leftmost (or rightmost), innermost (or outermost), weak, head and normalizing. Implementing λ-terms. In order to illustrate implementations of reduction strategies in OCaml, we will encode λ-terms using the type type term = | Var of var | App of term * term | Abs of var * term where var is an arbitrary type for identifying variables (in practice, we would choose int or maybe string). We will also need a substitution function, such that subst x t u computes the term u where all occurrences of the variable x have been replaced by the term t:

CHAPTER 3. PURE λ-CALCULUS let | | |

rec Var App Abs let let Abs

140

subst x t = function y -> if x = y then t else Var y (u, v) -> App (subst x t u, subst x t v) (y', u) -> y = fresh () in u = subst y' (Var y) u in (y, subst x t u)

In order to avoid name captures, we always refresh the names of the abstracted variables when substituting under an abstraction (this is correct, but quite inefficient): in order to do so we use a function fresh which generates a new variable name each time it is called, e.g. using an internal counter incremented at each call. For each of the considered strategies below, we will define a function reduce which performs multiple β-reduction steps, in the order specified by the strategy. Call-by-value. The call-by-value strategy (CBV) is by far the most common, this is the one used by OCaml for instance. Its name comes from the fact that it computes the value of the argument of a function before applying the function to the argument. It is defined as the weak leftmost innermost strategy. This means that, given an application t u, 1. we evaluate t until we obtain a term of the form λx.t0 (where t0 is not necessarily a normal form), 2. we then evaluate the argument u to a normal form u ˆ, 3. we then evaluate t0 [ˆ u/x]. The reduction function associated to this strategy can be implemented as follows: let | | |

rec reduce = function Var x -> Var x Abs (x, t) -> Abs (x, t) App (t, u) -> match reduce t with | Abs (x, t') -> subst x (reduce u) t' | t -> App (t, reduce u)

In the case App (t, u), it can be observed that both terms t and u are always reduced, so that taking the rightmost variant of the strategy has little influence. Since it is a weak strategy, it is not normalizing, and normal forms for this strategy will be weak normal forms. The above function does not directly compute the weak normal form: it has to be iterated. For instance, applying it to (λx.xy)(λx.x) will result in (λx.x)y, which further reduces to y. Applicative order. The applicative order strategy (AO) is the leftmost innermost strategy, i.e. the variant of call-by-name where we are allowed to reduce under abstractions.

CHAPTER 3. PURE λ-CALCULUS let | | |

141

rec reduce = function Var x -> Var x Abs (x, t) -> Abs (x, reduce t) App (t, u) -> match reduce t with | Abs (x, t') -> subst x (reduce u) t' | t -> App (t, reduce u)

Normal forms are normal forms in the usual sense. As illustrated above, by the term (λx.y)Ω, this strategy might not terminate even though the term has a normal form, otherwise said the strategy is not normalizing. Call-by-name. The call-by-name strategy (CBN) is the weak head leftmost outermost strategy. Here, arguments are computed at each use and not once for all as in call-by-value strategy. An implementation of the corresponding reduction is let | | |

rec reduce = function Var x -> Var x Abs (x, t) -> Abs (x, t) App (t, u) -> match reduce t with | Abs (x, t') -> subst x u t' | t -> App (t, u)

Iterating this function computes the weak head normal form for a term, which is the appropriate notion of normal form for the strategy. This strategy being weak and head, it is not normalizing. However, it can be shown that if a term has a weak head normal form, this strategy will compute it. Normal order. The normal order strategy (NO) is the leftmost outermost strategy. An implementation is let | | |

rec reduce = function Var x -> Var x Abs (x, t) -> Abs (x, reduce t) App (t, u) -> match reduce_cbn t with | Abs (x, t') -> subst x u t' | t -> App (reduce t, reduce u)

where reduce_cbn is the above call-by-name reduction strategy. Normal forms for this strategy are normal forms and this strategy is actually normalizing: it can be shown that if there is a way to reduce a λ-term to a normal form then this strategy will find it, thus its name: this is a consequence of the so-called standardization theorem in λ-calculus [Bar84, chapters 12 and 13]. Normalization. As indicated earlier, the reduce functions perform multiple reduction steps, in the order specified by the strategy, so that iterating it reduces the term to its normal form following the strategy. A term can thus be normalized according to one of the strategies using the following function:

CHAPTER 3. PURE λ-CALCULUS

142

let rec normalize t = let u = reduce t in if t = u then t else normalize u 3.5.2 Normalization by evaluation. The implementations provided in section 3.5.1 are not really efficient because of one small reason: the substitution function is not efficiently implemented. Doing this efficiently, while properly taking bound variables in account, is actually quite difficult, see section 3.6.2. When using a functional language such as OCaml, the compiler already has support for that and we can use the reduction of the host language in order to implement the β-reduction and compute normal forms. This is called normalization by evaluation: we implement the normalization function using the evaluation of the language. We shall now see how to perform that in practice. Evaluation. We begin by describing our λ-terms as usual: type term = | Var of string | Abs of string * term | App of term * term For convenience, variable names are described by strings. For instance, we can define the looping λ-term Ω = (λx.xx)(λx.xx) by let omega = let o = Abs ("x", App (Var "x", Var "x")) in App (o, o) We are going to evaluate those terms to normal forms, that we call here values. We know from proposition 3.2.6.1 that those normal forms can be characterized as the λ-terms v generated by the grammar v ::= λx.v | x v1 . . . vn The terms of the second form x v1 . . . vn intuitively correspond to computations which are “stuck” because we do not know the function which is to be applied to the arguments: we only have a variable x here and will only be able to perform the reduction when this variable is substituted by an actual abstraction. Those are called neutral values and can be described by the grammar n ::= x | n v where v is a value. With this notation, values can be described by v ::= λx.v | n We can thus describe values as the following datatype: type value = | VAbs of string * value | VNeu of neutral and neutral = | NVar of string | NApp of neutral * value

CHAPTER 3. PURE λ-CALCULUS

143

Now, remember that our idea is to use the evaluation of the language. In order to do so, the trick consists in describing λ-term λx.t not as a pair consisting of the variable x and the term t, but as the function u 7→ t[u/x] which to a term u associates the term t with occurrences of x replaced by u: after all, the only thing we want to be able to perform with the λ-term λx.t is β-reduction! Instead of the above type, we thus actually describe values in this way by type value = | VAbs of (value -> value) | VNeu of neutral and neutral = | NVar of string | NApp of neutral * value We can then implement a function which evaluates a term to a value as follows: let rec eval env = function | Var x -> (try List.assoc x env with Not_found -> VNeu (NVar x)) | Abs (x, t) -> VAbs (fun v -> eval ((x,v)::env) t) | App (t, u) -> vapp (eval env t) (eval env u) and vapp v w = match v with | VAbs f -> f w | VNeu n -> VNeu (NApp (n, w)) The function eval takes as second argument the term to be evaluated (i.e. normalized) and, as first argument, an “environment” env which is a list of pairs (x, v) consisting of a variable x and a value v, such a pair indicating that the free variable x has to be replaced by the value v in the term during the evaluation (initially, this environment will typically be the empty list []). We then evaluate terms as follows: – if the term is a variable x, we try to lookup in the environment if there is a value for it, in which case we return it, and return the variable x otherwise, – if the term is an abstraction λx.t, we return the value which is the function which to a value v associates the evaluation of t in the environment where x is bound to v, – if the term is an application t u, we evaluate t to a value tˆ and u to a value u ˆ; depending on the form of tˆ, we have two cases: – if tˆ = f is a function, we simply apply it to u ˆ, ˆ – if t = x v1 . . . vn , we return x v1 . . . vn u ˆ, this last part being taken care of by the auxiliary function vapp (which applies a value to another).

CHAPTER 3. PURE λ-CALCULUS

144

Finally, the environment is only really used during the evaluation and we define let eval t = eval [] t because from now on we will only use it in the empty environment. You should remark that abstractions are not evaluated right away: we rather construct a function which will evaluate it when an argument is given. In this sense, the function actually computes the weak normal form for the term. This function will not terminate if the β-reduction of the term does not. Typically, the evaluation of Ω will not terminate: let _ = eval omega Readback. We are now pleased because we have a short and efficient implementation of normalization, excepting on one point: we cannot easily print or serialize values because they contain functions. We now explain how we can convert a value back to a term: this procedure is called readback. We will need an infinite countable pool of fresh variables, so that we define the function let fresh i = "x@" ^ string_of_int i which generates the name of the i-th fresh variable, that we call here “x@i”, supposing that initial terms will never contain variable names of this form (say that the user cannot use “@” in a variable name). The readback function can be implemented as follows: let rec readback i v = (* Read back a neutral term. *) let rec neutral = function | NVar x -> Var x | NApp (n, v) -> App (neutral n, readback i v) in match v with | VAbs f -> let x = fresh i in Abs (x, readback (i+1) (f (VNeu (NVar x)))) | VNeu n -> neutral n It takes as argument an integer i (the index of the first fresh variable we have not used yet) and a value v and returns the term corresponding to the value: – if the value is a function f , we return the term λx.t where t = f (x) for some fresh variable x, – otherwise it is of the form x v1 . . . vn and we return x v 1 . . . v n where v i is the term corresponding to the value vi . We can then define function which normalizes a term by evaluating it to a value and reading back the result: let normalize t = readback 0 (eval t) For instance, we can compute the normal form of the λ-term (λxy.x)y, which is λz.y, by

CHAPTER 3. PURE λ-CALCULUS

145

let _ = let t = App (Abs ("x", Abs ("y", Var "x")), Var "y") in normalize t which gives the expected result Abs ("x@0", Var "y") Note that this reduction requires α-converting the abstraction on y, and this was correctly taken care of for us here. Equivalence. Finally, we can test for β-equivalence of two λ-terms by comparing their normal forms (see section 4.2.4): let eq t u = (normalize t) = (normalize u) This is not as obvious as is seems: it also takes care of α-conversion! Namely, the readback function does not “randomly” generate fresh variables, but incrementing the counter i starting from 0 when progressing into the term. Because of this, it canonically renames the variables. For instance, one can check that the functions λx.x and λy.y are equal let () = let id = Abs ("x", Var "x") in let id' = Abs ("y", Var "y") in assert (eq id id') Namely, both identity functions are going to be normalized into the term Abs ("x@0", Var "x@0") The above equality normalizes two terms in order to compare them. It is not as efficient as it could be in the case where the two terms are not equivalent: we might ensure that two terms are not equivalent without fully normalizing them. In order to understand why, first observe the following: Lemma 3.5.2.1. Given a term t, the normal form of a term – λx.t is necessarily of the form λx.tˆ, – x t is necessarily of the form x tˆ where tˆ is the normal form of t. Proof. This follows from the facts that the only possible way to β-reduce a term λx.t (resp. x t) is of the form λx.t −→β λx.t0 (resp. x t −→β x t0 ) by the rule (βλ ) (resp. (βr )). For this reason, we know that two terms of the form λx.t and x u are never β-convertible, no matter what the terms t and u are. In such a situation, there is thus no need to fully normalize the two terms to compare them. More generally, a term x t1 . . . tn is never equivalent to an abstraction. Based on this observation, we can implement the test of β-equivalence as follows:

CHAPTER 3. PURE λ-CALCULUS

146

let eq t u = (* Equality of values *) let rec veq i v w = match v, w with | VAbs f, VAbs g -> let x = VNeu (NVar (fresh i)) in veq (i+1) (f x) (g x) | VNeu m, VNeu n -> neq i m n | _, _ -> false (* Equality of neutral terms *) and neq i m n = match m, n with | NVar x, NVar y -> x = y | NApp (m, v), NApp (n, w) -> neq i m n && veq i v w | _, _ -> false in veq 0 (eval t) (eval u) Given two terms t and u, we reduce them to their weak normal form, i.e. we reduce them until we find abstractions: – if they are of the form λx.t0 and x u1 . . . un (or conversely), we know that they are not equivalent (even though we have not computed the normal form for t), – if they are of the form λx.t0 and λx.u0 then we compare t0 and u0 (which requires evaluating them further) – if they are of the form x t1 . . . tm and y u1 . . . um , where the ti and ui are weak normal forms, then they are equivalent if and only if x = y, m = n and ti = ui for every index i. For instance, this procedure allows ensuring that λx.Ω is not convertible to x: let () = let t = Abs ("x", omega) in assert (not (eq t (Var "x"))) whereas the former equality procedure would loop when comparing the two terms because it tries to fully evaluate λx.Ω.

3.6 Nameless syntaxes It might be difficult to believe at first, but a great source of bugs in software implementing compilers, proof-assistants, proving functional programs, and so on, comes from the incorrect handling of α-conversion. For instance, a naive implementation of substitution is: x[u/x] = u when y 6= x

y[u/x] = y 0

0

(t t )[u/x] = (t[u/x]) (t [u/x]) (λy.t)[u/x] = λy.t[u/x]

CHAPTER 3. PURE λ-CALCULUS

147

The last case is incorrect, because we do not suppose that y 6∈ FV(t): we are substituting x by t under the abstraction λy without taking in account the fact that y might get bound in t in this way. For instance, this implementation would lead to the following sequence of β-reductions (λy.yy)(λf x.f x) −→ (λf x.f x)(λf x.f x) −→ λx.(λf x.f x)x −→ λxx.xx In the second reduction step, there is an erroneous capture of x: a correct normal form for the above term is λxy.xy. There are multiple ways around this, and we have already seen in section 3.5.2 that normalization by evaluation provides a satisfactory answer to this question. However, there are cases where this technique is not an option (e.g. the host language is not functional, or we want to perform more subtle manipulations than simply normalizing terms, or we want to formalize λ-calculus in a proof assistant). We present below some alternative syntaxes for λ-calculus which allow taking care of α-conversion in terms and implement β-reduction correctly and efficiently. 3.6.1 The Barendregt convention. A first idea in order to avoid incorrect captures of variables is to use the so-called Barendregt convention for naming the variables of λ-terms: all variables which are λ-abstracted should be pairwise distinct and distinct from all free variables. Lemma 3.6.1.1. Every term is α-equivalent to one satisfying the Barendregt convention. This convention sometimes simplifies things. For instance, the above naive implementation of β-reduction works on terms satisfying the convention. However, after one β-reduction step, the λ-term is not guaranteed to satisfy the Barendregt convention anymore (see the above example) and it is quite expensive to have to α-convert the whole term at each reduction step in order to enforce the convention. 3.6.2 De Bruijn indices. A more serious solution to this problem is given by de Bruijn indices. The idea is that, in a closed term, every variable is created by a specific abstraction in the term, so that instead of referring to a variable by its name, we can identify it by the abstraction which created it. Moreover, it turns out that there is a very convenient way to refer to an abstraction: the number of abstractions we have to step over when going up in the syntactic tree starting from the variable in order to reach the corresponding abstraction. This number is called the de Bruijn index of the variable. For instance, consider the λ-term λx.x(λy.yx) This lambda term can be graphically represented as a tree where a node labeled “λx” corresponds to an abstraction and a node “@” corresponds to an

CHAPTER 3. PURE λ-CALCULUS

148

application: λx @ x

λy @ y

x

we have also figured in dotted arrows the links between a variable and the abstraction which created it. In the first variables x and y, the abstraction we are referring to is the one immediately above (we have to skip 0 λ’s), whereas in the last occurrence of x, when going up starting from x in the syntactic tree, the corresponding abstraction is not the first one (which is λy) but the second one (we have to skip 1 λ). The information in the λ-term can thus equivalently be represented by λx.0(λy.01) where each variable has been replaced by the number of λ’s we have to skip when going up to reach the corresponding abstraction (note that a given variable, such as x above, can have different indices, depending on its position in the term). Now, the names of the variables do not really matter since we are working modulo α-conversion: we might as well drop them and simply write λ.0(λ.01) This is a very convenient notation because it does not mention variables anymore. What is not entirely clear yet is that we can implement β-reduction in this formalism. We will see that it is indeed possible, but quite subtle and difficult to get right. Terms with de Bruijn indices. We thus consider a variant of the λ-calculus where terms are generated by the grammar t, u ::= i | t u | λ.t where i ∈ N is the de Bruijn index of a variable. Following the preceding remarks, a conversion function of_term from closed λ-terms into terms with de Bruijn indices is provided in figure 3.1. It takes an auxiliary argument l which is the list of variables already declared by abstractions: the de Bruijn index of a variable is then the index of the variable in this list. The preceding function will raise the exception Not_found if the term contains free variables. It is however possible to adapt them to represent terms with free variables using de Bruijn indices. The idea is that we should represent a term t with n free variables FV(t) = {x0 , . . . , xn−1 } as if we were computing the de Bruijn representation of t in the term λxn−1 . . . . λx0 .t, i.e. the free variables are implicitly abstracted. For instance λx.xx0 x2

is represented as

λ.013

CHAPTER 3. PURE λ-CALCULUS

(** Traditional λ-terms. *) type lambda = | LVar of string | LApp of lambda * lambda | LAbs of string * lambda (** De Bruijn λ-terms. *) type deBruijn = | Var of int | App of deBruijn * deBruijn | Abs of deBruijn (** let | |

Index of an element in a list. *) rec index x = function y::l -> if x = y then 0 else 1 + index x l [] -> raise Not_found

(** De Bruijn representation of a closed term. *) let of_term t = let rec aux l = function | LVar x -> Var (index x l) | LApp (t, u) -> App (aux l t, aux l u) | LAbs (x, t) -> Abs (aux (x::l) t) in aux [] t Figure 3.1: Converting λ-terms into de Bruijn representation.

149

CHAPTER 3. PURE λ-CALCULUS

150

In practice, it is also possible (and convenient) to “mix” the two conventions: have names for free variables and de Bruijn indices for bound variables. This is called the locally nameless representation of λ-terms [Cha12]. Reduction. Our goal is now to implement β-reduction in the de Bruijn representation of terms. The rule is, as usual, (λ.t)u −→β t[u/0] meaning that the variable 0 has to be replaced by u in t, where the substitution t[u/0] remains to be defined. We actually need to define the substitution of any variable since, when going under an abstraction, the index of the variable to be substituted is increased by one. For instance, the β-reduction λx.(λy.λz.y) (λt.t) −→β λx.(λz.y)[λt.t/y] = λx.λz.y[λt.t/y] = λx.λz.λt.t i.e. graphically λx

λx

@

λz

λy

λt −→β λt

λz

t

t

y should correspond to the following steps λ.(λ.λ.1) λ.0 −→β λ.(λ.1)[λ.0/0] = λ.λ.1[λ.0/1] = λ.λ.λ.0 We are thus tempted to define substitution by i[u/i] = u for j 6= i

j[u/i] = j 0

0

(t t )[u/i] = (t[u/i]) (t [u/i]) (λ.t)[u/i] = λ.t[u/i + 1] But it is incorrect because, in the last case, u might contain free variables, which refer to above abstractions, and have to be increased by 1 when going under the abstraction. For instance, λx.(λy.λz.y) x −→β λx.(λz.y)[x/y] = λx.λz.y[x/y] = λx.λz.x

CHAPTER 3. PURE λ-CALCULUS

151

i.e. graphically λx

λx

@

λz x −→β

λy

x

λz y will give rise to λ.(λ.λ.1) 0 −→β λ.(λ.1)[0/0] = λ.λ.1[1/1] = λ.λ.1 The morale is that the last case of substitution should actually be (λ.t)[u/i] = λ.t[u0 /i + 1]

(3.3)

where u0 is the term obtained from u by increasing by 1 all free variables (and leaving other variables untouched), what we will write u0 = ↑0 u in the following. The “corrected version” with (3.3) still contains a bug, which comes from the fact that β-reduction removes an abstraction, and therefore the indices of variables in t referring to the variables abstracted above the removed abstraction have to be decreased by 1. For instance λx.(λy.x) (λt.t) −→β λx.x[λt.t/y] = λx.x i.e. graphically λx

λx

@

−→β

λy

λt

x

t

x

corresponds to λ.(λ.1) (λ.0) −→β λ.1[λ.0/0] = 0 This means that we should also correct the second case of substitution in order to decrease the index of variables which were free in the original substitution. And now we have it right. In order to distinguish between bound and free variables in a term, it will be convenient to maintain an index l, called the cutoff level, such that the indices strictly below l correspond to bound variables and those above are free variables. We thus first define a function ↑l such that ↑l t is the term obtained from t by

CHAPTER 3. PURE λ-CALCULUS

152

increasing by one all variables with index i > l, called the lifting of t at level l. By induction, ( i if i < l ↑l i = i + 1 if i > l ↑l (t u) = (↑l t) (↑l u) ↑l (λ.t) = λ.(↑l+1 t) The right way to think about it is that ↑l t is the term obtained from t by adding a “new variable” of index l: the variables of index i > l have to be increased by 1 in order to make room for the new variable. Similarly, we can define a function ↓l such that, for every term t which does not contain the variable l, ↓l t is the term obtained by removing the variable l (the unlifting of t): all variables of index i > l have to be decreased by one. It turns out that we will only need it when t is a variable so that we define ( i − 1 if i > l ↓l i = i if i < l (it is not defined when i = l). With those at hand, we can finally correctly define substitution: Definition 3.6.2.1 (Substitution). Given terms t and u and variable i, we define the substitution of i by u in t t[u/i] by induction by i[u/i] = u j[u/i] = ↓i j (t t )[u/i] = (t[u/i]) (t0 [u/i])

for j 6= i

0

(λ.t)[u/i] = λ.t[↑0 u/i + 1] As indicated above, β-reduction can then be implemented with the rule (λ.t)u −→β t[u/0] An implementation of call-by-value β-reduction (see section 3.5.1) on λ-terms in de Bruijn representation is given in figure 3.2. 3.6.3 Combinatory logic. Combinatory logic introduced by Schönfinkel [Sch24], and further studied by Curry [Cur30, CF58], is another possible representation of λ-terms which does not need to use variable binding or α-conversion: in this syntax, there is simply no need for variables. Introductory references on the subject are [Bar84, chapter 7] and [Sel02]. Our starting point is the following question: is there a small number of “basic” λ-terms such that every λ-term can be obtained (up to β-equivalence) by applying the basic λ-terms one to the other. This would mean that all the abstractions we need in λ-terms can somehow be generated from those contained in the basic λ-terms. It turns out that we only need three basic λ-terms, which somehow encode some possible manipulations of variables:

CHAPTER 3. PURE λ-CALCULUS

(** Lambda terms. *) type term = | Var of int | App of term * term | Abs of term (** let | | |

Lift a term at l. *) rec lift l = function Var i -> if i < l then Var i else Var (i + 1) App (t, u) -> App (lift l t, lift l u) Abs t -> Abs (lift (l+1) t)

(** Unlift a variable i at l. *) let unlift l i = assert (l i); if i < l then i else i-1 (** let | | |

Substitute variable i for u in t. *) rec sub i u = function Var j -> if j = i then u else Var (unlift i j) App (t, t') -> App (sub i u t, sub i u t') Abs t -> Abs (sub (i+1) (lift 0 u) t)

(** let | | |

Call-by-value reduction. *) rec reduce = function Var i -> Var i Abs t -> Abs t App (t, u) -> match reduce t with | Abs t' -> sub 0 (reduce u) t' | t -> App (t, reduce u) Figure 3.2: Normalization of λ-term using de Bruijn indices.

153

CHAPTER 3. PURE λ-CALCULUS

154

– I = λx.x corresponds to using a variable, – S = λxyz.(xz)(yz) corresponds to duplicating a variable, – K = λxy.x corresponds to erasing a variable. Namely, it can be observed that the last abstracted variable of those terms is used, duplicated and erased respectively. As surprising as it seems as first, we can actually obtain any λ-term by application of those only. For instance, the term λxy.yy can be obtained as K ((S I) I): you can check that ∗

K ((S I) I) −→β λxy.yy Moreover, we can give the β-reduction rules directly for the basic terms: given any terms t, u, and v, we have I t −→β t

S t u v −→β (t u)(t v)

K t u −→β t

These are the only possible reductions for terms made of basic terms, and we have thus described β-reduction without using variables. This motivates the study, in the following of terms constructed from those, with the above rules as reduction. Definition. The terms of combinatory logic are generated by variables, application and the three above constants. Formally, they are generated by the grammar T, U ::= x | T U | S | K | I where x is a variable, T and U are terms in combinatory logic and S, K and I are constants. The reduction rules are S T U V −→ (T V ) (U V )

K T U −→ T

T −→ T 0 T U −→ T 0 U

U −→ U 0 T U −→ T U 0

I T −→ T

We implicitly bracket application on the left, i.e. T U V is read as (T U ) V . As ∗ usual, we write −→ for the reflexive and transitive closure of the relation −→, ∗ and ←→ for it reflexive, symmetric and transitive closure. A normal form is a term which does not reduce. We write FV(T ) for the set of variables of a term T . A combinator is a term without variables. Implementation. In OCaml, the terms of combinatory logic can be described by the type type term = | Var of var | App of term * term | S | K | I The leftmost outermost reduction strategy (see section 3.5.1) can be shown to be normalizing: if a term admits a normal form then this strategy will reach it (we will see in theorem 3.6.3.3 that this normal form is necessarily unique). In OCaml, it can be implemented as follows:

CHAPTER 3. PURE λ-CALCULUS

155

let rec normalize t = match t with | Var _ | S | K | I -> t | App (t, v) -> match normalize t with | I -> v | App (K, t) -> normalize t | App (App (S, t), u) -> normalize (App (App (t, v), App (u, v))) | t -> App (t, normalize v) An alternative, more elegant and efficient, implementation of this normalization procedure can be achieved by taking an additional argument enc, which is a list of arguments the current term is applied to, which is sometimes called Krivine’s trick: compared to the previous implementation, we avoid normalizing multiple times the same term. let rec normalize t env = match t, env with | App (t, u), _ -> normalize t (u::env) | I, t::env -> normalize t env | K, t::u::env -> normalize t env | S, t::u::v::env -> normalize t (v::(App(u,v))::env) | t, env -> (* apply to normalized arguments *) List.fold_left (fun t u -> App (t, normalize u [])) t env Example 3.6.3.1. Consider the combinator S K K. It satisfies, for any term T , S K K T −→ K T (K T ) −→ T Therefore the combinatory I is superfluous in our system, since it can be implemented as I = SKK which means that we could have restricted ourselves to the two combinators S and K only. Example 3.6.3.2. The term (S I I) (S I I) leads to an infinite sequence of reductions: ∗

(S I I) (S I I) −→ (S I I) (I (S I I)) −→ (S I I) (S I I) −→ · · · ∗

Theorem 3.6.3.3. The reduction relation −→ is confluent. Proof. This can be shown as in section 3.4, by introducing a notion of parallel reduction and showing that it has the diamond property. Abstraction. We can simulate abstractions in combinatory logic as follows. Given a variable x and a term T , we define a new term Λx.T by Λx.x = I Λx.T = K T Λx.(T U ) = S (Λx.T ) (Λx.U )

if x 6∈ FV(T ), otherwise.

CHAPTER 3. PURE λ-CALCULUS

156

Example 3.6.3.4. We have Λx.Λy.x = S (K K) I Note that the term on the right is a normal form (in particular, it does not reduce to K). Given terms T, U , we write T [U/x] for the term T where the variable x has been replaced by U . Lemma 3.6.3.5. For any terms T, U and variable x, we have ∗

(Λx.T ) U −→ T [U/x] Proof. By induction on T . Translation. We can now define translations between λ-terms and combinatory terms: – we translate a λ-term t as a combinatory term JtKcl , – we translate a combinatory term T as a λ-term JT Kλ . These transformations are defined inductively as follows: JxKcl = x

Jt uKcl = JtKcl JuKcl

Jλx.tKcl = Λx.JtKcl

JxKλ = x

JT U Kλ = JT Kλ JU Kλ

JSKλ = λxyz.(xz)(yz)

JKKλ = λxy.x JIKλ = λx.x

Example 3.6.3.6. For instance, we have the following translations of λ-terms: Jλxy.xKcl = S (K K) I

Jλxy.yyKcl = K (S I I)

and of combinatory term: JS (K K) IKλ = (λxyz.(xz)(yz))((λxy.x)(λxy.x))(λx.x) ∗



Lemma 3.6.3.7. For any terms T, U , if T −→ U then JT Kλ −→β JU Kλ . Proof. By induction on T . The reduction of combinatory terms can be simulated in λ-calculus: ∗

Lemma 3.6.3.8. For any term T , JΛx.T Kλ −→β λx.JT Kλ . Proof. By induction on T . Translating a λ-term back and forth has no effect up to β-equivalence: ∗

Lemma 3.6.3.9. For any λ-term t, JJtKcl Kλ −→β t. Proof. By induction on t, using previous lemma in the case of abstraction.

CHAPTER 3. PURE λ-CALCULUS

157

The previous theorem, together with lemma 3.6.3.7, can be seen as the fact that combinatory logic embeds into λ-calculus (modulo β-reduction). It also implies that the basic combinators S, K and I can be thought of as a “basis” from which all the λ-terms can be generated: Corollary 3.6.3.10. Every closed λ-term is β-equivalent to one obtained from S, K and I by application. This dictionary between λ-calculus and combinatory logic unfortunately has a number of minor defects. First, it is not true that ∗

t −→β u



JtKcl −→ JuKcl

implies

For instance, we have Jλx.(λy.y) xKcl = S (K I) I

Jλx.xKcl = I

(3.4)

where both combinatory terms are normal forms. If we try to go through the induction, the problem comes from the fact that β-reduction satisfies the rule on the left below, often called (ξ), whereas the corresponding principle on the right is not valid in combinatory logic: T −→ T 0 Λx.T −→ Λx.T 0

t −→β t0 (ξ) λx.t −→β λx.t0

as the above example illustrates. Intuitively, this is due to the fact that we have not yet provided enough arguments to the terms. Namely, if we apply both terms of (3.4) to an arbitrary term T , we obtain the same result: S (K I) I T −→ K I T (I T ) −→ I (I T ) −→ I T −→ T

and

I T −→ T

In general, it can be shown that ∗

t −→β u

implies



JtKcl T1 . . . Tn −→ JuKcl T1 . . . Tn

for every terms Ti , provided that n is a large enough natural number depending on t and u. It is also not true that the translation of a combinatory term in normal form is a normal λ-term: JK xKcl = (λxy.x) x −→β λy. x Again, the term K x is intuitively a normal form only because it is not applied to enough arguments. Finally, given a term T , the terms JJT Kλ Kcl and T are not convertible in general. For instance JJKKλ Kcl = Jλxy.xKcl = S (K K) I 6= K Both terms are normal forms and if they were convertible, they would reduce to a common term by theorem 3.6.3.3. This is again due to the lack of arguments: for every term T , we have ∗

S (K K) I T −→ K T Two combinatory terms T and T 0 are extensionally equivalent, when for every ∗ term U , we have T U ←→ T 0 U . It can be shown that combinatory terms modulo reduction and extensional equivalence are in bijection with λ-terms modulo β and η, via the translations we have defined.

CHAPTER 3. PURE λ-CALCULUS

158

Iota. We have seen in example 3.6.3.1 that the combinatory I is superfluous, so that the two combinators S and K are sufficient. Can we remove another combinator? With S and K we cannot. We can however come up with one combinator which subsumes both S and K: if we define the λ-term ι = λx.x S K we have I = ιι

K = ι (ι (ι ι))

S = ι (ι (ι (ι ι)))

We can therefore base combinatory logic on the only combinator ι, the reduction rule being ι T −→ T S K = T (ι (ι (ι (ι ι)))) (ι (ι (ι ι))) In the sense described above, any λ-term can thus be encoded as a combinator based on ι, i.e. as a term generated by the grammar t, u ::= ι | t u Any λ-term t can thus be encoded as a binary word [t] defined by [ι] = 1 so that ι (ι (ι ι)) is encoded as 0101011.

[t u] = 0[t][u]

Chapter 4

Simply typed λ-calculus If λ-calculus introduced in chapter 3 can be seen as the functional core of a programming language, the simply typed λ-calculus studied in this chapter is the core of a typed programming language. It will allow us to make formal the title of the book: we will see that a type can be seen as a formula and a typable λ-term corresponds precisely to a proof of its type. This is the so-called CurryHoward correspondence which is at the heart of this course. From a historical point of view, this calculus was introduced by Church in the 40s [Chu40], in order to provide a foundation of logics and mathematics. Good further reading material on the subject include [Pie02, SU06]. We introduce types for λ-calculus in section 4.1, show that typable terms are terminating in section 4.2, extend typing to other types constructors than arrows in section 4.3, discuss the variant where abstracted variables are not typed in section 4.4, discuss the relationship between Hilbert calculus and combinators in section 4.5, and finally present extensions to classical logic in section 4.6.

4.1 Typing 4.1.1 Types. A simple type is an expression made of variables and arrows. Those are generated by the grammar A, B ::= X | A → B A simple type is thus either – a type variable X, – an arrow type A → B read as the type of functions from A to B (which are themselves simple types). By convention arrows are implicitly bracketed on the right: A → B → C is read as A → (B → C). 4.1.2 Contexts. A context Γ = x 1 : A1 , . . . , x n : An is a list of pairs consisting of a variable xi (in the sense of λ-calculus, see section 3.1.1) and a type Ai . A context is thus either the empty context or of the form Γ, x : A for some context Γ, which is useful to reason by induction on contexts. The domain dom(Γ) of the context Γ is the set of variables occurring in it: dom(Γ) = {x1 , . . . , xn } Given a variable x ∈ dom(Γ), we sometimes write Γ(x) for the type associated with it. Here, we do not require that in a context all the variables xi are

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

160

distinct: to be precise Γ(x) is the rightmost pair x : A occurring in Γ, which can be defined by induction by (Γ, x : A)(x) = A

(Γ, y : A)(x) = Γ(x)

for y 6= x. 4.1.3 λ-terms. We are going to consider a small variation of λ-terms: we suppose that all λ-abstractions specify the type of the abstracted variable. The syntax for terms is thus t, u ::= x | t u | λxA .t where x is a variable, t and u are terms, and A is a type. An abstraction λxA .t should be read as a function taking an argument x of type A and returning t. Church vs Curry style for λ-terms. The above convention, where abstractions are typed, is called Church style λ-terms. We will see that adopting it greatly simplifies the questions one is usually interested in for those terms (such as type checking, see section 4.1.6), at the cost of requiring small annotations from the user (the type of the abstractions). A variant of the theory where abstractions are not typed can also be developed and is called Curry style, see section 4.4. This is for instance the convention used in OCaml: one would typically write let f = fun x -> x although the Church style is also supported, i.e. we can also write let f = fun (x:int) -> x 4.1.4 Typing. A sequent is a triple noted Γ`t:A

(4.1)

consisting of context Γ, a λ-term t and a type A. A term t has type A in a context Γ when the sequent (4.1) is derivable using the three rules of figure 4.1 where, in the rule (ax), we suppose x ∈ dom(Γ) satisfied as a side condition. Those rules can be read as follows: – (ax): in an environment where x is of type A, we know that x is of type A, – (→I ): if, supposing x is of type A, t is of type B, then the function λx.t which to x associates t is of type A → B, – (→E ): given a function t of type A → B and an argument u of type A, the result of the application t u is of type B. We simply say that the term t has type A if it is so in the empty context. A derivation in this system is sometimes called a typing derivation. A term t is typable when it has some type A in some context Γ.

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

Γ ` x : Γ(x)

161

(ax)

Γ, x : A ` t : B Γ ` λxA .t : A → B

(→I )

Γ`t:A→B Γ`u:A (→E ) Γ ` tu : B Figure 4.1: Typing rules of simply-typed λ-calculus Example 4.1.4.1. The term λf A→A .λxA .f (f x) has type (A → A) → A → A Namely, we have the typing derivation

(ax)

(ax)

Γ`f :A→A Γ`x:A (ax) (→E ) Γ`f :A→A Γ ` fx : A (→E ) f : A → A, x : A ` f (f x) : A f : A → A ` λxA .f (f x) : A → A

` λf A→A .λxA .f (f x) : (A → A) → A → A

(→I ) (→I )

with Γ = f : A → A, x : A Remark 4.1.4.2. Although this will mostly remain implicit in the following, we consider sequents up to α-conversion: this means that, in a sequent Γ ` t : A, we can change a variable x into y both in Γ and in t at the same time, provided that y 6∈ dom(Γ). Because of this, we can always assume that all the variables are distinct in the contexts we consider. This assumption is sometimes useful to reason about proofs, e.g. with this convention, the axiom rule is equivalent to Γ, x : A, Γ0 ` x : A

(ax)

We do however feel bad about systematically assuming this because, in practice, implementations of logical or typing systems do not maintain this invariant. 4.1.5 Basic properties of the typing system. We state here some basic properties of the typing system, which will be used in the sequel. First, the following variant of the structural rules (see section 2.2.10) hold.

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

162

Lemma 4.1.5.1 (Weakening rule). The weakening rule is admissible Γ, Γ0 ` t : B (wk) Γ, x : A, Γ0 ` t : B provided that x 6∈ dom(Γ). Proof. By induction on the proof of Γ, Γ0 ` t : B. The case of axiom rules uses the fact that we can suppose that x 6∈ dom(Γ), since we are considering sequents up to α-conversion, see remark 4.1.4.2. Lemma 4.1.5.2 (Exchange rule). The exchange rule is admissible Γ, x : A, y : B, Γ0 ` t : C (xch) Γ, y : B, x : A, Γ0 ` t : C provided that x 6= y. Proof. By induction on the proof of the premise. Lemma 4.1.5.3 (Contraction rule). The contraction rule is admissible: Γ, x : A, y : A, Γ0 ` t : B (contr) Γ, x : A, Γ0 ` t[x/y] : B Proof. By induction on the proof of the premise. All the free variables of a typable term are bound in the context: Lemma 4.1.5.4. Given a derivable sequent Γ ` t : A, we have FV(t) ⊆ dom(Γ). Proof. By induction on the proof of the sequent. In particular, a term t typable in the empty context is necessarily closed, i.e. FV(t) = ∅. Conversely, a variable which does not occur in the term can be removed: Lemma 4.1.5.5. Given a derivable sequent Γ, x : A, Γ0 ` t : A with x 6∈ FV(t), the sequent Γ, Γ0 ` t : A is also derivable. Proof. By induction on the proof of the sequent. 4.1.6 Type checking, type inference and typability. The three most important algorithmic questions when considering a typing system are the following ones. – The type checking problem consists, given a context Γ, a term t and a type A, in deciding whether t has type A in context Γ. – The type inference problem consists, given a context Γ and a type t which is typable in the context Γ, in finding a type A such that t has type A in context Γ. – The typability problem consist, given a context Γ and a term t, in deciding whether t admits a type in this context.

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

163

In simply-typed λ-calculus all those three problems are very easy: they can be answered in linear time over the size of the term t (neglecting the size of Γ): Theorem 4.1.6.1 (Uniqueness of typing). Given a context Γ and a term t there is at most one type A such that t has type A in the context Γ and at most one derivation of Γ ` t : A. Proof. By induction on the term t. We have the following cases depending on its shape: – if the term is of the form x then it is typable iff x ∈ dom(Γ) and in this case the typing derivation is Γ`x:A

(ax)

with A = Γ(x), – if the term is of the form t u then it is typable iff both t and u are typable in Γ, with respective types of the form A → B and A, and in this case the typing derivation is .. .. . . Γ`t:A→B Γ`u:A (→E ) Γ ` tu : B – if the term is of the form λxA .t then it is typable iff t is typable in context Γ, x : A with some type B, and in this case the typing derivation is .. . Γ, x : A ` B Γ ` λxA .t : A → B

(→I )

This concludes the proof. The above theorem allows one to speak of “the” type and “the” typing derivation of a typable term. Moreover, its proof is constructive, in the sense that it allows to explicitly construct the type of a term when it exists (i.e. perform type inference) and determine that the type admits no type otherwise (i.e. perform typability), by induction on the type of the term. Since a term admits a unique type, the type checking problem can be reduced to type inference: in a given context, a term t admits a type A if and only if the type inferred for t is A. Implementation. An implementation is provided in figure 4.2. – The function infer infers the type of a given term in a given context env, which is a list of pairs consisting of a variable and a type, encoding the typing context Γ (in reversed order). Depending on whether the term is a variable, an abstraction or an application, the function will recursively look for proofs, as indicated by the rules (ax), (→I ) and (→E ) respectively. The function raises the exception Not_found when no such type exists.

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

164

– The function check performs type checking: given an environment env, a term t and a type a, it returns () if the term admits the given type and raises Not_found otherwise. The implementation of this function corresponds to the proof of theorem 4.1.6.1. – The function typable determines whether a terms admits a type or not in a given environment.

4.1.7 The Curry-Howard correspondence. The presentation and naming of the rules of section 4.1.4 is intended to make it clear the relation with logic: if we erase the term annotations and replace → by ⇒, we obtain precisely the rules of the implicational fragment of intuitionistic logic, see section 2.2.6. This parallel between the typing rules (on the left) and the rules in natural deduction (on the right) is shown in the table below:

Γ, x : A, Γ0 ` x : A Γ, x : A ` t : B Γ ` λxA .t : A → B

(ax)

(→I )

Γ`t:A→B Γ`u:A (→E ) Γ ` tu : B

Γ, A, Γ0 ` A

(ax)

Γ, A ` B (⇒I ) Γ`A⇒B Γ`A⇒B Γ`B

Γ`A

(⇒E )

If we start from a typing derivation, we thus obtain a derivation in NJ by erasing the terms, i.e. replacing a rule on the left column above by the corresponding rule on the right column: this process is called here the term erasing procedure. Abstractions thus correspond to introduction rules of ⇒, applications to elimination rules of ⇒, and variables to axiom rules. In fact, the relationship between typable terms and proofs in NJ is very tight: this is known as the Curry-Howard correspondence, also called proofs-as-programs correspondence or propositions-as-types correspondence. It was first explicitly made by Howard in notes which circulated started from 1969 and were ultimately published a decade after [How80]. The name of Curry is due to his closely related discovery of the correspondence between Hilbert calculus and combinatory logic [CF58] in the late 50s, detailed in section 4.5. We recall that natural deduction [Gen35] and simply typed λ-calculus [Chu40] were introduced in 1935 and 1940: this correspondence might look like an obvious fact once the concepts properly elaborated and the right notations set up, but it took 30 years to get there. Theorem 4.1.7.1 (Curry-Howard correspondence). Given a context Γ and a type A, the term erasing procedure induces a one-to-one correspondence between (i) λ-terms of type A in the context Γ, and (ii) proofs in the implicational fragment of NJ of Γ ` A. Proof. Suppose given a proof π of a sequent. We construct a term t having this proof as typing derivation by induction on the proof:

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

type var = string (** Types. *) type ty = | TVar of string | Arr of ty * ty (** Terms. *) type term = | Var of var | App of term * term | Abs of var * ty * term exception Type_error (** Type inference. *) let rec infer env = function | Var x -> (try List.assoc x env with Not_found -> raise Type_error) | Abs (x, a, t) -> Arr (a, infer ((x,a)::env) t) | App (t, u) -> match infer env t with | Arr (a, b) -> check env u a; b | _ -> raise Type_error (** Type checking. *) and check env t a = if infer env t a then raise Type_error (** Typability. *) let typable env t = try let _ = infer env t in true with Type_error -> false Figure 4.2: Type checking, type inference and typability.

165

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

166

– if the proof is of the form Γ, A, Γ0 ` A

(ax)

then necessarily the corresponding typing derivation is Γ, x : A, Γ0 ` x : A

(ax)

– if the proof is of the form π Γ`A⇒B Γ`B

π0 Γ`A

(⇒E )

then by induction hypothesis we have a terms t and u with typing derivations .. .. . . Γ`t:A→B Γ`u:A and necessarily the typing derivation is .. .. . . Γ`t:A→B Γ`u:A (→I ) Γ ` tu : B – if the proof is of the form π Γ, A ` B (⇒I ) Γ`A⇒B then by induction hypothesis we have a typing derivation .. . Γ, x : A ` t : B and necessarily the typing derivation is of the form .. . Γ, x : A ` t : B Γ ` λxA .t : A → B

(→I )

Conversely, given a term of type A in the context Γ, theorem 4.1.6.1 ensures that there is at most one type derivation for it, and erasing it provides a proof of Γ ` A. Finally, it can be shown that both translations establish a bijective correspondence. In the light of the previous theorem, typable λ-terms can be thought of as witnesses for proofs.

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

167

Remark 4.1.7.2 (Contexts as sets). The two λ-terms λxA .λy A .x and λxA .λy A .y both have the type A → A → A: x : A, y : A ` x : A

(ax)

x : A ` λy A .x : A → A

x : A, y : A ` y : A

(→I )

` λxA .λy A .x : A → A → A

(→I )

(ax)

x : A ` λy A .y : A → A

(→I )

` λxA .λy A .y : A → A → A

(→I )

and they are clearly different (they respectively correspond to the first and the second projection). This sheds a new light on our remark of section 2.2.10, stating that contexts should be lists and not sets in proof systems. If we handled them as sets, we would not be able to distinguish them since both would correspond, via the “Curry-Howard correspondence”, to the proof (ax)

A`A (⇒I ) A`A⇒A (⇒I ) `A⇒A⇒A In other words, it is important, in axiom rules, to know exactly which hypothesis we are using in the context when there are two of the same type. Remark 4.1.7.3 (Equivalence vs isomorphism). In the same vein as previous remark, there is a difference between equivalence and isomorphism in type theory. For instance, we have an equivalence (A ⇒ A ⇒ B) ⇔ (A ⇒ B) but the types A ⇒ A ⇒ B and A ⇒ B are not isomorphic. The equivalence amounts to having terms corresponding to both implications of the equivalence: t : (A → A → B) → (A → B)

u : (A → B) → (A → A → B)

Here, we can take t = λf A→A→B .λxA .f x x

u = λf A→B .λxA .λy A .f x

Such a pair of terms is an isomorphism when both composites are (βη-equivalent to) the identity: λf A→B .t (u f ) ===βη λf A→B .f λf A→A→B .u (t f ) ===βη λf A→A→B .f In the above example, the first equality does hold, but not the second since λf A→A→B .u (t f ) = λf A→A→B .λxA .f x x =6==βη λf A→A→B .f 4.1.8 Subject reduction. An important property, relating typing and β-reduction in the λ-calculus is the subject reduction property, already encountered in theorem 1.4.3.2: typing does not change during evaluation, by which we mean here β-reduction. We first need an auxiliary lemma:

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

168

Lemma 4.1.8.1 (Substitution lemma). Suppose that we have a typing derivation of Γ, x : A, Γ0 ` t : B and Γ, Γ0 ` u : A then we have a typing derivation of Γ, Γ0 ` t[u/x] : B. Otherwise said, the rule Γ, x : A, Γ0 ` t : B Γ, Γ0 ` u : A Γ, Γ0 ` t[u/x] : B is admissible. Proof. By induction on the typing derivation of Γ, x : A, Γ0 ` t : B. – If it is of the form Γ, x : A, Γ0 ` x : A

(ax)

then we conclude with the proof of Γ, Γ0 ` u : A in the hypothesis. – If it is of the form Γ, x : A, Γ0 ` y : B

(ax)

where x 6= y and y : B occurs in Γ or Γ0 , then we conclude with Γ, Γ0 ` y : B

(ax)

– If it is of the form π1 π2 Γ, x : A, Γ0 ` t : B ⇒ C Γ, x : A, Γ0 ` t0 : B (⇒E ) Γ, x : A, Γ0 ` t t0 : C then we conclude with π10 π20 Γ, Γ0 ` t[u/x] : B ⇒ C Γ, x : A, Γ0 ` t0 [u/x] : B (⇒E ) Γ, Γ0 ` (t[u/x]) (t0 [u/x]) : C where π10 and π20 are respectively obtained from π1 and π2 by induction hypothesis. – If it is of the form π Γ, x : A, Γ , y : B ` t : C (⇒I ) Γ, x : A, Γ0 ` λy.t : B ⇒ C 0

then we conclude with π0 Γ, Γ , y : B ` t[u/x] : C (⇒I ) Γ, Γ0 ` λy.t[u/x] : B ⇒ C 0

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

169

where π 0 is obtained from

π Γ, x : A, Γ , y : B ` t : C

and

0

.. . Γ, Γ0 ` u : A (wk) Γ, x : A, Γ0 ` u : A

by induction hypothesis. Remark 4.1.8.2. Note that, through the Curry-Howard correspondence, the substitution lemma precisely corresponds to the “proof substitution” of proposition 2.3.2.1: the term erasure of the rule of lemma 4.1.8.1 is the cut rule Γ, A, Γ0 ` B Γ, Γ0 ` A (cut) 0 Γ, Γ ` B It should not be a surprise: under the Curry-Howard correspondence, substituting proofs corresponds to substituting terms. Theorem 4.1.8.3 (Subject reduction). Suppose given a term t of type A in a context Γ. If t β-reduces to t0 then t0 also has type A in the context Γ. Proof. By induction on the derivation of t −→β t0 (see section 3.2.1). – If the derivation ends with (βs ), it is of the form (λx.t)u −→β t[u/x] and the typing derivation of the term on the left is of the form .. . .. Γ, x : A ` t : B . (→I ) Γ ` λx.t : A → B Γ`u:B (→E ) Γ ` (λx.t) u : B We conclude by lemma 4.1.8.1 which ensures the existence of a derivation of the form .. . Γ ` t[u/x] : B – If the derivation ends with (βl ), it is of the form t u −→β t0 u with t −→β t0 , and the typing derivation of the term on the left is of the form π2 π1 Γ`t:A→B Γ, x : A ` u : B (→E ) Γ ` tu : B We conclude with the derivation π10 π2 Γ`t :A→B Γ, x : A ` u : B (→E ) Γ ` t0 u : B 0

where π10 is obtained by induction hypothesis.

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

170

– The cases of (βr ) and (βλ ) are similar to the previous one. Example 4.1.8.4. We have the typing derivation x : A, y : A ` y : A

(ax)

x : A ` λy A .y : A → A

(→I )

x:A`x:A

x : A ` (λy A .y)x : A ` λxA .(λy A .y)x : A → A

(ax) (→E ) (→I )

and the reduction λxA .(λy A .y)x −→β λxA .x It can be checked that the reduced term does admit the same type A → A: x:A`x:A

(ax)

` λxA .x : A → A

(→I )

The proof of the above theorem deserves some attention. It should be observed that, by erasing the terms, the β-reduction of a typable term described in the above proof corresponds precisely to the procedure we used in section 2.3.3 in order to eliminate a cut in the corresponding proof: .. . Γ, A ` B (⇒I ) Γ`A⇒B Γ`B

.. . Γ`B

(⇒E )

.. . Γ`B

Thus, Theorem 4.1.8.5 (Dynamical Curry-Howard correspondence). Through the CurryHoward correspondence, β-reduction corresponds to eliminating cuts. This explains the remark already made in section 2.3.3: although cut-free proofs are “simpler” in the sense that they do not contain cuts, they can be much bigger than the corresponding cut-free proof, in a same way that executing a program can give rise to a much bigger result than the program itself (e.g. a program computing the factorial of 1000). As a direct consequence of previous theorem, we have that Corollary 4.1.8.6. Through the Curry-Howard correspondence, typable terms in normal form correspond to cut-free proofs. 4.1.9 η-expansion. We have seen that a β-reduction step corresponds to eliminating a cut, which consists in an elimination rule followed by an introduction rule. Similarly, an η-expansion step corresponds to introducing a “co-cut” (we are not aware of an official name for those) consisting of an introduction rule followed by an elimination rule. For instance, supposing that in some context Γ we can show t : A → B, the η-expansion step t −→η λxA .t x

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

171

corresponds to the following transformation of typing derivation .. . Γ`t:A→B (wk) (ax) Γ, x : A ` t : A → B Γ, x : A ` x : A (→E ) Γ, x : A ` t x : B

.. . Γ`t:A→B

Γ ` λxA .t x : A → B

(→I )

which, after term erasure, corresponds to the following proof transformation

π Γ`A→B

π Γ`A⇒B (wk) (ax) Γ, A ` A ⇒ B Γ, A ` A (⇒E ) Γ, A ` B (⇒I ) Γ`A⇒B

4.1.10 Confluence. Recall from section 3.4 that β-reduction of λ-terms is confluent. By theorem 4.1.8.3, we can immediately extend this result to typable terms: Theorem 4.1.10.1 (Confluence). The β-reduction is confluent on typable terms ∗ (in some fixed context): given typable terms t, u1 and u2 such that t −→β u1 ∗ ∗ ∗ and t −→β u2 , there exists a typable term v such that u1 −→β v and u2 −→β v.

4.2 Strong normalization 4.2.1 A normalization strategy. We have seen in theorem 4.1.8.5 that, under the Curry-Howard correspondence, β-reduction corresponds to cut elimination. Since, in theorem 2.3.3.1, we have established that every proof reduces to a cut-free proof, this means that every typable term β-reduces to a term in normal form. More precisely, the proof produces a strategy to reduce a term to a normal form: we can reduce a β-redex (λx.t) u whenever t and u do not contain β-redexes. In fact, the proof only depends on the hypothesis that u does not contain β-redexes, and we have to suppose this because those redexes could be “duplicated” during the reduction, making it unclear that it will terminate. For instance, writing I = λx.x for the identity, with t = λy.yxx and u = I I, we have the following reductions: (λxy.yxx)(I I)

λy.y(I I)(I I)

(λxy.yxx) I

λy.y I (I I)

λy.y I I

We see that the redex I I −→β I on the top line has become two redexes in the bottom line: this is because the term λxy.yxx contains twice the variable x and the vertical reduction will thus cause the term substituted for x to be duplicated. Following the terminology introduced in section 3.5.1, what theorem 2.3.3.1 establishes is thus that the innermost reduction strategies, such as call-by-value, terminate for typable λ-terms.

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

172

4.2.2 Strong normalization. We would now like to show a stronger result called strong normalization: every typable term is strongly normalizing. This mean that, starting from a given typable term t, we will always end up on a normal form after a finite number of steps, whichever way we chose to reduce it, see section 3.2.6. We show below a proof based on “reducibility candidates” which is due to Tait and later refined by Girard, see [Gir89, Chapter 6]. Before entering the details of this subtle proof, let us first explain why the naive ideas for a proof do not work. Failure of the naive proof. A first attempt to show the result would consist in showing that, for any derivable sequent Γ ` t : A, the term t is strongly normalizing by induction on the proof of the sequent. – For the rule (ax), this is immediate since a variable is strongly normalizing (it is even a normal form). – For the rule (→I ), we have to show that a term λx.t is strongly normalizing knowing that t is strongly normalizing. A sequence of reductions starting from λx.t is of the form λx.t −→β λx.t1 −→β λx.t2 −→β . . . with t −→β t1 −→β t2 −→β . . ., and is thus finite since t is strongly normalizing by induction hypothesis. – For the rule (→E ), we have to show that a term t u is strongly normalizing knowing that both t and u are strongly normalizing. However, a reduction in t u is not necessarily generated by a reduction in t or in u in the case where t is an abstraction, and we cannot conclude. If we try to identify the cause of the failure, we see that we do not really use the fact that the terms are typable in the last case. We are left proving that if t and u are normalizable then t u is normalizable, and there is a counter-example to that, already encountered in section 3.2.6: take t = λx.xx and u = λx.xx, both are strongly normalizable, but t u is not since it leads to an infinite sequence of reductions. This however not a counter-example to the strong normalizability property, because λx.xx cannot be typed, but we have no easy way of exploiting this fact. Reducibility candidates. Instead, we now take an “optimistic” approach and, given a type A, we define a set RA of terms, called the reducibility candidates at A, which are terms such that (i) for every term t such that Γ ` t : A is derivable, we have t ∈ RA , (ii) a term t in RA is “obviously” strongly normalizing. Which will allow us to immediately conclude, once we have shown those properties. The definition is performed by induction on the type A by – for a type variable X, RX is the set of all strongly normalizable terms t, – for an arrow type A → B, RA→B is the set of terms t such that for every u ∈ RA , we have t u ∈ RB .

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

173

In the first case, we have not been particularly subtle: we wanted a set of strongly normalizable terms which contains all the terms of type X, and we simply took all strongly normalizable terms. However, in the second case, we have crafted our definition to avoid the previous problem: in the case of the rule (→E ), it will be immediate to deduce that, given t ∈ RA→B and u ∈ RA that t u ∈ RB . However, it is not immediate that every term in RA→B is strongly normalizing and we will have to prove that. A term is said to be reducible when it belongs to a set of reducibility candidates RA for some type A. We begin by showing that every term t ∈ RA is strongly normalizing by induction on the type A, but in order to do so we need to strengthen the induction hypothesis and show together additional properties on A. A term is neutral when it is not an abstraction; otherwise said, a neutral term is of the form t u or x. Proposition 4.2.2.1. Given a type A and a term t, we have (CR1) if t ∈ RA then t is strongly normalizing, (CR2) if t ∈ RA and t −→β t0 then t0 ∈ RA , (CR3) if t is neutral, and t −→β t0 implies t0 ∈ RA , then t ∈ RA . Proof. Consider a term t. We show simultaneously the three properties by induction on A. In the base case, the type A is a type variable X. (CR1) If t ∈ RX then it is strongly normalizable by definition of RX . (CR2) Suppose that t ∈ RX (i.e. t is strongly normalizing) and t −→β t0 . Every sequence of reductions t0 −→β . . . starting from t0 can be extended as a sequence of reductions t −→β t0 −→β . . . starting from t, and is thus finite. Therefore t0 is strongly normalizing and thus belongs to RX . (CR3) Suppose that t is neutral and such that for every term t0 such that t −→β t0 we have t0 ∈ RX . A sequence of reductions t −→β t0 −→β . . . starting from t is such that t0 ∈ RX , and is thus finite. Therefore t ∈ RX . Consider the case of an arrow type A → B. (CR1) Suppose that t ∈ RA→B , i.e. for every u ∈ RA we have t u ∈ RB . A variable x is neutral and a normal form and thus belongs to RA by (CR3). By definition of RA→B , we have t x ∈ RB . Any sequence of reductions t −→β t0 −→β . . . induces a sequence of reductions t x −→β t0 x −→β . . . and is thus finite by (CR1) on B. Thus t is strongly normalizing. (CR2) Suppose that t ∈ RA→B and t −→β t0 . Given a term u ∈ RA , by definition of RA→B , we have t u ∈ RB . Since t u −→β t0 u, by (CR2) on B, we have t0 u ∈ RB . Therefore t0 ∈ RA→B . (CR3) Suppose that t is neutral and such that, for every term t0 with t −→β t0 , we have t0 ∈ RA→B . Suppose given a term u ∈ RA . By (CR1) on A, the term u is strongly normalizing and we can show that t u ∈ RB for every term u ∈ RA by well-founded induction on u (theorem A.3.2.1). Since t is neutral, the term t u can only reduce in two ways.

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

174

– If t u −→β t0 u then t0 u ∈ RB because, by hypothesis, we have t0 ∈ RA→B . – If t u −→β t u0 with u −→β u0 then u0 ∈ RA by (CR2) on A and, by recurrence hypothesis on u, we have t u0 ∈ RB . Therefore, by (CR3) on B, we have t u ∈ RB . t ∈ RA→B .

We conclude that

Lemma 4.2.2.2. Suppose given a term t such that t[u/x] ∈ RB for every term u ∈ RA . Then λxA .t ∈ RA→B . Proof. Given u ∈ RA , we have to show (λxA .t)u ∈ RB . By (CR1), the terms t and u are strongly normalizing. We can thus show (λxA .t)u ∈ RB by induction on the pair (t, u). The term (λxA .t)u can either reduce to – t[u/x], which is in RB by hypothesis, – (λxA .t0 )u with t −→β t0 , which is in RB by induction hypothesis, – (λxA .t)u0 with u −→β u0 , which is in RB by induction hypothesis. In every case, the neutral term (λxA .t)u reduces to a term in RB and therefore belongs to RB by (CR3). Lemma 4.2.2.3. Suppose given a term t such that Γ ` t : A is derivable for some context Γ = x1 : A1 , . . . , xn : An and type A. Then, for every terms ti ∈ RAi , for 1 6 i 6 n, we have t[t1 /x1 , . . . , tn /xn ] ∈ RA . Proof. We write t[t∗ /x∗ ] for the above substitution, and show the result by induction on t. By induction on the derivation of Γ ` t : A. – If the last rule is Γ ` x i : Ai

(ax)

then, for every terms ti ∈ RAi , we have t[t∗ /x∗ ] = ti ∈ RAi . – If the last rule is Γ`u:A→B Γ`v:A (→E ) Γ ` uv : B then, for every terms ti ∈ RAi , by induction hypothesis, u[t∗ /x∗ ] ∈ RA→B and v[t∗ /x∗ ] ∈ RA and therefore t[t∗ /x∗ ] = (u[t∗ /x∗ ])(v[t∗ /x∗ ]) ∈ RB . – If the last rule is

Γ, x : A ` u : B (→I ) Γ ` λx.u : A → B

then, by induction hypothesis, for every terms ti ∈ RAi and for every term v ∈ RA , we have u[t∗ /x∗ ][v/x] = u[t∗ /x∗ , v/x] ∈ RB . Therefore, by lemma 4.2.2.2, we have t[t∗ /x∗ ] = λx.(u[t∗ /x∗ ]) ∈ RA→B . Proposition 4.2.2.4 (Adequacy). Given a term t such that Γ ` t : A is derivable, we have t ∈ RA .

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

175

Proof. We write Γ = x1 : A1 , . . . , xn : An . A variable being neutral and in normal form, by (CR3), we have xi ∈ RAi for every index i. Therefore, by lemma 4.2.2.3, t = t[x∗ /x∗ ] ∈ RA . Theorem 4.2.2.5 (Strong normalization). Every typable term t is strongly normalizing. Proof. By proposition 4.2.2.4, the term t is reducible, and thus strongly normalizing by (CR1). One of the remarkable strengths of this approach is that is generalizes well to usual extensions of simply typed λ-calculus, see section 4.3.7. Remark 4.2.2.6. There are many possible variants on the definition of reducibility candidates, see [Gal89]. The version presented here has the advantage of being simple to define and leads to simple proofs. One of its drawbacks is that the λ-terms of RA are not necessarily of type A (for instance, when A = X any strongly normalizable term belongs to RA by definition). We can however define a “typed variant” of reducibility candidates, by defining sets RΓ`A , indexed by both a context Γ and a type A, by induction on A by – for a type variable X, RΓ`X is the set of strongly normalizable terms t such that Γ ` t : X is derivable, – for an arrow type A → B, RΓ`A→B is the set of terms t such that Γ ` t : A → B is derivable and for every u ∈ RΓ`A , we have t u ∈ RΓ`B . The expected adaptation of the above properties hold in this context, see the formalization proposed in section 7.5.2. In particular, the variant of proposition 4.2.2.4 ensures that every term t such that Γ ` t : A is derivable belongs to RΓ`A ; conversely, one easily shows by induction on A that every term t of RΓ`A is such that Γ ` t : A is derivable. With this formulation, it thus turns out that reducibility candidates are simply a complicated way of defining RΓ`A = {t | Γ ` t : A is derivable} However, the way the definition is formulated allows to perform the proofs by induction! 4.2.3 First consequences. We shall now present some easy consequences of the strong normalization theorem. Non-typable terms. A fist consequence of the strong normalization theorem 4.2.2.5 (or rather its contrapositive) is that there are terms which are not typable. For instance, the λ-term Ω = (λx.xx)(λx.xx) is not typable because it is not terminating, see section 3.2.6. Termination of cut elimination. By theorem 4.1.8.5, cut-elimination in the implicational fragment of natural deduction corresponds to β-reduction. Since for typable terms β-reduction is always terminating (theorem 4.2.2.5), we have shown Theorem 4.2.3.1. The cut elimination procedure of section 2.3.3 always terminates (on a cut-free proof), whichever strategy we choose to eliminate the cuts.

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

176

4.2.4 Deciding convertibility. In practice, the most important consequence of the strong normalization theorem is that it provides us with an algorithm to decide the β-convertibility of typable λ-terms t and u. Namely, suppose that we start reduce t: t = t0 −→β t1 −→β t2 −→β . . . This means that we start from t, reduce it to a term t1 , then reduce t1 to a term t2 , and so on. Note that we do not impose anything on the way ti+1 is constructed from ti : any reduction strategy would be acceptable. By theorem 4.2.2.5, such a sequence cannot be infinite, which means that this process will eventually give rise to a term tn which cannot be reduced: tn is a normal ∗ form. This shows that there exists a term in normal form tˆ such that t −→β tˆ. Similarly, u admits a normal form u ˆ. Clearly, t and u are β-convertible if and only if tˆ and u ˆ are β-convertible. By proposition 3.4.4.3, this is the case if and only if tˆ and u ˆ are equal: t ∗? u ∗





?

u ˆ

We have thus reduced the problem of deciding whether two terms are convertible to deciding whether two terms are equal, which is easily done. Using the functions defined in section 3.5, the following function eq tests for the β-equivalence of two λ-terms which are supposed to be typable: let eq t u = (normalize t) = (normalize u) Remark 4.2.4.1. In fact, even if we do not suppose that the terms t and u given as input of the above function are typable, it is still correct in the sense that – if it answers true then t and u are convertible, and – if it answers false then t and u are not convertible. However, there is now a third possibility: nothing guarantees that the normalization of t or u will terminate. This means that the procedure will not provide a result in such a case. As such it is not an algorithm, since, by convention, those should terminate on every input. 4.2.5 Weak normalization. The strong normalization theorem is indeed strong: it shows that, starting from a typable term, whichever way we chose to reduce a typable term, we will eventually end up on a normal form. In practice, we care about much less than this: when implementing reduction and normalization, we implement a particular reduction strategy (see section 3.5.1), and all we want to know is that this particular strategy will end up on a normal form. We suppose fixed a deterministic reduction strategy −→, i.e. we suppose here that −→ is a relation on λ-terms such that – t −→ u implies t −→β u, – if t −→ u and t −→ u0 then u = u0 ,

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

177

the last point being the determinism of the strategy. In other words, the strategy −→ picks, for every term t, at most one way to reduce it. We can then show that every closed typable term is terminating with respect to −→ using a much simplified version of the above proof, see [Pie02, Chapter 12]. We define sets RA of λ-terms by induction on A by – t ∈ RX if ` t : X is derivable and t is strongly normalizing, – t ∈ RA→B if ` t : A → B is derivable, t is strongly normalizing and t u ∈ RB for every u ∈ RA . Contrarily to section 4.2.2, by lemma 4.1.5.4, the sets RA contain only closed terms of type A. Above, we have been using the following property of reduction when showing the properties of reducibility candidates in proposition 4.2.2.1: Lemma 4.2.5.1. If t −→ t0 and t is strongly normalizing then t0 is. Proof. An infinite sequence of reductions t0 −→ . . . starting from t0 can be extended as one t −→ t0 −→ . . . starting from t, i.e. if t0 not strongly normalizing then t is not either. We conclude by contraposition. The main consequence of having a deterministic reduction is that the converse now also holds: Lemma 4.2.5.2. If t −→ t0 and t0 is strongly normalizing then t is. Proof. By determinism, an infinite sequence of reductions starting from t is necessarily of the form t −→ t0 −→ . . ., and thus induces one starting from t0 . Remark 4.2.5.3. Again, this property would not be true with the relation −→β , which is not deterministic. For instance, we have (λx.y)Ω −→β y, where (λx.y)Ω is not strongly normalizing but y is. We can now easily show variants of the properties of proposition 4.2.2.1. Note that the proof is greatly simplified because we do not need to prove them all at once. Lemma 4.2.5.4 (CR1). If t ∈ RA then it is strongly normalizing. Proof. By induction on A, immediate by definition of RA . Lemma 4.2.5.5 (CR2). If t ∈ RA and t −→ t0 then t0 ∈ RA . Proof. By induction on the type A. – Suppose that t ∈ RX . Then t0 ∈ RX by lemma 4.2.5.1. – Suppose that t ∈ RA→B . Then t is strongly normalizing, and thus also t0 by lemma 4.2.5.1. Given u ∈ RA , we have t u ∈ RB by definition of RA→B and thus t0 u ∈ RB by induction hypothesis. Thus t0 ∈ RA→B . The last one uses lemma 4.2.5.2 and thus relies on the fact that we have a deterministic reduction: Lemma 4.2.5.6 (CR3). If t −→ t0 and t0 ∈ RA then t ∈ RA . Proof. By induction on the type A.

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

178

– Suppose that t0 ∈ RX . Then t ∈ RX by lemma 4.2.5.2. – Suppose that t0 ∈ RA→B . Then t0 is strongly normalizing, and thus also t by lemma 4.2.5.2. Given u ∈ RA , we have t0 u ∈ RB by definition of RA→B and thus t u ∈ RB by induction hypothesis. Thus t ∈ RA→B . Similarly, to lemma 4.2.2.3, we can then show the following. Lemma 4.2.5.7 (Adequacy). Suppose given a term t such that Γ ` t : A is derivable for some context Γ = x1 : A1 , . . . , xn : An and type A. Then, for every terms ti ∈ RAi , for 1 6 i 6 n, we have t[t1 /x1 , . . . , tn /xn ] ∈ RA . Finally, we can deduce Theorem 4.2.5.8. Given a term t, if there is a type A such that ` t : A is derivable then t is strongly normalizing (with respect to −→). Proof. Suppose that ` t : A holds. By lemma 4.2.5.7, we have that t ∈ RA . Thus t is strongly normalizing by lemma 4.2.5.4. A formalization of these properties is provided in section 7.5.2. A strategy is complete when it is such that, for every term t, if there is a term u such that t −→β u then there is a term u0 such that t −→ u. In other words, our strategy can reduce any β-reducible term. By applying the previous theorem to such a strategy, we deduce that Theorem 4.2.5.9 (Weak normalization). Every typable term t is weakly normalizing. Of course, this theorem is also immediately implied by theorem 4.2.2.5.

4.3 Other connectives Up to now, for simplicity, we have been limiting ourselves to types built using arrows as the only connective. The Curry-Howard correspondence would be sad if it stopped there: it actually extends to other usual connectives. For instance, the product of two types corresponds to taking the conjunction of the two corresponding formulas. Other cases of the correspondence are given in the following table: Typing function product unit coproduct empty

→ × 1 + 0

⇒ ∧ > ∨ ⊥

Logic implication conjunction truth disjunction falsity

In order to study this, we will now consider types generated by the following syntax: A, B ::= X | A → B | A × B | 1 | A + B | 0

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

Γ ` x : Γ(x)

(ax)

Γ, x : A ` t : B

Γ`t:A→B Γ`u:A (→E ) Γ ` tu : B Γ`t:A×B (×lE ) Γ ` πl (t) : A

179

Γ ` λxA .t : A → B

Γ`t:A×B (×rE ) Γ ` πr (t) : B

(→I )

Γ`t:A Γ`u:B (×I ) Γ ` ht, ui : A × B Γ ` hi : 1

(1I )

Γ`t:A+B Γ, x : A ` u : C Γ, y : B ` v : C (+E ) Γ ` case(t, x 7→ u, y 7→ v) : C Γ` Γ`t:0

Γ ` caseA (t) : A

Γ`t:A

ιB l (t)

:A+B

(+lI )

Γ`

Γ`B

ιA r (t)

:A+B

(+rI )

(0E )

Figure 4.3: Typing rules for λ-calculus with products and sums. For each of those connectives, we add a connective between types, as well as new constructions to the λ-calculus which correspond to introduction and elimination rules, and the full syntax for λ-terms will be t, u ::= x | t u | λxA .t | ht, ui | πl (t) | πr (t) | hi A A | ιA l (t) | ιr (t) | case(t, x 7→ u, y 7→ v) | case (t)

Moreover, each such connective will give rise to typing rules and the full list of rules is given in figure 4.3. In addition, we need to add new rules for β-reduction, which correspond to cut elimination for the new rules (see section 4.1.8), resulting in the rules of figure 4.4, and η-expansion rules which correspond to introducing “co-cuts” (see section 4.1.9). We now gradually introduce each of those. Most of the important theorems extend to the λ-calculus with these new added constructors and types, although we will not detail these: – confluence (theorem 3.4.3.7), – subject reduction (theorem 4.1.8.3), – strong normalization (theorem 4.2.2.5), – Curry-Howard correspondence (theorems 4.1.7.1 and 4.1.8.5).

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

180

β-reduction rules: (λx.t) u −→β t[u/x] πl (ht, ui) −→β t πr (ht, ui) −→β u case(ιB l (t), x 7→ u, y 7→ v) −→β u[t/x] case(ιA r (t), x 7→ u, y 7→ v) −→β v[t/y] Commuting reduction rules: caseA→B (t) u −→β caseB (t) πl (caseA×B (t)) −→β caseA (t) πr (caseA×B (t)) −→β caseB (t) case(caseA+B (t), x 7→ u, y 7→ v) −→β caseC (t) caseA (case0 (t)) −→β caseA (t) case(t, x 7→ u, y 7→ v) w −→β case(t, x 7→ u w, y 7→ v w) πl (case(t, x 7→ u, y 7→ v)) −→β case(t, x 7→ πl (u), y 7→ πl (v)) πr (case(t, x 7→ u, y 7→ v)) −→β case(t, x 7→ πr (u), y 7→ πr (v)) caseC (case(t, x 7→ u, y 7→ v)) −→β case(t, x 7→ caseC (u), y 7→ caseC (v)) case(case(t, x 7→ u, y 7→ v), x0 7→ u0 , y 0 7→ v 0 ) −→β 0

case(t, x 7→ case(u, x 7→ u0 , y 0 7→ v 0 ), y 7→ case(v, x0 7→ u0 , y 0 7→ v 0 )) Figure 4.4: Reduction rules for λ-calculus with products and sums.

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

181

4.3.1 Products. In order to accommodate for products, we add the construction A×B to the syntax of our types, denoting the product of two types A and B. We also extend λ-terms with three new constructions: t, u ::= . . . | ht, ui | πl (t) | πr (t) where – ht, ui is the pair of two λ-terms t and u, – πl (t) takes the first component of the λ-term t, – πr (t) takes the second component of the λ-term t. We add three new typing rules to our system, one for each of the newly added constructors: Γ`t:A×B (×lE ) Γ ` πl (t) : A

Γ`t:A×B (×rE ) Γ ` πr (t) : B

Γ`t:A Γ`u:B (×I ) Γ ` ht, ui : A × B

The first one states that if t is a term, which is a pair consisting of an element of A and an element of B, then the term πl (t), obtained by taking its first projection, has type A. The second rule is similar. The last rule establishes that if t is of type A and u is of type B then the pair ht, ui is of type A × B. If we apply our term erasing procedure of section 4.1.7, and replace the symbols × by ∧, we recover the rules for conjunction: Γ`A∧B l (∧E ) Γ`A

Γ`A∧B r (∧E ) Γ`B

Γ`A Γ`B (∧I ) Γ`A∧B

This means that our extension of simply typed λ-calculus is compatible with the Curry-Howard correspondence (theorem 4.1.7.1). Recall that the cut-elimination rules for conjunction consist in the following two cases: π π0 Γ`A Γ`B (∧I ) Γ`A∧B (∧lE ) Γ`A

π Γ`A

π π0 Γ`A Γ`B (∧I ) Γ`A∧B (∧rE ) Γ`B

π0 Γ`B

By the Curry-Howard correspondence, they correspond to the following trans-

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

182

formations of typing derivations: π π0 Γ`t:A Γ`u:B (×I ) Γ ` ht, ui : A × B (×lE ) Γ ` πl (ht, ui) : A

π Γ`t:A

π0 π Γ`t:A Γ`u:B (×I ) Γ ` ht, ui : A × B (×rE ) Γ ` πr (ht, ui) : B

π0 Γ`u:A

which indicate that the reduction rules associated to the new connectives should be πl (ht, ui) −→β t

πr (ht, ui) −→β u

as expected: taking the first component of a pair ht, ui returns t, and similarly for the second component. Finally, the η-expansion rule corresponds to the following transformation of the proof derivation:

π Γ`t:A×B

π π Γ`t:A×B Γ`t:A×B (×lE ) (×rE ) Γ ` πl (t) : A Γ ` πr (t) : B (×I ) Γ ` hπl (t), πr (t)i : A × B

It should thus consist in the rule t −→η hπl (t), πr (t)i which states some form of “extensionality” of products: a term which is a product should be the same as the pair consisting of its components. Alternative formulations. Alternative formulations are possible for those connectives. Instead of taking the first and second projection of some term t, i.e. πl (t) and πr (t), we could simply add to our calculus the first and second projection operators πlA,B and πrA,B , whose associated typing rules would be Γ ` πlA,B : A × B → A

Γ ` πrA,B : A × B → B

This would correspond to an approach using combinators, also called Hilbertstyle, see section 4.5. Another alternative, could be to add a constructor unpair(t, xy 7→ u) which would corresponds to OCaml’s let (x,y) = t in u

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

183

It binds the two components of a pair t as x and y in u. The corresponding typing rule would be Γ`t:A×B Γ, x : A, y : B ` u : C Γ ` unpair(t, xy 7→ u) : C This is the flavor of rules which has to be used when working with dependent types, see section 8.3.3. We did not use it here because, through the CurryHoward correspondence, it corresponds to the following variant of elimination rule for conjunction Γ`A∧B

Γ, A, B ` C (∧E ) Γ`C

which is not the one which is traditionally used (the main reason is that it involves a “new” formula C, whereas the usual rules (∧lE ) and (∧rE ) only use A and B). Curryfication. An important property of product types in λ-calculus is that they are closely related to arrow types through an isomorphism called curryfication, which states that a function with two arguments is the same as a function taking a pair of arguments. More precisely, given types A, B and C, the two types A×B →C

and

A→B→C

are isomorphic, see remark 4.1.7.3, where the first type is implicitly bracketed as (A × B) → C. In OCaml, this means that it is roughly the same to write a function of the form let f x y = ... or a function of the form let f (x, y) = ... More precisely, the isomorphism between the two types means that we have λ-terms which allow converting elements of on type into an element of the other type, in both directions: λf A×B→C .λaA .λbB .f ha, bi : (A × B → C) → (A → B → C) λf A→B→C .λxA×B .f πl (x) πr (x) : (A → B → C) → (A × B → C) whose composites are both identities (up to βη-equivalence). Namely, respectively writing t and u for those terms, we have ∗

λf A×B→C .u (t f ) −→β λf A×B→C .λxA×B .f hπl (x), πr (x)i ===η λf A×B→C .f ∗

λf A→B→C .t (u f ) −→β λf A→B→C .λaA .λbB .f a b ===η λf A→B→C .f In most programming languages (Java, C, etc), a function with two arguments would be given a type of the form A × B → C. Functional programming languages such as the OCaml tend to prefer types of the form A → B → C because they allows partial evaluation, meaning that we can give the argument of type A, without giving the other one.

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

184

4.3.2 Unit. We can add a unit type by adding a constant type 1 called unit. It corresponds to the unit type of OCaml and, through the CurryHoward correspondence to the formula >. We add a new constant λ-term hi which is the only element of 1 (up to β-equivalence), and corresponds to () in OCaml. The typing rule is Γ ` hi : 1

(1I )

which corresponds to the usual rule for truth by term erasure: Γ`>

(>I )

There are no β- or η-reduction rules. 4.3.3 Coproducts. For coproducts, we add the construction A+B on types, which represents the coproduct of the two types A and B. Intuitively, this corresponds to the set-theoretic disjoint union: an element of A+B is either an element of type A or an element of type B. We add three new constructions to the syntax of λ-terms: A t, u, v ::= . . . | case(t, x 7→ u, y 7→ v) | ιA l (t) | ιr (t)

where t, u and v are terms, x and y are variables and A is a type. Since A + B is the disjoint union of A and B, we should be able to see a term t of type A (resp. B) as a term of type A + B: this is precisely represented by the A term ιB l (t) (resp. ιr (t)), which can be thought of as the term t “cast” into an A element of type A + B. For this reason, ιA l and ιr are often called the canonical injections. Conversely, any element of A + B should either be an element of A or an element of B. This means that we should be able to construct new values by case analysis: typically, given a term t of type A + B, – if t is of type A then we return u(t), – if t is of type B then we return v(t). Above, u (resp. v) should be a λ-term with a distinguished free variable x (resp. y) which is to be replaced by t. In formal notation, such a case analysis is written case(t, x 7→ u, y 7→ v) The symbol “7→” is purely formal here (it indicates bound variables), and our operation takes 5 arguments t, x, u, y and w. With the above intuitions, it should be no surprise that the typing rules are Γ`t:A+B Γ, x : A ` u : C Γ, y : B ` v : C (+E ) Γ ` case(t, x 7→ u, y 7→ v) : C Γ`t:A Γ`

ιB l (t)

:A+B

(+lI )

Γ`B Γ`

ιA r (t)

:A+B

(+rI )

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

185

From a Curry-Howard perspective, + corresponds to disjunction ∨, and term erasure of the above typing rules do indeed allow us to recover the usual rules for disjunction: Γ`A∨B

Γ, A ` C Γ`C

Γ, B ` C

Γ`A (∨lI ) Γ`A∨B

(∨E )

Γ`B (∨rI ) Γ`A∨B

The β-reduction rules correspond to the cut-elimination step reducing π Γ`t:A Γ ` ιB l (t) : A + B

(+lI )

π0 Γ, x : A ` u : C

π 00 Γ, y : B ` v : C

Γ ` case(ιB l (t), x 7→ u, y 7→ v) : C to

(+E )

π 0 [π/x] Γ ` u[t/x] : C

as well as the symmetric one, obtained by suitably using ιr instead of ιl . The β-reduction rules are thus case(ιB l (t), x 7→ u, y 7→ v) −→β u[t/x] case(ιA r (t), x 7→ u, y 7→ v) −→β v[t/y] The η-expansion rule is A t −→η case(t, x 7→ ιB l (t), y 7→ ιr (t))

In OCaml. We recall from section 1.3.2 that coproducts can be implemented in OCaml as the type type ('a, 'b) coprod = | Left of 'a | Right of 'b The injections ιl and ιr respectively correspond to Left and Right, and the eliminator case(t, x 7→ u, y 7→ v) to match t with | Left x -> u | Right y -> v Church vs Curry style. Note that if we remove the type annotations on the injections, i.e. write ιl (t) instead of ιB l (t), then the typing of a λ-term is not unique anymore. Namely, the type rules become Γ`t:A (+lI ) Γ ` ιl (t) : A + B

Γ`t:B (+rI ) Γ ` ιr (t) : A + B

and, in the first rule, there is no way of guessing the type B in the conclusion from the premise (and similarly for the other rule). Similar issues happen if we remove the type annotations from abstractions, these are detailed in section 4.4.

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

186

α-conversion and substitution. The reason why we use the symbol “7→” in terms case(t, x 7→ u, y 7→ v) is that it indicates that x is bound in u (and similarly for v), in the sense that our α-equivalence should include case(t, x 7→ u, y 7→ v) ===α case(t, x0 7→ u[x0 /x], y 0 7→ v[y 0 /y]) This also means that substitution should take care not to accidentally bind variables: the equation (case(t, x 7→ u, y 7→ v))[w/z] = case(t[w/z], x 7→ u[w/z], y 7→ v[w/z]) is valid only when x 6∈ FV(w) and y 6∈ FV(w). Such details can be cumbersome in practice when performing implementations, and we already have spent a great deal of time doing this correctly for abstractions. It is possible to use an alternative formulation of the eliminator which allows the use of abstractions, thus simplifying implementations by having abstractions being the only case where we have to be careful about capture of variables: in the construction case(t, x 7→ u, y 7→ v), instead of having u (resp. v) be a λ-term with a distinguished free variable x (resp. y), we can directly describe it as the function λx.u (resp. λy.v). Our eliminator thus now has the form case(t, u, v) taking three terms in argument, with associated typing rule Γ`t:A+B

Γ`u:A→C Γ`v:B→C (+E ) Γ ` case(t, u, v) : C

and the β-reduction rules become case(ιB l (t), u, v) −→β u t

case(ιA r (t), u, v) −→β v t

4.3.4 Empty type. The empty type is usually denoted 0. We extend the syntax of λ-terms t ::= . . . | caseA (t) by adding one eliminator caseA (t) which allows us to construct an element of an arbitrary type A, provided that we have constructed an element of the empty type 0 (which we do not expect to be possible). The typing rule is thus Γ`t:0 Γ ` caseA (t) : A

(0E )

Through the Curry-Howard correspondence, the type 0 corresponds to the falsity formula ⊥, and we recover the usual elimination rule Γ`⊥ (⊥E ) Γ`A There is no β-reduction rule and the η-reduction rule is case0 (t) −→η t

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

187

for t of type 0, which corresponds to the transformation π Γ`t:0 0

Γ ` case (t) : 0

π Γ`t:0

(0E )

As usual, negation can be implemented ¬A = A ⇒ ⊥, i.e. we could define the corresponding type ¬A = A → 0. 4.3.5 Commuting conversions. As explained in section 2.3.6, when considering both conjunctions and disjunctions, usual cuts are not the only situations where we want to simplify proofs, we also want to be able to remove the commutative cuts. By the Curry-Howard correspondence, this means that when having λ-terms with both products and coproducts, we want some additional reduction rules, called commuting conversions, which are all listed in figure 4.4. For instance, we have the following commutative cut π Γ`A∨B

π0 Γ, A ` C ∧ D Γ`C ∧D Γ`C

π 00 Γ, B ` C ∧ D

(∨E ) (∧lE )

which reduces to

π Γ`A∨B

π0 Γ, A ` C ∧ D (∧lE ) Γ, A ` C Γ`C

π 00 Γ, B ` C ∧ D (∧lE ) Γ, B ` C

(∨E )

By the Curry-Howard correspondence, this means that the typing derivation π Γ`t:A+B

π0 π 00 Γ, x : A ` u : C ∧ D Γ, y : B ` v : C ∧ D (+E ) Γ ` case(t, x 7→ u, y 7→ v) : C ∧ D (×lE ) Γ ` πl (case(t, x 7→ u, y 7→ v)) : C

should reduce to π Γ`t:A+B

π0 π 00 Γ, x : A ` u : C × D Γ, y : B ` v : C × D (×lE ) (×lE ) Γ, x : A ` πl (u) : C Γ, y : B ` πr (v) : C (+E ) Γ ` case(t, x 7→ πl (u), y 7→ πr (v)) : C

and thus that we should add the reduction rule πl (case(t, x 7→ u, y 7→ v)) −→β case(t, x 7→ πl (u), y 7→ πr (v)) which states that projections can “go through” case operators. Other rules are obtained similarly.

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

188

4.3.6 Natural numbers. In order to grow λ-calculus into a more full-fledged programming language, it is also possible to add basic types (integers, strings, etc.) as well as constants and functions to operate on those. In order to illustrate this, we explain here how to extend simply typed λ-calculus with natural numbers. The resulting system, called system T , was originally studied by Gödel [Göd58]. The types are generated by A ::= X | A → B | Nat where the newly added type Nat stands for natural numbers. Terms are generated by t, u, v ::= x | t u | λx.t | Z | S(t) | rec(t, u, xy 7→ v) where the term Z stands for the zero constant, and S(t) for the successor of a term t (supposed to be a natural number). The construction rec(t, u, xy 7→ v) allows to define a function by induction: – if t is 0, it returns u, – if t is n + 1, it returns v where x has been replaced by n and y by the value recursively computed for n. In OCaml. In OCaml, using int as representation for natural numbers, Z would correspond to 0, S(t) to t+1 and the recursor to let rec recursor t u v = if t = 0 then u else v (t-1) (recursor t u v) Alternatively, we can represent natural numbers as the type type nat = Z | S of nat where Z corresponds to Z, S to S and the recursor to let rec match | Z | S n

recursor t u v = t with -> u -> v n (recursor n u v)

Traditionally, addition can be defined by recurrence by let rec match | Z | S m

add m n = m with -> n -> S (add m n)

However, it can be observed that all the recurrence power we usually need is already contained in the recursor, so that this can equivalently be defined as let add m n = recursor m n (fun m r -> S r) Similarly, multiplication can be defined as let mul m n = recursor m Z (fun m r -> add r n)

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

189

and other traditional functions (exponentiation, Ackermann, etc.) are left to the reader. There are however some functions that can be written using recursion in OCaml, but cannot be encoded in recursor. For instance, all functions written with the recursor are total and therefore the function let rec omega n = omega n cannot be implemented using it, since it never produces a result whereas the recursor always defines well-defined functions. Rules. The typing rules for the new terms are the following ones: Γ ` Z : Nat Γ ` t : Nat

Γ ` t : Nat (SI ) Γ ` S(t) : Nat

(ZI )

Γ`u:A Γ, x : Nat, y : A ` v : A (NatE ) Γ ` rec(t, u, xy 7→ v) : A

The reduction rules ensure that the recursor implement the primitive recursion rules: rec(Z, u, xy 7→ v) −→β u rec(S(t), u, xy 7→ v) −→β v[t/x, rec(t, u, xy 7→ v)/y] Properties. It can be shown, see section 4.3.7, that this system is terminating and confluent. Moreover, the functions of type Nat → Nat which can be implemented in this system are precisely the recursive functions which are provably total (in Peano Arithmetic, see section 5.2.5), i.e. recursive functions for which there is a proof that they terminate on every input. This class of functions strictly includes the primitive recursive ones, and it is strictly included in the class of total recursive functions. 4.3.7 Strong normalization. The strong normalization proof presented in section 4.2.2 for simply typed λ-calculus extends to the other connectives presented above. For instance, following [Gir89, chapter 7], let us briefly explain how to adapt the proof for a λ-calculus with products, unit and natural numbers. Types are thus generated by A, B ::= X | A → B | A × B | 1 | Nat and terms by t, u, v ::= x | λxA .t | t u | ht, ui | πl (t) | πr (t) | hi | Z | S(t) | rec(t, u, xy 7→ v) We extend the notion of reducibility candidate by RX = R1 = RNat = {t | t is strongly normalizable} RA→B = {t | u ∈ RB implies t u ∈ RA } RA×B = {t | πl (t) ∈ RA and πr (t) ∈ RB }

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

190

We also extend the notion of neutral term: a term is neutral when it is not of the following forms λxA .t

ht, ui

hi

Z

S(t)

which correspond to the possible introduction rules in our system. With those definitions, the proofs can be performed following the same structure as in section 4.2.2.

4.4 Curry style typing In this section, we investigate simply typed λ-calculus à la Curry when abstractions are of the form λx.t instead of λxA .t, i.e. we do not indicate the type of abstracted variables. A detailed presentation of this topic can be found in [Pie02, chapter 12] and [SU06, chapter 6]. 4.4.1 A typing system. Curry-style typing is closer to languages such as OCaml, where we do not have to indicate the type of the arguments of a function. For simplicity, we consider functions only, i.e. types are defined by A, B ::= X | A → B and terms are defined by t, u ::= x | λx.t | t u similarly to the beginning of this chapter. The typing rules are

Γ ` x : Γ(x)

(ax)

Γ, x : A ` t : B (→I ) Γ ` λx.t : A → B Γ`t:A→B Γ`u:A (→E ) Γ ` tu : B This seemingly minor change of not writing types for abstractions has strong consequences on the properties of typing. In particular, theorem 4.1.6.1 does not hold anymore: a given λ-term might admit multiple types. For instance, the identity λ-term admits the following types: (ax)

x:X`x:X (→I ) ` λx.x : X → X

(ax)

x:Y →Z`x:Y →Z (→I ) ` λx.x : (Y → Z) → (Y → Z)

and in fact, every type of the form A → A for some type A is an admissible type for the identity. The reason for this is that when we derive a type containing a type variable in this system, we can always replace this variable by any other type. Formally,

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

191

given types A and B and a type variable X, we write A[B/X] for the type obtained from A by replacing every occurrence of X by B in A. Similarly, given a context Γ, we write Γ[B/X] for the context where X has been replaced by B in every type. We have Lemma 4.4.1.1. If Γ ` t : A is derivable then Γ[B/X] ` t : A[B/X] is also derivable for every type B and variable X. Proof. By induction on the derivation of Γ ` t : A. For instance, since the identity admits the type X → X, it also admits the same type where X has been replaced by Y → Z, i.e. (Y → Z) → (Y → Z). The first type is “more general” than the second, in the sense that the second can be obtained by substituting type variables in the first. We will see that any term admits a type which is “most general”, in the sense that it is more general than any other of its types. For instance, the most general type for identity is X → X. Again, this phenomenon is not present in Church style typing, e.g. the two terms λxX .x : X → X

λxY →Z .x : (Y → Z) → (Y → Z)

are distinct: Curry is more spicy than Church. 4.4.2 Principal types. Recall from section 2.2.11 that a substitution σ is a function which to type variables in X associates a type. Its domain dom(σ) is the set of type variables dom(σ) = {X ∈ X | σ(X) 6= X} This set will always be finite for the substitutions we consider here, so that, in practice, a substitution can be described by the images of the variables X in its domain. Given a type A, we write A[σ] for the type A where every variable X has been replaced by σ(X). Formally, it is defined by induction on the type A by X[σ] = σ(X) (A → B)[σ] = A[σ] → B[σ] We say that a type A is more general than a type B, what we write A v B, when there is a substitution σ such that B = A[σ]. Lemma 4.4.2.1. The relation v is a partial order on types modulo α-conversion. Given a context Γ = x1 : A1 , . . . , xn : An , we also write Γ[σ] = x1 : A1 [σ], . . . , xn : An [σ] In this case, we sometimes say that the context Γ[σ] is a refinement of the context Γ. It is easily shown that if a term admits a type, it also admits a less general type: lemma 4.4.1.1 generalizes as follows. Lemma 4.4.2.2. Given a term t such that Γ ` t : A is derivable and a substitution σ then Γ[σ] ` t : A[σ] is also derivable. Proof. By induction on the derivation of Γ ` t : A.

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

192

Definition 4.4.2.3 (Principal type). Given a context Γ and a λ-term t, a principal type (or most general type) for t in the context Γ consists of a substitution σ and a type A such that – Γ[σ] ` t : A is derivable, – for every substitution τ such that Γ[τ ] ` t : B is derivable, there exists a substitution τ 0 such that τ = τ 0 ◦ σ and B = A[τ 0 ]. In other words, the most general type is a type A for t in some refinement of the context Γ such that every other type can be obtained by substitution, in the sense of lemma 4.4.2.2. This is often used in the case where the context Γ is empty, in which case the substitution σ is not relevant. In this case, the principal type for t is a type A such that ` t : A is derivable and which is minimal: given a type B, we have ` t : B derivable if and only if A v B. Example 4.4.2.4. The principal type for t = λx.x is X → X: the types of t are those of the form A → A for some type A. Example 4.4.2.5. The principal types for the λ-terms λxyz.(xz)(yz)

and

λxy.x

are respectively (X → Y → Z) → (X → Y ) → X → Z

and

X→Y →X

4.4.3 Computing the principal type. We now give an algorithm to compute the principal type of a λ-term. A type equation system is a finite set E = {A1 = ? B1 , . . . , A n = ? Bn }

(4.2)

consisting of pairs types Ai and Bi , written Ai = ? Bi and called type constraints. A substitution σ is a solution of the equation system (4.2) when applying it makes every equation of E valid, i.e. for every 1 6 i 6 n we have Ai [σ] = Bi [σ] Typing with constraints. The idea is that to every context Γ and λ-term t, we are going to associate a type A and a type equation system E which are complete in the sense that – for every solution σ of E, we have Γ[σ] ` t : A[σ] – if there is a substitution σ such that Γ[σ] ` t : B then σ is a solution of E such that B = A[σ].

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

193

In this sense, the solutions of E describe all the possible types of t in the refinements of the context Γ. Its elements are sometimes called constraints since they encode constraints on acceptable substitutions. We will do so by imposing the “minimal amount of equations” to E so that t admits a type A in the context Γ. As usual, this is performed by induction on t, distinguishing the three possible cases: – x: we have a type A if and only if x ∈ dom(Γ), in which case A = Γ(x), – λx.t: the type A should be of the form B → C for where C is the type of t. Writing At for the type inferred for t, we thus define A = X → At for some fresh variable X, – t u: we have a type A if and only if t is of the form B → A and u is of type B. Writing At for the type inferred for t and Au for the type inferred for u, we thus define A = X for some fresh variable X and add the equation At = ? (Au → X) Above, the fact that X is “fresh” means that it does not occur somewhere else (in the contexts, the types or the equation systems). Sequent presentation. More formally, this can be presented in the form of a “sequent” calculus, where the sequents are of the form Γ ` t : A|E where Γ is a context, t is a term, A is a type and E is a type equation system: given Γ and t, a the derivation of such a sequent will be seen as producing the type A and the equations E. The rules are Γ ` x : Γ(x) |

(ax)

with x ∈ dom(Γ)

Γ, x : X ` t : At | Et (→I ) Γ ` λx.t : X → At | Et

with X fresh

Γ ` t : At | Et Γ ` u : Au | Eu (→E ) Γ ` t u : X | Et ∪ Eu ∪ {At = ? (Au → X)}

with X fresh

Example 4.4.3.1. For instance, for the term λf x.f x, we have the following derivation: (ax)

(ax)

f : Z, x : X ` f : Z | f : Z, x : X ` x : X | (→E ) f : Z, x : X ` f x : Y | Z = X → Y (→I ) f : Z ` λx.f x : X → Y | Z = X → Y (→I ) ` λf.λx.f x : Z → (X → Y ) | Z = X → Y The type A and the equations E describe exactly all the possible types for t in the context Γ in the following sense.

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

194

Lemma 4.4.3.2. Suppose that Γ ` t : A | E is derivable using the above rules, then – for every solution σ of E the sequent Γ[σ] ` t : A[σ] is derivable (in the sense of section 4.1.4), – if there is a substitution σ and a type B such that Γ[σ] ` t : B is derivable then σ is a solution of E and B = σ[A]. Proof. By induction on the derivation of Γ ` t : A | E. It is easily seen that given a context Γ and term t there is exactly one type A and system E such that Γ ` t : A | E is derivable (up to the choice of type variables), we can thus speak of the type A and the system E associated to a term t in a context Γ. Moreover, the above rules are easily translated into a method for computing those. An implementation of the resulting algorithm is provided in figure 4.5: the function infer generates, given a an environment env describing the context Γ, the type A and the equation system E, encoded as a list of pairs of terms. Computing the principal type. What is not clear yet is – how to compute the solutions of E, – how to compute the most general type for t in the context Γ. We will see in section 5.4 that if a system of equations admits a solution then it admits a most general one: provided there is a solution, there is a solution σ such that the solutions are exactly substitutions of the form τ ◦ σ for some substitution τ . Moreover, we will see an algorithm to actually compute this most general solution: this is called the unification algorithm. This finally provides us with what we were looking for. Theorem 4.4.3.3. Suppose given a context Γ and a term t. Consider the type A and the system E such that Γ ` t : A | E is derivable, and write σ for the most general solution of E. Then the substitution σ together with the type A[σ] is a principal type for t in the environment Γ. In place unification. In practice, people do not implement the computation of most general types by first generating equations and then solving them, although there are notable exceptions [PR05]: we can directly change the value of the variables instead of deferring this using equations. Moreover, this can be done efficiently by using references. We will see in section 5.4.5 that unification can always be performed in this way, and only describe here the implementation specialized to our problem. Instead of generating equations, we can replace type variables as follows: – when we have an equation of the form X = ? A, we can directly replace X by A, provided that X 6∈ FV(A), – when we have an equation of the form A = ? X, we can directly replace X by A, provided that X 6∈ FV(A),

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

(** Types *) type ty = | TVar of int | TArr of ty * ty (** Generate a fresh type variable. *) let fresh = let n = ref (-1) in fun () -> incr n; TVar !n (** Terms. *) type term = | Var of string | Abs of string * term | App of term * term (** Type constraints. *) type teq = (ty * ty) list (** let | |

Type and equations. *) rec infer env : term -> ty * teq = function Var x -> List.assoc x env, [] Abs (x, t) -> let ax = fresh () in let at, et = infer ((x,ax)::env) t in TArr (ax, at), et | App (t, u) -> let at, et = infer env t in let au, eu = infer env u in let ax = fresh () in ax, (at, TArr (au, ax))::(et@eu) Figure 4.5: Typability with constraints in OCaml.

195

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

196

– when we have an equation of the form (A → B) = ? (A0 → B 0 ), we can 0 0 replace it by the two equations A = ? A and B = ? B , and recursively act on those. In order to perform this efficiently, we change the representation of type variables to the following: (** Types *) type ty = | TVar of tvar ref | TArr of ty * ty (** and | |

Type tvar Link AVar

variables. *) = of ty (* a substituted type variable *) of int (* a type variable *)

A variable, corresponding to the constructor TVar, is now a reference, meaning that its value can be changed. Initially, this reference will point to a value of the form AVar n, meaning that it is the variable with number n. However, we can replace its contents by another type A, in which case we make the reference point to a value of the form Link A (it is a “link” to the type A): this method has the advantage of changing at once the contents of all the occurrences of the variable. The type tvar thus indicates the possible values for a variable: it is either a real variable (AVar) or a substituted variable (Link). With this representation, a variable containing a link to a type A should be handled as if it was the type A. To this end, we implement a function which will remove links at toplevel of types: let rec unlink = function | TArr (a, b) -> TArr (a, b) | TVar v as a -> match !v with | Link a -> unlink a | AVar _ -> a In order to check the side condition X 6∈ FV(A) above, we need to implement a function which checks whether a variable X occurs in a type A, i.e. whether X ∈ FV(A). This easily done by induction on A: let rec occurs x = function | TArr (a, b) -> occurs x a || occurs x b | TVar v -> match !v with | Link a -> occurs x a | AVar _ as y -> x = y Next, instead of generating an equation A = ? B, we will use the following function which will replace the type variables in A and B following the method described above, called unification: let rec unify a b = match unlink a, b with

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

197

| TVar v, b -> assert (not (occurs !v b)); v := Link b | a, TVar v -> v := Link a | TArr (a, b), TArr (a', b') -> unify a a'; unify b b' Finally, the type inference algorithm can be implemented as before, excepting that we do not return the equations anymore (only the type), since type variables are changed in place: in the case of application, instead of generating the equation At = ? (Au → X), we instead call the function unify which will replace type variables in a minimal way, in order to make the types At and Au → X equal: let rec infer env = function | Var x -> List.assoc x env | App (t, u) -> let a = infer env u in let b = fresh () in unify (infer env t) (TArr (a,b)); b | Abs (x, t) -> let a = fresh () in let b = infer ((x,a)::env) t in TArr (a, b) Example 4.4.3.4. The term λf x.f x can be represented as the term Abs ("f", Abs ("x", App (Var "f", Var "x"))) If we infer its type (in the empty environment) using the above function infer, we obtain the following result Arr (TVar {contents = Link (TArr (TVar {contents = AVar 1}, TVar {contents = AVar 2}) }, TArr (TVar {contents = AVar 1}, TVar {contents = AVar 2})) which is OCaml’s way of saying (X → Y ) → (X → Y ) (in OCaml, references are implemented as records with one mutable field labeled contents). Remark 4.4.3.5. In the unification function, when facing an equation X = ? A, it is important to check that X does not occur in A. For instance, let us try to type λx.xx, which is not expected to be typable. The inference will roughly proceed as follows. 1. Since it is an abstraction, the type of λx.xx must be of the form X → A, where A is the type of xx. Let’s find the type of xx assuming x of type X.

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

198

2. The term xx is an application whose function is x of type X and argument is x of type X. We must therefore have X = ? (X → Y ) and the type of xx is Y . With the above implementation, the algorithm will raise an error: the unification of X and X → Y will fail because X ∈ FV(X → Y ). If we forgot to check this, we would generate for x the type X → Y where X is (physically) the type itself. This would intuitively correspond to allowing the infinite type (((. . . → Y ) → Y ) → Y ) → Y which should not be allowed. Typability. The above algorithms can also be used to decide the typability of a term t, i.e. answer the following question: is there a context in which t admits a type? Theorem 4.4.3.6. The typability problem for λ-calculus is decidable. Proof. Suppose given a term t. We write FV(t) = {x1 , . . . , xn } for the set of free variables and define the context Γ = x1 : X1 , . . . , xn : Xn . Using lemma 4.1.5.1, it is not difficult to show that t admits a type if and only if it admits a type in the context Γ, which can be decided as above. 4.4.4 Hindley-Milner type inference. In this section, we go on a small excursion and investigate polymorphic types. We have seen that a Curry-style λ-term usually admits multiple types. However, a given term cannot be used within a same term with two different types. In a real-world programming language this is a problem: for instance, if we define the identity function, we cannot apply it both to integers and strings. If we want to do so, we need to define two identity functions, one for integers and one for strings, with the same definition. One way to overcome this problem is to allow functions to be polymorphic, i.e. to have multiple types. For instance, we will be able to type the identity as ∀X.X → X meaning that it has type X → X for any possible value of the variable X. OCaml features such types: variables beginning by a ’ are implicitly universally quantified, so that the identity has type ’a -> ’a. Type schemes. Formally, a type A is defined as before, and type schemes A are generated by the grammar A ::= A | ∀X.A where X is a type variable and A is a type. In other words, a type scheme is a type with some universally quantified variables at toplevel, i.e. a formula of the form ∀X1 . . . . ∀Xn .A Having such a “type” for a term means that it can have any type in the set JAK = {A[A1 /X1 , . . . , An /Xn ] | A1 , . . . , An types}

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

199

i.e. any type obtained by replacing the universally quantified type variables by some types. As usual, in a type scheme ∀X.A, the variable X is bound in A and could be renamed. The free variables of a type scheme are FV(∀X1 . . . . ∀Xn .A) = FV(A) \ {X1 , . . . , Xn } Given a variable X and a type B, we write A[B/X] for the type scheme A where the variable X has been replaced by B (as usual, one has to properly take care of bound variables): (∀X1 . . . . ∀Xn .A)[B/X] = ∀X1 . . . . ∀Xn .A[B/X] whenever Xi 6∈ FV(B) whenever 1 6 i 6 n. We consider type schemes modulo α-conversion, which can be defined by: ∀X.A = ∀Y.A[Y /X] We write A v B when the set of types described by A is included in the set of types of B, i.e. JAK ⊆ JBK. In this case, we say that the type scheme A is more general than B, and that B is less general or a specialization of A. Lemma 4.4.4.1. We have ∀X1 . . . . ∀Xn .A v ∀Y1 . . . . ∀Yn .B if and only if there are types A1 , . . . , An such that B = A[A1 /X1 , . . . , An /Xn ] and the Yi are variables which are not free in ∀X1 . . . . ∀Xn .A. When we have A v B, which means that B was obtained from A by replacing some universally quantified variables Xi by types Ai , but not only: we can also universally quantify some of the fresh variables introduced by the Ai afterwards. For instance, we have ∀X.X → X v ∀Y.(Y → Y ) → (Y → Y ) v (Z → Z) → (Z → Z) Hindley-Milner typing system. We are now going to give a typing system for a programming language whose terms are t, u ::= x | λx.t | t u | let x = t in u Compared to λ-calculus, the only new construction is let x = t in u, and means that we should declare x to be t in the term u. From an operational point of view, it is thus the same as (λx.u) t. The two constructions however differ from the typing point of view: the type of a variable defined by a let will be generalized, which means that we are going to universally quantify the type variables we can, so that it becomes polymorphic. A variable declared by a let can thus be used with multiple types, which is not the case for an argument of a function. For instance, in OCaml, we can define the identity once with a let, and use it on an integer and on a string: let () = let id = fun x -> x in print_int (id 3); print_string (id "a")

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

200

This is allowed because the type inferred for id is ∀X.X → X, which is polymorphic. On the other hand, the following code is rejected: let () = (fun id -> print_int (id 3); print_string (id "a") ) (fun x -> x) Namely, the type inferred for the argument id of the function is X → X. During the type inference, OCaml sees that it is applied to 3, and therefore replaces X by int, i.e. it guesses that the type of id must be int → int and thus raises a type error when we also apply it to a string. The identity argument is monomorphic: it can be applied to an integer, or to a string, but not both. We now present an algorithm, due to Hindley and Milner [Hin69, Mil78] which infers such types. A context Γ is a list x 1 : A1 , . . . , x n : An consisting of pairs of variables and type schemes. The free variables of such a context are n [ FV(Ai ) FV(Γ) = i=1

We will consider sequents of the form Γ ` t : A where Γ is a context, t a term and A a type: we still infer a type (as opposed to a type scheme) for a term. The rules for our typing system, which assigns type schemes to terms are the following ones: Γ(x) = A AvB (ax) Γ`x:B

Γ`t:A Γ, x : ∀Γ A ` u : B (let) Γ ` let x = t in u : B

Γ`t:A→B Γ`u:A (→E ) Γ ` tu : B

Γ, x : A ` t : B (→I ) Γ ` λx.t : A → B

The rules (→E ) and (→I ) for elimination and introduction of functions are the usual ones. The rule (ax) allows to specialize the type of a variable in the context: if x has type A in the context Γ, then we can assume that it actually has any type B with A v B: with our above example, if id has the type scheme ∀X.X → X in the context, then we can assume that it has type int → int (or string → string) when we use it, and we can make different assumptions at each use. Finally, the rule let states that if we can show that t has type A, then we can assume that it has the more general type scheme ∀Γ A = ∀X1 . . . . ∀Xn .A where FV(A) \ FV(Γ) = {X1 , . . . , Xn }, called the generalization of A with respect to Γ. We thus universally quantify over all type variables which are not already present in the context. Remark 4.4.4.2. In the rule (let), it is important that ∀Γ A does not universally quantify over all the variables in A, but only those in FV(A) \ FV(Γ). Namely,

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

201

suppose that we did not put this restriction and quantify over all the variables of A. We would then have the derivation (ax)

x:X`x:X x : X, y : ∀X.X ` y : Y x : X ` let y = x in y : Y ` λx.let y = x in y : X → Y

(ax) (let) (→I )

This is clearly incorrect since the term λx.let y = x in y is essentially the identity, and thus should have X → X as principal type. The following proposition shows that this typing system amounts to the simple one of section 4.4.1, where we would allow us to infer the type of an expression each time we use it: Proposition 4.4.4.3. The sequent Γ ` let x = t in u : A is derivable if and only if t is typable in the context Γ and Γ ` u[t/x] : A is derivable. Algorithm W. Formulated as above, it is not immediate to write down an algorithm which would infer the most general type of a term. The problem lies in the (ax) rule: given a variable x which has type scheme A in the context, we have to come up with a type B which specializes A, and there is no obvious way of doing so. Instead, we will replace all the universally quantified variables of A by fresh variables, and will gradually compute a substitution which will fill those in. The resulting algorithm is called algorithm W [DM82]. It is very close to the algorithm we have seen in section 4.4.3 excepting that, instead of generating type equations and solving them afterward in order to obtain a substitution, we compute the substitution during the inference. We can express this algorithm using sequents of the form Γ ` t : A|σ where Γ is a context, t a term, A a type and σ a substitution. The rules are the following ones, they should be read as producing A and σ from Γ and t: Γ(x) = A (ax) Γ ` x : !A | id Γ, x : X ` t : B | σ X fresh (→I ) Γ ` λx.t : X[σ] → B | σ Γ ` t : C |σ

Γ ` u : A | σ0 X fresh σ 00 = mgu(A → X, C) (→E ) 00 00 0 Γ ` t u : X[σ ] | σ ◦ σ ◦ σ Γ ` t : A|σ Γ[σ], x : ∀Γ[σ] A ` u : B | σ 0 (let) Γ ` let x = t in u : B | σ 0 ◦ σ

Those can be explained as follows. (ax) Given the type scheme A associated to the variable x in the context, we declare that the type for x is !A, under the identity substitution. Here, !A is a notation for the instantiation of A, by which we mean that we have replaced all the universally quantified variables by fresh ones (i.e. variables

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

202

which do not already occur in Γ). If the type scheme A is ∀X1 . . . . ∀Xn .A, the type !A is thus A[Y1 /X1 , . . . , Yn /Xn ] where the variables Yi are fresh and distinct. (→I ) In order to infer the type of λx.t, we have to guess a type for x and infer the type of t in the context where x has this type. Since we have no idea of what this type should be, we simply infer the type of t in the environment where x has type X, a fresh type variable. This will result in a type B and a substitution σ such that Γ[σ], x : X[σ] ` t : B and therefore we can deduce that λx.t has type X[σ] → B. (→E ) We first infer a type A for u, and the type C for t. In order for t u to be typable C should be of the form A → B. We therefore use the unification procedure described in section 5.4 in order to compute the most general substitution σ 00 such that σ 00 (A → X) = σ(B) for some fresh variable X, and we will have B = X[σ]; this substitution being noted σ 00 = mgu(A → X, C) (here, “mgu” means most general unifier, see section 5.4.2). We deduce that t u has the type B we have computed. (let) There is no real novelty in this rule compared to earlier. In order to infer the type of let x = t in u, we infer a type A for t and then infer a type B for u in the environment where x has the type scheme obtained by generalizing A with respect to Γ. This algorithm generates a valid type according to the previous rules: Theorem 4.4.4.4 (Correctness). If Γ ` t : A | σ is derivable then Γ[σ] ` t : A is derivable. Moreover, it is actually the most general one that could be inferred: Theorem 4.4.4.5 (Principal types). Suppose that Γ ` t : A | σ is derivable. Then, for every substitution τ and type B such that Γ[τ ] ` t : B there exists a substitution τ 0 such that τ = τ 0 ◦ σ and B = A[τ 0 ]. Example 4.4.4.6. Here are some principal types which can be computed with the algorithm: λx.let y = x in y : X → X λx.let y = λz.x in y : X → Y → X λx.let y = λz.x z in y : (X → Y ) → (X → Y ) Implementing algorithm W. Algorithm W can be coded by suitably implementing the above rules. We define the type of terms as type term = | Var of var | App of term * term | Abs of var * term | Let of var * term * term where var is an alias for int for clarity. Then type schemes are encoded as the following type:

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

203

type ty = | EVar of int (* non-quantified variable *) | UVar of int (* universally quantified variable *) | TArr of ty * ty Here, instead of universally quantifying some variables, we use two constructors: UVar n is a variable which is universally quantified, and EVar n is a variable which is not. The generation of fresh type variables can be achieved with a counter, as usual: let fresh = let n = ref (-1) in fun () -> incr n; EVar !n Next, the instantiation of a type scheme is performed by replacing each universal variable with a fresh existential one (we use a list tenv in order to remember when a universal variable has already been replaced by some variable, in order to always replace it by the same variable): let inst = let tenv = ref [] in let rec inst = function | UVar x -> if not (List.mem_assoc x !tenv) then tenv := (x, fresh ()) :: !tenv; List.assoc x !tenv | EVar x -> EVar x | TArr (a, b) -> TArr (inst a, inst b) in inst The following function checks whether a variable occurs in a type: let | | |

rec occurs x = EVar y -> UVar _ -> TArr (a, b) ->

function x = y false occurs x a || occurs x b

We can then generalize a type with respect to a given context by changing each variable EVar n which does not occur in the context into the corresponding universal variable UVar n: let rec gen env = function | EVar x -> if List.exists (fun (_,a) -> occurs x a) env then EVar x else UVar x | UVar x -> UVar x | TArr (a, b) -> TArr (gen env a, gen env b) We can finally implement the function which will infer the type of a term in a given environment and return it together with the corresponding substitution. The four cases of the match correspond to the four different rules above: let rec infer env = function | Var x ->

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

204

let a = try List.assoc x env with Not_found -> raise Type_error in inst a, Subst.id | Abs (x, t) -> let a = fresh () in let b, s = infer ((x,a)::env) t in TArr (Subst.app s a, b), s | App (t, u) -> let a, su = infer env u in let b = fresh () in let c, st = infer env t in let s = unify (TArr (a, b)) c in Subst.app s b, Subst.comp s (Subst.comp su st) | Let (x, t, u) -> let a, st = infer env t in let b, su = infer ((x, gen (Subst.app_env st env) a)::env) u in b, Subst.comp su st We have implemented substitutions as functions int -> ty associating a type to a type variable. The functions Subst.id, Subst.comp, Subst.app and Subst.app_env respectively compute the identity substitution, the composite of substitutions and the application of a substitution to a term and to an environment. Their implementation is left to the reader. Finally, above, the function unify implements the unification algorithm described in section 5.4: let rec unify l = match l with | (EVar x, b)::l -> if occurs x b then raise Type_error; Subst.comp (unify l) (Subst.make [x, b]) | (a, EVar x)::l -> unify ((EVar x, a)::l) | (TArr (a, b), TArr (a', b'))::l -> unify ([a, a'; b, b']@l) | (UVar _, _)::_ | (_, UVar _)::_ -> assert false | [] -> Subst.id let unify a b = unify [a, b] Algorithm J. The previous algorithm is theoretically nice. In particular, it is well adapted to making correctness proofs. However, it is quite inefficient: we have to apply substitutions to many types (including to the context) and we have to go through the context to look for type variables which have been used. As in section 4.4.3, a solution consists in modifying type variables in place by using references. The resulting algorithm is sometimes called algorithm J. We thus change the implementation of types to type ty = | EVar of tvar ref (* non-quantified variable *) | UVar of int (* universally quantified variable *)

CHAPTER 4. SIMPLY TYPED λ-CALCULUS | and | |

TArr tvar Unbd Link

205

of ty * ty = of int (* unbound variable *) of ty (* substituted variable *)

Most functions are straightforwardly adapted. The main novelties are the unification function, which now performs the modification of types in place: let rec unify a b = match unlink a, unlink b with | EVar x, _ -> if occurs x b then raise Type_error else x := Link b | _, EVar y -> unify b a | TArr (a1, a2), TArr (b1, b2) -> unify a1 b1; unify a2 b2 | _ -> raise Type_error and the type inference function which is simpler to write, because it does not need to propagate the substitutions: let rec infer env = function | Var x -> (try inst (List.assoc x env) with Not_found -> raise Type_error) | Abs (x, t) -> let a = fresh () in let b = infer ((x,a)::env) t in TArr (a, b) | App (t, u) -> let a = infer env u in let b = fresh () in let c = infer env t in unify (TArr (a,b)) c; b | Let (x, t, u) -> let a = infer env t in infer ((x, gen env a)::env) u The substitutions are now very efficiently performed because we do not have to go through terms anymore, references are doing the job for us. There is however one last source of inefficiency in this code: in the function unify, the function occurs x b has to go through all the type b to see whether the variable x occurs in it or not. There is a very elegant solution to this due to Rémy [Rém92] that we learned from [Kis13]. To each type variable, we are going to assign an integer called its level, which indicates the depth of let-declaration when it was created. Initially, the level is 0 by convention and in an expression let x = t in u at level n, the variables created by t will have level n + 1, whereas the variables of u will still have level n (it is some sort of de Bruijn index). For instance, in the term let a = (let b = λx.x in λy.y) in λz.z the type variables associated to x, y and z will have level 2, 1 and 0 respectively.

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

206

This can be figured graphically as follows: level 2:

let a = (let b = λx.x in λy.y) in λz.z

level 1:

let a = (let b = λx.x in λy.y) in λz.z

level 0:

let a = (let b = λx.x in λy.y) in λz.z

One can convince himself that, in a term let x = t in u, the variables which should be generalized in the rule (let) are those which were “locally created” during the inference of t, i.e. those which are at a strictly higher level than the current one. We thus modify our implementation once more. We begin by declaring a global reference, which will record the current level when performing the type inference, along with two functions in order to increase and decrease the current level: let level = ref 0 let enter_level () = incr level let leave_level () = decr level We also change the representation of type variables: the constructor for unbound variables becomes | Unbd of int * int (* unbound variable (name / level) *) It now takes two integers as argument: the number of the variable (acting as its name) and its level. In the generalization function, we only generalize variables which are above the current level: let rec gen a = match a with | EVar x -> if tlevel x UVar x | TArr (a, b) -> TArr (gen a, gen b) where tname and tlevel respectively return the name and type of a type variable. Finally, levels get updated in the infer function whose only change is in the Let case: | Let (x, t, u) -> enter_level (); let a = infer env t in leave_level (); infer ((x, gen a)::env) u We increase the current level when typechecking the definition and decrease it afterward. Example 4.4.4.7. In the function λx.let y = λz.z in y the type variable Z associated to z has level 1, so that it gets generalized in the type of y, because y is declared at level 0 and 0 < 1: in the environment, y will have the type scheme ∀Z.Z → Z. However, in the function λx.let y = λz.x in y the type variable X associated to x does not get generalized because it is of level 0, so that y has the type scheme ∀Y.Y → X, and not ∀X.∀Y.Y → X.

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

207

There is a catch however: it might happen that, during unification (see the function unify above), a variable X with low level ` gets substituted with a type A containing variables of high level. In this case, for every variable in A, the level should be lowered to the minimum of this level and ` before performing the substitution: the levels get “contaminated” by the one of the variable they are unified with. However, we are smart and see that the function occurs is already going through the type just before we substitute, and it is the only place where it is used, so that we can use it to both check the occurrence and update the levels. We therefore change it to let rec occurs x a = match unlink a with | EVar y when x = y -> raise Type_error | EVar y -> let l = tlevel y in let l = match !x with Unbd (_,l') -> min l l' | _ -> l in y := Unbd (tname y, l) | UVar _ -> () | TArr (a, b) -> occurs x a; occurs x b which changes the level of all variables to the minimum of their old level and the current level. Without this modification of occurs, for the term λx.let y = λz.x z in y we would infer the unsound type (X → Y ) → (Z → W ) instead of the expected type (X → Y ) → (X → Y ). 4.4.5 Bidirectional type checking. We present here another approach to type checking, which does not try to come up with new or most general types: this means that we will fail to infer the type for terms when we are not certain about this type (for instance, if the term can have multiple types). However, we will try to exploit as much as possible the already known type information about terms. This is less powerful than previous methods in the context of λ-calculus, but has the advantage of being simple to implement and of generalizing well to richer logics, where principal types do not exist or type inference is undecidable, see chapter 8. When implementing type inference, we can see that two different phases are actually involved: – type inference: we come up with a type for the term, – type checking: we make sure that the term has a given type. For instance, when performing the type inference for a term t u, we first infer the type for t, which should be of the form A → B, and then we check that u has type A. Of course, this checking part is usually done by inferring a type for u and comparing it with A, but in some situations we can exploit the fact that we are checking that the term t has type A, and that this A does bring us some information. For instance, we can check that the term λx.x has the type X → X, but we cannot unambiguously infer a type for λx.x because it admits multiple types (we have seen that there are canonical choices such as the

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

208

principal type, see section 4.4.2, but here we do not want to make any choice for the user). This suggests splitting the usual typing judgment Γ ` t : A in two: – Γ ` t ⇒ A: we infer the type A for the term t in the context Γ, – Γ ` t ⇐ A: we check that the term t has type A in the context Γ. We will consider terms of the form t, u ::= x | λx.t | t u | (t : A) The only new construction is the last one, (t : A), which means “check that t has type A”. It will become handy since it allows bringing type information in terms and is already present in languages such as OCaml, where we can define the identity function on integers by let id = fun x -> (x : int) The rules for type inference and checking are the following ones: Γ ` x ⇒ Γ(x)

(ax)

Γ`t⇒A→B Γ`u⇐A (→E ) Γ ` tu ⇒ B

Γ, x : A ` t ⇐ B (→I ) Γ ` λx.t ⇐ A → B

Γ`t⇐A (cast) Γ ` (t : A) ⇒ A

Γ`t⇒A (sub) Γ`t⇐A

They read as follows: (ax) If we know that x has type A then we can come up with a type for x: namely A. (→E ) If we can infer a type A → B for t and check that u has type A then we can infer the type B for t u. (→I ) In order to check that λx.t has type A → B, we should check that t has type B when x has type A. (cast) We can infer the type A for (t : A) provided that t actually has type A. (sub) This subsumption rule states that, as last resort, if we do not know how to check that a term t has type A, we can go back to the old method of inferring a type for it and ensuring that this type is A. Note that there is no rule for inferring the type of λx.t, because there is no way to come up with a type for x without type annotations. Again, this means that we cannot infer a type for the identity λx.x, but we can in presence of type annotations: (ax)

x:A`x⇒A (sub) x:A`x⇐A (→I ) ` λx.x ⇐ A → A (cast) ` (λx.x : A → A) ⇒ A → A

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

209

An implementation is provided in figure 4.6: the two modes (type inference and checking) are implemented by two mutually recursive functions (infer and check). There are two kind of errors that can be raised: Type_error means that the term is ill-typed as usual, and Cannot_infer means that the algorithm could not come up with a type, but the term might still be typable. Example 4.4.5.1. For illustration purposes, we suppose we have access to real (or float) numbers, of type R, with the rule Γ`r⇒R for every real number r. We also suppose that we have access to the usual mathematical functions (addition, multiplication), as well as a function which computes the mean of a function between two points, i.e. Γ contains mean : (R → R) → R → R → R We can then type Γ, x : R ` x ⇒ R Γ, x : R ` x ⇐ R Γ ` mean ⇒ (R → R) → R → R → R Γ ` λx.x ⇐ R → R Γ ` mean (λx.x × x) ⇒ R → R → R Γ ` mean (λx.x × x) 5 ⇒ R → R Γ ` mean (λx.x × x) 5 7 ⇒ R

Γ`5⇒R Γ`5⇐R

Γ`7⇒R Γ`7⇐R

However, we cannot infer the type for the function λf xy.(f x + f y)/2 which would be the definition of mean. When defining a function we have to give its type and cast it accordingly: we can type (λf xy.(f x + f y)/2 : (R → R) → R → R → R) This is why in a programming language such as Agda you have to declare the type of a function when defining it: mean : (R → R) → R → R → R mean f x y = (x + y) / 2 Remark 4.4.5.2. If we omit the rule (cast), it is interesting to note that the terms v such that Γ ` v ⇐ A and the terms n such that Γ ` n ⇒ A is derivable for some context Γ and type A are respectively generated by the grammars v ::= λx.t | n

n ::= x | n v

which is precisely the traditional definition of values (also called normal forms) and neutral terms (already encountered in section 3.5.2 for instance).

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

(** Types. *) type ty = | TVar of string | TArr of ty * ty type var = string (** Terms. *) type term = | Var of var | App of term * term | Abs of var * term | Cast of term * ty exception Cannot_infer exception Type_error (** Type inference. *) let rec infer env = function | Var x -> (try List.assoc x env with Not_found -> raise Type_error) | App (t, u) -> ( match infer env t with | TArr (a, b) -> check env u a; b | _ -> raise Type_error ) | Abs (x, t) -> raise Cannot_infer | Cast (t, a) -> check env t a; a (** Type checking. *) and check env t a = match t , a with | Abs (x, t) , TArr (a, b) -> check ((x, a)::env) t b | _ -> if infer env t a then raise Type_error Figure 4.6: Bidirectional type checking.

210

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

211

4.5 Hilbert calculus and combinators We have seen in section 3.6.3 that every λ-term can be expressed using application and the two combinators S and K, respectively corresponding to the λ-terms S = λxyz.(xz)(yz)

K = λxy.x

We have also seen in example 4.4.2.5 that the principal types of those terms are respectively (X → Y → Z) → (X → Y ) → X → Z

and

X→Y →X

and

A→B→A

which means that they respectively have the type (A → B → C) → (A → B) → A → C

for every types A, B and C. It is thus natural to consider a typed version of combinatory terms (see section 3.6.3) expressed by rules, where types and contexts are defined as above, and sequents are of the form Γ`t:A where Γ is a context, t is a combinatory term and A is a type. The rules are Γ ` x : Γ(x)

(ax)

Γ ` S : (A → B → C) → (A → B) → A → C Γ`K:A→B→A

(S)

(K)

Γ`t:A→B Γ`u:A (→E ) Γ ` tu : B where in (ax) we suppose that x ∈ dom(Γ). If we apply an analogue of the term erasing procedure (section 4.1.7), we obtain the following logical system: Γ, A, Γ0 ` A

(ax)

Γ ` (A ⇒ B ⇒ C) ⇒ (A ⇒ B) ⇒ A ⇒ C Γ`A⇒B⇒A Γ`A⇒B Γ`B

(S)

(K)

Γ`A

(⇒E )

which is precisely the Hilbert calculus described in section 2.7! In other words, in the same way that natural deduction corresponds, via the Curry-Howard correspondence, to simply typed λ-calculus, Hilbert calculus corresponds to typed combinatory terms. This was first observed by Curry [CF58].

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

212

Example 4.5.0.1. We have seen in example 3.6.3.1 that, in combinatory logic, identity could be expressed as I = SKK Its typing derivation is (S)

(K)

` S : (A → (B → A) → A) → (A → B → A) → A → A ` K : A → (B → A) → A (→E ) ` S K : (A → B → A) → A → A ` SKK : A → A

`K:A→B→A

(K) (→E )

from which, by term erasure, we recover the proof of A ⇒ A in Hilbert calculus given in example 2.7.1.1.

4.6 Classical logic Since classical logic is an extension of intuitionistic logic, in the sense that we have more rules, we can expect that the Curry-Howard correspondence can be extended to classical logic. For various reasons, it has been thought for a long time that classical logic had no computational contents, one being that naively imposing A ⇔ ¬¬A makes all proofs of a given type equal, see section 2.5.4. It was thus somehow a surprise when Parigot introduced the λµ-calculus [Par92], which is an extension of the λ-calculus suitable for classical logic. In section 2.5.9, we have analyzed the proof of ¬A ∨ A (or rather its encoding in intuitionistic logic). The main ingredient is the ability to “rollback” to a previous proof goal at any point in the proof: we prove ¬A and at some point we change our mind, go back to proving ¬A ∨ A, and chose proving A instead. In λµ-calculus, this is achieved by a sort of “exception” mechanism: instead of going on with the computation, we might raise an exception which is going to be catched and change the computation flow. However, in this calculus, the exceptions follow a very particular discipline, making them behaving not exactly as in usual languages such as OCaml. 4.6.1 Felleisen’s C. Let us try to naively try to extend the Curry-Howard correspondence to classical logic. For clarity, we write here ⊥ instead of 0 for the type corresponding to falsity. Starting from implicative intuitionistic natural deduction, classical logic can be obtained by adding the rule Γ ` ¬¬A (¬¬E ) Γ`A corresponding to double negation elimination. This suggests that we should add a corresponding construction, say C(t), to our term calculus together with the typing rule Γ ` t : ¬¬A (¬¬E ) Γ ` C(t) : A This calculus allows for “static” Curry-Howard correspondence in a similar sense to theorem 4.1.7.1: there is a bijective correspondence between typing derivation of λ-terms with C and proofs in natural deduction with double negation elimination.

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

213

In order to hopefully extend this to a dynamical correspondence, we need to introduce a notion of reduction, which should correspond to cut elimination. First remark that, contrarily to usually, we do not need to add an introduction rule for double negation. Namely, recalling that ¬A = A → ⊥, we can construct a proof of ¬¬A from a proof of A as follows: .. . Γ, k : A → ⊥ ` k : A → ⊥ Γ, k : A → ⊥ ` t : A (→E ) Γ, k : A → ⊥ ` k t : ⊥ (ax)

(→I )

Γ ` λk A→⊥ .k t : (A → ⊥) → ⊥

In other words, the introduction rule for double negation should be Γ`t:A Γ ` λk ¬A .k t : ¬¬A

(¬¬I )

We can therefore “guess” the reduction rule associated to C by observing the corresponding cut elimination: π Γ`t:A Γ ` λk ¬A .k t : ¬¬A Γ ` C(λk

¬A

.k t) : A

(¬¬I )

π Γ`t:A

(¬¬E )

The β-reduction rule should thus be C(λk ¬A .k t) −→β t Note that this rule only makes sense when k does not occur in t, otherwise the bound variable k could escape its scope... This is the main reduction rule associated to C, but it turns out that two more reduction rules are required for C: C(λk ¬A .k t) −→β t C(λk C(λk

¬A

¬(A→B)

.k C(λk

.t) u −→β C(λk

0¬A

.t)) −→β C(λk

if k 6∈ FV(t) 0¬B

.t[λf

00¬A

A→B

.k (f u)/k])

00

.t[k /k, k 00 /k 0 ])

The second rule somehow states that the application to the argument u goes through under C: if our calculus had products or coproducts, similar rules would have to be added in order to enforce their compatibility with C. Let us try to understand what this could mean. Suppose given a term v of type ¬¬A. Since ¬¬A = (A → ⊥) → ⊥, this means that v must be an abstraction taking an argument k of type A → ⊥ and return a value of type ⊥, i.e. v will typically reduce to a term of the form λk A→⊥ .u. Since there is no introduction rule for ⊥ (there is no way of directly constructing a term of type ⊥), at some point during the evaluation of u, it must apply k to some argument t of type A in order to produce the value of type ⊥, i.e. v will reduce to λk A→⊥ .k t. Thus, C(v) will reduce to C(λk A→⊥ .k t), which will reduce to t. A typical reduction is thus ∗



C(v) −→β C(λk ¬A .u) −→β C(λk ¬A .k t) −→β t

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

214

This means that C(v) waits for v to apply its argument k to some term t of type A and returns this argument t. The term k can thus be thought of as an analogous of return in some languages such as C, or maybe also as the raise operator of OCaml which raises exceptions (more on this later on). However, things are more subtle here because the returned term might itself use some of the terms computed during the evaluation of v. In order to see that in action, let us compute the term associated to the usual proof of ¬A ∨ A: (ax)

k : ¬(¬A ∨ A), a : A ` a : A (∨rI ) k : ¬(¬A ∨ A), a : A ` ιr (a) : ¬A ∨ A (¬E ) k : ¬(¬A ∨ A), a : A ` k ιr (a) : ⊥ k : ¬(¬A ∨ A) ` λaA .k ιr (a) : ¬A

(¬I )

k : ¬(¬A ∨ A) ` ιl (λaA .k ιr (a)) : ¬A ∨ A k : ¬(¬A ∨ A) ` k ιl (λaA .k ιr (a)) : ⊥ ` λk ¬A .k ιr (λaA .k ιl (a)) : ¬¬(¬A ∨ A) ` C(λk ¬A .k ιl (λaA .k ιr (a))) : ¬A ∨ A

(∨lI ) (¬E ) (¬¬I ) (¬¬E )

As indicated above, the term C(λk ¬A .k ιl (λaA .k ιr (a))) cannot reasonably reduce to t = ιl (λaA .k ιr (a)) because the variable k occurs in t. The additional rules make it so that it however acts as t, i.e. it states that it is a proof of ¬A = A → ⊥, albeit being surrounded by C(λk ¬A .k . . .). If, at some point, we use this proof and apply it to some argument u of type A, the term will thus reduce to C(λk ¬A .k ιr (u)) which in turn will reduce to ιr (u) by the reduction rule associated to C. It thus fakes being a proof of ¬A until we actually use this proof and apply it to some argument u of type A, at which point it changes its mind and declares that it was actually a of A, namely u. This is exactly the behavior we were describing in section 2.5.2, when explaining that classical logic allows to “resetting proofs”. Variants of the calculus. The operator C is due to Felleisen [FH92] and the observation that it could be typed by double negation elimination was first made by Griffin [Gri89], see also [SU06, chapter 7]. There are many small possible variants of the calculus. First note that we could add C (as opposed to C(t)) as a constant to the language, which corresponds to adding double negation elimination as an axiom instead of a rule: Γ ` C : ¬¬A → A If we instead use Clavius’ law instead of double negation, see theorem 2.5.1.1, then we would have defined an operator cc called callcc (for call with current continuation): Γ ` cc : (¬A → A) → A

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

215

This operator is implemented in languages such as Scheme and C is a generalization of it: we have cc(λk.t) = C(λk.k t). Finally, double negation elimination can also be implemented by the rule Γ, ¬A ` ⊥ Γ`A which suggests the following variant of the calculus Γ, x : ¬A ` t : ⊥ Γ ` µxA .t : A This means that we now add a construction µxA .t to our terms which corresponds to µxA .t = C(λx¬A .t) in the previous calculus. In the next section, we see an alternative calculus, based on the similar ideas though, with much nicer and intuitive reduction rules. 4.6.2 The λµ-calculus. Let us now introduce the λµ-calculus [Par92]. We suppose fixed two sorts of variables: the term variables x, y, etc. which behave as usual and the control variables α, β, etc. which can be thought of as variables of negated types. The terms are generated by the grammar t, u ::= x | t u | λx.t | µα.t | [α]t The first constructions are the usual ones from the λ-calculus. A term of the form µα.t should be thought of as a term catching an exception named α and a term [α]t as raising the exception α with the value t. The reduction will make it so that the place where it is catched is replaced by t. For instance, we will have a reduction ∗ t (µα.u ([α]v)) −→ t v meaning that during the evaluation of the argument of t, the exception α will be raised with value v and will thus replace the term at the corresponding µα. The constructor µ is considered as a binder and terms are considered modulo α-equivalence: µα.t = µβ.(t[β/α]). Beware of the unfortunate similarity in notation between raising and substitution. The three reduction rules of the calculus are – the usual β-reduction: (λx.t)u −→β t[u/x] – the following rule commuting applications and µ-abstractions: (µα.t) u −→β µβ.t[[β]−u/[α]−] where the weird notation for in the substitution means that we should replace every subterm of t of the form [α]v by [β]vu, – the following reduction rule for µ, stating that if we catch exceptions raised on α and immediately re-raise on β, we might as well raise them directly on β: [β](µα.t) −→β t[β/α]

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

216

Additionally, we require the following η-reduction rule, stating that if we catch on α and immediately re-raise on α, we might as well do nothing: µα.[α]t −→η t when α 6∈ FV(t). It is proved in [Par92] that Theorem 4.6.2.1. The λµ-calculus is confluent. Remark 4.6.2.2. The translation between previous calculus based on C and λµ-calculus was already hinted at at the end of previous section: µα.t corresponds to C(λα.t) and [α] corresponds to applying the argument given by C. More formally, the operators cc and C can be encoded in the λµ-calculus as cc = λy.µα.[α](y (λx.µβ.[γ]x)) C = λy.µα.[β](y (λx.µγ.[α]x)) The intuition is thus that µα.t corresponds to some sort of OCaml construction creating an exception and catching it: let exception Alpha of 'a in try t with Alpha u -> u and [α]u would correspond to raising the exception: raise (Alpha u) However, there are differences. First, the name of the exception is generated on the fly instead of being hard-coded: we have an α-conversion rule for µ binders. More importantly, the exceptions can never escape their scope in λµ-calculus, contrarily to OCaml. For instance, consider the following program in OCaml: let f : int -> int = let exception Alpha of (int -> int) in try fun n -> raise (Alpha (fun x -> n * x)) with Alpha g -> g let () = print_int (f 3) Although, the exception Alpha seems to be catched (the raise is surrounded by a try / catch), executing the program results in Fatal error: exception Alpha(_) meaning that it was not the case: when executing f 3, f is replaced by its value and the reduction raises the exception. The analogous of this program in λµ is f = µα.[α](λn.[α](λx.n × x)) (we allow ourselves to use integers and multiplication). It does not suffer from this problem, and corresponds to a function which, when applied to an argument n, turns into the function which multiplies by n. When we apply it to 3, it thus turns to the function which multiplies its argument (which is 3) by 3 and the result will actually be 9 as expected: f 3 −→ µβ.[β](λn.[β](λx.n × x)3)3 −→ µβ.[β][β](λx.3 × x)3 −→ µβ.[β][β](3 × 3)

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

217

which is η-equivalent to 3 × 3 using the two η-conversion rules. Another possible interpretation is that µα.t stores the current evaluation context and [α]u restores the evaluation context of the corresponding µα before executing u: it is as if the term t had never been executed. In the above example, it is as if f had directly been defined as λx.n × x. 4.6.3 Classical logic as a typing system. In order to type the λµ-calculus, we consider types of the form A, B ::= X | A → B | ⊥ We also consider a Church variant of the calculus, where λ- and µ-abstracted variables are decorated by their types. The sequents are of the form Γ ` t : A|∆ with t a term, A a type, and Γ and ∆ contexts of the form Γ = x1 : B1 , . . . , x m : Bm

∆ = α1 : A1 , . . . , αn : An

where the variables of Γ are regular ones, whereas those of ∆ are control variables. Namely, Γ provides the type of the free variables of t as usual, whereas ∆ gives the type of exceptions that might be raised. Finally, A is the type of the result of t, which might never be actually given if some exception is raised. In particular, a term of type ⊥ is called a command: we know that it will never return a value, and thus necessarily raises some exception. The typing rules for λµ-calculus are Γ, x : A, Γ0 ` x : A | ∆

(ax)

Γ ` t : A → B |∆ Γ ` u : A|∆ (→E ) Γ ` tu : B |∆ Γ ` t : ⊥ | ∆, α : A, ∆0 A

Γ ` µα .t : A | ∆, ∆

0

(⊥E )

Γ, x : A ` t : B | ∆ Γ ` λxA .t : A → B | ∆

(→I )

Γ ` t : A | ∆, α : A, ∆0 (⊥I ) Γ ` [α]t : ⊥ | ∆, α : A, ∆0

The rule (⊥E ) says that a term µαA .t of type A is a command which raises some value of type A on α and the rule (⊥I ) says that a term [α]t is a command (of type ⊥, not returning anything) and that the type A of the raised term t has to match the one expected for α. Exercise 4.6.3.1. Show that the Pierce law ((A → B) → A) → A is the type of the following term: λx(A→B)→A .µαA .[α](x (λy A .µβ B .[α]y)) It can be shown that this system has the expected properties [Par92, Par97], which were detailed above in the case of simply-typed λ-calculus:

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

218

Theorem 4.6.3.2 (Subject reduction). If Γ ` t : A | ∆ is derivable and t −→β t0 then Γ ` t0 : A | ∆ is also derivable. Theorem 4.6.3.3 (Strong normalization). Typed λµ-terms are strongly normalizing. If we erase the terms from the rules, we obtain the following presentation of classical logic: Γ, A, Γ0 ` A, ∆

(ax)

Γ ` A ⇒ B, ∆ Γ ` A, ∆ (⇒E ) Γ ` B, ∆

Γ, A ` B, ∆ (⇒I ) Γ ` A ⇒ B, ∆

Γ ` ⊥, ∆, A, ∆0 (⊥E ) Γ ` A, ∆, ∆0

Γ ` A, ∆, A, ∆0 (⊥I ) Γ ` ⊥, ∆, A, ∆0

All the rules are the usual ones excepting for the rule (⊥I ) which combines weakening, contraction and exchange: Γ ` A, ∆, A, ∆0 (xch) Γ ` ∆, A, A, ∆0 (contr) Γ ` ∆, A, ∆0 (wk) Γ ` ⊥, ∆, A, ∆0 The list of formulas in ∆ (nor Γ) is not supposed to be commutative, and introduction and elimination rules are always performed on the leftmost formula. During proof search, we can however put another formula of ∆ on the left using elimination and introduction rules for ⊥, as shown on the left (and the corresponding typing derivation is figured on the right): Γ ` B, A, ∆, B, ∆0 (⊥I ) Γ ` ⊥, A, ∆, B, ∆ (⊥E ) Γ ` A, ∆, B, ∆0

Γ ` t : B | α : A, ∆, β : B, ∆0 (⊥I ) Γ ` [β]t : ⊥ | A, ∆, B, ∆ Γ ` µαA .[β]t : A | ∆, β : B, ∆0

(⊥E )

Adding the usual rules for coproducts, we can show the excluded middle as follows in this settings x : A ` x : A | α : ¬A ∨ A

(ax)

x : A ` ι¬A r (x) : ¬A ∨ A | α : ¬A ∨ A x : A ` [α]ι¬A r (x) : ⊥ | α : ¬A ∨ A ` λxA .[α]ι¬A r (x) : ¬A | α : ¬A ∨ A

(∨rI ) (⊥I ) (¬I )

A ¬A ` ιA l (λx .[α]ιr (x)) : A ∨ B | α : ¬A ∨ A

(∨lI )

A ¬A ` [α]ιA l (λx .[α]ιr (x)) : ⊥ | α : ¬A ∨ A A ¬A ` µα¬A∨A .[α]ιA l (λx .[α]ιr (x)) : ¬A ∨ A |

(⊥I ) (⊥E )

In order to give a more concrete idea of this program, let us try to implement it in OCaml. Remember from section 1.5 that the empty type ⊥ can be implemented as

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

219

type bot and negation as type 'a neg = 'a -> bot From those, the above term proving excluded middle can roughly be translated as let em () : (a neg, a) sum = let exception Alpha of (a neg, a) sum in try Left (fun x -> raise (Alpha (Right x))) with Alpha x -> x As explained above, this does not behave exactly as it should in OCaml, because exceptions are not properly scoped there... 4.6.4 A more symmetric calculus. The reduction rule for (µα.t) u in the λµ-calculus involves a slightly awkward substitution. In order to overcome this defect and reveal the symmetry of terms and environments, Curien and Herbelin µ-calculus [CH00]. In have introduced a variant of the λµ-calculus called the λµ˜ this calculus there are three kinds of “terms”: terms:

t ::= x | λx.t | µα.c

environments:

e ::= α | t · e | µ ˜α.c c ::= ht | ei

commands: with reduction rules

hλx.t | u · ei −→ hu | µ ˜x.ht | eii hµα.c | ei −→ c[e/α] ht | µ ˜x.ci −→ c[t/x] The typing judgments are of the three possible forms Γ ` t : A|∆

Γ|e : A ` ∆

c : (Γ ` ∆)

and the rules are Γ | α : A ` α : A, ∆

(axL )

Γ, x : A ` x : A | ∆

(axR )

Γ ` t : A|∆ Γ|e : B ` ∆ (→L ) Γ, t · e : A → B ` ∆

Γ, x : A ` t : B | ∆ (→R ) Γ ` λx.t, A → B | ∆

c : (Γ, x : A ` ∆) (⊥L ) Γ|µ ˜x.c : A ` ∆

c : (Γ ` α : A, ∆) (⊥R ) Γ ` µα.c : A | ∆

Γ ` t : A|∆ Γ|e : A ` ∆ ht | ei : (Γ ` ∆) You are strongly encouraged to observe their beautiful symmetry and find out their meaning by yourselves. In particular, Lafont’s critical pair presented in

CHAPTER 4. SIMPLY TYPED λ-CALCULUS

220

section 2.5.4 corresponds to the fact that the following term can reduce in two different ways, showing that the calculus is not confluent (for good reasons!): c[˜ µx.d/α] ←− hµα.c | µ ˜x.di −→ d[µα.c/x] In particular, if α is not free in c and x is not free in d, c and d are convertible...

Chapter 5

First-order logic First-order logic is an extension of propositional logic where propositions are allowed to depend on terms over some fixed signature, and are then called predicates. For instance, equality can be encoded as a predicate t = u which depends on two terms t and u. There are thus two worlds at stake: the world of logic, where formulas live, and the world of data, where terms live. This logic is the one traditionally considered in mathematics (in particular, we will see that it can be used to formally state the axioms of set theory). Good introductions on the subject include [CK90, CL93]. We define first-order logic in section 5.1, present some well-known first-order theories in section 5.2, and detail the particular case of set theory in section 5.3 (including in the intuitionistic setting). Finally, the first-order unification algorithm is presented in section 5.4.

5.1 Definition 5.1.1 Signature. A signature Σ is a set of function symbols together with a function a : Σ → N associating an arity to each symbol: f can be thought of as a formal operation with a(f ) inputs. In particular, symbols of arity 0 are called constants. 5.1.2 Terms. We suppose fixed an infinite countable set X of variables. Given a signature Σ, the set TΣ of terms is the smallest set such that – every variable is a term: X ⊆ TΣ , – terms are closed under operations: given f ∈ Σ with a(f ) = n and t1 , . . . , tn ∈ TΣ , we have f (t1 , . . . , tn ) ∈ TΣ . This can also be stated as the fact that terms are generated by the grammar t ::= x | f (t1 , . . . , tn ) where x is a variable, f is a term of arity n and the ti are terms. We often implicitly suppose fixed a signature and simply write T instead of TΣ . Example 5.1.2.1. Consider the signature Σ = {+ : 2, 0 : 0}. This notation means that it contains two functions symbols + and 0, whose arities are respectively a(+) = 2 and a(0) = 0. Examples of terms over this signature are +(x, 0())

+ (+(x, x), +(y, 0()))

+ (0(), 0())

In the following, we generally omit parenthesis for constants, e.g. write 0 instead of 0().

CHAPTER 5. FIRST-ORDER LOGIC

222

Given a term t, its set of subterms ST(t) is defined by induction on t by ST(x) = {x} ST(f (t1 , . . . , tn )) = {f (t1 , . . . , tn )} ∪

[

ST(ti )

16i6n

We say that u is a subterm of t when u ∈ ST(t), it is a strict subterm when it is distinct from t. 5.1.3 Substitutions. A substitution is a function σ : X → T such that the set {x ∈ X | σ(x) 6= x} is finite. Given a term t, we write t[σ] for the term obtained from t by replacing every variable x by σ(x): x[σ] = σ(x)

(f (t1 , . . . , tn ))[σ] = f (t1 [σ], . . . , tn [σ])

We sometimes write σ = [t1 /x1 , . . . , tn /xn ] for the substitution such that σ(xi ) = ti and σ(x) = x for x 6= xi for every 1 6 i 6 n. A renaming is a substitution such that the term σ(x) is a variable, for every variable x. 5.1.4 Formulas. We suppose fixed a set P of predicates (also sometimes called relation symbols) together with a function a : P → N associating an arity to each predicate. A formula A is an expression generated by the grammar A ::= P (t1 , . . . , tn ) | A ⇒ B | A ∧ B | > | A ∨ B | ⊥ | ¬A | ∀x.A | ∃x.A where P is a predicate of arity n, the ti are terms and x ∈ X is a term variable. The quantifications bind the less tightly, e.g. ∀x.A ∧ B is implicitly bracketed as ∀x.(A ∧ B) and not (∀x.A) ∧ B. Note that the definition of formulas depends both on the considered signature Σ and the considered set P of predicates: we sometimes say a formula on (Σ, P) to make this precise, although we generally leave it implicit. Example 5.1.4.1. Consider the signature Σ = {× : 2, 1 : 0}, which means that we have two function symbols “×” and “1”, with × of arity 2 and 1 of arity 0. We also suppose that P contains a predicate = of arity 2. We have the formula ∀x.∃y.(x × y = 1 ∧ y × x = 1) which expresses that every element admits an inverse. Example 5.1.4.2. With a predicate D of arity one, the drinker formula is ∃x.(D(x) ⇒ (∀y.D(y))) The name of this formula comes from the following interpretation. If we see terms as people in a pub and consider that D(t) holds when t drinks, it can be read as: There is someone in the pub such that, if he is drinking, then everyone in the pub is drinking. We will see in example 5.1.7.1 that this formula is classically true, but that it is not intuitionistically so. First order logic is an extension of propositional logic in the following sense. Consider the empty signature Σ = ∅ and the set P = X consisting of all propositional variables, seen as predicates of arity 0. Then a propositional formula, such as X ∨ ¬Y , corresponds precisely to a first order formula, such as X() ∨ ¬Y ().

CHAPTER 5. FIRST-ORDER LOGIC

223

5.1.5 Bound and free variables. In a formula of the form ∀x.A or ∃x.A, the variable x is said to be bound in A. This means that the name of the variable x does not really matter and we could have renamed it to some other variable name, without changing the formula. We thus implicitly consider formulas up to proper (or capture avoiding) renaming of variables (by “proper”, we mean here that we should take care of not renaming a variable to some already bound variable name). For instance, we consider that the two formulas ∀x.∃y.x + y = x

and

∀z.∃y.z + y = z

are the same (the second is obtained from the first by renaming x to z), but they are different from the formula ∀x.∃x.x + x = x obtained by an “improper” renaming of y into x which was already bound. Such a mechanism for renaming bound variables is detailed in section 3.1, for the λ-calculus. A variable which is not bound is said to be free and we write FV(A) for the set of free variables of a formula A. This is formally defined by FV(P (t1 , . . . , tn )) = FV(t1 ) ∪ . . . ∪ FV(tn ) FV(A ⇒ B) = FV(A × B) = FV(A + B) = FV(A) ∪ FV(B) FV(>) = FV(⊥) = ∅ FV(¬A) = FV(A) FV(∀x.A) = FV(∃x.A) = FV(A) \ {x} where, given a term t, we write F V (t) for the set of all the variables occurring in t. A formula A is closed when it has no free variables, i.e. FV(A) = ∅. We sometimes write A(x1 , . . . , xn ) for a formula A whose free variables are among x1 , . . . , xn . In this case, we write A(t1 , . . . , tn ) instead of A[t1 /x1 , . . . , tn /xn ]. Given a formula A and a term t, we write A[t/x] for the formula A where all the free occurrences of x have been substituted by t avoiding captures, i.e. we suppose that all bound variables are different from the variables of t. For instance, with A being (∃y.x + x = y) ∨ (∃x.x = y) we have that A[z + z/x] is (∃y.(z + z) + (z + z) = y) ∨ (∃x.x = y) but in order to compute A[y + y/y], we have to rename the bound variable y (say, to z) and the result will be (∃z.(y + y) + (y + y) = z) ∨ (∃x.x = y) and not (∃y.(y + y) + (y + y) = y) ∨ (∃x.x = y)

CHAPTER 5. FIRST-ORDER LOGIC

224

5.1.6 Natural deduction rules. The rules for first order logic in intuitionistic natural deduction are the usual ones (see figure 2.1) together with the following introduction and elimination rules for universal and existential quantification: Γ ` ∀x.A (∀E ) Γ ` A[t/x]

Γ`A (∀I ) Γ ` ∀x.A

Γ ` ∃x.A Γ, A ` B (∃E ) Γ`B

Γ ` A[t/x] (∃I ) Γ ` ∃x.A

These rules are subject to the following (important) side conditions: – in (∀I ), we suppose x 6∈ FV(Γ), – in (∃E ), we suppose x 6∈ FV(Γ) ∪ FV(B). where, given a context Γ = x1 : A1 , . . . , xn : An , we have FV(Γ) =

n [

FV(Ai )

i=1

Example 5.1.6.1. We have (ax)

∀x.¬A, ∃x.A ` ∃x.A

(ax)

∀x.¬A, ∃x.A, A ` ∀x.¬A (∀E ) (ax) ∀x.¬A, ∃x.A, A ` ¬A ∀x.¬A, ∃x.A, A ` A (¬E ) ∀x.¬A, ∃x.A, A ` ⊥ (∃E ) ∀x.¬A, ∃x.A ` ⊥ (¬I ) ∀x.¬A ` ¬(∃x.A) (⇒I ) ` (∀x.¬A) ⇒ ¬(∃x.A)

Remark 5.1.6.2. The side conditions avoid clearly problematic proofs such as (ax)

A(x) ` A(x) (∀I ) A(x) ` ∀x.A(x) (⇒I ) ` A(x) ⇒ ∀x.A(x) (∀I ) ` ∀x.(A(x) ⇒ ∀x.A(x)) (∀E ) ` A(t) ⇒ ∀x.A(x) which can be read as: if the formula A holds for some term t then it holds for any term. The problematic rule is the (∀I ) just after the (ax) rule: x is not fresh. Properties of the calculus. We do not detail this here, but the usual properties of natural deduction generalize to first order logic. In particular, the structural rules (contraction, exchange, weakening) are admissible, see section 2.2.7. We will also see in section 5.1.9 that cuts can be eliminated.

CHAPTER 5. FIRST-ORDER LOGIC

225

5.1.7 Classical first order logic. Following section 2.5, classical first order logic, is the system obtained from the above one by adding one of the following rules Γ ` ¬A ∨ A

(lem)

Γ ` ¬¬A (¬¬E ) Γ`A

Γ, ¬A ` A (raa) Γ`A

implementing the excluded middle, the elimination of double negation or Clavius’ law (we could also have added any of the axioms of theorem 2.5.1.1). Example 5.1.7.1. A typical formula which is provable in classical logic (and not in intuitionistic logic) is A = ∃x.(D(x) ⇒ (∀y.D(y))) already presented in example 5.1.4.2. A proof is the following: (ax)

(ax)

. . . , ¬D(y) ` ¬D(y) . . . , D(y) ` D(y) (¬E ) ¬A, D(x), ¬D(y), D(y) ` ⊥ (⊥E ) ¬A, D(x), ¬D(y), D(y) ` ∀y.D(y) (⇒I ) ¬A, D(x), ¬D(y) ` D(y) ⇒ (∀y.D(y)) (∃I ) ¬A, D(x), ¬D(y) ` ∃x.(D(x) ⇒ (∀y.D(y))) (¬E ) ¬A, D(x), ¬D(y) ` ⊥ (¬I ) ¬A, D(x) ` ¬¬D(y) (¬¬E ) ¬A, D(x) ` D(y) (∀I ) ¬A, D(x) ` ∀y.D(y) (⇒I ) ¬A ` D(x) ⇒ (∀y.D(y)) (∃I ) ¬A ` ∃x.(D(x) ⇒ (∀y.D(y))) (raa) ` ∃x.(D(x) ⇒ (∀y.D(y))) If we interpret x as ranging over the people present in a pub, and the predicate D(x) as “x drinks” this formula states that there is a “universal drinker”, i.e. somebody such that if he drinks then everybody drinks. We can imagine why this formula cannot be proved intuitionistically: if it was so, we should be able to come up with an explicit name for this guy, see theorem 5.1.9.3, which seems impossible in absence of further information on the pub. We do not actually prove that there exists x such that D(x), which would require us to come up with an explicit witness for x, but only show that it cannot be the case that there is no x satisfying D, which is enough to conclude by double negation elimination. Example 5.1.7.2. Another formula provable in classical logic is the formula ¬(∀x.¬A(x)) ⇒ ∃x.A(x) which states that if it is not the case that every element x does not satisfy A(x), then we can actually produce an element which satisfies A(x). It can be proved

CHAPTER 5. FIRST-ORDER LOGIC

226

as follows: (ax)

. . . ` A(x) (∃I ) . . . ` ¬∃x.A(x) . . . ` ∃x.A(x) (¬E ) ¬∀x.¬A(x), ¬∃x.A(x), A(x) ` ⊥ (¬I ) ¬∀x.¬A(x), ¬∃x.A(x) ` ¬A(x) (ax) (∀I ) . . . ` ¬∀x.¬A(x) ¬∀x.¬A(x), ¬∃x.A(x) ` ∀x.¬A(x) (¬E ) ¬∀x.¬A(x), ¬∃x.A(x) ` ⊥ (¬I ) ¬∀x.¬A(x) ` ¬¬∃x.A(x) (¬¬E ) ¬∀x.¬A(x) ` ∃x.A(x) (⇒I ) ` ¬(∀x.¬A(x)) ⇒ ∃x.A(x) (ax)

As in example 5.1.7.1, it is enough to show that it is not the case that there is no x satisfying A. Exercise 5.1.7.3. Another proof for the drinker formula of example 5.1.7.1 is the following. We have two possibilities for the pub: – either everybody drinks: in this case, we can take anybody as the universal drinker, – otherwise, there is someone who does not drink: we can take him as universal drinker. Formalize this reasoning in natural deduction. De Morgan laws. In addition to the equivalences already shown in section 2.5.5, the following de Morgan laws hold in classical first-order logic: (∀x.A) ∧ B ⇔ ∀x.(A ∧ B)

B ∧ (∀x.A) ⇔ ∀x.(B ∧ A)

(∀x.A) ∨ B ⇔ ∀x.(A ∨ B) (∀x.A) ⇒ B ⇔ ∃x.(A ⇒ B)

B ∧ (∀x.A) ⇔ ∀x.(B ∨ A) B ⇒ (∀x.A) ⇔ ∀x.(B ⇒ A)

(∃x.A) ∧ B ⇔ ∃x.(A ∧ B)

B ∧ (∃x.A) ⇔ ∃x.(B ∧ A)

(∃x.A) ∨ B ⇔ ∃x.(A ∨ B)

B ∧ (∃x.A) ⇔ ∃x.(B ∨ A)

(∃x.A) ⇒ B ⇔ ∀x.(A ⇒ B)

B ⇒ (∃x.A) ⇔ ∃x.(B ⇒ A)

whenever x 6∈ FV(B), as well as ¬(∀x.A) ⇔ ∃x.¬A

¬(∃x.A) ⇔ ∀x.¬A

Prenex form. A formula P is in prenex form when it is of the form P ::= ∀x.P | ∃x.P | A where A is a formula which does not contain any first-order quantification: a formula in prenex form thus consists in a bunch of universal and existential quantifications over a formula without quantifications. By using the above de Morgan laws from left to right, one can show that Lemma 5.1.7.4. Every formula is equivalent to a formula in prenex form.

CHAPTER 5. FIRST-ORDER LOGIC

227

Example 5.1.7.5. The formula of example 5.1.7.2 can be put into prenex form as follows: ¬(∀x.¬A(x)) ⇒ ∃x.A(x)

(∃x.¬¬A(x)) ⇒ ∃x.A(x) ∀x.¬¬A(x) ⇒ ∃x.A(x) =

∀x.¬¬A(x) ⇒ ∃y.A(y) ∀x.∃y.¬¬A(x) ⇒ A(y)

More de Morgan laws. In addition to the above equivalences, we also have ∀x.(A ∧ B) ⇔ (∀x.A) ∧ (∀x.B) ∀x.> ⇔ >

∃x.(A ∨ B) ⇔ (∃x.A) ∨ (∃x.B) ∃x.⊥ ⇔ ⊥

5.1.8 Sequent calculus rules. The rules for first-order quantifiers in classical sequent calculus are Γ, ∀x.A, A[t/x] ` ∆ (∀L ) Γ, ∀x.A ` ∆

Γ ` A, ∆ (∀R ) Γ ` ∀x.A, ∆

Γ, A ` ∆ (∃L ) Γ, ∃x.A ` ∆

Γ ` A[t/x], ∃x.A, ∆ (∃R ) Γ ` ∃x.A, ∆

with the side condition for (∀R ) and (∃L ) that x 6∈ FV(Γ)∪FV(∆). Intuitionistic rules are obtained, as usual, by restricting to sequents with one formula on the right: Γ, ∀x.A, A[t/x] ` B Γ`A (∀L ) (∀R ) Γ, ∀x.A ` B Γ ` ∀x.A Γ, A ` B (∃L ) Γ, ∃x.A ` B

Γ ` A[t/x] (∃R ) Γ ` ∃x.A

with the expected side conditions for (∀R ) a,d (∃L ). Remark 5.1.8.1. In the rules (∀R ) and (∃L ), we have been careful to keep a copy of the hypothesis: with this formulation, contraction is admissible. Example 5.1.8.2. The drinker formula from example 5.1.4.2 can be proved classically by (ax)

D(x), D(y) ` D(y), ∀y.D(y), ∃x.(D(x) ⇒ (∀y.D(y))) (⇒R ) D(x) ` D(y), D(y) ⇒ (∀y.D(y)), ∃x.(D(x) ⇒ (∀y.D(y))) (∃R ) D(x) ` D(y), ∃x.(D(x) ⇒ (∀y.D(y))) (∀R ) D(x) ` ∀y.D(y), ∃x.(D(x) ⇒ (∀y.D(y))) (⇒R ) ` D(x) ⇒ (∀y.D(y)), ∃x.(D(x) ⇒ (∀y.D(y))) (∃R ) ` ∃x.(D(x) ⇒ (∀y.D(y))) As noted in the previous remark, we need to use twice the proved formula, an it is thus crucial that we keep a copy of it in the rule (∃R ) at the bottom.

CHAPTER 5. FIRST-ORDER LOGIC

228

5.1.9 Cut elimination. The properties and proof techniques developed in section 2.3 extend to first order natural deduction, allowing to prove that it has the cut elimination property: Theorem 5.1.9.1. A sequent Γ ` A admits a proof if and only if it admits a cut free proof. In the cut elimination procedure, there are two new cases, which can be handled as follows: π Γ ` A(x) (∀I ) Γ ` ∀x.A(x) (∀E ) Γ ` A(t) π Γ ` A(t) π0 (∃I ) Γ ` ∃x.A(x) Γ, A(x) ` B (∃E ) Γ`B

π[t/x] Γ ` A(t)

π 0 [t/x][π/A] Γ`B

Above, π[t/x] denote the proof π where all the free occurrences of the variable x have been replaced by the term t (details left to the reader). As in the case of propositional logic, it can be shown that a proof of a formula in an empty context necessarily ends with an introduction rule (proposition 2.3.3.2) and thus deduce (as in theorem 2.3.4.2): Theorem 5.1.9.2 (Consistency). First order (intuitionistic or classical) natural deduction is consistent: there is no proof of ` ⊥. Another important consequence is that that the logic has the existence property: if we can prove that there exists a term satisfying some property, then we can actually construct such a term: Theorem 5.1.9.3 (Existence property). A formula of the form ∃x.A is provable in intuitionistic first order natural deduction if and only if there exists a term t such that A[t/x] is provable. Proof. For the left-to-right implication, if we have a proof of ∃x.A then, by theorem 5.1.9.1, we have a cut-free one which, by proposition 2.3.3.2, ends with an introduction rule, i.e. is of the form π ` A[t/x] (∃I ) ` ∃x.A We therefore have a proof π of A[t/x] for some term t. The right-to-left implication is given by an application of the rule (∃I ). In contrast, we do not expect this property to hold in classical logic. For instance, consider the drinker formula of example 5.1.7.2. We can feel that the proof we have given is not constructive: there is no way of determining who is the drinker in general (i.e. without performing a reasoning specific to the bar in which we currently are).

CHAPTER 5. FIRST-ORDER LOGIC

229

5.1.10 Eigenvariables. The logic as we have presented it (which is the way it is traditionally presented) suffers from a defect: in a given sequent, we do not know the first order variables which are used. This is the subtle cause of some surprising proofs. For instance, we can prove that there always exists a term, whereas we would expect that the empty set is a perfectly reasonable way of interpreting logic in the case where the signature is empty for instance: (>I )

`> (∃I ) ` ∃x.> Note that in the premise of the (∃I ), we use the fact that > = >[x/x], i.e. we use x as witness for the existence. A variation on the previous example is the following proof, which expresses the fact that if a property A is satisfied for every term x, then we can exhibit a term satisfying A. Again, we would have expected that this is not true if there is no element in the model, and moreover, this does not feel very constructive: (ax)

∀x.A ` ∀x.A (∀E ) ∀x.A ` A (∃I ) ∀x.A ` ∃x.A (⇒I ) ` (∀x.A) ⇒ ∃x.A Here also, in the premise of the (∃I ) rule, we use the fact that A = A[x/x], i.e. we use x as witness for the existence. We will see in section 5.2.3 that this is the reason why models are usually supposed to be non-empty, while there is no good reason to exclude this particular case. In order to fix that, we should keep track of the variables which are declared in the context, which are sometimes called eigenvariables. This can be done by adding a new context Ξ to our sequents, which is a list of first order variables which are declared. We thus consider sequents of the form Ξ|Γ`A the vertical bar being there to mark the delimitation between the context of eigenvariables and the traditional context. The rules for logical connectives simply “propagate” the new context Ξ, e.g. the rules for conjunction become Ξ|Γ`A∧B l (∧E ) Ξ|Γ`A

Ξ|Γ`A∧B r (∧E ) Ξ|Γ`B

Ξ|Γ`A Ξ|Γ`B (∧I ) Ξ|Γ`A∧B

More interestingly, the rules for first order quantifiers become Ξ | Γ ` ∀x.A (∀E ) Ξ | Γ ` A[t/x]

Ξ, x | Γ ` A (∀I ) Ξ | Γ ` ∀x.A

Ξ | Γ ` ∃x.A Ξ, x | Γ, A ` B (∃E ) Ξ|Γ`B

Ξ | Γ ` A[t/x] (∃I ) Ξ | Γ ` ∃x.A

where we suppose

CHAPTER 5. FIRST-ORDER LOGIC

230

– x 6∈ Ξ in (∀I ) and (∃E ), – FV(t) ⊆ Ξ in (∀E ) and (∃I ). Finally, the axiom rule and the truth introduction rule become Ξ | Γ, A, Γ0 ` (ax) Ξ | Γ, A, Γ0 ` A

Ξ|Γ` (>I ) Ξ|Γ`>

where Ξ | Γ ` is a notation to mean that we suppose FV(Γ) ⊆ Ξ. Supposing this for this two rules (which are the only two without premise) is enough to ensure that whenever we prove a sequent Ξ | Γ ` A, we will always have FV(Γ) ∪ FV(A) ⊆ Ξ (it is easy to check that the inference rules preserve this invariant). Example 5.1.10.1. We can still prove (∀x.A) ⇒ ∀x.A in this new system: (ax)

x | ∀x.A ` ∀x.A (∀E ) x | ∀x.A ` A[x/x] (∀I ) |∀x.A `⇒ ∀x.A (⇒I ) | ` (∀x.A) ⇒ ∀x.A Example 5.1.10.2. We cannot prove ∃x.> in this system. In particular, the proof (>I )

| ` >[x/x] (∃I ) | ` ∃x.> is not valid because the side condition is not satisfied for the rule (∃I ). Exercise 5.1.10.3. Show that the formula (∀x.⊥) ⇒ ⊥ is provable with traditional rules, but not with the rules presented in this section. 5.1.11 Curry-Howard. The Curry-Howard correspondence can be extended to first-order logic, following the intuition that – a proof of ∀x.A should be a function which, when applied to a term t, returns a proof that A is valid for this term, – a proof of ∃x.A should be a pair consisting of a term t and a proof that A is valid for this term. Expressions. We begin with the language for proofs introduced in chapter 4, the simply typed λ-calculus. In this section, we call expressions its terms in order not to confuse them with first order terms, and write e for an expression. The syntax for expressions is thus e, e0 ::= λxA .e | e e0 | . . . In order to account for first order logic, we extend them with the following constructions: e ::= . . . | λ∀ x.e | e t | ht, ei | unpair(e, xy 7→ e0 ) The newly added constructions are

CHAPTER 5. FIRST-ORDER LOGIC

231

– λ∀ x.e: a function taking a term as argument x and returning an expression e, – e t: the application of an expression (typically a function as above) to a term t, – ht, ei: a pair consisting of a term t and an expression e, – unpair(e, xy 7→ e0 ): the extraction of the components x and y of a pair e for use in an expression e0 , which would be written in a syntax closer to OCaml let hx, yi = e in e0 We insist on the fact that there are two kinds of abstractions, respectively written λ and λ∀ , and two kind of applications, which are distinct constructions. Similarly, for products, there are two kinds of pairings and of eliminators. Although they behave similarly, they are entirely distinct constructions. However, we will manage to unify those constructions when going to dependent types in chapter 8: there will be one kind of abstraction (resp. pairing) which covers both cases. Typing rules. The associated typing rules are Γ`e:A

Γ ` e : ∀x.A (∀E ) Γ ` e t : A[t/x]

Γ ` λ∀ x.e : ∀x.A

Γ ` e : ∃x.A Γ, y : A ` e0 : B (∃E ) Γ ` unpair(e, xy 7→ e0 ) : B

(∀I )

Γ ` e : A[t/x] (∃I ) Γ ` ht, ei : ∃x.A

and can be read as follows: – (∀I ): a proof of ∀x.A is a function which takes a term x as argument and returns a proof of A, – (∀E ): using a proof of ∀x.A consists in applying it to a term t, – (∃I ): a proof of ∃x.A(x) is a pair consisting of a term t and a proof that A(t) is satisfied, – (∃E ): we can use a proof of ∃x.A by extracting its components. Example 5.1.11.1. Consider again the derivation of example 5.1.6.1. It can be decorated with terms as follows: f : ∀x.¬A, e : ∃x.A ` e : ∃x.A

(ax) (ax)

f : ∀x.¬A, e : ∃x.A, a : A ` f : ∀x.¬A (∀E ) f : ∀x.¬A, e : ∃x.A, a : A ` f x : ¬A (ax)

f : ∀x.¬A, e : ∃x.A, a : A ` a : A (¬E ) f : ∀x.¬A, e : ∃x.A, a : A ` f x a : ⊥ (∃E ) f : ∀x.¬A, e : ∃x.A ` unpair(e, xa 7→ f x a) : ⊥ f : ∀x.¬A ` λe∃x.A . unpair(e, xa 7→ f x a) : ¬(∃x.A)

(¬I )

` λf ∀x.¬A .λe∃x.A . unpair(e, xa 7→ f x a) : (∀x.¬A) ⇒ ¬(∃x.A)

(⇒I )

CHAPTER 5. FIRST-ORDER LOGIC

232

The corresponding proof term is thus λf ∀x.¬A .λe∃x.A . unpair(e, xa 7→ f x a) This function takes two arguments: – f of type ∀x.A → ⊥, and – e of type ∃x.A and produces a value of type ⊥ by extracting from e a term x and a proof a of A(t), and applying f to x and a. Reduction. As usual, the β-reduction rules correspond to cut-elimination steps: π Γ`e:A Γ ` λ∀ x.e : ∀x.A ∀

(∀I )

Γ ` (λ x.e) t : A[t/x]

π[t/x] Γ ` e[t/x] : A[t/x]

(∀E )

i.e. (λ∀ x.e) t

−→β

e[t/x]

and π Γ ` e : A[t/x] π0 (∃I ) Γ ` ht, ei : ∃x.A Γ, y : A ` e0 : B (∃E ) Γ ` unpair(ht, ei, xy 7→ e0 ) : B

π 0 [t/x][π/A] Γ ` e0 [t/x, e/y] : B

i.e. unpair(ht, ei, xy 7→ e0 )

−→β

e0 [t/x, e/y]

where π[t/x] is the proof obtained from π by replacing all free occurrences of x by t (details left to the reader). Similarly, η-reduction rules correspond to eliminate dual of cuts: π Γ ` e : ∀x.A (∀E ) Γ ` e x : A[x/x] ∀

Γ ` λ x.e x : ∀x.A

π Γ ` e : ∀x.A

(∀I )

i.e. λ∀ x.e x

−→η

e

and π (ax) Γ ` e : ∃x.A Γ, y : A ` y : A (∃E ) Γ ` unpair(e, xy 7→ y) : A (∃I ) Γ ` hx, unpair(e, xy 7→ y)i : ∃x.A

π Γ ` e : ∃x.A

CHAPTER 5. FIRST-ORDER LOGIC

233

i.e. hx, unpair(e, xy 7→ y)i

−→η

e

5.2 Theories A first-order theory Θ on a given signature and set of predicates is a (possibly infinite) set of closed formulas called axioms. A formula A is provable in a theory Θ is there is a finite subset Γ ⊆ Θ such that Γ ` A is provable. Unless otherwise specified, the ambient fist order logic is usually taken to be classical when considering first order theories. 5.2.1 Equality. We often consider theories with equality. This means that we suppose that we have a predicate “=” of arity 2, together with axioms ∀x.x = x ∀x.∀y.x = y ⇒ y = x ∀x.∀y.∀z.x = y ⇒ y = z ⇒ x = z and, for every function symbol f of arity n, we have an axiom ∀x1 .∀x01 . . . . ∀xn .∀x0n . x1 = x01 ⇒ . . . ⇒ xn = x0n ⇒ f (x1 , . . . , xn ) = f (x01 , . . . , x0n ) and, for every predicate P of arity n, we have an axiom ∀x1 .∀x01 . . . . ∀xn .∀x0n . x1 = x01 ⇒ . . . ⇒ xn = x0n ⇒ P (x1 , . . . , xn ) ⇒ P (x01 , . . . , x0n ) These are sometimes called the congruence axioms. Example 5.2.1.1. The theory of groups is the theory with equality over the signature Σ = {× : 2, 1 : 0} whose axioms are ∀x.1 × x = x ∀x.x × 1 = x

∀x.∀y.∀z.(x × y) × z = x × (y × z)

∀x.∃y.y × x = 1 ∀x.∃y.x × y = 1

together with the axioms for equality ∀x.x = x ∀x.∀y.x = y ⇒ y = x ∀x.∀y.∀z.x = y ⇒ y = z ⇒ x = z ∀x.∀x0 .∀y.∀y 0 .x = x0 ⇒ y = y 0 ⇒ x × y = x0 × y 0 1=1 5.2.2 Properties of theories. A theory is – consistent when ⊥ is not provable in the theory, – complete when for every formula A, either A or ¬A is provable in the theory, – decidable when there is an algorithm which, given a formula A decides whether A is provable in the theory or not.

CHAPTER 5. FIRST-ORDER LOGIC

234

5.2.3 Models. Theories are thought of as denoting structures made of sets and functions satisfying axioms. For instance, the theory of groups of example 5.2.1.1 can be seen as a syntax for groups in the traditional sense. These structures are called models of the theory and we very briefly recall those here. We do not even scratch the surface of model theory here, and the reader interested in knowing more about those is urged read some standard textbooks about that such as [CK90]. Structure. Suppose given a signature Σ and a set P of predicates. A structure M consists of – a non-empty set M called the domain of the structure, – a function Jf K : M n → M for every function symbol f ∈ Σ,

– a relation JP K ⊆ M n for every predicate symbol P ∈ P.

Interpretation. Suppose fixed such a structure. Given k ∈ N and a term t whose free variables are among {x1 , . . . , xk }, we define its interpretation as the function JtKk : M k → M defined by induction by Jxi Kk : M k → M

is the canonical i-th projection and, for every function symbol f of arity n, and (m1 , . . . , mk ) ∈ M k , Jf (t1 , . . . , tn )Kk (m1 , . . . , mk ) =

Jf K(Jt1 Kk (m1 , . . . , mk ), . . . , Jtn Kk (m1 , . . . , mk ))

where Jf K is given by the structure and Jti Kk is computed inductively for every index i. In other words, the interpretation of terms is the only extension of the structure which is compatible with composition. Given k ∈ N and a formula A whose free variables are among {x1 , . . . , xk }, we define its interpretation JAKk as the subset of M k defined inductively as follows: J⊥Kk = ∅

JA ∧ BKk = JAKk ∩ JBKk J¬AKk = M k \ JAKk

J>Kk = M k

JA ∨ BKk = JAKk ∪ JBKk

JA ⇒ BKk = J¬A ∨ BKk

together with J∀xk+1 .AKk =

\ m∈M

{(m1 , . . . , mk ) ∈ M k | (m1 , . . . , mk , m) ∈ JAKk+1 }

and J∃xk+1 .AKk =

[ m∈M

{(m1 , . . . , mk ) ∈ M k | (m1 , . . . , mk , m) ∈ JAKk+1 }

The interpretation of A is thus intuitively the set of values in M for its free variables making it true.

CHAPTER 5. FIRST-ORDER LOGIC

235

Satisfaction for closed formulas. Given a closed formula A, its interpretation JAK0 is a subset of M 0 = {()}, which is a set with one element, conventionally denoted (). There are therefore two possible values for JAK0 : ∅ and {()}. In the second case, we say that the formula A is satisfied in the structure. Model. A structure is a model of a theory Θ when each formula in Θ is satisfied in the structure. Example 5.2.3.1. Consider the theory of groups (example 5.2.1.1). A structure consists of – a set M , – a function J×K : M 2 → M ,

– a constant J1K : M 0 → M , – a relation J=K ⊆ M × M .

We say that such a structure has strict equality when the interpretation of the equality is the diagonal relation J=K = {(m, m) | m ∈ M } Such a structure M is a model of the theory of groups, i.e. is a model for all its axioms, precisely if (M, J×K, J1K) is a group in the traditional sense, and conversely every group gives rise to a model where equality is interpreted in such a way: the models with strict equality of the theory of groups are precisely groups. Remark 5.2.3.2. As can be seen in the previous example, it is often useful to restrict to models with strict equality. Since equality is always a congruence (because of the axioms imposed in section 5.2.1), from any model we can construct a model with strict equality by quotienting the model under the relation interpreting equality, so that this assumption is not very restrictive. Validity. A sequent y1 : A1 , . . . , yn : An ` A is satisfied in M if for every k ∈ N such that the free variables of the sequent are in {x1 , . . . , xk }, we have JA1 Kk ∩ . . . ∩ JAn Kk ⊆ JAKk which is equivalent to requiring that J¬(A1 ∧ . . . ∧ An ⇒ A)Kk is empty. A sequent is valid when it is satisfied in every model. Correctness. We can now formally state the fact that our notion of semantics is compatible with our logical system. Theorem 5.2.3.3 (Correctness). Every derivable sequent is valid. Proof. By induction on the derivation of the sequent. The above theorem has the following important particular case:

CHAPTER 5. FIRST-ORDER LOGIC

236

Corollary 5.2.3.4. For every theory Θ and closed formula A such that Θ ` A is derivable, every model of Θ is also a model of A. Example 5.2.3.5. In the theory of groups, one can show ∀x.∀y.∀y 0 .x × y = 1 ⇒ y 0 × x = 1 ⇒ y = y 0 by formalizing the following sequence of implications of equalities: x×y =1 y 0 × (x × y) = y 0 × 1 y 0 × (x × y) = y 0 (y 0 × x) × y = y 0 1 × y = y0 y = y0 By correctness, it holds in every group: a left inverse of an element coincides with any right inverse of the same element. The contrapositive of the above theorem is also quite useful: Corollary 5.2.3.6. For every theory Θ and closed formula A, if there exists a model of Θ which is not a model of A then Θ ` A is not derivable. Example 5.2.3.7. In the theory of groups, consider the formula ∀x.∀y.x × y = y × x We know that there exist non-abelian groups, for instance the symmetric group on 3 elements S3 . Such a non-abelian group being a model for the theory of groups but not for the above formula, we can conclude that this formula cannot be deduced in the theory of groups. Finally, a major consequence of the theorem is the following. A theory is satisfiable when it admits a model. Proposition 5.2.3.8. A satisfiable theory is consistent. Proof. Suppose that Θ is a theory with a model M . If we had Θ ` ⊥ then, by corollary 5.2.3.4, we would have that M is a model of ⊥, which it is not since J⊥K = ∅ by definition. Remark 5.2.3.9. As explained in section 5.1.10, the handling of first-order variables in traditional first-order logic is not entirely satisfactory: we do not keep track of the free variables we use. This is why we have to have k (the number of first-order variables) as a parameter for the interpretation. This is also why we need to restrict to non-empty domains in structures. Namely, the sequent ` ∃x.> is always derivable and it would not be satisfied in the structure with an empty domain. See section 5.1.10 for a solution to this issue. Skolemisation. Suppose fixed a signature Σ and a set P of predicates. A formula on (Σ, P) of the form ∀x.∃y.A(x, y)

CHAPTER 5. FIRST-ORDER LOGIC

237

states that for every element x there exists a y such that A(x, y) is satisfied. When this formula admits a model, we can construct a function f which to every x associates one of the associated y. Thus it implies that the formula ∀x.A(x, f (x)) on (Σ0 , P) is also satisfiable, where the signature Σ0 is Σ extended with a symbol f of arity one. By a similar reasoning, one shows that the satisfiability of the second formula implies the satisfiability of the first formula. The two formulas are thus equisatisfiable: one is satisfiable if and only if the second is. This process of “replacing existential quantifications by function symbols” is due to Skolem: it allows to replace a theory, by another equisatisfiable theory, whose axioms do not contain existential quantifications, see section 5.4.6. More generally, given a formula on (Σ, P) of the form ∀x1 . . . . ∀xn .∃y.A a skolemization of it is the formula ∀x1 . . . . ∀xn .A[f (y1 , . . . , ym )/y] on the signature (Σ0 , P) where FV(∃y.A) = {y1 , . . . , ym } and Σ0 is Σ extended with a fresh symbol f of arity m. Proposition 5.2.3.10. A formula of the form ∀x1 . . . . ∀xn .∃y.A is satisfiable if and only if its skolemization is. Example 5.2.3.11. In the theory of groups of example 5.2.1.1, we can skolemize the axiom ∀x.∃y.y × x = 1. This forces us to introduce a new unary function symbol i (which will be the function which to an element associates its inverse) and reformulate the axiom as ∀x.i(x) × x = 1. Remark 5.2.3.12. If we allowed to perform this process for any existential quantification in a formula, proposition 5.2.3.10 would not be true. For instance, the formula ¬(∃x.g(x) = x) is satisfiable when g has no fixpoint. If we “skolemize” it, we obtain the formula ¬(g(f ()) = f ()) where f is a fresh nullary function symbol: this formula is satisfiable when there is an element which is not a fixpoint for g. The two are thus not equisatisfiable. 5.2.4 Presburger arithmetic. The Presburger arithmetic axiomatizes the addition over natural numbers. It is the theory with equality over the signature Σ = {0 : 0, S : 1, + : 2} whose axioms are those for equality together with ∀x.0 = S(x) ⇒ ⊥ ∀x.∀y.S(x) = S(y) ⇒ x = y ∀x.0 + x = x ∀x.∀y.S(x) + y = S(x + y) together with, for every formula A(x) with one free variable x, an axiom A(0) ⇒ (∀x.A(x) ⇒ A(S(x))) ⇒ ∀x.A(x) such an infinite family of axioms is sometimes called an axiom scheme, it expresses here the recurrence principle.

CHAPTER 5. FIRST-ORDER LOGIC

238

Example 5.2.4.1. For instance, ∀x.x + 0 = x can be proved by recurrence on x. Namely, consider the formula A(x) being x + 0 = x. We have – A(0): 0 + 0 = 0. – Suppose A(x), we have A(S(x)), namely S(x) + 0 = S(x + 0) = S(x). This theory was shown by Presburger to be consistent, complete and decidcn able [Pre29]. In the worse case, any decision algorithm has a complexity O(22 ) with respect to the size n of the formula to decide [FR98], although it is useful in practice (it is for example implemented in the tactic omega of Coq). It is also very weak: for instance, one cannot define the multiplication function in it (if we could, it would not be decidable, see next section). 5.2.5 Peano and Heyting arithmetic. The Peano arithmetic, often written PA, extends Presburger arithmetic by also axiomatizing multiplication. It is the theory with equality on the signature Σ = {0 : 0, S : 1, + : 2, × : 2} whose axioms are those of equality, those of Presburger arithmetic, and ∀x.0 × x = 0 ∀x.∀y.S(x) × y = y + (x × y) This theory is implicitly understood with an ambient classical first order logic. When the logic is intuitionistic, the theory is called Heyting arithmetic (or HA). Exercise 5.2.5.1. In HA, prove ∀x.x + 0 = x. Consistency. The second of Hilbert’s list of 23 problems posed in 1900, consisted in showing that Peano arithmetic is consistent, i.e. cannot be used to prove ⊥, or equivalently that 0 = S(0) cannot be proved. A natural reaction would be to use corollary 5.2.3.4 and build a model for this theory, whose existence would imply its consistency, and there is an obvious model: the set N of natural numbers with usual zero, successor, addition and multiplication functions. However, the usual construction of this set of natural numbers is itself performed inside (models of) set theory (see section 5.3) which is a much stronger theory. All this would prove is that if set theory is consistent then Peano arithmetic is consistent, which is like proving that if we have a nuclear bomb then we can kill a fly. This is why people first hoped to prove the consistency of Peano arithmetic in theories as weak as possible and, why not, in Peano arithmetic itself. However, in 1931, Gödel showed in his second incompleteness theorem that Peano arithmetic cannot prove its own consistency [Göd31] (unless it is inconsistent). The cut elimination procedure was then introduced by Gentzen in 1936, precisely in order to show consistency of Heyting arithmetic, using similar methods as in theorem 5.1.9.2, although the proof is more involved due to the presence of the axioms of the theory, from which one can deduce the consistency of Peano arithmetic using double negation translations of Peano arithmetic into Heyting arithmetic as in section 2.5.9, see [Gen36]. Induction up to ε0 . Gentzen’s proof brings no contradiction with Gödel’s theorem, because this proof (or more precisely the proof that the cut-elimination procedure terminates) requires more than the recurrence principle: we need a

CHAPTER 5. FIRST-ORDER LOGIC

239

(** Finite rooted trees. *) type tree = T of tree list (** Lexicographic extension of an order. *) let rec lex le l1 l2 = match l1, l2 with | x::l1, y::l2 -> if x = y then lex le l1 l2 else le x y | [], _ -> true | _, [] -> false (** Order on trees. *) let rec le t1 t2 = match t1, t2 with | T l1, T l2 -> let cmp t1 t2 = if t1 = t2 then 0 else if le t1 t2 then -1 else 1 in let l1 = List.sort cmp l1 in let l2 = List.sort cmp l2 in lex le l1 l2 Figure 5.1: ε0 in OCaml. transfinite induction up to the ordinal ε0 (which is ω to the power ω to the power ω and so on, i.e. ε0 = ω ε0 ). Otherwise said, while recurrence only requires us to believe that the set of natural numbers is well-founded, the transfinite induction up to ε0 now requires that the following set of trees is well-founded. By a classical result in ordinal arithmetic (which we cannot detail here), any ordinal α < ε0 can be uniquely written as α = ω β1 + . . . + ω βn where α > β1 > . . . > βn are ordinals, this is called the Cantor normal form of α, each of the βi having a similar normal form. Such an ordinal α, can thus be represented as a planar rooted tree, with one root and n sons, which are the trees corresponding to the 3 3 βi . For instance, the ordinals ω ω +1 + 2 and ω w · 3 + 2 respectively correspond to the trees ·

·

·

·

· x

· y

· · z

· > ·

·

·

·

·

·

·

·

·

·

·

·

·

·

·

·

·

·

These trees can be compared by lexicographically comparing the sons of the root (which are supposed to be ordered decreasingly), so that for instance, the tree on the left above is greater than the one on the right. An implementation of this order is provided in figure 5.1. This order can also be interpreted using

CHAPTER 5. FIRST-ORDER LOGIC

240

the following Hydra game on trees [KP82]. This game with two players starts with a tree as above and at each turn – the first player removes a leaf x (a node without sons) of the tree, – the second player chooses a number n, looks for the parent y of x and the parent z of y (it does nothing if no such parents exist), and adds n copies of the tree with y as root as new children of z. The game stops when the tree is reduced to its root. We now see where the game draws its name from: the first player cuts the head of the Hydra, but in response the Hydra grows many new heads! For instance, in the figure above, the tree on the right is obtained from the one of the left after one round. Given trees α and β, it can be shown that α > β if and only if β can be obtained after some finite number rounds of the game starting from α. Believing that ε0 is well-founded is thus equivalent to believing that every such game will necessarily end (try it, to convince yourself that it always does!). Undecidability. Finally, we would like to mention that Peano arithmetic is also undecidable, which was shown by Turing [Tur37]. Namely, a sequence of configurations of a Turing machine can be suitably encoded as an integer, so that one can write a formula expressing that a given natural number encodes such a sequence, which ends on an accepting configuration. From there, it is easy to construct a formula expressing the fact that the machine is not halting, and such formulas cannot be decided, otherwise we would decide the halting problem.

5.3 Set theory Set theory is a first-order theory whose intended models are sets. Everything is a set there, in particular the elements of sets are themselves sets. This theory was defined at the beginning of the 20th century while looking for axiomatic foundations of mathematics. We only briefly scrap the subject and refer to standard textbooks [Kri98, Deh17] for more details. 5.3.1 Naive set theory. The naive set theory is the theory with a binary predicate “∈” and the following axiom scheme, called unrestricted comprehension ∃y.∀x.x ∈ y ⇔ A for every formula A with x as only free variable. Informally, this states for every property A(x), the existence of a set y = {x | A(x)} of elements x satisfying the property A(x). This theory is surprisingly simple and works surprisingly well: we can perform all the usual constructions: – the empty set is {x | ⊥}, – the union of two sets is x ∪ y = {z | z ∈ x ∨ z ∈ y}, – the intersection of two sets is x ∩ y = {z | z ∈ x ∧ z ∈ y},

CHAPTER 5. FIRST-ORDER LOGIC

241

– the product of two sets is x × y = {(i, j) | i ∈ x ∧ j ∈ y} with the notation (i, j) = {{i}, {i, j}}, – the inclusion of two sets is x ⊆ y = ∀z.z ∈ x ⇒ z ∈ y, – the powerset is P(x) = {y | y ⊆ x}, and so on. Russell’s paradox. There is only a “slight” problem with this theory: it is inconsistent, meaning that we can in fact prove any formula, which explains why everything was so simple. This was first formalized by Russell in 1901, using what is known nowadays as the Russell paradox, which goes as follows. Consider the property A = ¬(x ∈ x) The unrestricted comprehension scheme ensures the existence of a set y such that ∀x.x ∈ y ⇔ ¬(x ∈ x) In particular, for x being y, we have y ∈ y ⇔ ¬(y ∈ y) In classical logic, we can easily conclude to an inconsistency: – if y ∈ y then ¬(y ∈ y) and therefore we can prove ⊥, – if ¬(y ∈ y) then y ∈ y and therefore we can prove ⊥. Russell’s paradox in intuitionistic logic. This proof can be thought of as reasoning by case analysis on whether y ∈ y is true or not and, as such it seems that it is not intuitionistically valid because we are using the excluded middle. However, it can also be considered as a valid intuitionistic proof: namely, the two cases amount to – prove ¬(y ∈ y), and – prove ¬¬(y ∈ y), from which we can deduce ⊥. More generally, for any formula A, one can show intuitionistically that (A ⇔ ¬A) ⇒ ⊥ Namely, the equivalent formula (A ⇒ ¬A) ⇒ (¬A ⇒ A) ⇒ ⊥ can be proved by (ax)

Γ, ¬A ` ¬A

(ax)

Γ, ¬A ` ¬A ⇒ A Γ, ¬A ` A Γ, ¬A ` ⊥ Γ ` ¬¬A

Γ, ¬A ` ¬A

(ax) (⇒E ) (¬E )

(ax)

Γ, A ` A ⇒ ¬A Γ, A ` ¬A

(¬I )

Γ`⊥ ` (A ⇒ ¬A) ⇒ (¬A ⇒ A) ⇒ ⊥

Γ, A ` A Γ, A ` ⊥ Γ ` ¬A

(ax) (⇒E )

Γ, A ` A

(ax) (⇒E ) (¬I ) (¬E ) (⇒I )

CHAPTER 5. FIRST-ORDER LOGIC

242

with Γ = A ⇒ ¬A, ¬A ⇒ A, whose corresponding proof-term is λf A⇒¬A .λg ¬A⇒A .(λx¬A .x (g x))(λaA .f a a) Another, more symmetrical proof term for the same formula is λf A⇒¬A .λg ¬A⇒A .f (g (λaA .f a a)) (g (λaA .f a a)) In both cases, note that we recover the looping term Ω = (λx.xx)(λx.xx) if we apply it to the identity twice. The so-called Curry paradox is the following slight generalization of the above formula (A ⇔ (A ⇒ B)) ⇒ B and can be shown using the same λ-terms. Size issues. The problem with naive set theory is due to size: the collection of all sets is “too big” to actually form a set. Once this issue is identified, subsequent attempts at formalizing set theory have struggled to take this issue in account. We should not be able to consider this as a set and therefore we cannot consider the set of all sets which satisfy a property, such as not belonging to itself... Other paradoxes. Other paradoxes can be used to show the inconsistency of naive set theory. For instance, an argument based on Cantor’s theorem is the following one. Suppose that there exists a set u of all sets. Every subset x of u is a set, and thus an element of u. In this way, we can construct an injection from the powerset of u to u, which is excluded by Cantor’s diagonal argument, see appendix A.4. Another classical paradox is the one of Burali-Forti, presented in section 8.2.3. 5.3.2 Zermelo-Fraenkel set theory. The above observations lead to a refined axiomatic for set theory, the most popular being called Zermelo-Fraenkel set theory, or ZF [Zer08]. We make a very brief presentation of it here, mostly discussing the axiom of choice. This is the classical first order theory with equality with a binary predicate ∈, whose axioms are the following. Axiom of extensionality. This axiom states that two sets with the same elements are equal: ∀x.∀y.((∀z.z ∈ x ⇔ z ∈ y) ⇒ x = y) If we introduce the notation x ⊆ y for the formula ∀z.z ∈ x ⇒ z ∈ y which expresses that the set x is included in the set y, the axiom of extensionality can be rephrased as ∀x.∀y.(x ⊆ y ∧ y ⊆ x) ⇒ x = y i.e. two sets are equal precisely when they have the same elements.

CHAPTER 5. FIRST-ORDER LOGIC

243

Axiom of union. This axiom states that the union of the elements of a set is still a set: ∀x.∃y.∀i.(i ∈ y ⇔ ∃z.(i ∈ z ∧ z ∈ x)) In more usual notation, this states the existence, for every set x, of the set [ [ y= x= z z∈x

In particular, we can construct the union of two sets x and y as [ x ∪ y = {x, y} where the set {x, y} is constructed using the axiom schema of replacement, see below. Axiom of powerset. This axiom states that given a set x, there is a set whose elements are precisely the subsets of x, usually called the powerset of x and denoted P(x): ∀x.∃y.∀z.(z ∈ y ⇔ (∀i.i ∈ z ⇒ i ∈ x)) In more usual notation, ∀x.∃y.∀z.(z ∈ y ⇔ z ⊆ x) i.e. we can construct the set y = P(x) = {z | z ⊆ x} Axiom of infinity. The axiom of infinity states the existence of an infinite set: ∃x.(∅ ∈ x ∧ ∀y.y ∈ x ⇒ S(y) ∈ x) where the empty set ∅ is defined using the axiom schema of replacement below and S(y) = y ∪ {y} is the successor of a set. A set is called inductive when it contains the empty set and is closed under successor: the axiom states the existence of an inductive set. In particular, the set N of natural numbers can be constructed as the intersection of all inductive sets. Here, the natural numbers are encoded following the von Neumann convention: 0=∅

1 = 0 ∪ {0} = {∅}

2 = 1 ∪ {1} = {∅, {∅}}

...

and more generally n + 1 = n ∪ {n}. The definition implies immediately the following principle of induction: every inductive subset of the natural numbers is the set of natural numbers. Axiom schema of replacement. This axiom states that the image of a set under a partial function is a set: (∀i.∀j.∀j 0 .(A ∧ A[j 0 /j] ⇒ j = j 0 )) ⇒ ∀x.∃y.∀j.(j ∈ y ⇔ ∃i.(i ∈ x ∧ A)) where A is any formula such that j 0 6∈ FV(A) (but A might contain i or j or other free variables). This is thus an axiom schema: it is an infinite family of axioms, one for each such formula A.

CHAPTER 5. FIRST-ORDER LOGIC

244

For simplicity, we consider the case where the formula contains only i and j as free variables, and is thus denoted A(i, j). In this case the axiom reads as (∀i.∀j.∀j 0 .(A(i, j) ∧ A(i, j 0 ) ⇒ j = j 0 )) ⇒ ∀x.∃y.∀j.(j ∈ y ⇔ ∃i.(i ∈ x ∧ A(i, j))) The formula A encodes a relation: a set i is in relation with a set j when A(i, j) is true. In particular, the relation corresponds to a partial function when every element i is in relation with at most one element j, i.e. ∀i.∀j.∀j 0 .(A(i, j) ∧ A(i, j 0 ) ⇒ j = j 0 ) Namely, such a relation corresponds to the partial function f from sets to sets such that f (i) is the unique j such that A(i, j), should there exists one, and is undefined otherwise. Then, our axiom states that given a set x, we can construct the set y = {j | ∃i ∈ x.A(i, j)} of its images under f . For instance, the empty set ∅ is defined as the set y obtained in this way from any set x (and there exists one by the axiom of infinity) using the nowhere defined function, which can be encoded as the relation A = ⊥: ∅ = {j | ∃i ∈ x.⊥} Given two sets x and y, we can construct the set {x, y} as the image of the partial function over the natural numbers sending 0 (i.e. ∅) to x and 1 (i.e. {∅}) to y: {x, y} = {j | ∃i ∈ N.(i = 0 ∧ j = x) ∨ (i = 1 ∧ j = y)} and we can similarly construct a set containing any finite given family of sets. Given two sets x and y, we can construct their intersection as x ∩ y = {j | ∃i ∈ x ∪ y.j = i ∧ i ∈ x ∧ i ∈ y} Given two sets x and y, we can encode a pair of elements i1 ∈ x and i2 ∈ y as (i1 , i2 ) = {{i1 }, {i1 , i2 }}, which is an element of P(P(x ∪ y)), and thus construct the product of the two sets as x × y = {j | ∃i ∈ P(P(x ∪ y)).∃i1 .∃i2 .j = i ∧ i = (i1 , i2 ) ∧ i1 ∈ x ∧ i2 ∈ y} More generally, given a predicate B(i) and a set x, we can we can construct the set of elements of x satisfying B as {i ∈ x | B(i)} = {j | i = j ∧ B(i)} This construction corresponds to what is sometimes called the axiom schema of restricted comprehension and can formally be stated as ∀x.∃y.∀i.(i ∈ y ⇔ (i ∈ x ∧ A)) where A is a formula such that x, y 6∈ FV(A), but A might contain i or other free variables, i.e. in usual notation, for every set x we can construct the set y = {i ∈ x | A} Note that compared to the unrestricted comprehension scheme, which was at the source of Russell’s paradox, we can only construct the set of elements of some set which satisfy A.

CHAPTER 5. FIRST-ORDER LOGIC

245

Axiom of foundation. The axiom of foundation states that every non-empty set contains a member which is disjoint from the whole set: ∀x.(∃y.y ∈ x) ⇒ ∃y.(y ∈ x ∧ ¬∃i.(i ∈ y ∧ i ∈ x)) or, in modern notation, ∀x.x 6= ∅ ⇒ ∃y ∈ x.y ∩ x = ∅ One of the main consequences of the axiom of foundation is the following: Lemma 5.3.2.1. There is no infinite sequence of sets (xi ) such that xi+1 ∈ xi . Proof. Suppose the contrary. The sequence of sets can be seen as a function f with N as domain, which to every i associates f (i) = xi . By the axiom schema of replacement, its image x = {xi | i ∈ N} is also a set and by the axiom of foundation there exists y ∈ x such that y ∩x 6= ∅. By definition of x, there exists some natural number i for which y = f (i) = xi . However, we have xi+1 ∈ xi and therefore xi+1 ∈ y ∩ x. Contradiction. In particular, there is not set x such that x ∈ x (otherwise, the constant sequence xi = x would contradict previous lemma). The axiom of foundation is, in presence of the other axioms, equivalent to the following, better looking, axiom of induction, which is a variant of transfinite induction (sometimes called ∈-induction): (∀x.(∀y.y ∈ x ⇒ A(y)) ⇒ A(x)) ⇒ ∀x.A(x) for every predicate A with FV(A) ⊆ {x}. Avoiding Russell’s paradox. Intuitively, the way ZF avoids Russell’s paradox is by considering that the collections such as the collection of all sets are “too big” to be sets themselves: they are sometimes called classes and a set can be considered as a “small class”. For instance, we have the restricted schema of replacement, but not the unrestricted one, which would allow defining the set of all sets as {x | >}: we can only consider the collection of elements satisfying some property within a set, i.e. a subset of a small class is itself small. This is also the reason why, in the axiom scheme of replacement, we require A(x, y) to be functional: otherwise, for a given x the set {y | A(x, y)} would not guaranteed to be a set (it could be too big). Also, we have seen that the axioms of foundation ensures that no set contains itself, which avoids that classes, such as the collection of all sets, are themselves sets. The axiom of choice. The axiom of choice states that given a collection x of non-empty sets, we can pick an element in each of the sets: [ ∀x.∅ 6∈ x ⇒ ∃(f : x → x).∀y ∈ x.f (y) ∈ y This states that given a set x of non-empty sets (i.e. x does not contain ∅), S there exists a function f from x to x (the union of the elements of x, which can be constructed with the axiom of union) which, for every set y ∈ x picks an

CHAPTER 5. FIRST-ORDER LOGIC

246

element of y (i.e. f (y) ∈ y): this is called a choice function for x. The careful reader will notice that the existence of a function is not a formal statement of our language but it can be encoded: the formula ∃(f : x → y).A asserting the existence of a function f from x to y such that A, is a notation for a formula of the form ∃f.f ⊆ x × y ∧ . . . which would state (details left to the reader) the existence of a subset f of x × y which, as a relation, encodes a total function such that A is satisfied. The axiom of choice has a number of classically equivalent formulations among which – every relation defined everywhere contains a function, – every surjective function admits a section, – the product of a family of non-empty sets is non-empty, – every set can be well-ordered, and so on. 5.3.3 Intuitionistic set theory. Set theory, as any other theory can also be considered within intuitionistic first order logic, in which case it is called IZF. The reason for is the usual one: we want to be able to exhibit explicit witnesses when constructing elements of sets. We will however see that there is a price to pay for this, which is that things behave much differently than usual: intuitionism is not necessarily intuitive, see [Bau17] for a very good general introduction to the subject. Equivalent formulations of excluded middle. Most of the proofs related to the excluded middle in set theory are using the following simple observation. Given a proposition A, which might contain any free variable excepting y, consider the set x = {y ∈ N | A} Then, given a natural number y ∈ N, we have y ∈ x if and only if A holds: (y ∈ x) ⇔ A

(5.1)

(in practice, we often use y = 0 as arbitrary natural number). In particular, when A = ⊥, the set x is (by definition) the empty set ∅ and we have y ∈ ∅ if and only if ⊥. By the axiom of extensionality, we thus have that x = ∅ is equivalent to ∀y ∈ N.(y ∈ x) ⇔ (y ∈ ∅) which is equivalent to A ⇔ ⊥, which is equivalent to A ⇒ ⊥ (since ⊥ ⇒ A always holds): we have shown (x = ∅) ⇔ ¬A

(5.2)

For instance, a typical thing we cannot do in IZF is test whether an element belongs to a given set or not: Lemma 5.3.3.1. In IZF, the formula ∀y.∀x.(y ∈ x) ∨ (y 6∈ x) is satisfied if and only if the law of excluded middle is.

CHAPTER 5. FIRST-ORDER LOGIC

247

Proof. The right-to-left implication is immediate. For the left-to-right implication, given any formula A, consider the natural number y = 0 and the set x = {y ∈ N | A}: the set x contains the element 0 if and only if A holds. We conclude using (5.1): the above proposition would imply (0 ∈ {x ∈ N | A}) ∨ (0 6∈ {x ∈ N | A}) which is equivalent to A ∨ ¬A and we conclude. The intuition behind this result is the following one. In a constructive world, an element of x = {y ∈ N | A} consists of as an element of N together with a proof that A holds. Therefore, in order to decide whether 0 belongs to x or not, we have to decide whether A holds or not. Considering the variant of the excluded middle recalled in lemma 2.3.5.3, similarly, we cannot test a set for emptiness either: Lemma 5.3.3.2. In IZF, the formula ∀x.(x = ∅) ∨ (x 6= ∅) is satisfied if and only if we can prove ¬A ∨ ¬¬A for every formula A. Proof. The right-to-left implication is immediate. For the left-to-right implication, given a formula A, consider the set x = {y ∈ N | A}. We conclude using (5.2): we have x = ∅ if and only if 0 ∈ {y ∈ N | A} ⇔ 0 ∈ {y ∈ N | ⊥}, if and only if A ⇔ ⊥, if and only if ¬A. More generally, we do not expect to be able to decide equality either: the formula ∀x.∀y.(x = y) ∨ (x 6= y) would imply that we can test for emptiness as a particular case. Of course, this does not imply that we cannot decide the equality of some particular sets. For instance, one can show that 0 = ∅ = 6 {∅} = 1 (because ∅ belongs to 1 but not to 0) and therefore, writing B = {0, 1} = {x ∈ N | x = 0 ∨ x = 1} for the set of booleans, we can decide the equality of booleans. By a similar reasoning, we can decide the equality of natural numbers. Many other “unexpected” properties of IZF (compared to the classical case) can be proved along similar lines. For instance, the finiteness of subsets of a finite set is equivalent to being classical. By a finite set, we mean here a set x for which there is a natural number n and a bijection f : {0, . . . , n − 1} → x. Lemma 5.3.3.3. In IZF, every subset of a finite set is finite if and only if the law of excluded middle is satisfied.

CHAPTER 5. FIRST-ORDER LOGIC

248

Proof. Suppose that every finite subset of a finite set is finite. Given a property A, consider the set x = {y ∈ B | A}, which is a subset of the finite set B of booleans. By hypothesis, this set is finite and therefore there exists a natural number n and a function f as above. Since we can decide equality for natural numbers as argued above, we have either n = 0 or n 6= 0: in the first case x = ∅ and thus ¬A holds, in the second case, f (0) ∈ x and thus A holds. We therefore have A ∨ ¬A. Conversely, in classical logic, every subset of a finite set is finite, as everybody knows. The axiom of choice. Seen from a constructive perspective the axiom of choice is quite dubious: it allows the construction of an element in each set of a family of non-empty sets, without having to provide any hint at how such an element could be constructed. In particular, given a non-empty set x, the axiom of choice provides a function f : {x} → x, i.e. an element of x (the image of x under f ), and allows proving x 6= ∅ ⇒ ∃y.y ∈ x i.e. we can construct an element in x by only knowing that there exists one. This is precisely the kind of behavior we invoked in section 2.5.2 in order to motivate the fact that double negation elimination was not constructive. In fact, we will see below that having the axiom of choice implies that the ambient logic is classical. Another reason why the axiom of choice can be questioned is that it allows proving quite counter-intuitive results, the most famous perhaps being the Banach-Tarski theorem recalled below. Two sets A and B of points in R3 are congruent if one can be obtained from the other by an isometry, i.e. by using translations, rotations and reflections. Theorem 5.3.3.4 (Banach-Tarski). Given two bounded subsets of R3 of nonempty interior, there are partitions A = A1 ] . . . ] An

B = B1 ] . . . ] Bn

such that Ai is congruent to Bi for 1 6 i 6 n. Proof. Using the axiom of choice and other ingredients... In particular, consider the case where A is a ball in R3 and B is two copies of the ball A. The theorem states that there is a way to partition the ball A and move the subsets of the partition using isometries only, in order to make two balls. If you try this at home, you should convince yourself that there is no easy way to do so. For such reasons, people started to investigate the status of the axiom of choice with respect to ZF. In 1938, Gödel constructed a model of ZFC (i.e. a model of ZF satisfying the axiom of choice) inside an arbitrary model of ZF, thus showing that ZFC is consistent if ZF is [Göd38]. In 1963, Cohen showed that the situation is similar with the negation of the axiom of choice. The axiom of choice is thus independent of ZF: neither this axiom nor its negation is a consequence of the axioms of ZF and one can add it or its negation without affecting consistency. Constructivists however will reject the axiom of choice, because it implies the excluded middle:

CHAPTER 5. FIRST-ORDER LOGIC

249

Theorem 5.3.3.5. In IZF with the axiom of choice, the law of elimination of double negation holds. Proof. Fix a formula A and suppose ¬¬A holds. The set x = {y ∈ N | A} is not empty. Namely, we have seen in (5.2) that x = ∅ implies ¬A which, together with the hypothesis ¬¬A, implies ⊥. By the axiom of choice, the fact that x 6= ∅ implies the existence of an element of x because we have a choice function for {x}, which implies A by (5.1). Therefore ¬¬A ⇒ A. The above proof is not entirely satisfactory because is it uses the following of the axiom of choice: any set y, whose elements x are not empty, admits a choice function. The issue here is that we suppose that an element x of y is not empty, i.e. it is not the case that x does not contain an element. From a constructive point of view, this is not equivalent to supposing that it contains an element (not not containing an element does not mean that we contain an element, because we do not admit double negation elimination) and the latter is more constructive. A better formulation of the axiom of choice would thus be any set y, whose elements x contain an element, admits a choice function. This hints at the fact that we should be careful in IZF about what we mean by the axiom of choice: formulations which were equivalent in classical logic are not any more in intuitionistic logic. This second formulation of the axiom of choice still implies the excluded middle as first noticed by Diaconescu [Dia75, GM78], but this is much more subtle: Theorem 5.3.3.6 (Diaconescu). In IZF with the axiom of choice, the law of excluded middle is necessarily satisfied. Proof. Fix an arbitrary formula A: we are going to show ¬A ∨ A. Consider the sets x = {z ∈ B | (z = 0) ∨ A}

and

y = {z ∈ B | (z = 1) ∨ A}

Those sets are not empty since 0 ∈ x and 1 ∈ y. By the axiom of choice, there is therefore a function f : {x, y} → B such that f (x) ∈ x and f (y) ∈ y. Now, f (x) and f (y) are booleans, where equality is decidable, so that we can reason by case analysis on those. – If f (x) = f (y) = 0 then 0 ∈ y thus (0 = 1) ∨ A holds, thus A holds. – If f (x) = f (y) = 1 then 1 ∈ x thus (1 = 0) ∨ A holds, thus A holds. – If f (x) = 0 6= 1 = f (y) then x 6= y (otherwise, f (x) = f (y) would hold), and we have ¬A: namely, supposing A, we have x = y = A, and thus thus ⊥ since x 6= y, – If f (x) = 1 6= 0 = f (y) then we can show both A and ¬A as above (so that this case cannot happen).

CHAPTER 5. FIRST-ORDER LOGIC

250

Therefore, we have ¬A ∨ A. This motivates, for the reader convinced of the interest of intuitionistic logic, which we hope you are by now, the exploration of set theory without choice, but the reader should be warned that this theory behaves much differently than usual. For instance, Blass has shown the following result [Bla84]: Theorem 5.3.3.7. In ZF, the axiom of choice is equivalent to the fact that every vector space has a basis. In fact, we know models of ZF where there is a vector space admitting no basis, and one admitting two basis of different cardinalities. Synthetic differential geometry. Since classical logic is obtained by adding axioms (e.g. excluded middle) to intuitionistic logic, a proof in intuitionistic logic is valid in classical logic (we are not using the extra axioms). Therefore, one is tempted to think that intuitionistic logic is less powerful than classical logic, because it can prove less. Well, this is true, but this can also be seen as a strength: this also means that we have more models of intuitionistic theories than their classical counterparts. We would like to give an illustration of this. The notion of infinitesimal is notoriously difficult to define in analysis. Intuitively, such a quantity is so small that it should be “almost 0”; in particular, it should be smaller that any usual strictly positive real number. Having such a notion is quite useful. For instance, we expect the derivative of a function f : R → R at x be defined as f 0 (x) = (f (x + ε) − f (x))/ε for any non-zero infinitesimal ε. Namely, f 0 (x) should be the slope of the line tangent to the slope of f at x, i.e. f (x + ε) = f (x) + f 0 (x)ε More precisely, by “almost 0”, we mean here that it should capture first-order variations, i.e. it should be so small that ε2 = 0. If we are ready to accept the existence of such entities, we find out that computations which traditionally involve subtle concepts such as limits, become simple algebraic manipulations. For instance, consider the function f (x) = x2 . We have f (x + ε) = (x + ε)2 = x2 + 2xε + ε2 = x2 + 2xε and therefore we should have f 0 (x) = 2x, as expected. This suggests that we define the set of infinitesimals as D = {ε ∈ R | ε2 = 0} and postulate the following principle of microaffineness: Axiom 5.3.3.8. Every function f : D → R is of the form f (ε) = a + bε for some unique reals a and b.

CHAPTER 5. FIRST-ORDER LOGIC

251

Once this axiom postulated, we necessarily have a = f (0) and we can define f 0 (x) to be the coefficient b. We have already given an example of such a computation above. We can similarly, compute the derivative of a product of two functions by (f × g)0 (x + ε) = f (x + ε) × g(x + ε) = (f (x) + f 0 (x)ε) × (g(x) + g 0 (x)ε) = (f (x) + g(x)) + (f 0 (x)g(x) + f (x)g 0 (x))ε + (f 0 (x) + g 0 (x))ε2 = (f (x) + g(x)) + (f 0 (x)g(x) + f (x)g 0 (x))ε and therefore (f × g)0 (x) = f 0 (x)g(x) + f (x)g 0 (x) as expected. Similarly, the derivative of the composite of two functions can be computed by g(f (x + ε)) = g(f (x) + f 0 (x)ε) = g(f (x)) + g 0 (f (x))f 0 (x)ε because f 0 (x)ε is easily shown to be an infinitesimal and therefore (g ◦ f )0 (x) = g 0 (f (x))f 0 (x) This is wonderful, excepting that our microaffineness seems to be clearly wrong. Namely, ε2 = 0 implies ε=0 thus D = {0}, and therefore any coefficient b would suit. However... the above implication uses a classical reasoning. Namely: if ε 6= 0, we have ε = ε2 /ε = 0/ε = 0 from which we can conclude that ε = 0... in classical logic! In intuitionistic logic, all that we have proved is that ¬¬(ε = 0) This is the sense in which ε is infinitesimal: it is not nonzero. This shows that there is no obvious contradiction in our axiomatic if we work in intuitionistic logic, but it does not prove that there is no contradiction. This can however be done by constructing models. The field of synthetic differential geometry takes this idea of working in intuitionistic logic in order to define infinitesimals as a starting point to study differential geometry [Bel98, Koc06].

5.4 Unification Suppose fixed a signature. Given two terms t and u, a very natural question is: is there a way to substitute their variables in order to make them equal? In other words, we are trying to solve the equation t=u One quickly finds out that there is quite often an infinite number of solutions, and we refine the question to: is there a “smallest” way of substituting the

CHAPTER 5. FIRST-ORDER LOGIC

252

variables of t and u in order to make them equal? Occurrences of this problem have for instance already been encountered in section 4.4. We explain here how to properly formulate the problem, which we have already encountered in section 4.4.2, and exhibit an algorithm in order to solve it. A detailed introduction to the subject can be found in [BN99]. 5.4.1 Equation systems. An equation is a pair of terms (t, u) often written t= ?u where t and u are respectively called the left and right member of the equation. A substitution σ, see section 5.1.3, is a solution of the equation when t[σ] = u[σ] in which case we also say that σ is an unifier of t and u. An equation system, or unification problem, E is a finite set of equations. A substitution σ is a solution (or an unifier) of E when it is a solution of every equation in E. We write E[σ] for the equation system obtained by applying a substitution σ to every member of an equation of E: σ is thus a solution of E when all the equations of E[σ] are of the form t = ? t. Example 5.4.1.1. Let us give some examples of unifiers. We suppose that our signature comprises two binary function symbols f and g, and two nullary symbols a and b. – f (x, b()) = ? f (a(), y) has one unifier: [a()/x, b()/y], – x= ? f (y, z) has many unifiers: [f (y, z)/x], [f (a(), z)/x, a()/y], etc. – f (x, y) = ? g(x, y) has no unifier, – x= ? f (x, y) has no unifier. Since the solution to an equation system is not unique in general, we can wonder whether there is a best one in some sense when there is one. We will see that it is indeed the case. 5.4.2 Most general unifier. A preorder 6 is a reflexive and transitive relation: a partial order is an antisymmetric preorder. We can define a preorder 6 on substitutions by setting σ 6 τ whenever there exists a substitution σ 0 such that τ = σ 0 ◦ σ. Example 5.4.2.1. With σ = [f (y)/x]

σ 0 = [g(x, x)/y]

τ = [f (g(x, x))/x, g(x, x)/y]

we have σ 0 ◦ σ = τ and thus σ 6 τ . A renaming is a substitution which replaces variables by variables (as opposed to general terms). The relation 6 defined above is “almost” a partial order, in the sense that it would be so if we considered substitutions up to renaming: Lemma 5.4.2.2. Given substitutions σ and τ , we have both σ 6 τ and τ 6 σ if and only if there exists a renaming σ 0 such that σ 0 ◦ σ = τ .

CHAPTER 5. FIRST-ORDER LOGIC

253

Suppose fixed an equation system E. It easy to see that its set of solutions is upward closed: Lemma 5.4.2.3. Given substitutions σ and τ such that σ 6 τ , if σ is a solution of E then τ is also a solution of E. A solution σ of E is a most general unifier when it generates all the solutions by upward closure, i.e. when τ is a solution of E if and only if σ 6 τ . We will see in next section that when an equation systems admits an unifier, it always admits a most general one, and we have an algorithm to efficiently compute it. We will thus prove, in a constructive way, the following: Theorem 5.4.2.4. An equation system E has a solution if and only if it has a most general unifier. 5.4.3 The unification algorithm. Suppose given an equation system E, for which we are trying to compute a most general unifier. The idea of the algorithm is to apply a series of transformations to E, which preserve the set of solutions of the system, in order to simplify it and compute a solution. More precisely, our goal is put the equation system in the following form: an equation system E is in solved form when – if is of the form E = {x1 = ? t1 , . . . , xn = ? tn } i.e. all its equations have a variable as left member, – no variable in a left member of an equation occurs in a right member: xi 6∈ FV(tj ) for every indices i and j, – variables in left members are distinct: xi = xj implies i = j. To every equation system in solved form E as above, one can canonically associate the substitution σE = [t1 /x1 , . . . , tn /xn ] Lemma 5.4.3.1. Given an equation system in solved form E, the substitution σE is a most general unifier of E. Given an equation system E, the unification algorithm, due to Herbrand [Her30] and Robinson [Rob65], applies the transformations of figure 5.2, in an arbitrary order, and terminates when no transformation applies. We write E

E0

to indicate that the transformation replaces E by E 0 . At some point of the execution, the algorithm might fail, which we write E



Example 5.4.3.2. Suppose that the signature comprises symbols a of arity 0, f of arity 3, and g and h of arity 1. Consider the equation system E = {f (a(), g(x), g(x)) = ? f (a(), y, g(h(z)))}

CHAPTER 5. FIRST-ORDER LOGIC

254

Decompose (we propagate equation to subterms): {f (t1 , . . . , tn ) = ? f (u1 , . . . , un )} ∪ E

{t1 = ? u1 , . . . , tn = ? un } ∪ E

Clash (different symbols cannot be unified): {f (t1 , . . . , tn ) = ? g(u1 , . . . , um )}



Delete (we remove trivial equations): {f (t1 , . . . , tn ) = ? f (t1 , . . . , tn )} ∪ E

E

Orient (we want variables as left members): {f (t1 , . . . , tn ) = ? x} ∪ E

{x = ? f (t1 , . . . , tn )} ∪ E

Occurs-check (we eliminate cyclic equations): when x ∈ FV(t1 ) ∪ . . . ∪ FV(tn ), {x = ? f (t1 , . . . , tn )}



Propagate (we propagate substitutions): when x 6∈ FV(t) and x ∈ FV(E), {x = ? t} ] E 0

{x = ? t} ∪ E 0 [t/x]

Figure 5.2: The unification algorithm.

CHAPTER 5. FIRST-ORDER LOGIC

255

We have E

{a() = ? a(), g(x) = ? y, g(x) = ? g(h(z))}

by Decompose,

{g(x) = ? y, g(x) = ? g(h(z))}

by Delete,

{y = ? g(x), g(x) = ? g(h(z))}

by Orient,

{y = ? g(x), x = ? h(z)}

by Decompose,

{y = ? g(h(z)), x = ? h(z)}

by Propagate.

The size |t| of a term is the number of function symbols occurring in it: |x| = 0

|f (t1 , . . . , tn )| = 1 +

n X

|ti |

i=1

Theorem 5.4.3.3. Given any equation system E as input the unification algorithm always terminates. It fails if and only if E has no solution, otherwise the equation system E 0 at the end of the execution is in solved form and σE 0 is a most general unifier of E. Proof. This is detailed in [BN99, section 4.6]. Termination can be shown by observing that the rules make the size of the equation system E decrease: here, the size is the triple (n1 , n2 , n3 ) of natural numbers, ordered lexicographically, where n1 is the number of unsolved variables (a variable is solved P when it occurs exactly once in E, as a left member of an equation), n2 = (t = ? u)∈E |t| + |u| is the size of the equation system and n3 is the number of equations of the form t = ? x in E. The other properties result from the fact that the transformations preserve the set of unifiers (⊥ has no unifier by convention) and that the resulting equation system is in solved form. Example 5.4.3.4. The most general unifier of example 5.4.3.2 is [g(h(z))/y, h(z)/x] Remark 5.4.3.5. The side conditions of propagate are quite important (and often forgotten by students when first implementing unification). Without those, unification problems such as {x = ? f (x)} would lead to an infinite number of applications of rules Propagate and Decompose, and thus fail to terminate: {x = ? f (x)}

{f (x) = ? f (f (x))}

{x = ? f (x)}

...

The side condition avoids this and the rule Occurs-check makes the unification fail: the solution would intuitively be the “infinite term” f (f (f (. . .))) but those are not acceptable here. In the worse case, the algorithm is exponential in time and space: hint, consider {x1 = ? f (x0 , x0 ), x2 = ? f (x1 , x1 ), . . . , xn = ? f (xn−1 , xn−1 )} but it performs well in practice.

CHAPTER 5. FIRST-ORDER LOGIC

256

5.4.4 Implementation. Terms can be implemented by the type type term = | Var of string | App of string * term list and we can check whether a variable x occurs in a term t, i.e. if x ∈ FV(t), with let rec occurs x = function | Var y -> x = y | App (f, tt) -> List.exists (occurs x) tt A substitution can be described as a list of pairs consisting of a variable (here, a string) and a term. It can be applied to a term thanks to the following function: let rec app s = function | Var x -> (try List.assoc x s with Not_found -> Var x) | App (f, tt) -> App (f, List.map (app s) tt) Unification can finally be performed by the following function, which takes as arguments the substitution being constructed (which is initially empty), and the equation system (a list of pairs of terms) and returns the most general unifier: let rec unify s = function | (App (f, tt), App (g, uu))::e -> (* clash *) if f g then raise Not_unifiable (* decompose *) else unify s ((List.map2 (fun t u -> t, u) tt uu)@e) | (App (f, tt), Var x)::e -> (* orient *) unify s ((Var x, App (f, tt))::e) | (Var x, Var y)::e when x = y -> (* delete *) unify s e | (Var x, t)::e -> (* occurs check *) if occurs x t then raise Not_unifiable (* propagate *) else let t = app s t in let e = List.map (fun (t,u) -> app s t, app s u) e in let s = List.map (fun (x,t) -> x, app s t) s in unify ((x, t)::s) e | [] -> s let unify = unify [] This function raises the exception Not_unifiable when the system has no solution. The unifier of example 5.4.3.2 can then be computed with let s = let t = App ("f", [

CHAPTER 5. FIRST-ORDER LOGIC

257

App ("a", []); App ("g", [Var "x"]); App ("g", [Var "x"]) ]) in let u = App ("f", [ App ("a", []); Var "y"; App ("g", [App ("h", [Var "z"])]) ]) in unify [t, u] 5.4.5 Efficient implementation. The major source of inefficiency in the previous algorithm is due to substitutions. In order to apply a substitution to a term, we have to go through the whole term to find variables to substitute, and moreover we have to apply a substitution to all the terms in the Propagate phase. Instead, if we are willing to use mutable structures, we can use the trick we have already encountered in section 4.4.3: we can have variables be references, so that we can modify their contents, and have them point to terms when we want to substitute those. This suggests that we should implement terms as type term = | Var of var ref | App of string * term list and var = | AVar of string | Link of term A variable x is represented as Var r where r is a reference containing AVar "x". If, later on, we want to substitute it with a term t, we can then modify the contents of r to Link t, which means that the variable has been replaced by t. When we do so, the contents of all the occurrences of the variable will thus be replaced at once. While we could implement things in this way (similarly to section 4.4.3), we would like to explain another point and give a variant of this implementation. When encoding variables in this way, it is important that all the occurrences of the variable x contain the same reference, which is error prone: we have to ensure that, for a given variable name, the pointed memory cell is always the same. In most applications, the precise name of variables does not matter, since we are usually considering terms up to α-conversion. We can thus consider that the reference itself is the name of the variable, i.e. the name is the location in memory, which avoids the previous possibility for errors. Since two variables are now the same when their references are physically the same (i.e. when they point to the same memory cell, as opposed to having the same contents) and we should thus compare them using physical equality == instead of the usual extensional equality =. We can thus rather encode terms as type term = | Var of term option ref | App of string * term list

CHAPTER 5. FIRST-ORDER LOGIC

258

A variable will initially be Var r with the reference r containing None (we do not use a string to indicate the name of the variable since the name is not relevant, only the position in memory where the None is stored is), and substitution with a term t will amount to replace this value by Some t. We can thus generate a fresh variable with let var () = Var (ref None) and the right notion of equality between terms is given by the following function let rec eq t u = match t, u with | Var {contents = Some t}, u -> eq t u | t, Var {contents = Some u} -> eq t u | Var x, Var y -> x == y | App (f, tt), App (g, uu) -> f = g && List.for_all2 eq tt uu | _ -> false where we use the fact that a reference is implemented in OCaml as a record with contents as only field, which is mutable. We can check whether a variable occurs in a term with let rec occurs x = function | Var y -> x == y | App (f, tt) -> List.exists (occurs x) tt using, as indicated above, physical equality to compare variables, and unification can be performed with let rec unify t u = match t, u with | App (f, tt), App (g, uu) -> (* clash *) if f g then raise Not_unifiable (* decompose *) else List.iter2 unify tt uu (* follow links *) | Var {contents = Some t}, u -> unify t u | t, Var {contents = Some u} -> unify t u (* delete *) | Var x, Var y when x == y -> () | Var x, u -> (* occurs check *) if occurs x u then raise Not_unifiable (* propagate *) else x := Some u | _, Var _ -> (* orient *) unify u t The unifier of example 5.4.3.2 can then be computed with

CHAPTER 5. FIRST-ORDER LOGIC

259

let () = let x = var () in let y = var () in let z = var () in let t = App ("f", [ App ("a", []); App ("g", [x]); App ("g", [x]) ]) in let u = App ("f", [ App ("a", []); y; App ("g", [App ("h", [z])]) ]) in unify t u 5.4.6 Resolution. A typical use of unification is to generalize the resolution technique of section 2.5.8 to first-order classical logic [Rob65]. Clausal form. In order to do so, we must first generalize the notion of clausal form: – a literal L is a predicate applied to terms or its negation L ::= P (t1 , . . . , tn ) | ¬P (t1 , . . . , tn ) where P is a predicate of arity n and the ti are terms, – a clause C is a disjunction of literals, i.e. a formula of the form C ::= L1 ∨ L2 ∨ . . . ∨ Lk We recall that a theory Θ on a given signature Σ is a set of closed formulas. Any theory can be put in clausal form in the following sense: Proposition 5.4.6.1. Given a finite theory Θ on a signature Σ, there is a theory Θ0 on a signature Σ0 such that all the formulas in Θ0 are clauses and the two theories Θ and Θ0 are equisatisfiable. Proof. The process of constructing Θ0 from Θ is done in six steps. 1. By lemma 5.1.7.4, we can replace any formula in Θ by an equivalent one in prenex form. 2. By iterated use of proposition 5.2.3.10, we can replace any formula of Θ by an equisatisfiable one without existential quantification on a larger signature Σ0 . 3. By proposition 2.5.5.1, we can replace any formula in Θ, which is necessarily of the form ∀x1 . . . . ∀xn .A where A is an arbitrary formula which does

CHAPTER 5. FIRST-ORDER LOGIC

260

not contain any first-order quantification, by an equivalent one where A is a conjunction of disjunctions of literals, i.e. of the form ∀x1 . . . . ∀xk .

ni m _ ^

Li,j

i=1 j=1

4. By repeated use of the equivalence ∀x.(A ∧ B) ⇔ (∀x.A) ∧ (∀x.B) we can replace every formula of Θ by a conjunction of universally quantified clauses ni m ^ _ ∀x1 . . . . ∀xk . Li,j i=1

j=1

5. We can then replace every conjunction of clauses by all its universally quantified clauses ni _ ∀x1 . . . . ∀xk . Li,j j=1

6. Finally, we can remove the universal quantifications in formulas if we suppose that all the universally quantified variables are distinct, see lemma 5.4.6.2 below. The theory Θ0 obtained in this way is equisatisfiable with Θ. Lemma 5.4.6.2. Given formula ∀x.A and a theory Θ, the theories Θ ∪ {∀x.A} and Θ ∪ {A} are equisatisfiable, provided that x 6∈ FV(Θ). The resolution rule. We can assimilate a theory Γ in clausal form with a first order context. The resolution rule of section 2.5.8 can then be modified as follows in order to account for first-order: Γ ` C ∨ P (t1 , . . . , tn ) Γ ` ¬P (u1 , . . . , un ) ∨ D (res) Γ ` (C ∨ D)[σ] where σ is the most general unifier of the equation system {t1 = ? u1 , . . . , tn = ? un } Generalizing lemma 2.5.8.1, this rule is correct: Lemma 5.4.6.3 (Correctness). If C can be deduced from Γ using the axiom and resolution rules then the sequent Γ ` C is derivable in classical first-order logic. Example 5.4.6.4. The standard example is the following one. We know that – all men are mortal, and – Socrates is a man,

CHAPTER 5. FIRST-ORDER LOGIC

261

which can be formalized as the theory {∀x. man(x) ⇒ mortal(x), man(Socrates)} in the signature with a constant symbol Socrates and with two unary predicates man and mortal. By proposition 5.4.6.1, we can put it in clausal form: {¬ man(x) ∨ mortal(x), man(Socrates)} We want to show that this entails that Socrates is mortal. As explained in lemma 2.5.8.8, this can be done using resolution by showing that adding ¬ mortal(Socrates) to the theory makes it inconsistent. And indeed, writing Γ for the resulting theory, we have (ax)

Γ ` ¬ man(x) ∨ m(x) Γ ` m(S)

Γ ` man(S)

(ax) (res)

Γ ` ¬ m(S)

Γ`⊥

(ax) (res)

(we shortened mortal as m and Socrates as S). The factoring rule. As is, the resolution rule is not complete (see example 5.4.6.6 below). We can however make the system complete by adding the following factoring rule Γ ` C ∨ P (t1 , . . . , tn ) ∨ P (u1 , . . . , un ) (fac) Γ ` (C ∨ P (t1 , . . . , tn ))[σ] where σ is the most general unifier of {t1 = ? u1 , . . . , tn = ? un }. With this rule, the completeness theorem 2.5.8.7 generalizes as follows: Theorem 5.4.6.5 (Refutation completeness). A set Γ of clauses is not satisfiable if and only if we can show Γ ` ⊥ using axiom, resolution and factoring rules rules only. Example 5.4.6.6. Given a unary predicate P , consider the theory Γ = {P (x) ∨ P (y), ¬P (x) ∨ ¬P (y)} which is not satisfiable. The resolution rule only allows us to deduce the clauses P (x) ∨ ¬P (x) and P (y) ∨ ¬P (y), from which we cannot deduce any other clause: without factoring, the resolution rule is not complete. With factoring, we can show that Γ is inconsistent by (ax)

(ax)

Γ ` P (x) ∨ P (y) (fac) Γ ` P (x)

Γ ` P (x) ∨ P (y) (fac) (ax) Γ ` P (x) Γ ` ¬P (x) ∨ ¬P (y) Γ ` ¬P (y) (res) Γ`⊥

Instead of adding the factoring rule, in order to gain refutation completeness, the resolution rule can also be modified in order to unify multiple literals at once.

Chapter 6

Agda 6.1 What is Agda? Agda is both a programming language and a proof assistant, originally developed by Norell in 2007. On the surface, it resembles a standard functional programming language such as OCaml or Haskell. However, it was designed with the Curry-Howard correspondence in mind, see chapter 4, extended to a much richer logic than propositional or first-order logic: it uses dependent types, which will be the object of chapter 8. This means that the types can express pretty much any proposition as a type and a program can be considered as a way of proving such a proposition. In this sense the language can also be considered as a proof assistant. We start by writing a type, which can be read as a formula, and gradually construct a program of this type, which can be read as a proof of the formula. The type checking algorithm of Agda will verify that the type is correct for the program actually admits the given type, i.e. that our proof is correct! A first introduction to Agda is given in sections 6.2 and 6.3, inductive types are presented in section 6.4 for data types and section 6.5 for logical connectives, we discuss the formalization of equality in section 6.6, the use of Agda to prove the correctness of programs in section 6.7 and the issues related to termination in section 6.8. 6.1.1 Features of proof assistants. We shall first present some of the general features that Agda has, or has not. There is no room here for a detailed comparison with other provers, the interested reader can find details in [Wie06] for instance. Let us simply mention some difference with the main competitor, which is currently Coq. Other well-known proof assistants include ACL2, HOL Light, Isabelle, Lean, Mizar, PVS, etc. No type inference. A first difference with functional programming languages (e.g. OCaml) is that the typing is so rich in proof assistants that there are no principal types and typability is undecidable. There is thus no type inference and we have to explicitly provide a type for all functions. The more precise the type for a function is, the longer implementing the program will take, but the stronger the guarantees will be. For instance, a sorting algorithm can be given the type List A → List A as usual, but also the type List A → SortedList A i.e. the type expresses the fact that the output is a sorted list (the type of sorted lists can be defined in the language). The second type is much more precise than

CHAPTER 6. AGDA

263

the first one, and it will be more involved to define a function of the second than of first type (although not considerably so). Programs vs tactics. The Agda code looks pretty much like a program in a functional programming language. For instance, the proof of A × B → A is, as expected a program which takes a pair (a, b) and returns a: open import Data.Product postulate A B : Set proj : A × B → A proj (a , b) = a which is easily compared with the corresponding definition in the OCaml toplevel # let proj (a , b) = a;; val proj : 'a * 'b -> 'a = On the contrary, Coq uses tactics which describe how to progress into the proof. The same proof in Coq would look like this: Variables A B : Prop. Theorem proj : (A * B) -> A. Proof. intro p. elim p. intro a. intro b. exact a. Qed. It is not clear at all that it is implementing a projection, but the correspondence with the proof in natural deduction is immediate. The tactics precisely correspond to the rules, when read from bottom-up: the intro commands correspond to introduction of ⇒ rules, elim to a variant of the usual elimination rule for ∧, and exact to the axiom rule: (ax)

p : A ∧ B, a : A, b : B ` A (⇒I ) p : A ∧ B, a : A ` B ⇒ A (⇒I ) p:A∧B `A⇒B ⇒A (∧E ) p:A∧B `A (⇒I ) `A∧B ⇒A The difference between the two is mostly a matter of taste, both are quite convenient to use and have the same expressive power. The reason we chose to use Agda in this course is that it makes more clear the Curry-Howard correspondence, which is one of the main objects of this course.

CHAPTER 6. AGDA

264

Automation. There is one main advantage of using tactics over programs however: it allows more easily for automation, i.e. Coq can automatically build parts of the proofs for us. For instance, the previous example can be proved in essentially one line, which will automatically generate all the above steps: Variables A B : Prop. Theorem proj : (A * B) -> A. Proof. tauto. Qed. As a more convincing example, the following formula over integers ∀m ∈ Z.∀n ∈ Z.(1 + 2 × m) 6= (n + n) can also be proved in essentially one line: Require Import Omega. Open Scope Z_scope. Theorem thm : forall m n:Z, 1 + 2 * m n + n. Proof. intros; omega. Qed. whereas if we had to do it by hand, we would have needed many steps, using small intermediate lemmas expressing facts such as n + n = 2 × n, etc. Agda has only very limited support for automation, although it has been progressing recently using reflection. Program extraction. A major feature of Coq is that the typing system allows to perform what is called program extraction: once the program is proved correct, one can extract the program (in OCaml) and forget about the parts which are present only to prove the correctness of the program. In contrast, the support for program extraction in Agda is less efficient and more experimental. Correctness. It might seem obvious, but let us state this anyway: a proof assistant should be correct, in the sense that when it accepts a proof then the proof should actually be correct. Otherwise, it would be very easy to write a proof assistant: let () = while true do let _ = read_line () in print_endline "Your proof is correct." done We will see that sometimes the logic implemented in proof assistants is not consistent for very subtle reasons (for instance, in section 8.2.2): in this case, the program allows proving ⊥ and thus any formula, and thus essentially amounts

CHAPTER 6. AGDA

265

to the above although it is not obvious at all. For modern and well-developed proof assistants, we however have good reasons to trust that this is not the case, see below. Small kernel. An important design point for a theorem prover is that it should have a small kernel, whose correctness ensures the correctness of the whole program: this is called the de Bruijn criterion. A proof assistant is made of a large number of lines of code (roughly 100 000 lines of Haskell for Agda and 225 000 lines of OCaml for Coq), those lines are written by humans and there is always the possibility that there is a bug in the proof assistant. For this reason, it is desirable that the part of the software that we really have to trust, its “kernel”, which mainly consists in the typechecker, is as small as possible and isolated from the rest of the software, so that all the efforts for correctness can be focused on this part. For instance, in Coq, a tactic can produce roughly any proof in order to automate part of the reasoning: this is not really a problem because, in the end, the typechecker will ensure that the proof is correct, so that we do not have to trust the tactic. In Coq, the kernel is roughly 10% of the software; in Agda, the kernel is a bit larger, because it contains more features (dependent pattern matching in particular), which means that programming is easier on some aspects, but the trust that we have in the proof checker is a bit smaller. In order to have a small kernel, it is desirable to reuse as much as possible existing features; this principle is followed by most proof assistants. For instance in OCaml, there is a type bool of booleans, but those could already have been implemented using inductive types by type bool = False | True This is reasonable in OCaml to have a dedicated type for performance reasons but, in a proof assistant, this would mean more code to trust which is a bad thing: if we can encode a feature in some already existing feature, this is good. In Agda, booleans are actually implemented as above: data Bool : Set where false true : Bool as well as in Coq: Inductive bool : Set := false : bool | true : bool. Bootstrapping. A nice idea in order to gain confidence in the proof checker would be to bootstrap and prove its correctness inside itself: OCaml is programmed in OCaml, why couldn’t we prove Agda in Agda? Gödel’s second incompleteness theorem unfortunately shows that this is impossible. However, a fair amount can be done, and has been done in the case of Coq [BW97]: the part which is out of reach is to show the termination of Coq programs inside Coq (we already faced a similar situation, in the simpler case of Peano arithmetic, see section 5.2.5). Termination. A proof assistant should be able to decide, in finite amount of time, whether a proof is correct or not. In order to do so, it has to be able to check that a given function will produce a given value. For this reason, all the

CHAPTER 6. AGDA

266

functions that you can write in proof assistants such as Agda are total: they always produce a result in a finite amount of time. In order to ensure this, heavy restrictions are imposed on the programs which can be implemented in proof assistants. Firstly, since all functions are total, the language is not Turingcomplete: there are some programs that you can write in usual programming languages that you will not be able to write in a proof assistant. Fortunately, those are “rare” and typically arise when trying to bootstrap as explained above. Secondly, since the problem of deciding whether a function terminates or not is undecidable, the proof assistant actually implements conditions which ensure that accepted programs will terminate, but some terminating programs will actually get rejected for “no good reason”. These issues are detailed in section 6.8. 6.1.2 Installation. In order to use Agda you will need two pieces of software: Agda itself and an editor which supports interacting with Agda. The advised editor is Emacs. Linux. On Ubuntu or Debian, installing Agda and Emacs is achieved by typing sudo apt-get install agda emacs Or install cabal and cabal update cabal install Agda agda-mode setup Atom. For people thinking that Emacs looks too old, a most modern-looking editor compatible with Agda is Atom1 , which is available for most platforms. In order to activate Agda support, you should go into the menu Edit > Preferences > Install search for “agda”, and install both agda-mode and language-agda. In the preferences, under the Editor tab, it is also recommended to activate the Scroll Past End checkbox. If Atom has trouble finding the standard libraries, in the settings of the Package agda-mode, change the Agda path to something like /usr/bin/agda -i /usr/share/agda-stdlib macOS. If you have brew: brew install emacs agda agda-mode setup Otherwise, the manual way: – download Haskell2 – run in a terminal: 1 https://atom.io/

2 https://www.haskell.org/platform/mac.html

CHAPTER 6. AGDA

267

cabal update cabal install Agda agda-mode setup – install Emacs3 or Atom (see above). Alternatively, you can try Agda Writer4 which looks nice and simple. Windows. Follow those steps: – download Haskell5 – run the same commands as for macOS (without brew) in a terminal – install Emacs6 or Atom (see above). Alternatively, there is an installer7 : you should choose the latest version. After that, you still need to install the standard library by hand. It can be downloaded8 , and you should copy the contents of the src directory of the archive to the library directory of the Agda installed by the previous installer.

6.2 Getting started with Agda 6.2.1 Getting help. The first place to get started with Agda is the online documentation, which is quite well written: https://agda.readthedocs.io/en/latest/ As usual you can also search on the web. In particular, there are also various forums such as Stackoverflow: https://stackoverflow.com/questions/tagged/agda A very good introduction to Agda is [WK19]. 6.2.2 Shortcuts. When writing a proof in Agda, we do not have to write the whole program directly: this would be almost impossible in practice. The editor allows us to leave “holes” in proofs (written ?) and provides us with shortcuts which can be used in order to fill those holes and refine programs. Below we provide the shortcuts for the most helpful ones, writing C-x for the control key + the x key. They might seem a bit difficult to learn at first, but you will see that they are easy to get along, and we can live our whole Agda life with only six shortcuts. 3 https://emacsformacosx.com/

4 http://markokoleznik.github.io/agda-writer/

5 https://www.haskell.org/platform/windows.html

6 http://ftp.gnu.org/gnu/emacs/windows/emacs-26/ 7 http://homepage.divms.uiowa.edu/~astump/agda/

8 https://wiki.portal.chalmers.se/agda/pmwiki.php/Libraries/StandardLibrary

CHAPTER 6. AGDA

268

Emacs. We should first recall the main Emacs shortcuts: C-c C-s C-w M-w C-y

save file cut copy paste

Atom uses more standard ones. Agda. The main shortcuts for Agda that we will need are the following ones, their use is explained below in section 6.2.6. C-c C-l C-c C-, C-c C-space C-c C-c C-c C-r C-c C-a middle click

typecheck and highlight the current file get information about the hole under the cursor give a solution case analysis on a variable refine the hole automatic fill definition of the term

A complete list can be found in the online documentation. Shortcuts which are also sometimes useful are C-c C-. which is like C-c C-, but also shows the inferred type for the proposed term for a hole, and C-c C-n which normalizes a term (useful to test computations). Symbols. Agda allows for using fancy UTF-8 symbols: those are entered using \ (backslash) followed by the name of the symbol (many names are shared with LaTeX). Most of them can be found in the documentation. The most useful ones are for logic ∧ ∨

\and \or

\top \bot

\to \neg

→ ¬

∀ ∃

\all \ex

Π Σ



\le \qed

\Pi \Sigma

λ ≡

\Gl \equiv

and some other useful ones are \bN \uplus

× ∷

\times \::





\in

Indices and exponents such as in x ₁ and x¹ are respectively typed \_1 and \^1, and similarly for other. 6.2.3 The standard library. The standard library defines most of the expected data types. The default path is /usr/share/agda-stdlib and you are encouraged to have a look in there or in the online documentation. We list below some of the most useful modules. Data types. The modules for common data types are: Data.Empty Data.Unit Data.Bool Data.Nat Data.List Data.Vec Data.Fin

Empty type ( ) Unit type ( ) Booleans Natural numbers ( ) Lists Vectors (lists of given length) Types with finite number of elements

CHAPTER 6. AGDA

269

Other useful ones are : Data.Integer (integers), Data.Float (floating point numbers), Data.Bin (binary natural numbers), Data.Rational (rational numbers), Data.String (strings), Data.Maybe (option types), Data.AVL (balanced binary search trees). Logic. Not much is defined in the core of the Agda language and most of the type constructors are also defined in the standard library: Data.Sum Data.Product Relation.Nullary Relation.Binary.PropositionalEquality

Sum types ( , ∨) Product types (×, ∧, ∃, Σ) Negation (¬) Equality (≡)

Algebra. The standard library contains modules for useful algebraic structures in Algebra.*: monoids, rings, groups, lattices, etc. 6.2.4 Hello world. A mandatory example is the “hello world” program, see section 1.1.1. We can of course write it in Agda: open import IO main = run (putStrLn "Hello, world!") We however only give it for fun here: you will very rarely write such a program. A more realistic example is detailed in next section. 6.2.5 Our first proof. As a first proof, let’s show that the propositional formula A∧B ⇒B∧A is valid. By the Curry-Howard correspondence, we want a program of the type A×B →B×A showing that × is commutative. In OCaml, we would have typed # let prod_comm (a , b) = (b , a);; val prod_comm : 'a * 'b -> 'b * 'a = The full proof in Agda goes as follows: open import Data.Product -- The product is commutative ×-comm : (A B : Set) → (A × B) → (B × A) ×-comm A B (a , b) = (b , a) We should try to explain the various things we see there. Importing modules. The programs in Agda, including the standard library, are organized in modules which are collections of functions dedicated to some feature. Here, we want to use the product, and therefore we have to import the corresponding module, which is called Data.Product in the standard library. In order to do so, we use the open import command which loads all its functions.

CHAPTER 6. AGDA

270

Comments. In Agda, comments start by -- (two minus dashes), as in the second line. Declaring functions. We are defining a function named ×-comm. A function declaration always contains (at least) two lines. The first one is of the form name : type declaring that the function name will have the type type, and the second one is of the form name a1 ... an = value declaring that the function name takes arguments ai and returns a given value. Types. Let us detail the type (A B : Set) → (A × B) → (B × A) we have given to the function. As indicated above, in OCaml the type would have been 'a * 'b -> 'b * 'a which means that for any types ’a and ’b the function can have the above type. In Agda, there is not such implicit universal quantification over types, which means that we have to do that by ourselves. We can do this because 1. we have the special type Set which is “the type of types” (we have universes), 2. we have the ability to name the arguments in types and use them in further types (we have dependent types). The type of the function will thus read as: given arguments A and B of type Set (i.e. given types A and B), given a third argument of type A × B, we return a result of type B × A. The fact that the arguments A and B are grouped here is purely a syntactic convenience, and the above type is exactly the same as (A : Set) → (B : Set) → (A × B) → (B × A) Function definitions. The definition of the function is then the expected one ×-comm A B (a , b) = (b , a) We take three arguments: A, B and a pair (a , b) and return the pair (b , a). Note that the fact that we can write (a , b) for the third argument is because Agda allows definitions by pattern matching (just as OCaml): here, the product has only one constructor, the pair. Spaces. A minor point, which is sometimes annoying at first, is that spaces for constructors are important: you have to write (a , b) and not (a, b) or (a,b). This is because the syntax of Agda is really extensible (the notation for pairings is not builtin for instance, it is defined in Data.Product!), which comes with some induced limitations. A side effect of this convention is that a,b is a perfectly legit variable name (but it is not necessarily a good idea to make heavy use of this opportunity).

CHAPTER 6. AGDA

271

Typesetting UTF-8 symbols. Since we want our proofs to look fancy, we have used some nice UTF-8 symbols: for instance “→” and “×”. In the editor, such symbols are typed by commands such as \to or \times as indicated above, in section 6.2.2. There are usually text replacements (e.g. we could have written -> and *), but those are not used much in Agda. 6.2.6 Our first proof, step by step. The above proof is very short, so that we could have typed it at once and then made sure that it typechecks, but even for moderately sized proofs, it is out of question to write them in one go. Fortunately, we can input those gradually, by leaving “holes” in the proofs which are refined later. Let us detail how one would have done this proof step by step, in order to introduce all the shortcuts. We begin by giving the type of the function and its declaration as ×-comm : (A B : Set) → (A × B) → (B × A) ×-comm A B p = ? We know that our function takes three arguments (A, B and p), which is obvious from the type, but we did not think hard enough yet of the result so that we have written ? instead, which can be thought of as a “hole” in the proof. We can then typecheck the proof by typing C-c C-l. Basically, this makes sure that Agda is aware of what is in the editor (and report errors) so that you should use it whenever you have changed something in the file (outside a hole). Once we do that, the file is highlighted and changed to ×-comm : (A B : Set) → (A × B) → (B × A) ×-comm A B p = { }0 The hole has been replaced by { }0, meaning that Agda is waiting for some term here (the 0 is the number of the hole). Now, place your cursor in the hole. We can see the variables at our disposal (i.e. the context) by typing C-c C-,: Goal: B × A -----------------------------------------------------------p : A × B B : Set A : Set This is useful to know where we are exactly in the proof: here we want to prove B × A with A, B and p of given types. Now, we want to reason by case analysis on p. We therefore use the shortcut C-c C-c, Agda then asks for the variable on which we want to reason by case on, in this case we reply p (and enter). The file is then changed to ×-comm : (A B : Set) → (A × B) → (B × A) ×-comm A B (fst , snd) = { }0 Since the type of p is a product, p must be a pair and therefore Agda changes p to the pattern (fst , snd). Since we do not like the default names given by Agda to the variables, we rename fst to a and snd to b: ×-comm : (A B : Set) → (A × B) → (B × A) ×-comm A B (a , b) = { }0

CHAPTER 6. AGDA

272

We should then do C-c C-l so that Agda knows of this change (remember that we have to do it each time we modify something outside a hole). Now, we place our cursor into the hole. By the same reasoning, the hole has a product as a type, so that it must be a pair. We therefore use the command C-c C-r which “refines” the hole, i.e. introduces the constructor if there is only one possible for the given type. The file is then changed to ×-comm : (A B : Set) → (A × B) → (B × A) ×-comm A B (a , b) = { }1 , { }2 The hole was changed in a pair of two holes. In the hole { }1, we know that the value should be b. We can therefore write b inside it and type C-c C-space to indicate that we have given the value to fill the hole: ×-comm : (A B : Set) → (A × B) → (B × A) ×-comm A B (a , b) = b , { }2 We could do the same for the second hole (by giving a), but we get bored: this hole is of type A so that the only possible value for it was a anyway. Agda is actually able to find that by typing C-c C-a which is the command for letting the prover try to automatically fill a hole: ×-comm : (A B : Set) → (A × B) → (B × A) ×-comm A B (a , b) = b , a 6.2.7 Our first proof, again. We would like to point out that these steps actually (secretly) correspond to constructing a proof. For simplicity, we suppose that A and B are two fixed types, this can be done by typing postulate A B : Set and consider the proof ×-comm : (A × B) → (B × A) ×-comm (a , b) = b , a which is a small variant of previous one. We now explain that constructing this proof corresponds to constructing a proof in sequent calculus. As a general rule: – doing a case split on a variable (C-c C-c) corresponds to performing a left rule (or an elimination rule in natural deduction), – refining a hole (C-c C-r) corresponds to performing a right rule (or a introduction rule in natural deduction), – providing a variable term (C-c C-space) corresponds to performing an axiom rule. In figure 6.1, we have shown how the steps of our proof in Agda translate into the construction of the proof from bottom up, in sequent calculus. Also note that there is a perfect correspondence with respect to the Curry-Howard

CHAPTER 6. AGDA

273

Agda

Shortcut

×-comm = { }0

C-c C-r

×-comm p = { }0

C-c C-c p

×-comm (a , b) = { }0

C-c C-r

×-comm (a , b) = { }1 , { }2

b C-c C-s

×-comm (a , b) = b , { }2

a C-c C-s

×-comm (a , b) = b , a

Proof ? `A∧B ⇒B∧A ? A∧B `B∧A `A∧B ⇒B∧A ? A, B ` B ∧ A A∧B `B∧A `A∧B ⇒B∧A ? ? A, B ` B A, B ` A A, B ` B ∧ A A∧B `B∧A `A∧B ⇒B∧A ? A, B ` B A, B ` A A, B ` B ∧ A A∧B `B∧A `A∧B ⇒B∧A A, B ` B A, B ` A A, B ` B ∧ A A∧B `B∧A `A∧B ⇒B∧A

Figure 6.1: Agda proofs and sequent proofs.

Rule (⇒R )

(∧L )

(∧R )

(ax)

(ax)

CHAPTER 6. AGDA

274

correspondence if we allow ourselves to put patterns instead of variables in the context: (ax)

(ax)

a : A, b : B ` b : B a : A, b : B ` a : A (∧R ) a : A, b : B ` (b, a) : B ∧ A (∧L ) (a, b) : A ∧ B ` (b, a) : B ∧ A (⇒R ) ` λ(a, b).(b, a) : A ∧ B ⇒ B ∧ A This correspondence has some defects in general [Kri09], which is why we do not detail it further here.

6.3 Basic agda In this section we present the main constructions which are present in the core of Agda, with the notable exception of inductive types which are described in sections 6.4 and 6.5. 6.3.1 The type of types. In Agda, there is by default a type named Set, which can be thought of as the type of all types: an element of type Set is a type. 6.3.2 Arrow types. In Agda, we have the possibility of forming function types: given types A and B, one can form the type A → B of functions taking an argument of type A and returning a value of type B. For instance, the function isEven which determines whether a natural number is boolean will be given the type isEven :

→ Bool

Type constructors. Functions in Agda can operate on types. For instance, the type of lists, is a type constructor: it is a function which takes a type A as argument and produces a new type List A, the type of lists whose elements are of type A. We can thus give it the type List : Set → Set The type List A can also be seen as a type which is parametrized by another type, just as in OCaml the type ’a list of lists is parametrized by the type ’a. Named arguments. In Agda, we can give a name to the arguments in types, e.g. we can give the name x to A and consider the type (x : A) → B For instance, the even function could also have been given the type isEven : (x :

) → Bool

CHAPTER 6. AGDA

275

However, the added power comes from the fact that the type B is also allowed to make use of the variable x. For instance, the function which constructs a singleton list of some type can be given the following type (see section 6.3.3 for the full definition of this function): singleton : (A : Set) → A → List A Both the second argument and the result use the type A which is given as first argument. Such a type is called a dependent type: it can depend on a value, which is given as an argument. Universal quantification. Another way to read the type (x : A) → B is as a universal quantification: it corresponds to what we would have previously written ∀x ∈ A.B For instance, we can define the type of equalities between two elements of a given type A by eq : (A : Set) → A → A → Set and a proof that this equality is reflexive is given the type refl : (A : Set) → (x : A) → eq A x x which corresponds to the usual formula ∀A.∀x ∈ A.x = x Implicit arguments. Sometimes, some arguments can be deduced from the type of other arguments. For instance, in the singleton function above, A is the type of the second argument. In this case, we can make the first argument implicit, which means that we will not have to write it and we will let Agda guess it instead. This is done by using curly brackets in the type singleton : {A : Set} → A → List A This allows us to simply write singleton 3 and Agda will be able to find out that A has to be , since this is the type of 3. In case we want to specify the implicit argument, we have to use the same brackets: singleton { } 3 Another way of having Agda make a guess is to use _ which a placeholder that has to be filled automatically by Agda. For instance, we could try to let Agda guess the type of A (which is Set) by declaring singleton : {A : _} → A → List A which can equivalently be written singleton : ∀ {A} → A → List A

CHAPTER 6. AGDA

276

6.3.3 Functions. As indicated in section 6.2.5, a function definition begins with a line specifying the type of the function, followed by the definition of the function itself. For instance, the singleton function which takes an element x of some arbitrary type A and returns the list with x as the only element, can be defined as singleton : (A : Set) → A → List A singleton A x = x ∷ [] or as singleton : {A : Set} → A → List A singleton x = x ∷ [] or as singleton : {A : Set} → A → List A singleton = λ x → x ∷ [] In the second variant, A is an implicit argument. In the third variant, we illustrate the fact that we can use λ-abstractions to define anonymous functions. Infix notations. In function names, underscores (_) are handled as places where the arguments should be put, which allows to easily define infix operators. For instance, we can define the addition with type _+_ :





and then use it as 3 + 2 The prefix notation is still available though, in case it is needed: _+_ 3 2 The priorities of binary operators can be specified by commands such as infix 6 _+_ which states that the priority of addition is 6 (the higher the number, the higher the priority). Operations can also be specified to be left (resp. right) associative by replacing infix by infixl (resp. infixr). For instance, addition and multiplications are usually given priorities infixl 6 _+_ infixl 7 _*_ so that the expression 2 + 3 + 5 * 2 is implicitly bracketed as (2 + 3) + (5 * 2)

CHAPTER 6. AGDA

277

Auxiliary functions. In the definition of a function, it is possible to use auxiliary function definitions using the where keyword. For instance, we can define the function f which computes the fourth power of a natural number, i.e. f (x) = x4 , by using the square function as an auxiliary function, i.e. f (x) = (x2 )2 , as follows: fourth : → fourth n = square (square n) where square : → square n = n * n Here, we define the fourth function in terms of the auxiliary function square, which is defined afterwards, preceded by the where keyword. 6.3.4 Postulates. It rarely happens that we need to assume the existence of a term of a given type without any hope of proving it: this is typically the case for axioms. This can be achieved by the postulate keyword. For instance, in order to work in classical logic, we can assume the law of excluded middle with postulate lem : (A : Set) → ¬ A

A

These should be avoided as much as possible because postulates will not compute: if we apply lem to an actual type A, it will not reduce to either ¬ A or A, as we would expect for a coproduct, see section 6.5.6: how could Agda possibly know which one is the right one? 6.3.5 Records. Records in Agda are pretty similar to those in other language (e.g. OCaml) and will not be used much here. In order to illustrate the syntax, we provide here an implementation of pairs using records: record Pair (A B : Set) : Set where field fst : A snd : B make-pair : {A B : Set} → A → B → Pair A B make-pair a b = record { fst = a ; snd = b } proj ₁ : {A B : Set} → Pair A B → A proj ₁ p = Pair.fst p 6.3.6 Modules. A module is a collection of functions. It can be declared by putting module Name where at the beginning of the file, where Name is the name of the module and should match the name of the file. The functions of another module can be used by issuing the command open import Name

CHAPTER 6. AGDA

278

which will expose all the functions of the module Name. After this command, the modifiers hiding (...) or renaming (... to ...) can be used in order to hide or rename some of the functions.

6.4 Inductive types: data Inductive types are the main way of defining new types in Agda. Apart from a few exceptions (such as → and Set mentioned above), all the usual types are defined using this mechanism in the standard library, including usual data types and logical connectives; we first focus on data types in this section. An inductive type T is declared using a statement of the form data T : A where cons ₁ : A ₁ → ... → A ᵢ → T ... cons ₙ : B ₁ → ... → B ⱼ → T which declares that T is an inductive type, whose type is A, with constructors cons ₁ , . . . , cons ₙ . For each constructor, the line begins with two blank spaces, followed by the name of the constructor, and ended by the type of the constructor. Each constructor takes an arbitrary number of arguments and has T as return type. Since the type T we are defining is itself a type, A is usually Set, although some more general inductive types are supported (for instance, they can depend on some other types, see section 6.4.7). 6.4.1 Natural numbers. As a first example, the natural numbers are defined as the inductive type in the module Data.Nat by data : Set where zero : suc : → The first constructor is zero, which does not take any argument, and the second constructor is suc, which takes a natural number as argument. A value of type is zero

suc zero

suc (suc zero)

suc (suc (suc zero))

and so on. As a commodity, the usual notation for natural numbers is also supported and we can write 2 as a shorthand for suc (suc zero). 6.4.2 Pattern matching. The way one typically uses elements of an inductive type is by pattern matching: it allows inspecting a value of an inductive type and return a result depending on the constructor of the value. As explained above, the cases are usually generated by using the C-c C-c shortcut which instructs the editor to perform case analysis on some variable. For instance, in order to define the predecessor function, we start with → pred : pred n = ?

CHAPTER 6. AGDA

279

then, by C-c C-c we indicate that we want to reason by case analysis on n, which turns the code into → pred : pred zero = ? pred (suc n) = ? We now have to give the result of the function when the argument is zero (by convention the predecessor of 0 is 0) and when the argument is suc n, where n is a natural number. We can finally fill in the holes in order to define the predecessor: pred : → pred zero = zero pred (suc n) = n Of course, pattern matching also works with multiple arguments and we can define addition by _+_ : → → zero + n = n suc m + n = suc (m + n) This definition can be tested by defining t = 3 + 2 and use C-c C-n to normalize t (which give 5 as answer). Subtraction can be defined similarly by _∸_ : → → zero ∸ n = zero suc m ∸ zero = suc m suc m ∸ suc n = m ∸ n (by convention m − n = 0 when m < n) and multiplication by _*_ : → → zero * n = zero suc m * n = (m * n) + n Matching with other values. It is sometimes useful to define a function by case analysis on a value which is not an argument. In this case, we can use the where keyword followed by the value we want to match on. This value can then be matched as an extra argument, which has to be separated from the other argument by a symbol |. For instance, the modulo function on natural numbers can be defined by induction on the second argument by the following definition: ( m if m < n m mod n = (m − n) mod n otherwise Here, we do not want to reason directly by induction on n, which would force us to distinguish the case where n is zero or a successor, but rather on whether m < n holds or not. This can be achieved by matching on m