220 37 7MB
English Pages [652] Year 2022
George Tourlakis
Computability
Computability
George Tourlakis
Computability
George Tourlakis Electrical Engineering and Computer Science York University Toronto, ON, Canada
ISBN 978-3-030-83201-8 ISBN 978-3-030-83202-5 (eBook) https://doi.org/10.1007/978-3-030-83202-5 © Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Για τον Ν´ικο την Ζω´η και τον Ιάσονα
Preface
This volume is mostly about the theoretical limitations of computing: On one hand, we investigate what we cannot do at all through the application of mechanical (computational) processes —and why. This is the domain of computability theory. On the other hand, we study some problems that admit only extremely inefficient mechanical solutions —and explore why this is so. For example, is it always the enormous size of the output that makes a solution computationally inefficient? We will see (Chap. 14) that this is a deeper phenomenon and will discover that for any amount of “resources” that we might a priori decide to “spend” on any computation, we can build a computable function so that every program we might employ to compute it will require, for all but a finite number of inputs, more resources than we had originally set aside —and this remains true even if we restrict the outputs of our function to be only the numbers 0 or 1. The study of this phenomenon belongs to (computational) complexity theory. Admittedly, anyone who has programmed a computer, or studied some discrete mathematics, or has done both, will normally have no trouble recognising a mechanical (computational) process (or algorithm) when they see one. For example, they ought to readily recognise Euclid’s algorithm for finding the “greatest common divisor”,1 gcd(a, b), as such a process. The same is true of the binary search algorithm used to search for a given value in a sorted (ascending or descending order) finite array of values. Most of us will define a mechanical process as any finitely described, precise, step-by-step process, usually performed in terms of a menu of permissible instructions, which must be carried out in a certain order. Moreover, these instructions must be extremely simple, and such that they can be mechanically carried out unambiguously, say, by pencil and paper.2 Here are two examples of instructions
1 “Greatest
common divisor” is established jargon, where “greatest” means “largest”. describe neither mechanical, nor precise, processes. For example, a bit more or a bit less salt is not critical —unless the end-user is on a salt-free diet. And is it really necessary for all ingredients to be at room temperature in order to make mayonnaise? 2 Recipes
vii
viii
Preface
that meet our informal requirements stated in this paragraph: (1) “change the ith digit of a (natural) number written in decimal notation by adding 1 to it (where adding 1–9 makes it 0 but changes nothing else); another one: (2) “add 1 to a natural number n”. The simpler that these “mechanical” instructions are the easier it is for reasonable people to agree that they can be carried out mechanically and precisely.3 Hm. If all we want to achieve is to agree with reasonable people on the definition of an “algorithm” or “mechanical process”, then what do we need a theory of computation for? The difficulty with algorithms —programs, as people who program computers call them— is not so much in recognising that a process is an algorithm, or to be able to device an algorithm to solve a straightforward problem; programmers do this for a living. The difficulty manifests itself as soon as we want to study the inherent mathematical limitations of computing. That is, when we want to prove that a particular problem, or a set of problems, admits no solution via mechanical processes —admits no solution via a program. For example, consider the problems of the type “check whether an arbitrary given program P computes the constant function that, for every natural number input, outputs always the same number, 42”. We will see in this volume that such a “checker” —if we insist that it be implemented as a mechanical process— does not exist, regardless of whether the constant output is 42 or some other number. How does one prove such a thing? We need a mathematical theory of computation, whose objects of study are programs (mechanical processes), and problems. Then we can hope to prove a theorem of the type “there is no program that solves this problem”. Usually, we prove this non-existence by contradiction, using tools such as diagonalisation or reductions (of one problem to another). We will perform this kind of task for many problems in this book and we will develop powerful tools such as Rice’s theorem, which, rather alarmingly, states: We cannot have a mechanical checker that decides whether an arbitrary program X has a given property or not, and this fact is true of all conceivable mathematical properties, except two: (1) The property that no program has, and (2) the property that all programs have.
On the other hand, in the mathematical domain, an instruction that is so complex as to require us to prove a theorem in order to be allowed to apply it is certainly not mechanical, because theorem proving in general is not. Nor is it simple. 3 The “machines” of Turing (1936, 1937), the Turing Machines, or TMs of the literature, are used to define mechanical processes on numbers (equivalently, strings of symbols). Their instruction set is so primitive that adding 1 to a number is not allowed! (but this can be simulated, as a macro, of course, by using several Turing Machine instructions). Number manipulation at the primitive instruction level is restricted at the digit level on a TM.
Preface
ix
By the way, a comment like this one, which is situated between two -signs is to be considered worthy of attention.4 The programs studied in a mathematical theory of computation cannot be technology-dependent nor can they be restricted by our physical world. In particular, the mathematisation of our “mechanical processes” will allow such processes to “compute” using arbitrary (natural) numbers, of any length and hence size, unlike real computers that have overall memory limitations, but also limitations on the size of numbers that we can store in any program variable. Why do we disallow size limitations? Consider the function which for any natural number x outputs the expression below ·· 2 g(x) = 2
·2
x 2’s
There is a very straightforward way to implement g(x) recursively,5 which I present in two versions, mathematically first, and then programmatically. g(0) = 1 g(x + 1) = 2g(x) or procedure g(x) x = 0 then return (1)
if
return (2g(x−1) )
else
It would be a shame to disallow into our theory this very simple function — look at the trivial program!— for reasons of (output) size.6 We must disallow any limitations on number size in our theory of computability lest we let our theory become the theory of finite tables (that represent functions of limited input and output size). Here is a tiny sample of interesting problems, both practically and theoretically speaking, that we will explore in this book to determine whether they are amenable to a programming solution, or not. We will find that they are not.
4 This type of flagging an important passage of text originated in the writings of Bourbaki (1966), that some may find annoying. To be fair, any exhortation to pay attention is likely to be annoying, no matter what symbols you use to effect it: Boxes, red type, or road signs. 5 Primitive recursively, but I do not want to be too far ahead of myself, so I am using a familiar term from elementary programming, say in the C language, with recursion. 6 222
= 16 but 22
22
= 65,536 while 22
2 22
is astronomical.
x
Preface
• Do we have a mechanical checker for the program correctness problem? That is the problem “Is this (arbitrary) program (say, written in the C language) correct?”7 • Is there a mechanical process that can check whether a formula of a mathematical theory is a theorem? This is Hilbert’s Entscheidungsproblem, or decision problem. Church proved that for the theory of natural numbers —the so called Peano Arithmetic (PA)— the answer is “no”. Even the simpler theory obtained from PA by removing the induction axiom has a decision problem that is unsolvable by mechanical means. There is an intimate connection between logic and computing that goes beyond the superficial, and rather obvious, analogy between a proof and a computation (both consist of a finite sequence of very elementary steps). The most famous result on the limitations of computing8 is perhaps the “unsolvability of the halting problem” (Theorem 5.8.4). Even more famous are logic’s limitations of proofs results, that is, Gödel’s two incompleteness theorems, the first and the second (Gödel 1931, retold in Hilbert and Bernays 1968, Tourlakis 2003a). We will visit Gödel’s first incompleteness theorem several times in this volume, in Chap. 8 at first, and then we will discuss and prove several versions of it (including Rosser’s) in Chap. 12.9 But this is noteworthy: Our most elementary proof of this result (in Chap. 8) is to show that if we assume that Gödel’s theorem is false, then so is the unsolvability of the halting problem. In other words, the unsolvability of the halting problem implies Gödel’s first incompleteness theorem supporting our claim that there is an intimate connection between logic and computation.10 There has been a lot of interest in logical circles in the early 1930s for the development of a mathematical theory of mechanical processes and the functions that they calculate —the calculable (or computable) functions. This was sparked by Hilbert’s conjecture that his formalised versions of mathematical theories would have an Entscheidungsproblem (decision problem: “Is this sentence a theorem?”) that would be decidable or solvable via mechanical means. Clearly, logicians were anxious to define (mathematically) what “mechanical processes” or “mechanical
7A
“correct” program produces, for every input, the output that is expected by its specification. “famous result” is not always hard to prove. For example, Russell’s paradox or antinomy regarding Cantor’s Set Theory is very easy to describe (it involves a proof that some “collection” is technically not a “set”) even to a 1st year class on discrete mathematics. The “halting problem” is perhaps the easiest problem one can show not amenable to a computational solution. Even Gödel’s first incompleteness theorem can be made “easy” to prove and understand after a bit of computability and cutting a few corners in the presentation —that is, omit the details of arithmetisation (cf. Chap. 8). 9 For the statement and complete proof of the second theorem see Tourlakis (2003a) or the earlier (Hilbert and Bernays 1968). 10 No wonder that computability is considered to be part of logic. 8A
Preface
xi
means” are in order to verify, or otherwise, Hilbert’s conjecture! Towards this end, several logicians independently defined mathematisations of mechanical process or algorithm in the early 1930s (Turing 1936, 1937; Church 1936a; Kleene 1936; Post 1936) and there were some notable contributions in the 1960s as well (Markov 1960; Shepherdson and Sturgis 1963). The first set of efforts (1930s) were quickly shown to be equivalent “extensionally”:11 The various very different mechanical processes introduced were, provably, all computing the members of the same set of calculable or computable functions! In fact the 1960s efforts also defined the very same set of functions as the ones produced in the 1930s (these proofs —of the equivalence of all these definitions— are retold in Tourlakis 1984). There are so many mathematical models of computation out there! Which one should we use in this volume to found the theory of computation? The short (and correct) answer is it does not matter; all models are extensionally the same, that is why I emphasised “found” above: The model of computation is only meant to found the theory and lead (quickly, one hopes) to the pillars of the theory, the universal and S-m-n theorems along with the normal form theorems from which all else follows —no need to continue going back to the model after that. Nevertheless, let me offer some comments on my choice of model of computation for the early elementary chapters of the present volume, the ones that deal with basics of computation —without oracles. Our choice is the URM (unbounded register machine) of Shepherdson and Sturgis (1963) through which we will define “computable function” and “computation”.12 For our later chapters that are on computation with an oracle, and more generally computation of functionals —meaning functions that admit entire functions on numbers as inputs— we use the modern approach due to Kleene where the indexing (of programs) is built-in into the model.13 Incidentally, an example of a functional, from calculus, that you may immediately relate to is the function I nt (a, b, f ) where the inputs a and b are natural numbers and f is a function while the output is b expected to be a f (x)dx.
11 A process is an intentional object. It intends to finitely represent a set, even build it —for example, an enumeration process. The built set is an extensional object. We know it by its extent; what it contains. For example, the axiom of extensionality in set theory, (∀y)(∀z)((∀x)(x ∈ y ≡ x ∈ z) → y = z) says that two sets y and z will be equal if they have the same members —same extent. The axiom does not care how the sets came about. 12 In some of our earlier works (Tourlakis 1984, 2003a) we have chosen instead the elegant but somewhat less intuitive Kleene approach of “μ-computable functions” (Kleene 1936, 1943) —one of the three approaches he developed. 13 One can do computation with oracles using the URM model (e.g., Cutland 1980), as one can do so using the TM model. However the Kleene approach is more elegant and more direct and prepares the reader for further study of computation with even more complex inputs than functions on numbers.
xii
Preface
Factors considered in choosing the URM model of computation include: 1. Quick access to the advanced results of the theory. To satisfy this parameter the optimal —in our opinion— approaches to the foundation of computability are • The completely number-theoretic approaches of Kleene, in particular the one via his μ-recursive functions formalism.14 • The other, even quicker approach, where expressions such as {e}( x) y —interpreted as “the program (coded as the natural number) e, on input x, produces output y” (Kleene 1959)— are taken as the starting point. Here the index or program code, or just “program” of each partial recursive function {e} is already built-in in the inductive definition of {e}( x ) y.15 In all other approaches one does a fair amount of “coding” work to show that computable functions can be so indexed. • Yet, the URM can reach the universal function, S-m-n and and normal form theorems almost as quickly, especially if, as we do in this volume, we by-pass arithmetisation in the elementary and foundational Chap. 5, relying instead on “Church’s thesis”. 2. Mathematical rigour. The first two approaches above are the most rigorous mathematically. However, one may want to trade some mathematical sophistication in order to reach a wider readership and still maintain a fairly rapid access to the major results of computability. Towards widening the readership scope authors invariably adopt “programming based” approaches to achieve the foundation of computability. Well chosen such programming foundations provide a fairly quick access to advanced results without unduly inhibiting the readers who are less mathematically advanced. • The “programming approach” is consistent with the oft stated belief that a pedagogically sound introduction to the theory of computation should take a student from the concrete to the abstract. This pedagogy supports audiences with diverse backgrounds, from undergraduate university computer science (or mathematics, logic, or philosophy) to graduate. • The choice of programming formalisms is not unlimited. Two prominent examples to choose from are TM-programming or URM-programming. But apropos pedagogy and the “from the concrete to the abstract” desideratum one must ask: What is the likely percentage of the readership who have programmed in assembly language (“concrete” TM-like programming) vs. those who have programmed in a high-level language like C or Python or JAVA or even FORTRAN? (“concrete” URM-like programming).
14 We derive this as a post-facto characterisation in Sect. 5.6 and Chap. 7 but in Tourlakis (1984) this approach was our starting point; the foundation of computability. 15 There is a tendency in the literature to say “induction” for proofs and say “recursive”, rather than “inductive”, for definitions. We just thought that the close proximity of terminology such as “each partial recursive function” and “recursive definition” might be confusing.
Preface
xiii
The answer to the question above tips the scale in favour of the register machine. Besides, programs written in the TM language are not only hard for the students to understand, but are also hard for authors of books like this and similar ones to write precisely, in detail, and do so within a rigorous mathematical discourse —without hand-waving! As far as I know, the only books in the entire computability literature that have used the Turing machine mathematically are the monograph of Davis (1958a) and the text by Boolos et al. (2003). Both works, quite appropriately, abandon “programming” (the TM) once they have proved rigorously what I have called above the “pillars of the theory”. All the other books (on the subject of computation) that I am aware of —and which have adopted the TM— on one hand, compromise mathematical rigour by extensively hand-waving in their proofs and constructions, presumably because the TM model is so hard to “program”16 On the other hand (which is probably worse than handwaving), these texts continue programming TMs until the very last theorem they present, thus misrepresenting computability as the practice of programming the Turing machine sprinkled with the occasional unsolvability result every now and then. Computability is not about programming in this or that language, but rather is about its results that detail what computation is, what it does, but, above all, what it cannot do. This is a book on the theory of mechanical processes and computable functions, both treated as the objects of study in a mathematical theory, not as activities. So the amount of “programming” is held to a minimum. Yet, inevitably, whether founded via TMs or URMs, the theory of computation will require a little bit of “programming” initially (in the chosen model), to establish a few foundational results that bootstrap the theory. For example, we show by brute force programming (in the URM model) that • The function that adds 1 to its input and returns the obtained value as output is computable. • The function, which for every input returns nothing as output is computable. • If two one variable functions are computable, then so is their composition. Most of such programming lemmata are extremely tedious when proved for the TM language —unless one trades proof for hand-waving— but are almost trivial for the URM language. This is another reason for choosing URMs: Use programming as the foundation, but be quick, readable, and easily verifiable to be correct while going about it! What is covered in this volume on computability is, partly, what is more or less “normal” to include in such a theory in the context of a reasonably sized book. It 16 For example, try to program primitive recursion on a TM, or see (Boolos et al. 2003, p.30) where a TM for multiplication of two natural numbers is given.
xiv
Preface
is also, partly, influenced by the author’s preferences (I admit that the first part, “normal”, is influenced by the second). The chapters, in outline are, as follows: We start with a “Chapter 0” to provide a one-stop-shopping kind of reference on essential topics from discrete mathematics and elementary logic that are needed to make studying this volume fun.17 A pessimist’s justification for a course in the theory of computation is that it serves to firm up the students’ grasp of (discrete) mathematical techniques and mathematical reasoning, that weren’t learnt in the prerequisite discrete math course. However I am rather optimistic that by the time the readers have studied the subject matter of this volume they will be equipped with substantially more than the mastery of the prerequisite. For this to happen, it cannot be emphasised enough, the student of a theory of computation course must be already equipped with the knowledge expected to be acquired by the successful completion of a one-semester course on discrete mathematics that does not skimp on logic topics. So why include a “Chap. 0” in this volume? This chapter is like the musical notes you see in front of professional Baroque players who hardly look at their notes while playing; they are just having fun! The notes are there to help only if needed! The same holds with our “Chap. 0” and you: You ought to be familiar with the topics in said chapter, but I include it for your reference. To be consulted only if you absolutely need it, that is —and also, partly, to give you a guide as to which topics of discrete math you will have to call upon most frequently. In order to be sure that readers have all the prerequisite tools available to them, if needed, our “Chap. 0” had to be longer than the norm especially on the topic of formal logic. This is due to our need of formal logic in order to present several (complete, mathematically) proofs of Gödel’s first incompleteness theorem in Chap. 12. I wanted, above all, to retell two stories in Chap. 0: logic and induction, that I often found are insufficiently present in the student’s “toolbox”, notwithstanding the earlier courses they may have taken. Now, (naïve) set theory topics included in discrete math courses provide not much more than a vocabulary of mathematical notation. I include this topic mainly to share with the readers early on that (1) not all functions of interest are totally defined, and (2) introduce them to Cantor’s ingenious (and simple) diagonalisation argument that recurs in one or another shape and form, over and over, in the computability and complexity part of the theory of computation. Enough about discrete math. Some high-level programming is also highly desirable (I should say, a prerequisite) to provide you with a context for all these Uncomputability results that are contained in this volume.
17 What
is essential is to some extent a subjective assessment.
Preface
xv
Chapter 1 begins our story proper. It introduces the mathematisation of “mechanical process” by introducing the URM programs, which express such processes, by definition, in our mathematical foundation of computability. We call these processes computations and the functions that they compute computable. How much mathematical rigour and how much intuition is a good mix? We favour both. The main reason that compels us to teach (meta)theory in a computer science curriculum is not so much to prevent the innocent from trying in vain to program a solution for the halting problem (cf. Theorem 5.8.4), just as we do not teach courses in axiomatic Euclidean geometry in order to discourage “research” on circle squaring and angle trisecting. Rather, we want the student to learn to mathematically analyse computational processes —not simply to use them— and (by doing the former) to become aware that not every problem is computationally solvable, and that, among the solvable ones, there are those that are computationally “intractable” in the sense that their computational solutions require prohibitive amounts of computational resources. It is our expectation that the reader will become proficient in the techniques that establish these limitations-results in the domain of computing. The techniques the readers will learn to use in this volume are a significant extension of the formal mathematical methods they learnt in a discrete mathematics course.18 These techniques, skills and results justify putting “science” in the term computer science. The two premier tools, diagonalisation and reductions, that you will have learnt to use by studying this volume are ubiquitous in computability (undecidability, unprovability) and complexity (computational unfeasibility). Just consider the wide applicability of diagonalisation as evidence of the value and power of this tool: It was devised and used by Cantor to show that the reals, R, is an uncountable set (cf. Sect. 0.5.3). It was then used by Russell, to show that Cantor’s unrestricted use of “defining (mathematical) properties”, P (x), in building “sets” such as {x : P (x) is true} leads to a nasty paradox.19 Then Gödel used it in his diagonalisation lemma to prove his incompleteness theorems (for example, cf. Lemma 9.1.1). 18 The term “formal methods”, when describing curriculum, uses the qualifier “formal” loosely to describe courses, such as discrete mathematics and logic. Also describes courses that are built upon the former two, where students are applying, rigorously, mathematical techniques in investigations of program specification and correctness (e.g., software engineering) as well as in investigations of the limitations of computing, computational complexity, logic, analysis of algorithms, AI, etc. The qualifier “formal” is, in this context, not that of Hilbert’s; here it simply means “mathematically rigorous”. Hilbert’s “formal” mathematical theories are instead founded axiomatically via purely syntactic definitions, and the mathematical proofs carried out in said theories are also purely syntactic objects, devoid of semantics —based on form only. 19 The word “paradox” is from the Greek παράδοξο, meaning inconsistent with one’s intuition or belief; in short, a contradiction. In this connection we have the well known Russell’s paradox, that the collection {x : x ∈ / x} cannot be —technically— a set.
xvi
Preface
Then computer scientists used it over and over again, to prove the unsolvability of the halting problem (Theorem 5.8.4), the existence of total computable functions beyond the primitive recursive (Sect. 3.4), the existence of nontotal computable functions that cannot be extended to total computable functions (Example 6.2.1), and many other results in computability. Blum (1967) used diagonalisation (and the recursion theorem, which is a diagonalisation tool) in proving that there are total computable functions that have no best program; every program for them can be sped up (Blum’s speed-up theorem) (Chap. 14). It is also important to develop and strengthen the students’ intuition that supports and discovers informal proofs! Students need to understand “what is going on” and help them see the essence of the argument rather than get lost in a maze of symbols. To this end we introduce Church’s thesis in Chap. 5, and start using it in our proofs that this or that construction leads to a computable function, computable by a URM, that is. This “thesis” is a belief —not a theorem or metatheorem— of Church, according to which any informal algorithm can be implemented as a Turing machine, or as any of the other equivalent mathematical computation models that were developed in the 1930s. It applies to Markov algorithms and URMs as well, as these two models are equivalent to all those proposed in the 1930s. Application of the “thesis” in proofs will strengthen intuition, an appreciation of “what’s going on”, and will offer quick and user-friendly progress in our study of the theory, especially in its formative steps in Chap. 5. However, it is also essential for the students to get a generous dose of rigorous mathematical proofs in this study of computability, uncomputability and complexity, and we aim in this volume to also develop in the reader the ability to understand and apply such methods. Thus, we almost always20 supplement a “proof by Church’s thesis” with a rigorous mathematical proof. There are also instances where you cannot apply Church’s thesis. For example, it is not applicable towards establishing that a function is primitive recursive, so rigorous mathematical methodology will be involved here out of necessity. Apropos primitive recursive functions, these are introduced in Sect. 2.1. These functions provide us with a rich toolset, including coding/decoding tools and the ability to simulate URM computations mathematically (Sect. 2.4.4). This result — which uses the elementary tool of simultaneous (primitive) recursion— is at the heart of our proof that the Kleene predicate T (x, y, z)21 is primitive recursive without doing a tedious arithmetisation of URM computations (the reader who absolutely wants to see such an arithmetisation is referred to Tourlakis (2012), or 20 There are two (notable) instances where we did not do so supplement —in Chap. 5— namely, the proofs of the universal function and S-m-n theorems. This was so because we wanted to not do an arithmetisation of URM computations in this volume. However, in the chapter on oracle computation —where we have left the URM and switched to Kleene’s indexing-based formalism— the S-m-n (and universal function) theorems are mathematically derived. 21 T (x, y, z) is true precisely when the URM at address x, when given as input y, has a computation of z steps (instruction-to-instruction steps).
Preface
xvii
simply wait until our Chap. 13 —but the latter arithmetisation is for a different foundation: of (oracle) computability, not for URM computability). The Kleene predicate is a pivotal tool in the development of computability. Chapter 3 develops the loop programs of Meyer and Ritchie (1967). These programs are significantly less powerful than the URM programs. They only compute the primitive recursive functions; no more no less. In particular, provably, they compute only total functions, unlike the URMs. We will learn that while these loop programs can only compute a very small subset of “all the computable functions”, nevertheless they are significantly more than adequate for programming solutions of any “practical”, computationally solvable, problem. For example, even restricting the nesting of loop instructions to as low as two, we can compute — in principle— enormously large functions, which, for each input x, can produce astronomical outputs such as 2x · 10350,000 2’s · 2·
(1)
even with input x = 0.22 The qualification above, “in principle”, separates what theory can do from what real machines can do due to physical (as opposed to theoretical) limitations that impede the latter. This chapter ends by showing, informally by diagonalisation, that a total intuitively computable function exists that is not primitive recursive. Chapter 4 is on the Ackermann function. Using a majorisation argument we prove that this function is not primitive recursive.23 Rather startlingly, we also prove —via a simple arithmetisation of the “pencil-and-paper computation” of the Ackermann function— that the graph of the Ackermann function24 is primitive recursive. An application of unbounded search to the graph shows that the Ackerman function is recursive. Chapter 5 enlists Church’s thesis (belief) that (essentially) states that an informally described program can be mathematised as a URM program. Here we present our first few unsolvability results, including the well known (at least by name) “halting problem” and offering a diagonalisation proof as to why it is unsolvable (by “mechanical means”). We also introduce here our first ad hoc reduction argument (these arguments are thoroughly developed and used in the following chapter) to prove that the program
22 This result will have to wait until we study the complexity (definitional and dynamic) of primitive recursive functions, in Chap. 15. 23 The majorisation argument consists in proving that the Ackermann function is too big to be primitive recursive. In a precise sense, it is strictly bigger (its outputs are) than any primitive recursive function. 24 If f is a function of one argument, its graph is the relation y = f (x).
xviii
Preface
correctness problem is also unsolvable by mechanical means. The chapter includes the definition of the Kleene T -predicate, the proof of its primitive recursiveness, the normal form theorems of Kleene, and a number theoretic (programmingindependent) characterisation of the set of partial recursive functions. It also includes the basic foundational tools, the universal and S-m-n theorems (both proved invoking Church’s thesis). Chapter 6 thoroughly examines the technique of reduction used towards proving unsolvability and non-semi-recursiveness results. The closure properties of the semi-recursive relations are proved, and the recursively (computably) enumerable sets are shown to be just another way of looking at semi-recursive sets. The chapter concludes with the Rice’s theorem and the two central Rice-like lemmata (Theorems 6.10.1 and 6.10.3). In Chap. 7 we derive a characterisation for the set of recursive (and partial recursive) functions that does not require primitive recursion as a given operation. This result, as well as the tool used to obtain it —arithmetising primitive recursive computations using numbers expressed in “2-adic (pronounced “dyadic”) as opposed to binary notation”— will be useful in Chap. 12. Chapter 8 is a first visit to the incompleteness phenomenon in logic, presenting a reduction argument of the form “the unsolvability of the halting problem implies that Peano arithmetic is syntactically incomplete —that is, it cannot syntactically prove all arithmetical truths”. This strongly connects unsolvability with unprovability. The next chapter (Chap. 9) is on the second recursion Theorem (9.2.1) and a few applications, including another proof of Rice’s theorem, and the technique of showing that recursive definitions have partial recursive solutions (fixed points, or fixpoints). Apropos fixpoints, the preamble to Chap. 9 shows the connection between Kleene’s version of the recursion theorem and the Carnap-Gödel version —the fixpoint or diagonalisation lemma— that was used by Gödel to construct a Peano arithmetic formula that says “I am not a theorem”. Chapter 10 introduces indices for primitive recursive derivations and produces a recursively defined universal function for PR. This provides a mathematically complete and rigorous version of the hand-waving argument that we produced at the end of the loop-programs chapter, and shows here that the universal function for PR is recursive but not primitive recursive (in Sect. 3.4 we could only show that the universal function was “intuitively computable”). Some of the tools produced in this chapter have independent interest: An S-m-n theorem and a recursion theorem for PR. Chapter 11 has some advanced topics on the enumerations of recursive and semi-recursive sets while Chap. 12 is on creative and productive sets and
Preface
xix
revisits the Gödel incompleteness theorem, and proves Church’s theorem on the unsolvability of Hilbert’s Entscheidungsproblem, and the Rosser sharpening of the first incompleteness theorem. We prove the above two results for Robinson’s arithmetic (that is, Peano arithmetic with the induction schema removed), with no recourse to any semantic help (the original statement and proof of Gödel’s theorem includes no semantic ideas in either its statement or its proof, but in Chap. 8 we cheated and presented a semantic statement and proof). A mathematically rigorous deceptively short proof of Gödel’s second incompleteness theorem —“if Peano arithmetic is consistent, then it cannot syntactically prove this fact”— is also given, but it assumes that the proofs of the so-called “derivability conditions” are provided to us by an oracle.25 This chapter also contains proofs of Tarski’s theorem on the non definability of truth. Chapter 13 introduces “oracle computations”, otherwise viewed as computing with arguments that are entire functions on the natural numbers —extensionally26 passed as arguments. For example, a numerical integration subroutine I nt (a, b, f ) has as arguments two numbers, a and b, and an entire function f (a table for f ) and b approximates the value of a f (x)dx. In this chapter we present increasingly more inclusive formalisms including that of Kleene (1959) and Moschovakis (1969), both using a Kleene-like (Kleene 1959) index-based foundation for partial computable functionals. The function inputs in both versions are allowed to be nontotal. The Kleene version of computing with total function inputs is also detailed, leading to the concept of computing relative to a set, Turing degrees and the priority method of Muchnik (1956), Friedberg (1957) in the modern version devised by Sacks (1963). The two versions of the “first recursion theorem” due to Kleene and Moschovakis are also proved in this chapter. Finally, we also classify here all the arithmetical relations (first encountered in Chap. 8) according to their complexity. We also give a different proof of Tarski’s theorem on the non definability of truth (the first proof was given in Sect. 12.8.1). In Chaps. 14 and 15 we present some complexity results of all computable functions in the former and of all primitive recursive functions and some interesting subclasses in the latter —for example, the class,27 of feasibly computable functions of Cobham, that is, those computable in deterministic polynomial time (measured as a function of input length). This class has a neat number-theoretic, programindependent characterisation.
25 Such “oracles” are Hilbert and Bernays (1968), Tourlakis (2003a). Gödel never published a proof
of his Second theorem. that the function f is passed as a “graph”, not as a “program”, f = {(a, b), (a , b ), (a , b ), . . .}. This graph is in general an infinite set. 27 Recursion theorists use the terms “class” and “set” synonymously. Thus, “class” in this volume does not have the same meaning that set theorists assign to the term as a possibly non-set collection. 26 Imagine
xx
Preface
Some interesting results of Chap. 15 include that, 1. in general, FORTRAN-like programs —more accurately, loop programs of Chap. 3— that allow nesting of the loop instruction equal to just three have highly impractical run times, certainly as high as · 22
2 ··
x 2’s
and 2. even if we restrict loop (program) nesting level to just two, the equivalence problem of programs, that is, the program correctness problem, is unsolvable for such loop programs! In fact, it is not even semi-recursive. We have included in this volume a good amount of complexity theory that will likely be mostly skipped in all but the most advanced courses on computability, as such courses proceed at a much faster pace. There are a few “high-level complexity” results already in Chap. 14 that use diagonalisation for their proof. Later, quite a bit is developed in Chap. 15, including an account of Cobham’s class of feasibly computable functions; and a thorough look at the hierarchy theory of the primitive recursive functions culminating in the rather startling fact that we cannot algorithmically solve the correctness problem of FORTRAN-like programs even if we restrict the nesting of loops to just two levels. FORTRAN-like languages have as an abstract counterpart the loop programs of Meyer and Ritchie (1967) that we study in Chap. 3 and again in Chap. 15. If I used this book in 3rd or 4th undergraduate course on computability I would skim very quickly over the mathematical “review” chapter, and then cover Chaps. 1, 2, 3, 5, parts of Chap. 6 —say, the first three sections— Chap. 8 (but instructor-simplified further). I would also want to cover the section on the recursion theorem, and also as much complexity theory, as time permits, from Chap. 15. In a graduate level class I would add to the above coverage Chaps. 11, 12 and 13, reducing as needed the amount of material that I cover in Chap. 15. The reader will forgive I hope the many footnotes, which the style police may assess as “bad style”! However, there is always a story within a story, the “. . . and another thing . . . ” of Douglas Adams (or lieutenant Columbo), that is best delegated to footnotes not to disrupt the flow . . . Incidentally the book by Wilder (1963) on the foundations of mathematics would lose most of its effectiveness if it were robbed of its superbly informative footnotes! The style of exposition that I prefer is informal and conversational and is expected to serve well not only the readers who have the guidance of an instructor, but also those readers who wish to learn computability on their own. I use several devices to promote understanding, such as frequent “pauses” that anticipate questions and encourage the reader to rethink an issue that might be misunderstood if read but not studied and reflected upon. All pauses start with “Pause.” and end with “”
Preface
xxi
Apropos quotes and punctuation we follow the “logical approach” (as Gries and Schneider 1994 call it) where punctuation is put inside the quotation marks if and only if it is a logical part of the quoted text. So we would never write The relation “is a member of” is fundamental in set theory. It is denoted by “∈.”
No. “.” is not part of the symbol! We should write instead The relation “is a member of” is fundamental in set theory. It is denoted by “∈”.
Another feature of the above reference that I have adopted is the logical use of the em-dash “—” as a parenthesis. It should have an left version and a right version to avoid ambiguities. The left version is contiguous with the following word but not with the preceding word. The right version is the reverse of this. For example, “computability is fun —as long as one has the prerequisites— and is definitely useful towards understanding the computing processes and their limitations”. I have included numerous remarks, examples and embedded exercises (the latter in addition to the end-of-chapter exercises) that reflect on a preceding definition or theorem. Influenced by my teaching —where I love emphasising things— but also by Bourbaki, I use in my books the stylised “winding road ahead” warning, , that I first saw in Bourbaki (1966). It delimits a passage that is too important to skim over. I am also using to delimit passages that I could not resist including. Frankly, the latter can be skipped with no injury to continuity (unless you are curious, or need the information right away). Thus, the entire Chap. 0 ought to be enclosed between signs but I forgot to do so! There are 256 end-of-chapter exercises and 49 embedded ones in the text. Many have hints and thus I refrained from (subjectively) flagging them for “level of difficulty”. After all, as one of my mentors, Allan Borodin, used to say to us (when I was a graduate student at the University of Toronto), “Attempt all exercises. The ones you can do, don’t do; do the ones you cannot do”. Acknowledgments I wish to thank all those who taught me, including my parents, Andreas Katsaros and Yiannis Ioannidis, who taught me geometry; more recently, Allan Borodin, Steve Cook and Dennis Tsichritzis, who taught me computability; and John Lipson, Derek Corneil and John Mylopoulos, who encouraged me to employ piecewise linear topology in computer science. Toronto, ON, Canada June 2021
George Tourlakis
Contents
0
Mathematical Background: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.1 Induction over N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.2 A Crash Course on Formal Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.2.1 Terms and Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.2.2 Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.2.3 Axioms, Rules and Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.2.4 The Boolean Fragment of First-Order Logic and Its Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.2.5 Three Obvious Metatheorems . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.2.6 Soundness and Completeness of the Boolean Fragment of 1st-Order Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.2.7 Useful Theorems and Metatheorems of (Pure) First-Order Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.3 A Bit of Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.4 Relations and Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.4.1 Equivalence Relations and Partial Order . . . . . . . . . . . . . . . 0.4.2 Big-O, Small-o, and the “Other” ∼ . . . . . . . . . . . . . . . . . . . . . 0.4.3 Induction Revisited; Inductive Definitions . . . . . . . . . . . . . 0.5 On the Size of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.5.1 Finite Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.5.2 Some Shades of Infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.5.3 Diagonalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.6 Inductively and Iteratively Defined Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.6.1 Induction on Closures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.6.2 Induction vs. Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.7 A Bit of Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 2 8 14 20 23 28 32 34 44 54 61 68 69 73 76 76 80 82 87 89 90 92 97
xxiii
xxiv
Contents
1
A Theory of Computability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 1.1 A Programming Framework for the Computable Functions . . . . . . 100 1.2 A Digression Regarding R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 1.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
2
Primitive Recursive Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Definitions and Basic Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Primitive Recursive and Recursive Relations. . . . . . . . . . . . . . . . . . . . . . 2.3 Bounded Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Coding Sequences; Special Recursions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Concatenating (Coded) Sequences; Stacks . . . . . . . . . . . . . 2.4.2 Course-of-Values Recursion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3 Simultaneous Primitive Recursion . . . . . . . . . . . . . . . . . . . . . . 2.4.4 Simulating a URM Computation . . . . . . . . . . . . . . . . . . . . . . . . 2.4.5 Pairing Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
117 117 125 127 130 131 134 135 137 144 149
3
Loop Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Syntax and Semantics of Loop Programs . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 PR ⊆ L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 L ⊆ PR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Incompleteness of PR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
153 153 158 159 164 165
4
The Ackermann Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Growth Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Majorisation of PR Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 The Graph of the Ackermann Function Is in PR∗ . . . . . . . . . . . . . . . . 4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
169 169 174 177 180
5
(Un)Computability via Church’s Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 The “Big Bang” for Logic and Computability. . . . . . . . . . . . . . . . . . . . . 5.2 A Leap of Faith: Church’s Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 The Effective List of All URMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 The Universal Function Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 The Kleene T -Predicate and the Normal Form Theorems . . . . . . . . 5.6 A Number-Theoretic Definition of P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 The S-m-n Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Unsolvable “Problems”; the Halting Problem . . . . . . . . . . . . . . . . . . . . . 5.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
183 184 186 187 191 193 195 196 200 205
6
Semi-Recursiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Semi-Decidable Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Some More Diagonalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Unsolvability via Reducibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Recursively Enumerable Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
207 207 212 214 220
Contents
6.5 6.6 6.7 6.8 6.9 6.10 6.11
xxv
Some Closure Properties of Decidable and Semi-Decidable Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computable Functions and Their Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . Some Complex Reductions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . An Application of the Graph Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . Converting Between the c.e. and Semi-Recursive Views Algorithmically . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Some General Rice-Like Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
224 226 227 232 234 240 244
7
Yet Another Number-Theoretic Characterisation of P . . . . . . . . . . . . . . . . 247 7.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
8
Gödel’s First Incompleteness Theorem via the Halting Problem . . . . . 8.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 The First Incompleteness Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 φx (x) ↑ Is Expressible in the Language of Arithmetic . . . . . . . . . . .
265 266 274 275
9
The Second Recursion Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 Historical Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Kleene’s Version of the Second Recursion Theorem. . . . . . . . . . . . . . 9.3 The Fixed Point Theorem of Computability . . . . . . . . . . . . . . . . . . . . . . . 9.4 Some Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5 Some Unusual Recursions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
281 281 284 286 288 290 296
10
A Universal (non-PR) Function for PR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 PR Derivations and Their Arithmetisation. . . . . . . . . . . . . . . . . . . . . . . . 10.2 The S-m-n Theorem for PR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 The Universal Function Eval for PR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
297 297 301 303 306
11
Enumerations of Recursive and Semi-Recursive Sets . . . . . . . . . . . . . . . . . . 11.1 Enumerations of Semi-Recursive Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Enumerations of Recursive Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.1 Finite Descriptions of Recursive Sets . . . . . . . . . . . . . . . . . . . 11.2.2 Finite Descriptions of Finite Sets . . . . . . . . . . . . . . . . . . . . . . . 11.3 Enumerations of Sets of Semi-Recursive Sets . . . . . . . . . . . . . . . . . . . . . 11.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
307 307 309 314 315 318 325
12
Creative and Productive Sets; Completeness and Incompleteness . . . . 12.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Strong Reducibility Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4 Recursive Isomorphisms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5 A Simple Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6 Robinson’s Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.7 Arithmetical Relations and Functions, Revisited . . . . . . . . . . . . . . . . . .
327 327 333 337 339 342 343 346
xxvi
Contents
12.8 12.9 12.10 12.11
12.12 12.13 13
Expressibility Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.8.1 A Bit of Arithmetisation and Tarski’s Theorem . . . . . . . . Yet Another Semantic Proof of Gödel’s Incompleteness Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gödel’s First Incompleteness Theorem: Syntactic Versions . . . . . . 12.11.1 More Arithmetisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.11.2 Gödel’s Formulation of the Incompleteness Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.11.3 Gödel’s Incompleteness Theorem: Original Proof . . . . . 12.11.4 Rosser’s Incompleteness Theorem: Original Version . . 12.11.5 Rosser’s Incompleteness Theorem: Original Proof . . . . The Second Incompleteness Theorem: Briefly . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Relativised Computability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1 Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2 Partial Recursive and Recursive Functionals Following Kleene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.1 Functional Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4 Normal Form Theorems for Computable Functionals . . . . . . . . . . . . 13.4.1 Arithmetisation Again . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5 Partial Recursive and Recursive Functionals: Index Independent Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5.1 Effective Operations on P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.6 Search Computability of Moschovakis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.6.1 The Normal Form Theorem, Again . . . . . . . . . . . . . . . . . . . . . 13.7 The First Recursion Theorem: Fixpoints of Computable Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.7.1 Weakly Computable Functionals Version . . . . . . . . . . . . . . 13.7.2 Kleene Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.7.3 Moschovakis Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.7.4 The Second vs. the First Recursion Theorem . . . . . . . . . . 13.7.5 A Connection with Program Semantics. . . . . . . . . . . . . . . . . 13.8 Partial Recursive and Recursive Restricted Functionals . . . . . . . . . . 13.8.1 Course-of-Values Recursion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.8.2 Normal Form Theorems; One Last Time . . . . . . . . . . . . . . . 13.9 Computing Relative to a Fixed Total Type-1 Input . . . . . . . . . . . . . . . . 13.10 Turing Degrees and Post’s Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.10.1 Sacks’ Version of the Finite Injury Priority Method . . . 13.11 The Arithmetical Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
348 349 359 360 375 379 389 392 396 402 405 413 415 416 420 420 436 442 442 451 457 459 471 478 478 480 482 485 487 490 502 503 514 519 524 529 538
Contents
xxvii
14
Complexity of P Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1 Blum’s Axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 The Speed-up Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3 Complexity Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
543 544 552 556 561
15
Complexity of PR Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.1 The Axt, Loop-Program and Grzegorczyk Hierarchies . . . . . . . . . . . 15.1.1 The Grzegorczyk Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2 The Ritchie-Cobham Property and Hierarchy Comparison . . . . . . . 15.3 Cobham’s Class of Feasibly Computable Functions . . . . . . . . . . . . . . 15.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
563 564 571 581 591 609
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613 Notation Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623
Chapter 0
Mathematical Background: A Review
Overview As noted in the Preface, I will be assuming of the reader at least a solid course in discrete math, including simple proof techniques (e.g., by induction, by contradiction) and some elementary logic. This chapter will repeat much of these topics, but I will ask forgiveness if there are no strict linear dependencies between the sections in Chap. 0. For example I talk about sets in the section below, which is about induction (but I deal with sets in later sections), I talk about proof by contradiction (to be redone formally in the section on logic, later). There are two licensing considerations for this: One, this section is a review of (assumed) known to the reader topics (except, probably, for the topic on topology). It is not a careful construction of a body of mathematics based on previous knowledge. I am only making sure that we start “on the same page” as it were. Two, the concept of natural number is fundamentally grounded in our intuition and it is safe, with some care, to start this review with the topics of induction and the least principle in the set of natural numbers prior to a review of other topics such as sets and logic. The most important topics in Chap. 0 are the existence and use of nontotal functions —and the realisation that we deal in this volume with both total(ly defined) functions and non total functions, the two types collectively put under the category of partial functions. These play a major role in computability. Other important topics in our review here are definitions by induction in a partially ordered set, and Cantor’s diagonal method or diagonalisation, which is an important tool in computability. We will also do a tiny bit of point set topology, only used (sparsely) in Chap. 13.
© Springer Nature Switzerland AG 2022 G. Tourlakis, Computability, https://doi.org/10.1007/978-3-030-83202-5_0
1
2
0 Mathematical Background: A Review
0.1 Induction over N This section presents the simplest forms of induction in the set of the natural numbers N = {0, 1, 2, . . .}, the mathematical habit being to use the ellipsis “. . .” to say, more or less, “you know how to obtain more elements”. In this case the “implied” (or, as a mathematician would likely say, “understood”) “rule” to get more elements is simply to add 1 to the previous element, and voilá, we got a new element! The induction technique helps in proving statements about (or properties of) — denoted by P (n) for “property of n”— the natural numbers handing us out some extra help, an extra assumption called “induction hypothesis”. In outline induction, essentially, goes like this: Suppose I want to prove P (n), for all n, and to do so from a set of assumptions (axioms, other theorems) A . Then I can so prove suffice it that I 1. (I.H.) fix n and add the assumption P (n) to what I already know, that is, to A . 2. (I.S.) prove P (n + 1) —using A and the I.H.— where “n” is still the n I fixed in step 1. 3. (Basis.) must be sure to prove (often, but not always, this is trivial) P (0) That is it! If I do 1.–3. then I have proved P (n) from hypotheses A , for all n. We simply say that “we have proved P (n) by induction on n, over N”. Unless there might be confusion as to what set we are doing induction over (or in) we omit the “over N” part. However, we always specify the variable we do induction on (here it is n).
Next to the numbering above, in brackets, we gave the acronyms for the three steps: I.H. stands for induction hypothesis, I.S. stands for induction step, and the Basis step (no acronym) is the easy (usually) proof for the smallest possible value of n —normally 0. One makes sure that the “fixed n” for which we assume P (n) remains unspecified, so the logical steps from P (n) to P (n + 1) (from 1. to 2.) are “general”, that is, valid for any n whatsoever. The induction process above translates to 0.1.1 Remark (Induction Principle) If for a property of natural numbers, Q(m) we have 1. Q(0) is true 2. On the assumption that Q(n) is true (for fixed unspecified n) I can prove the truth of Q(n + 1) —or we can say, equivalently, for all n, Q(n) implies Q(n + 1), then all natural numbers have the property Q(n).
0.1.2 Example. (The Usual “First Example” in Discrete Math Texts) We prove here that n i=0
i=
n(n + 1) 2
(1)
0.1 Induction over N
3
Well, (1) states our “P (n)” so, following the process, we list 1. Fix n and take (1) as the I.H. 2. Thus (I.S.), n+1 i=0
i
meaning of
=
I.H.
=
arithmetic
=
3. The Basis says
0
i=0 i
n + 1 + ni=0 i n + 1 + n(n 2+ 1) (n + 1)(n + 2) 2
= 0(0 2+ 1) or 0 = 0 which is true.
Done. 0.1.3 Remark.
1. We can also do induction over the set {k, k + 1, k + 2, k + 3, . . .}. Steps I.H. and I.S. have the same form as above, but we will be sure to be conscious of the fact that in the I.H. we choose and fix an n: n ≥ k (not n ≥ 0). The Basis, of course, will be verified for n = k: P (k). Why would we want to do that? Because some properties of numbers start being true from some k > 0 onwards (see next example). 2. We can also do induction over a finite set of natural numbers, {0, 1, 2, . . . m} or even {r, r + 1, r + 2, . . . m}. In this case we will be sure to note that the n we fix for the I.H. is < m so that when we look into the n + 1 case we are still in (over) the finite set. In the second case the n we fix for the I.H. satisfies r ≤ n < m 0.1.4 Example. We prove that n + 2 < 2n , for n ≥ 3 Here the first inequality in (2) is our “P (n)”, so following the process we list 1. Fix n and take (2) as the I.H. 2. Thus (I.S.), n+1+2=n+2+1 by I.H.
1 that has just two divisors: 1 and itself. Examples are 2, 3, 7, 11. Note that our Basis need be n = 2 since the property about prime divisors is stated for n > 1. We will use CVI. We will see why this is the right induction to use as we are progressing in this proof. • (Basis.) For n = 2 the property is true: 2 divides 2, and 2 is prime. • (I.H.) Fix n and assume the property for all k, such that 2 ≤ k ≤ n. • (I.S.) How about n + 1? Well, if n + 1 is a prime then we are done, since n + 1 has a prime factor (divisor): itself. If n + 1 is not prime then n + 1 = a·2 b, where neither of a or b equal 1. We would like to apply the induction step, but simple induction would need some acrobatics since we do not expect either of a or b to equal n. So, as we are using CVI, the I.H. assumes the property for all numbers ≤ n (where n > 1). Now each of a and b are < n + 1 hence ≤ n.3 By I.H. each has a prime factor. Concentrate, say, on a. But its prime factor is a factor of n + 1 as well. Done.
1 This is a misnomer: The two types of induction, “simple” and “strong” are actually equivalent, as we will soon see. 2 “·” denotes multiplication here. 3 As a = 1 = b, we have a > 1 and b > 1, so each a and b is ≥ 2 as needed —see I.H.
0.1 Induction over N
5
The least number principle may be more widely known and/or more widely believable as obvious: 0.1.7 Definition. (Least Principle) If S is a set of natural numbers that is not empty (i.e., it contains at least one number), then it contains a least (smallest) number s in the sense that for each x in S we have s ≤ x.
0.1.8 Theorem. The (simple) induction principle (Remark 0.1.1) is equivalent to the least principle (Definition 0.1.7). Proof So, from any one principle we can prove the other. Let us prove this statement. • (Suppose we have the induction principle.) Let then S be a nonempty set of natural numbers. We will show it contains a least number. We will argue by contradiction: So let S have no least element. Let us write S (n) for the property that says “none of the k such that k ≤ n are in S”.
(1)
Now following the steps of Remark 0.1.1 we note 1. S (0), which means “0 is not in S”, is true (for if 0 were in S it would clearly be least, contradicting hypothesis). 2. We take as I.H. S (n) is true for some fixed unspecified n, that is, (1) is true. 3. How about S (n + 1)? First off, this claims that “none of the k such that k ≤ n + 1 are in S”. If this is false, n + 1 is in S since by the I.H. all the smaller numbers are not. But then n + 1 is least in S which contradicts our assumption. Thus we must revert to “n + 1 is not in S” thus proving S (n + 1). The induction having concluded, we have proved that no n is in S. This contradicts the given fact that S is nonempty. • (Suppose we have the least principle.) Let then Q(n) be a property for which we verified 1.–2. of Remark 0.1.1. We pretend that we do not know what this process achieves. We will prove using the least principle (Definition 0.1.7) that we have proved that the set of all n making Q(n) true is N, thus establishing the validity of the simple induction principle. For convenience let us use the symbol {n ∈ N : Q(n)} for the described set,4 and let S be all natural numbers outside this set. Suppose {n ∈ N : Q(n)} = N. Then S is nonempty. By the least principle,
4 This notation is learnt in a course on discrete mathematics and we will revisit it in the section on sets.
6
0 Mathematical Background: A Review
let s be smallest in S
(†)
Now s = 0 since we have verified Q(0) during the process 1. and 2. (Remark 0.1.1). So, s − 1 is a natural number not in S hence Q(s − 1) is true. But the induction we carried out diligently establishes the truth of Q(s) (from that of Q(s − 1)). This contradicts that s is not in {n ∈ N : Q(n)} (by (†)). We have no other exit but to admit we were wrong to assume “the set of numbers, S, outside {n ∈ N : Q(n)} is not empty”. Thus no n ∈ N exists outside {n ∈ N : Q(n)}, that is, all n are in: In short, Q(n) is true, for all n. The induction did work!
0.1.9 Theorem. The CVI principle (Remark 0.1.5) is equivalent to the least principle (Definition 0.1.7). So, once this is proved, we have that from any one principle we can prove the other. Proof Let us prove this theorem. • (Suppose the CVI principle works (per Definition 0.1.5).) Let then S be a nonempty set of natural numbers. We will show, arguing by contradiction, that S contains a least number. So let instead “S have no least element .
(1)
We will show that as a result of (1) S is empty after all, contradicting the italicised hypothesis at the outset of this bulleted paragraph. We write S (n) for the property “n is not in S”. If we show that S (n) is true for all n we have shown that S = ∅ (the symbol for the empty set), thus obtaining our contradiction. Now following the steps of Definition 0.1.5 we note 1. S (0), which means “0 is not in S”, is true (for if 0 were in S it would clearly be least, contradicting (1)). 2. We fix n and take as I.H. that S (k) is true all k < n, that is, 0, 1, 2, . . . , n − 1 are not in S. 3. How about S (n)? True or false? If false then n ∈ S is least, contradicting (1). But then S (n) is true, proving by CVI that all n are outside S. So S is empty. Contradiction! • (Suppose we have the least principle.) Let then Q(n) be a property for which we verified 1.–2. of Definition 0.1.5. We show that this process proves the truth of Q(n), for all n. Equivalently, we will use the least principle (Definition 0.1.7) to prove that the set S = {n ∈ N : Q(n)} is all of N. Suppose this is not so. Then there are n in N but not in S. Let then, by the least principle, s be smallest such number.
0.1 Induction over N
7
Now s = 0 since we have verified Q(0). So —s being the smallest not in S— all of 0, 1, 2, . . . , s − 1 are in S. That is, all of Q(0), Q(1), . . . , Q(s − 1) are true. But the CVI, for any n, proves the passage from 1. to 2. in Definition 0.1.5. Therefore, here we have Q(s) is true. That is, s ∈ S; contradiction. Thus the assumption that S = N is untenable: we have S = N.
Therefore, trivially (transitivity of equivalence) we have by Theorems 0.1.8 and 0.1.9: 0.1.10 Corollary. All three principles of proof, simple induction, CVI, and least principle are equivalent. In axiomatic arithmetic due to Peano (Peano Arithmetic or PA) that we will deal with later on in this volume the simple induction is taken as an axiom,5 the reason being that it cannot be proved6 from more elementary statements that can be taken as axioms (like the axiom n + 1 = 0 of PA). Perhaps the least principle is most readily intuitively acceptable, since one may think of constructing the least element of a nonempty set S by starting from any element a in S and going “down” . . . < a < a < a < a
(1)
all the members of the sequence being in S. Such a sequence must terminate, intuitively, at what is the least element. You see, no subset of N can be “bottomless” since there is an ultimate bottom (the number 0) in N. Why is this not a proof? For one, we do not have sets in PA that would permit us to argue along the lines of the above, and secondly we only used above our intuition of natural numbers and some sleight of hand. For example, the positive rational numbers Q have 0 as ultimate bottom, but one can start at a rational a and have a non-ending descending sequence of positive rationals like (1) since for any rational 0 < b, we have 0 < . . . < b/23 < b/22 < b/2 < b
5 For
any formula A (a formal substitute for “property”) the following formula is an induction axiom: A(0) ∧ (∀x)(A(x) → A(x + 1)) → A. Thus we have infinitely many axioms of the same form, one for each formula A. That is, the induction axiom is expressed by a formula form or formula schema. 6 Provably, for example by using models, the induction schema cannot be proved from the other axioms of PA.
8
0 Mathematical Background: A Review
0.2 A Crash Course on Formal Logic The exposition of computability in this volume is informal, that is, it is neither axiomatically founded nor reasoned within formal logic. As such, the reader will be sufficiently well equipped, for the type of logic we will mostly need here for page-after-page reasoning, by having been exposed just to the concepts and use of informal logic and proof techniques as these are taught in a solid first year course on discrete mathematics.7 Nevertheless, computability can be used —and we will do so— to give an exposition of one of (if not the) most important metamathematical results of formal logic, namely, Gödel’s (first) Incompleteness theorem. Thus, we need preparation toward reasoning about formal logic, and this section is a crash course on such formal logic: we need to know exactly what formal logic looks like and exactly how it is used in order to be able to reason about it. It has to be a mathematical object with properties that are analysable mathematically. We use the term “logic” in this volume to mean “first-order logic”, that is, the logic we normally use in mathematics to argue about mathematical objects and their properties and prove theorems about said properties. The qualifier “first-order” will be soon defined. It concerns the informal expressions (that are formalised —i.e., accurately codified— in logic) “for some x” and “for all x” and the restriction we put on the nature of the variable x in such expressions, that is, what kind of objects it may denote. We can “do” most of mathematics using first-order logic; after all we can do so for set theory, which most people will accept as the language and foundation of all mathematics. We understand the term formal logic to mean a symbolic mechanism that we may use to verify, and discover, mathematical truths by syntactic manipulation of symbols alone. This is what Hilbert had in mind when he conjectured that we can formalise all of mathematics: that we can reason within mathematics syntactically (relying only on the form of statements). Now, if we set this up correctly —I mean, the logic and the basic statements (axioms8 ) of the mathematics we want to do— then it turns out that all the statements we prove syntactically are true in a certain precise sense. Unfortunately, as Gödel proved with his incompleteness theorems, we can never prove all mathematical truths with syntactic means, a blow to Hilbert’s belief. Yet, in practise, this syntactic approach to proofs works surprisingly well! So, here is our motivation to learn about formal logic in this section: To prove Gödel’s (first) incompleteness theorem eventually (appears first in Chap. 8 and is
7 After all, Kunen (1978) in his chapter on “Combinatorics” in the Handbook of Mathematical Logic states (about his topic) that “We assume familiarity with naïve set theory . . . (Halmos (1960)) . . .” and continues “A knowledge of logic is neither necessary, nor even desirable”. Presumably, he meant “formal logic”. But the latter is needed (a stronger qualifier than “desirable” if you need to prove results such as Gödel’s incompleteness theorems. Hence the need for the section you are just reading.) 8 To be defined carefully soon.
0.2 A Crash Course on Formal Logic
9
thoroughly retold in Chap. 12). That is, we will study the what and how of formal logical reasoning, in order to prove, following Gödel, that formal reasoning for theories like Peano arithmetic always tells the truth, but will never manage to tell all the truth. In order to do logic syntactically, we need an alphabet of symbols to use in order to build formulas (and terms). Incidentally, computer programming is also a formal endeavour. You solve “real world” problems via programming, but the latter requires a formal approach. At the present state of the art, programming is a precise formal process. It has a syntax and you cannot write a program in any syntax you may please. Since the end-use of logic is to be a tool for reasoning about mathematics, one will use special symbols according to intended (mathematical) use. 0.2.1 Example. Peano arithmetic (PA) —the first-order theory of natural numbers— contains these special symbols: • S that formalises (that is, formally codifies) the function λx.x + 1 We have used here —and will throughout this volume— the so-called Church’s “lambda notation (λ notation)”. The symbols “λ” and the following “·” are to be viewed as an opening and (its matching) closing bracket pair, more precisely, as “begin” / “end” keywords pair, as in programming. They enclose the names of all variables that we intend to change as we provide to them external values. In short, they enclose the list of input variables. The expression that immediately follows the period —the end of the input list— is the “rule” that describes how we obtain the output. In this case it is the expression x + 1. • • • •
+ that formalises the function λxy.x + y × that formalises the function λxy.x × y < that formalises the relation λxy.x < y 0 that formalises the number zero.9
9 At first sight, using “0” both for the syntactic “code” or “symbol” and the “real” zero may be confusing. However, as a rule, “formalists” —the practitioners of formal methods— do not think it is profitable or even desirable to invent obscure codes just for the sake of making them (the symbols!) “look different”. In the end, the context will make clear whether a symbol is “formal” or “the real thing”—the latter of which, come to think of it, is also denoted by a symbol: Nonformalist mathematicians do not write “zero”; they write “0”, even if they are not formalists. This comment applies to the formal symbols +, ×, < as well. Let us not forget that the ancient Greeks were extremely good in geometry but did not shine as much in algebra. You see, for the former branch of mathematics they had a symbolic language —the figures— but they had not invented such a language for the latter.
10
0 Mathematical Background: A Review
ZFC 10 Axiomatic Set theory —the first-order theory of set theory à la ZF (with C)— contains just one special symbol: • ∈ that formalises the relation λxy.x ∈ y, read, “x is a member of y”.
A general study of logic must be independent of the theory whose theorems it is called upon to prove. To keep the discussion general, we do not disclose, in the definition below, which exactly are the function, relation and constant symbols, using generic names such as f, p and c respectively, with natural number subscripts. 0.2.2 Tentative Definition. (The Alphabet of First-Order Logic) Logical symbols • The symbols in the set {¬, ∨, (, ), =, ∀} are the logical symbols, that is, those that are essential and sufficient to “do just logic” —or pure logic, as we properly say. • (variables for objects, or object variables): The infinite sequence v0 , v1 , . . . Math symbols These are the symbols needed to do mathematics. They are also called nonlogical symbols, to distinguish them from logical. • Zero or more function symbols: f0 , f1 , . . . • Zero or more relation symbols: p0 , p1 , . . . • Zero or more constant symbols: c0 , c1 , . . .
0.2.3 Final Definition. (The Alphabet of First-Order Logic; Final) Logical symbols • {#, v, ¬, ∨, (, ), =, ∀} • The actual ontology of the object variables is that they are the following strings, generated by the symbols “v”, “(”, “)”, and “#”: (v#), (v##), (v###), . . . in short, the strings (v#n+1 ), for n = 0, 1, 2, . . .
(2)
where #n+1 denotes
10 The initials stand for the proposers of this set theory version, Z: Zermelo, F: Fraenkel, while C: stands for “choice”, as in “with the axiom of choice”.
0.2 A Crash Course on Formal Logic
11
## . . . # n+1 times
Thus, the meta-name (metavariable) vi of 0.2.2 stands, for each i ≥ 0, for the string (v#i+1 ). It will be convenient —as we do in algebra, arithmetic, set theory, etc.— to employ additional metavariables to stand for actual variables in our discussions about logic and when using logic in doing mathematics: namely, x, y, z, u, w, with or without subscripts or accents stand for or name (object) variables. Math symbols • We add the set of symbols {f, p, c} to the alphabet. These, along with # and brackets, will generate all the mathematical symbols a (mathematical) theory needs as we explain below. • The fi in Tentative Definition 0.2.2 are meta-names for functions symbols. The actual ontology of function symbols is that of strings of the actual form “(#n+1 f #m+1 )”, for n ≥ 0 and m ≥ 0. If we arrange all possible11 such symbols in an infinite matrix of entries (n, m) then we obtain the matrix below. (0, 0)
(0, 1)
(1, 0)
(0, 2)
(1, 1)
(2, 0)
(0, 3) . . .
(1, 2)
(2, 1)
(3, 0) .. . The arrows indicate one way that we may traverse the matrix, if we want to rearrange the symbols in a one-dimensional (potentially) infinite array. If we call the members of this array by the meta-names fk of bullet two (under the heading Math symbols) in Tentative Definition 0.2.2, then each fk denotes the entry (#n+1 f #m+1 ) iff 12 the latter is situated at location k of the “linearised” matrix. For example, f0 denotes (#f #). The n + 1 denotes the required (by the function symbol) number of arguments —its arity as mathematicians call it13 — while the m + 1 indicates
11 This is a mind game for the most extreme case, hence we say “all possible” and “infinite”. Recall that, for example, the theory ZFC needs no function symbols a priori, so it has none in its alphabet. 12 if and only if. 13 This term is not proper English. It was introduced as a “property” of n-ary functions: the property to require n inputs; to have “arity” n.
12
0 Mathematical Background: A Review
(along with n + 1) the position of the symbol in the matrix: (#n+1 f #m+1 ) is the entry at “coordinates” (n, m). “iff” —that is an acronym for “if and only if”— is the informal “is equivalent to” relation between statements and we usually employ it conjunctionally, that is, a chain like A iff B iff C iff D means A iff B and B iff C and C iff D If the chain makes a true statement, then A iff D. • Analogously with the previous bullet, the meta-symbol pk , k ≥ 0, denotes the entry (#n+1 p#m+1 ) iff the latter is situated at location k of the “linearised” matrix of all the (#n+1 p#m+1 ) entries (each such entry occupying location (n, m) in the matrix. As before, n + 1 is the arity of the relation symbol (named) pk . • The ontology of constant symbols is similar to that of object variables, since arity plays no role. Thus, ck , for k ≥ 0, denotes the string “(c#k+1 )”.
0.2.4 Remark (On Metanotation) We use metanotation to shorten unwieldy mathematical texts so that humans can argue about them —and understand them— without getting attacked by symbols! Thus, (a) In specific theories we will use metanotation borrowed from mathematical practise. For example, in ZFC we will write “∈” rather than (##p#). In PA we will use 2. We will accept also unary relations, when its members are not all pairs. Example, {1, (1, 2), 4}. A relation R that is a set of pairs is called binary. We call it n-ary only if we care to note that its members are n-tuples, n > 2. It is customary to use the notation aRb for (a, b) ∈ R in imitation of a < b, a = b, a ∈ b.
0.4.9 Definition. (Converses and Images) We define a few useful terms for p.m.v functions (relations) and single-valued functions (simply put, functions). 1. The converse of a function or relation, f −1 or R −1 , are given extensionally as Def
R −1 = {(x, y) : yRx} and Def
f −1 = {(x, y) : yf x} 2. If f is f : A → B and X ⊆ A while Y ⊆ B we define the image of X and the inverse image of Y under f by f [X] = {f (x) : x ∈ X} = {y : (∃x ∈ X)f (x) = y} and f −1 [Y ] = {x : (∃y ∈ Y )f (x) = y}
0.4.10 Proposition. If f : A → B is a 1-1 correspondence, then 1-1 correspondence.
f −1
: B → A is
The above can be expressed as “if A ∼ B, then B ∼ A”.
Proof Exercise 0.8.15. 0.4.11 Example. Let f : {1, {1, 2}, 2} → {a, b, c} given by the table
1 a 2 c {1,2} b
0.4 Relations and Functions
65
Then f ({1, 2}) = b but f [{1, 2}] = {a, c}.
We can compose relations and functions: 0.4.12 Definition. (Composition of Relations and Functions) Let R : A → B and S : B → C be two relations or functions. We can define the relation R ◦ S by xR ◦ Sy iff (∃z)(xRz ∧ zRy)
0.4.13 Proposition. If both R and S above are functions (single-valued), then so is R ◦ S (single-valued).
Proof Exercise 0.8.16.
0.4.14 Proposition. Composition of functions and relations is associative: (R ◦S)◦ T = R ◦ (S ◦ T ).
Proof Exercise 0.8.17.
0.4.15 Example. Composition of functions or relations is not commutative: That is, in general, R ◦ S = S ◦ R. Here is a counterexample for the claim false claim: R ◦ S = S ◦ R Take R = {(1, 2)} and S = {(2, 3)}. Then R ◦ S = {(1, 3)} and S ◦ R = ∅. We say that “S ◦ R is the empty relation (or function)” since it is a relation table (or function table) that contains nothing.
0.4.16 Remark (Notation) Consistent with the “f (x)” notation, we note for two functions f
g
A→B →C that (f ◦ g)(a) = b iff a f ◦ g b iff (∃z)(af z ∧ zgb) iff (∃z)(f (a) = z ∧ g(z) = b) iff g(f (a)) = b In short, (f ◦ g)(a) = g(f (a)) ↓ if f (a) ↓ 47 and g(f (a)) ↓ 48 (f ◦ g)(a) ↑ otherwise
47 This 48 This
is the part “(∃z)af z” above. is the part “(∃z)zgb” above.
66
0 Mathematical Background: A Review
Clearly, if f (a) ↑ then g(f (a)) ↑ as well, i.e., (f ◦ g)(a) ↑. Note the order reversal: f ◦ g versus g(f (x)). There is no mystery here. In each case, f takes the input first. In each of the two notations we put f where the input is: xf ◦ gy vs. g(f (x)) = y. For peace of mind we introduce yet a third notation: “gf stands for f ◦ g”. Thus, (gf )(x) = (f ◦ g)(x) = g(f (x)).
0.4.17 Definition. (Inverses) Suppose f
g
g
f
A→B →A and thus also B →A→B If f ◦ g = λx.g(f (x)), x ∈ A,49 then we write f ◦ g = gf = 1A , where 1A : A → A is λx.x and say “f is a right inverse of g” and “g is a left inverse of f ”. Thus left or right (inverse) is defined with respect to the “gf ” notation. Similar nomenclature applies if we have g ◦ f = 1B .
0.4.18 Example. (Non Uniqueness of Left and Right Inverses) Take A = {1, 2, 3, 4} and B = {a, b}. Define the following functions (check that they are functions): f1 = {(1, a), (2, a), (3, b)}, f2 = {(1, b), (2, a), (3, b)}, g1 = {(a, 2), (b, 3)}, and g2 = {(a, 1), (b, 3)}. Note that f1 g1 = f1 g2 = f2 g1 = 1B . This example shows cases where neither the left nor the right inverses are unique.
What can we say about left and right inverses? 0.4.19 Proposition. Suppose f
g
A→B →A and gf = 1A . Then (1) f is total and 1-1. (2) g is onto (A).
49 The easy way to determine where the x comes from is to consider where the inputs of the “inner” function, f , are coming from.
0.4 Relations and Functions
67
Proof (1) f : 1A is total, that is, 1A (x) ↓ for all x ∈ A. By Remark 0.4.16, f (x) ↓, for all x ∈ A, so f is total. We next check 1-1 ness: Since f is total, the test simplifies to f (a) = f (b) → a = b (see Definition 0.4.6). Thus let f (a) = f (b). Then, applying g, a = g(f (a)) = g(f (b)) = b (2) g: To prove that g is onto we must show that for any a ∈ A we can solve g(x) = a for x. Indeed, f (a) ↓. So take x = f (a). We calculate: g(x) = g(f (a)) = a.
0.4.20 Remark Not every function f : A → B has a left or right inverse. (Why?) Functions that have neither left nor right inverses exist. Exhibit one (Exercise 0.8.18).
0.4.21 Lemma. Let f : A → B be a function. Then f 1A = f and 1B f = f .
Proof Exercise 0.8.19.
0.4.22 Theorem. If f : A → B has both a left and right inverses, g : B → A and h : B → A, then they are equal and unique. Proof Say gf = 1A —hence f is total and 1-1— and f h = 1B —hence f is onto. Thus f is a 1-1 correspondence. By Proposition 0.4.10 f −1 is also a 1-1 correspondence. Now, h = 1A h = (gf )h = g(f h) = g1B = g
(1)
For uniqueness we show g = h = f −1 . We use (1). We will show that f −1 f = 1A
(2)
that is, “f −1 ” plays the role of “g”. Then we will note that by (1), h = f −1 . But h = g, so g = f −1 as well. It remains to actually obtain (2): af −1 f b iff (∃z)(af −1 zf b)50 iff (∃z)(zf a ∧ zf b) iff (single-valued-ness) a = b
50 Written
conjunctionally.
68
0 Mathematical Background: A Review
0.4.1 Equivalence Relations and Partial Order Two types of binary relations, equivalence relations and partial order relations play an important role in mathematics. These are equipped with postulated properties. 0.4.23 Definition. A binary relation R on a set A is 1. 2. 3. 4. 5.
Reflexive iff aRa holds, for all a ∈ A. Irreflexive iff there is no a satisfying aRa. Symmetric iff, for all a, b, aRb → bRa. Antisymmetric iff, for all a, b, aRb ∧ bRa → a = b. Transitive iff, for all a, b, c, aRb ∧ bRc → aRc.
0.4.24 Example. 1. For any set A = ∅, 1A is reflexive. How about A = ∅? 2. The relation = on N has all the properties listed in Definition 0.4.23 except irreflexivity. 3. ≤ on N has reflexivity, antisymmetry and transitivity. 4. < on N has irreflexivity and transitivity. 5. For any integer m > 1, the relation of pairs (x, y) on Z (the set of all integers) defined by m divides x − y is reflexive, symmetric and transitive. It is called a “congruence modulo m” and it is usually denoted by x ≡ y (m) or x ≡ y mod m.
0.4.25 Definition. (Equivalence Relations) A relation on a set A is an equivalence relation iff it is reflexive, symmetric and transitive. For each x ∈ A and equivalence relation R : A → A we define the set Def
[x]R = {y ∈ A : yRx} and call it the equivalence class determined by, or with representative, x. If R is understood we normally simply write [x].
Thus x ≡ y mod m is an equivalence relation. We have an easy but important proposition: 0.4.26 Proposition. The equivalence classes of an equivalence relation R : A → A have the following properties: (1) (2) (3) (4)
[x] = [y] iff xRy. [x] ∩ [y] = ∅ → [x] = [y]. For all x, [x] = ∅. A = x∈A [x].
0.4 Relations and Functions
69
Proof Exercise 0.8.22.
The other important relation in mathematics is partial order. It can be defined as a strict order, like “ L, hence |f (x)| = f (x) for all x > L. Now, if f (x) ∼ g(x), then f (x) = O(g(x)). Proof The assumption says that lim
x→∞
f (x) =1 g(x)
From “calculus 101” (1st year differential calculus) we learn that this implies that for some K, x > K entails # # # f (x) # # # L, hence |f (x)| = f (x) for all x > L. Now, if f (x) = o(g(x)), then f (x) = O(g(x)). Proof The assumption says that lim
x→∞
f (x) =0 g(x)
From “calculus 101” we learn that this implies that for some K, x > K entails # # # f (x) # # # # g(x) # < 1 hence −1
” by Lemma 4.1.4.
4.1.6 Lemma. λn.An (x + 1) . Proof An+1 (x + 1) = An (An+1 (x)) > An (x + 1) —the “>” by Lemmata 4.1.4 (left argument > right argument) and 4.1.5.
The “x + 1” in Lemma 4.1.6 is important since An (0) = 2 for all n. Thus λn.An (0) is increasing but not strictly (constant). y
4.1.7 Lemma. λy.An (x) . y+1
Proof An
y
y
(x) = An (An (x)) > An (x) —the “>” by Lemma 4.1.4.
4.1.8 Lemma.
y λx.An (x)
.
Proof Induction on y: For y = 0 we want that λx.A0n (x) , that is, λx.x , which is true. We next take as I.H. that y
y
y+1
(x + 1) > An
An (x + 1) > An (x)
(1)
We want An
y+1
(x)
(2)
be precise, the step is to prove —from the basis and I.H.— “(∀x)An+1 (x) > x + 1” for the n that we fixed in the I.H. It turns out that this is best handled by induction on x.
3 To
172
4 The Ackermann Function
But (2) follows from (1) and Lemma 4.1.5, by applying An to both sides of “>”. 4.1.9 Lemma. For all n, x, y,
y An+1 (x)
≥
y An (x).
Proof Induction on y: For y = 0 we want that A0n+1 (x) ≥ A0n (x), that is, x ≥ x, which is true. We now take as I.H. that y
y
y+1
y+1
An+1 (x) ≥ An (x) We want An+1 (x) ≥ An
(x)
This is true because
y+1 y An+1 (x) = An+1 An+1 (x) by Lemma 4.1.6 y ≥ An An+1 (x) Lemma 4.1.5 and I.H. y+1
≥ An
(x)
4.1.10 Definition. Given a predicate P ( x ), we say that P ( x ) is true almost everywhere —in symbols “P ( x ) a.e.”— iff the set of (vector) inputs that make the predicate false is finite. That is, the set { x : ¬P ( x )} is finite. A statement such as “λxy.Q(x, y, z, w) a.e.” can also be stated, less formally, as “Q(x, y, z, w) a.e. with respect to x and y”.
4.1.11 Lemma. An+1 (x) > x + l a.e. with respect to x. Thus, in particular, A1 (x) > x + 10350000 a.e. Proof In view of Lemma 4.1.6 and the note following it, it suffices to prove A1 (x) > x + l a.e. with respect to x Well, since x 2’s
A1 (x) =
Ax0 (2)
= (· · · (((y + 2) + 2) + 2) + · · · + 2) *evaluated at y = 2 = 2 + 2x
we ask: Is 2 + 2x > x + l a.e. with respect to x? It is so for all x > l − 2 (only x = 0, 1, . . . , l − 2 fail).
4.1.12 Lemma. An+1 (x) > Aln (x) a.e. with respect to x. Proof If one (or both) of l and n is 0, then the result is trivial. For example,
4.1 Growth Properties
173 l 2’s
Al0 (x)
= (· · · (((x + 2) + 2) + 2) + · · · + 2) = x + 2l
We are done by Lemma 4.1.11. Let us then assume that l ≥ 1 and n ≥ 1. We note that (straightforwardly, via Definition 4.1.1) Aln (x) = An (Al−1 n (x))
=
Al−1 n (x) (2) An−1
Al−3 n (x) (2) An−1 (x) Al−2 n (2) (2) An−1 An−1 = An−1 (2) = An−1 (2)
The straightforward observation that we have a “ladder” of k An−1 ’s precisely when the topmost exponent is l − k can be ratified by induction on k (left to the reader). Thus we state Aln (x) =
k An−1
Al−k n (x) (2) An−1
·· A·n−1
..
.
(2)
In particular, taking k = l, Aln (x) =
l An−1
··
Al−l n (x) (2) An−1
A·n−1
..
.
(2) =
l An−1
x ·An−1 (2) . . · . A·n−1 (2)
(*)
Let us now take x > l. Thus, by (∗), An+1 (x) = Axn (2) =
x An−1
2 ·An−1 (2) . . · . A·n−1 (2)
(**)
By comparing (∗) and (∗∗) we see that the first “ladder” is topped (after l An−1 “steps”) by x and the second is topped by x−l An−1
y
2 ·An−1 (2) . . · . A·n−1 (2)
Thus —in view of the fact that An (x) increases with respect to each of the arguments n, x, y— we conclude by asking . . .
174
4 The Ackermann Function
x−l An−1
“Is
2
·An−1 (2) . . · . A·n−1 (2) > x a.e. with respect to x?”
. . . and answering, “Yes”, because by (∗∗) this is the same question as “is An+1 (x − l) > x a.e. with respect to x?”, which we answered affirmatively in Lemma 4.1.11.
4.1.13 Lemma. For all n, x, y, An+1 (x + y) > Axn (y). Proof x+y
An+1 (x + y) = An
(2)
y = Axn An (2)
= Axn An+1 (y)
> Axn (y)
by Lemmata 4.1.4 and 4.1.8
4.2 Majorisation of PR Functions We say that a function f majorizes another function, g, iff g( x ) ≤ f ( x ) for all x. The following theorem states precisely in what sense “the Ackermann function majorizes all the functions of PR”. 4.2.1 Theorem. For every function λ x .f ( x ) ∈ PR there are numbers n and k, such that for all x we have f ( x ) ≤ Akn (max( x )). Proof The proof is by induction with respect to PR. Throughout I use the abbreviation | x | for max( x ) as this is notationally friendlier. For the basis, f is one of: • Basis. Basis 1. λx.0. Then A0 (x) works (n = 0, k = 1). Basis 2. λx.x + 1. Again A0 (x) works (n = 0, k = 1). x | < A0 (| x |). Basis 3. λ x .xi . Once more A0 (x) works (n = 0, k = 1): xi ≤ | • Propagation with composition. Assume as I.H. that f ( xm ) ≤ Akn (| xm |)
(1)
y ) ≤ Aknii (| y |) for i = 1, . . . , m, gi (
(2)
and
4.2 Majorisation of PR Functions
175
Then y ), . . . , gm ( y )) ≤ Akn (|g1 ( y ), . . . , gm ( y )|), by (1) f (g1 ( ≤ Akn (|Akn11 (| y |), . . . , Aknmm (| y |)|), by Lemma 4.1.8 and (2)
ki y |) , by Lemmas 4.1.8 and 4.1.9 ≤ Akn |Amax max ni (| ki ≤ Ak+max y |), by Lemma 4.1.9 max(n,ni ) (|
• Propagation with primitive recursion. Assume as I.H. that y |) h( y ) ≤ Akn (|
(3)
g(x, y, z) ≤ Arm (|x, y, z|)
(4)
and
Let f be such that f (0, y) = h( y) f (x + 1, y) = g(x, y, f (x, y)) I claim that
k f (x, y) ≤ Arx |) m An (|x, y
(5)
I prove (5) by induction on x: For x = 0, I want f (0, y) = h( y ) ≤ Akn (|0, y|). This is true by (3) since |0, y| = | y |. As an I.H. assume (5) for fixed x. The case for x + 1: f (x + 1, y) = g(x, y, f (x, y)) ≤ Arm (|x, y, f (x, y)|), by (4)
#
# # # k ≤ Arm #x, y, Arx m An (|x, y|) # , by the I.H. (5), and Lemma 4.1.8
k rx k = Arm Arx m An (|x, y|) , by Lemma 4.1.8 and Am An (|x, y|) ≥ |x, y|
r(x+1) = Am Akn (|x, y|)
r(x+1) Akn (|x + 1, y|) , by Lemma 4.1.8 ≤ Am
With (5) proved, let me set l = max(m, n). By Lemma 4.1.9 I now get
176
4 The Ackermann Function
f (x, y) ≤ Arx+k (|x, y|) l
n + 1 by 4.1.6— which contradicts (1).
4 The
function that does the majorising.
4.3 The Graph of the Ackermann Function Is in PR∗
177
4.3 The Graph of the Ackermann Function Is in PR∗ How does one compute a yes/no answer to the question “An (x) = z?”
(1)
Thinking “recursively” (in the programming sense of the word), we will look at the question by considering three cases, according to the definition in the Remark 4.1.2: (a) If n = 0, then we will directly check (1) as “is x + 2 = z?”. (b) If x = 0, then we will directly check (1) as “is 2 = z?”. (c) In all other cases, i.e., n > 0 and x > 0, we shall naturally ask two questions (both must be answerable “yes” for (1) to be true):5 “Is there a w such that An−1 (w) = z and also An (x − 1) = w?” Steps (a)–(c) are entirely analogous to steps in a proof. Just as in a proof we verify the truth of a statement via syntactic means, here we are verifying the truth of An (x) = z by such means. Steps (a) and (b) correspond to writing down axioms. Step (c) corresponds to attempting to prove B by applying MP (modus ponens) where we are looking for an A such that we have a proof of both A and A → B. In fact, closer to the situation in (c) above is a proof step where we want to prove X → Y and are looking for a Z such that both X → Z and Z → Y are known to us theorems. Z plays a role entirely analogous to that of w above. Assuming that we want to pursue the process (a)–(c) by pencil and paper or some other equivalent means, it is clear that the pertinent data that we are working with are ordered triples of numbers such as n, x, z, or n − 1, w, z, etc. That is, the letter “A”, the brackets, the equals sign, and the position of the arguments (subscript vs. inside brackets) are just ornamentation, and the string “Ai (j ) = k”, in this section’s context, does not contain any more information than the ordered triple “(i, j, k)”. Thus, to “compute” an answer to (1) we need to write down enough triples, in stages (or steps), as needed to justify (1): At each stage we may write a triple (i, j, k) down just in case one of (i)–(iii) holds: (i) i = 0 and k = j + 2 (ii) j = 0 and k = 2 (iii) i > 0 and j > 0, and for some w, we have already written down the two triples (i − 1, w, k) and (i, j − 1, w). 4.3.1 Remark. Since “(i, j, k)” abbreviates “Ai (j ) = k”, Lemma 4.1.4 implies that j < k.
5 Note
that An (x) = An−1 (An (x − 1)).
178
4 The Ackermann Function
Our theory is more competent with numbers (than with pairs, triples, etc.) preferring to code tuples into single numbers. Thus if we were to carry out the pencil and paper algorithm within our theory, then we should code all these triples, which we write down step by step, by single numbers: We will use our usual prime-power coding, i, j, k, to do so. The verification process for An (x) = z, described in (a)–(c), is a sequence of steps of types (a), (b) or (c) that ends with the (coded) triple n, x, z. We will code such a sequence. We note that our computation is “tree-like”, since a “complicated” triple such as that of case (iii) above requires two similar others to be already written down, each of which in turn will require two earlier similar others, etc., until we reach “leaves” [cases (i) or (ii)] that can be dealt with ouright without passing the buck. This “tree”, just like the tree of a mathematical proof, can be “linearised” and thus be arranged in a sequence of coded triples i, j, k so that the presence of a “i, j, k” implies that all its dependencies appear earlier (to its left). We will code the entire proof sequence by a single number, u, using prime-power coding. The major result in this subsection is the theorem below, that given any number u, we can primitively recursively check whether or not it is a code of an Ackermann function computation: 4.3.2 Theorem. The predicate Def
Comp(u) = u codes an Ackermann function computation is in PR∗ . Proof The auxiliary predicates λvu.v ∈ u and λvwu.v 0 and 1 ≤ i ≤ n, and is closed under composition, (μy) (μ-recursion) and prim. Let us define then 5.6.1 Definition. (P-Derivations) The set
I = S, Z, Uin
n≥i>0
is the set of Initial P-functions.9 A P-derivation is a finite (ordered!) sequence of number-theoretic functions, f1 , f2 , . . . , fi , . . . , fn where, for each i , one of the following holds 1. fi ∈ I. 2. fi = prim(f j , fk ) and j < i and k < i —that is, fj , fk appear to the left of fi . 3. fi = λ y .g r1 ( y ), r2 ( y ), . . . , rm ( y ) , and all of the λ y .rq ( y ) and λ xm .g( xm ) appear to the left of fi in the sequence. 4. fi = λ x .(μy)fr (y, x), where r < i. * stands for the Any fi in a derivation is called a P-derived function. The symbol P, set of P-derived functions. That is, * Def P = {f : f is P-derived}
The aim is to show that P is the set of all P-derived functions as the terminology * in Definition 5.6.1 ought to clearly betray. Of course, we could also have said that P is the closure of I above, under the operations composition and primitive recursion and unbounded search (cf. Theorem 0.6.10). * We will achieve our aim by proving P = P. First a lemma: * 5.6.2 Lemma. PR ⊆ P. * * Proof Let f ∈ PR. Then f is PR-derived. But then it is also P-derived —a P* derivation need not necessarily use the (μy)-step 4 in Definition 5.6.1. So, f ∈ P.
* 5.6.3 Theorem. P = P.
9 Same
as the set of initial PR-unctions of 2.1.1.
196
5 (Un)Computability via Church’s Thesis
Proof * This is by an easy induction on the length of derivation of an Case P ⊇ P: * f ∈ P. The basis (length=1) is since I ⊆ P. The induction steps 2–4 (from Definition 5.6.1) follow from the closure properties of P. * Let λ Case P ⊆ P: xn .f ( xn ) ∈ P. By 5.5.3, for some i,
f = λ xn .out (μy)T n (i, xn , y), i, xn
(1)
* (recall also Theorem 2.4.23 By the lemma, the right hand side of (1) is in P and Lemma 5.5.2). So is f , then.
Among other things, Theorem 5.6.3 allows us to prove properties of P by induction on P-derivation length, and to show that f ∈ P via a way other than URMprogramming: Place f in a P-derivation. The number-theoretic characterisation of P given here was one of the foundations of computability proposed in the 1930s, due to Kleene.
5.7 The S-m-n Theorem A fundamental theorem in computability is the Parametrisation or Iteration or also “S-m-n” theorem of Kleene. In fact, the S-m-n-theorem along with the universal function theorem and a handful of additional initial computable functions are known to be sufficient tools towards founding computability axiomatically —but we will not get into this matter in this volume. 5.7.1 Theorem. (Parametrisation Theorem) For every λxy.g(x, y) ∈ P there is a function λx.f (x) ∈ R such that g(x, y) φf (x) (y), for all x, y
(1)
Preamble. (1) above is based on these observations: Given a program M that computes the function g as Mzuv with u receiving the input value x and v receiving the input value y —each via an “implicit” read statement— we can, for any fixed value x, construct a new program dependent on the value x, which behaves exactly as M does, because it consists of all of M’s instructions, plus one more: The new program N (x) —the notation “(x)” conveying the dependency of N on x— inputs x into u explicitly via an assignment statement added at the very top of M as 1 : u ← x. Of course, if x = x , the programs N(x) and N(x ) differ in their first instruction, so they are different. Let us denote, for each value x, the position of N(x)vz in our standard effective enumeration of all the Nww by the expression f (x), to convey the dependency on x.
5.7 The S-m-n Theorem
197
Clearly the correspondence x &→ f (x) is single-valued, and moreover, by the last remark in the preceding paragraph (in italics), it is a 1-1 function.
In sum, the new program, N(x), constructed from M and the value x is at location f (x) of the standard listing —in the notation of Corollary 5.3.4, F (f (x)) = N(x)vz . Thus N (x)vz with input y outputs g(x, y) for said x, that is, in the notation introduced in Definition 5.4.2, we have g(x, y) φf (x) (y), for all y and the fixed x —that is, for all x and y
(2)
Proof Of the S-m-n theorem. The proof is encapsulated by the preceding figure, and much of the argument was already presented in the Preamble located between the two signs above (in particular, we have shown (2)). Below we just settle the claim that we can compute the address f (x) from x, that is, λx.f (x) ∈ R. So, fix an input x for the variable u of program M. Next, construct N(x). A trivial algorithm exists for the construction:
198
5 (Un)Computability via Church’s Thesis
• Given M and x. • Modify M into N(x) by adding 1 : u ← x at the top of M as a new “first” instruction. See the above figure. • Change nothing else in the M-part of N(x), but do renumber all the original instructions of M, from “L : . . .” to “L + 1 : . . .”. Of course, every original M-instruction of the type L : if x = 0 goto P else goto R must also change “in its action part”, namely, into L + 1 : if x = 0 goto P + 1 else goto R + 1 • Now —to compute f (x)— go down the effective list of all Nww and keep comparing to N(x)vz , until you find it in the list and return its address. More explicitly, proc f (x) for z = 0, 1, 2, . . . if F (z) = N(x)vz
do then return z
• The returned value z is equal to f (x). Note that the if-test in the pseudo code will eventually succeed and terminate the computation, since all Nxx are in the range of F of Corollary 5.3.4. In particular, this means that f is total. By Church’s thesis the informal algorithm above —described in five bullets— can be realised as a URM. Thus, f ∈ R.
Worth Repeating: It must not be lost between the lines what we have already observed: that the S-m-n function f is 1-1. Two important corollaries suggest themselves: 5.7.2 Corollary. For every λx yn .g(x, yn ) ∈ P there is a function λx.f (x) ∈ R such that yn ), for all x, yn g(x, yn ) φf (x) ( Proof Imitate the proof of Theorem 5.7.1 using the fact that we have an effective enumeration of all n-ary computable partial functions (Corollary 5.3.5).
5.7.3 Corollary. There is a function S1m ∈ R of 2 variables such that (m+1)
φi
(m)
(x, ym ) φS m (i,x) ( ym ), for all i, x, ym 1
5.7 The S-m-n Theorem
199
Proof The proof is that of Theorem 5.7.1 with a small twist: In the proof of Theorem 5.7.1 we start with a URM M for g. Here instead we have an address i (m+1) , the latter being the counterpart of g in the current case. of a URM for φi The program N(x) that we have built in the proof of Theorem 5.7.1 depends on the value x that is inputed via an assignment rather a read statement. Said program is a trivial modification of the program M for g, where the first input variable u loses its “input status” and participates instead in the very first instruction as “1 : u ← x”. The corresponding program here we will call N(i, x) due to its obvious dependence on i that (indirectly) tells us which program “M” for φi(m+1) we start with. So, the construction of N(i, x) is (m+1)
1. Fetch the program for φi x Nxm+1 .
found in location i of the effective listing of all
Mzu,vm ,
Call it where we have also indicated its input/output variables. 2. Build N (i, x) by adding 1 : u ← x before the first instruction of M. Shift all labels of M by 1, so that N(i, x) is syntactically correct (cf. Theorem 5.7.1). 3. The N(i, x) program, with its input/output variables indicated, is N(i, x)vzm and can be located in the effective list of all N(i, x)xxm (cf. Corollary 5.3.5). The argument for the recursiveness of S1m has a bit more subtlety than that of f (x) of Theorem 5.7.1 due to the dependency on i. To compute the expression S1m (i, x), • Given i, x. x • Find the program at location i in the effective enumeration of all Nxm+1 . See step 1 in the construction above. • Build N (i, x)vzm as in step 2 above, and locate it in the effective list of all N(i, x)xxm (Corollary cf. 5.3.5). • Return the address you found in the previous step. This is S1m (i, x). By CT and the 1–3 algorithm above, S1m ∈ R.
5.7.4 Corollary. There is a function Snm ∈ R of n + 1 variables such that (m+n)
φi
(m)
( xn , ym ) φS m (i,xn ) ( ym ), for all i, xn , ym n
Proof This is now easy! In step 1. in the previous proof fetch the program for (m+n) (m+1) φi —instead of that of φi — found in location i of the effective listing of x
all Nxm+n . Call it Mzu n ,vm , where we have also indicated its input/output variables. The counterpart of step 2. above is now to place the program segment below before all instructions of M: 1 : u1 ← x1 2 : u2 ← x2 .. . n : un ← xn
200
5 (Un)Computability via Church’s Thesis
taking all the ui off input duty. The rest is routine and entirely analogous with the preceding proof, thus is left to the reader.
The notation of the symbol Snm indicates that the first n variables of φi(m+n) are taken off input duty while the last m of the original m + n input variables have still input duty.
5.8 Unsolvable “Problems”; the Halting Problem Some of the comments below (and Definition 5.8.1) occurred already in earlier sections (Definition 2.2.1). We revisit and introduce some additional terminology (e.g., the term “decidable”). Recall that a number-theoretic relation Q is a subset of Nn , where n ≥ 1. A relation’s outputs are t or f (or “yes” and “no”). However, a number-theoretic relation must have values (“outputs”) also in N. Thus we re-code t and f as 0 and 1 respectively. This convention is preferred by recursion theorists (as people who do research in computability like to call themselves) and is the opposite of the re-coding that, say, the C language employs (0 for f and non-zero for t). 5.8.1 Definition. (Decidable Relations) A relation Q( xn ) is computable, or recursive, or decidable, means that the function xn . cQ = λ
0 if Q( xn ) 1 otherwise
is in R. The set of all computable relations we denote by R∗ . xn ) —which does the re-coding of By the way, we call the function λ xn .cQ ( the outputs of the relation— the characteristic function of the relation Q (“c” for “characteristic”).
Thus, “a relation Q( xn ) is computable or decidable” means that some URM computes cQ . But that means that some URM behaves as follows: On input xn , it halts and outputs 0 iff xn satisfies Q (i.e., iff Q( xn )), it halts and outputs 1 iff xn does not satisfy Q (i.e., iff ¬Q( xn )). We say that the relation has a decider, i.e., the URM that decides membership of any tuple xn in the relation. 5.8.2 Definition. (Problems) A “Problem” is a formula of the type “ xn ∈ Q” or, equivalently, “Q( xn )”.
5.8 Unsolvable “Problems”; the Halting Problem
Thus, by definition, a “problem” is a membership question.
201
5.8.3 Definition. (Unsolvable Problems) A problem “ xn ∈ Q” is called any of the following: 1. Undecidable 2. Recursively unsolvable or just 3. Unsolvable iff Q ∈ / R∗ —in words, iff Q is not a computable relation.
Here is the most famous undecidable problem: φx (x) ↓
(1)
A different formulation of problem (1) is x∈K where K = {x : φx (x) ↓}10
(2)
that is, the set of all numbers x, such that machine Mx on input x has a (halting!) computation. K we shall call the “halting set”, and (1) we shall the “halting problem”. 5.8.4 Theorem. The halting problem is unsolvable. Proof We show, by contradiction, that K ∈ / R∗ . Thus we start by assuming the opposite. Let K ∈ R∗
(3)
that is, we can decide membership in K via a URM, or, what is the same, we can decide truth or falsehood of φx (x) ↓ for any x: Consider then the infinite matrix below, each row of which denotes a function in P as an array of outputs, the outputs being a natural number, or the special symbol “↑” for any undefined entry φx (y). By Theorem 5.4.1 each one argument function of P sits in some row (as an array of outputs).
10 All
three Rogers (1967), Tourlakis (1984), and Tourlakis (2012) use K for this set, but this notation is by no means standard. It is unfortunate that this notation clashes with that for the first projection K of a pairing function J . However the context will manage to fend for itself!
202
5 (Un)Computability via Church’s Thesis
φ0 (0) φ0 (1) φ0 (2) . . . φ0 (i) . . . φ1 (0) φ1 (1) φ1 (2) . . . φ1 (i) . . . φ2 (0) φ2 (1) φ2 (2) . . . φ2 (i) . . . .. . φi (0) φi (1) φi (2) . . . φi (i) . . . .. . We will show that under the assumption (3) that we hope to contradict, the flipped diagonal11 represents a partial recursive function as an array of outputs, and hence must fit the matrix along some row i since we have that all φi (as arrays) are rows of the matrix. On the other hand, flipping the diagonal is diagonalising, and thus the diagonal function constructed cannot fit. Contradiction! So, we must blame (3) and thus we have its negation proved: K ∈R / ∗ In more detail, or as most texts present this, we define the flipped diagonal for all x as ↓ if φx (x) ↑ d(x) ↑ if φx (x) ↓ Strictly speaking, the above does not define d since the “↓” in the top case is not a value; it is ambiguous. Easy to fix: One way to do so is d(x)
42
if φx (x) ↑
↑
if φx (x) ↓
(4)
Here is why the function in (4) is partial computable: Given x, do: • Use the decider for K (for φx (x) ↓, that is) —assumed to exist by (3)— to test which condition obtains in (4); top or bottom. • If the top condition is true, then we return 42 and stop. • If the bottom condition holds, then transfer to an infinite loop, for example: while 1 = 1 do end
all ↑ red entries to ↓ and vice versa. This flipping is a mechanical procedure by assumption (3).
11 Flipping
5.8 Unsolvable “Problems”; the Halting Problem
203
By CT, the 3-bullet program has a URM realisation, so d is computable. Say now d = φi
(5)
What can we say about d(i) φi (i)? Well, we have two cases: Case 1. φi (i) ↓. Then we are in the bottom case of (4). Thus d(i) ↑. But we also have d(i) φi (i) by (5), thus we have just contradicted the case hypothesis, φi (i) ↓. Case 2. φi (i) ↑. We have d(i) = 42 in this case, thus, d(i) ↓. By (5) d(i) φi (i), thus again we have contradicted the case hypothesis, φi (i) ↑. So we reject (3).
In terms of theoretical significance, the above is perhaps the most significant unsolvable problem that enables the process of discovering more! How? As a first example we illustrate the “program correctness problem” (see below). But how does “x ∈ K” help? Through the following technique of reduction: Let P be a new problem for which we want to see whether y ∈ P can be solved by a URM. We build a reduction that goes like this: (1) Suppose that we have a URM M that decides y ∈ P , for all y. (2) Then we show how to use M as a subroutine to also decide x ∈ K, for all x. (3) Since the latter problem is unsolvable, no such URM M exists! In short, P ( y) is unsolvable too. The equivalence problem is Given two programs M and N can we test to see whether they compute the same function?
Of course, “testing” for such a question cannot be done by experiment: We cannot just run M and N for all inputs to see if they get the same output, because, for one thing, “all inputs” are infinitely many, and, for another, there may be inputs that cause one or the other program to run forever (infinite loop). By the way, the equivalence problem is the general case of the “program correctness” problem which asks Given a program P and a program specification S, does the program fit the specification for all inputs?
since we can view a specification as just another formalism to express a function computation. By CT, all such formalisms, programs or specifications, boil down to URMs, and hence the above asks whether two given URMs compute the same function —program equivalence. Let us show now that the program equivalence problem cannot be solved by any URM.
204
5 (Un)Computability via Church’s Thesis
5.8.5 Theorem. (Equivalence Problem) The equivalence problem of URMs is the problem “given i and j ; is φi = φj ?”12 This problem is undecidable. Proof The proof is by a reduction (see above), hence by contradiction. We will show that if we have a URM that solves it, “yes”/“no”, then we have a URM that solves the halting problem too! So let the URM E solve the equivalenceproblem.
(*)
Let us use it to answer the question “a ∈ K” —that is, “φa (a) ↓”, for any a.
So, fix an a that we want to test.
(2)
Consider the following two computable functions given by: For all x: Z(x) = 0 and
0 * Z(x) 0
if x = 0 ∧ φa (a) ↓ if x = 0
Both functions are intuitively computable: For Z we already have shown a URM M * and input x compute as follows: that computes it. For Z • Print 0 and stop if x = 0. • On the other hand, if x = 0 then, using the universal function h start computing h(a, a), which is the same as φa (a) (cf. Theorem 5.4.1). If this ever halts just print 0 and halt; otherwise let it loop forever. * is in P, that is, it has a URM program, say M. * By CT, Z * respectively by going down We can compute the locations i and j of M and M * = φj . the list of all Nww . Thus Z = φi and Z By assumption (∗) above, we proceed to feed i and j to E. This machine will halt and answer “yes” (0) precisely when φi = φj ; will halt and answer “no” (1) otherwise. But note that φi = φj iff φa (a) ↓. We have thus solved the halting problem since a is arbitrary! This is a contradiction to the existence of URM E.
12 If
we set P = {(i, j ) : φi = φj }, then this problem is the question “(i, j ) ∈ P ?” or “P (i, j )?”.
5.9 Exercises
205
5.9 Exercises 1. Prove that the problem φx (y) ↓ is unsolvable. 2. Let λxy. x, y be any primitive recursive pairing function with primitive recursive projections Π1 and Π2 —that is, if z = x, y, then Π1 z = x and Π2 z = y. Prove that the set {x, y : φx (y) ↓} is not recursive. 3. Prove that the problem λxyz.φx (y) z is unsolvable. Hint. If instead it is recursive, then so is φx (x) = 1, by Grzegorczyk substitution. Now apply diagonalisation. 4. Prove that the problem λxyz.φx (y) > z is unsolvable. 5. Prove that the problem λxy.φx (y) z is unsolvable. 6. Prove that the problem λx.(∃y)φx (y) 42 —in words, “will the URM at address x ever print 42?” is unsolvable.
Chapter 6
Semi-Recursiveness
Overview This chapter introduces the semi-recursive relations Q( x ). These play a central role in computability. As the name suggests these are kind of “half” recursive. Indeed, we can program a URM M to verify membership in such a Q, but if an input is not in Q, then our verifier will not tell us so; it will loop for ever. That is, M verifies membership but does not decide it in a yes/no manner, that is, by halting and “printing” the appropriate 0 (yes) or 1 (no). Can’t we tweak M into an M that is a decider of such a Q? No, not in general! For example, our halting set K has a verifier but (provably) has no decider! This much we know: having a decider for K means K ∈ R∗ , and we know that this is not the case. Since the “yes” of a verifier M is signaled by halting but the “no” is signaled by looping forever, the definition below does not require the verifier to print 0 for “yes”. In this case “yes” equals “halting”. The chapter introduces a general reduction technique to prove that relations are not recursive or not semi-recursive according to the case. We prove the closure properties of the set of all semi-recursive relations and also the important graph theorem. We prove the projection theorem and also give a characterisation of semirecursive sets as images of recursive functions. The startling theorem of Rice is included: all sets of the form A = {x : φx ∈ C ⊆ P} are not recursive, unless A = ∅ or A = N.
6.1 Semi-Decidable Relations 6.1.1 Definition. (Semi-Recursive or Semi-Decidable Sets) A relation Q( xn ) is semi-decidable or semi-recursive iff there is a URM, M, which on input xn has a (halting!) computation iff xn ∈ Q. The output of M is unimportant! A more mathematically precise way to say the above is: © Springer Nature Switzerland AG 2022 G. Tourlakis, Computability, https://doi.org/10.1007/978-3-030-83202-5_6
207
208
6 Semi-Recursiveness
A relation Q( xn ) is semi-decidable or semi-recursive iff there is an f ∈ P such that Q( xn ) ≡ f ( xn ) ↓ Since an f ∈ P is some Myxn , M is a verifier for Q. The set of all semi-decidable relations we will denote by P∗ .1
(1)
6.1.2 Remark. Yet another way to say (1) is: A relation Q( xn ) is semi-decidable or semi-recursive iff there is an e ∈ N such that xn ) ↓ Q( xn ) ≡ φe(n) (
(2)
We call the e in (2) a semi-recursive index or semi-index of Q( xn ). Of course, every semi-recursive Q( xn ) has infinitely many semi-indices since the f in (1) in Definition 6.1.1 is equal to φe(n) for infinitely many e. If n = 1, i.e., Q ⊆ N, then we have the notation (Rogers 1967) Q = We for any semi-index e of Q. Thus, x ∈ We ≡ φe (x) ↓
(*)
x∈ / We ≡ φe (x) ↑
(**)
and
We have at once 6.1.3 Theorem. (Kleene Normal Form for Predicates) A relation Q( xn ) is semirecursive with semi-index i ∈ N iff xn , y) Q( xn ) ≡ (∃y)T (n) (i. (n)
Proof If -part. By Theorem 5.5.3, φi ( xn ) ↓≡ (∃y)T (n) (i. xn , y). Now, invoke Definition 6.1.1.
1 This is not a standard notation in the literature. Most of the time the set of all semi-recursive relations has no symbolic name! We are using this symbol in analogy to R∗ —the latter being fairly “standard”.
6.1 Semi-Decidable Relations
209 (n)
Only if -part. By Definition 6.1.1, Q( xn ) ≡ φi ( xn ) ↓. Now, invoke Theorem 5.5.3.
The following figure shows the two modes of handling a query, “ xn ∈ A”, by a URM.
Here is an important semi-decidable set. 6.1.4 Example. K is semi-decidable. We work within Definition 6.1.1. Note that the function λx.φx (x) is in P via the universal function theorem of 5.4.1, namely, λx.φx (x) = λx.h(x, x) and we know h ∈ P. Thus x ∈ K ≡ φx (x) ↓ settles it. By Definition 6.1.1 (1) we are done.
6.1.5 Example. Any recursive relation A is also semi-recursive. That is, R∗ ⊆ P∗ Indeed, intuitively, all we need to do to convert a decider for xn ∈ A into a verifier for xn ∈ A is to “intercept” the “print 1”-step and convert it into an “infinite loop”, while(1)2 { } By CT we can certainly do that via a URM implementation. A more explicit way (which still invokes CT) is to say, OK: Since A ∈ R∗ , it means that cA , its characteristic function, is in R. 21
in the C language evaluates as “true”. One may also use while(1 = 1).
210
6 Semi-Recursiveness
Define a new function f as follows: f ( xn )
0
if cA ( xn ) = 0
↑
xn ) 1 if cA (
This is intuitively computable since cA is total computable (the “↑” is implemented by the same while as above). Hence, by CT, f ∈ P. But xn ∈ A ≡ f ( xn ) ↓ because of the way f was defined. Definition 6.1.1 rests the case. One more way to do this: Totally mathematical (“formal”, as one might say, incorrectly3 ) this time! xn ) = 0 then 0 else ∅( xn ) f ( xn ) = if cA ( That is using the sw function that is in PR and hence in P, as in cA ( xn ) 0 ∅( xn ) ↓ ↓ ↓ z = 0 then u else w f ( xn ) = if ∅ is, of course, the empty function which by Grzegorczyk operations can have any number of arguments we please! For example, we may take ∅ = λ xn .(μy)g(y, xn ) where g = λy xn .SZ(y) = λy xn .1. In what follows we will often present first the informal way (for example, proofs by Church’s Thesis, or just by handwaving) of doing a proof, and then we will present the mathematically rigorous way.
An important observation following from the above examples deserves theorem status: 6.1.6 Theorem. R∗ P∗ Proof The ⊆ part of “” is Example 6.1.5 above.
3 “Formal” refers to syntactic proofs based on axioms. Our “mathematical” proofs are mostly semantic, they depend on meaning, not just syntax. That is how it is in the majority of mathematical publications. So one should say “mathematical” rather than “formal” in this context.
6.1 Semi-Decidable Relations
211
The = part is due to K ∈ P∗ (Example 6.1.4) and the fact that the halting problem is unsolvable (K ∈ / R∗ ). So, there are sets in P∗ (e.g., K) that are not in R∗ .
What about K, that is, the complement K = N − K = {x : φx (x) ↑} of K? The following general result helps us handle this question.
6.1.7 Theorem. A relation Q( xn ) is recursive if both Q( xn ) and ¬Q( xn ) are semirecursive. Before we proceed with the proof, a remark on notation is in order. In set notation we write the complement of a set, A, of n-tuples as A. This means, of course, Nn − A, where Nn = N × · · · × N n copies of N
In relational notation we write the same thing (complement) as ¬A( xn ) Proof We want to prove that some URM, N, decides xn ∈ Q xn ∈ Q”,4 and run them —on We take two verifiers, M for “ xn ∈ Q” and M for “ input xn — as “co-routines” (i.e., we crank them simultaneously). If M halts, then we stop everything and print “0” (i.e., “yes”). If M halts, then we stop everything and print “1” (i.e., “no”). CT tells us that we can put the above —if we want to— into a single URM, N.
Here is a mathematical proof that imitates what we did above. Let Q( xn ) ≡ (∃y)T (n) (i, xn , y) and ¬Q( xn ) ≡ (∃y)T (n) (j, xn , y) for some i and j (by Theorem 6.1.3). Then computing
Def f ( xn ) = (μy) T (n) (i, xn , y) ∨ T (n) (j, xn , y)
4 We
can do that, i.e., M and M exist, since both Q and Q are semi-recursive.
212
6 Semi-Recursiveness
runs the verifiers for Q and ¬Q —at input xn — found at locations i and j respectively. A smallest y will be found as one or the other verifier halts. Thus f ∈ R. But which verifier halted? Well, Q( xn ) ≡ T (n) (i, xn , f ( xn )) is true iff the verifier i stops. Incidentally, this equivalence —and the recursiveness of f — show that Q is recursive.
6.1.8 Remark The above is really an “iff”-result, because R∗ is closed under complement as we showed in Corollary 2.2.7. Thus, if Q is in R∗ , then so is Q, by closure under ¬. By Theorem 6.1.6, both of Q and Q are in P∗ .
6.1.9 Example. K ∈ / P∗ . Why? Well, if instead it were K ∈ P∗ , then combining this with Example 6.1.4 and Theorem 6.1.7 we would get K ∈ R∗ , which we know is not true. Thus, K is a extremely unsolvable problem! This problem is so hard it is not even semi-decidable!
6.2 Some More Diagonalisation 6.2.1 Example. We said in the cautionary remark at the end of Example 2.1.16 “That is, there are functions f ∈ P that have no recursive extensions. This we will show later.” Now is “later”! We show that f = λx.φx (x) + 1 has no total computable extensions. (Thus, in particular, it is not total, but computable it is!) By way of contradiction, assume that f ⊆g∈R
(1)
g = φi , for some i
(2)
g(i) = φi (i)
(3)
Then
Let us compute g(i):
by (2). Now, (1) tells a different story: f (i) = φi (i) + 1 (defined, by (3)), hence = g(i) (by (1)) We have from the above and (3): φi (i) + 1 = g(i) = φi (i), a contradiction, since all sides are defined.
6.2 Some More Diagonalisation
213
6.2.2 Exercise. Prove that λx.φx (x) has no total computable extensions either. We will come back to diagonalisation yet again later, but let us conclude here with two hidden diagonalisations that prove, yet again, the unsolvability of the halting problem. 6.2.3 Example. In Example 6.1.9 we proved K ∈ / P∗ using K ∈ / R∗ . Let us reverse this sequence of events here, and prove K ∈ / P∗ first, deriving from it the unsolvability of the halting problem (if the latter is solvable, then so is K by closure of R∗ under ¬). Well, if K ∈ / P∗ is false, then x ∈ K ≡ φi (x) ↓
(1)
for some i, hence, by Theorem 6.1.3 and (1), x ∈ K ≡ (∃y)T (i, x, y)
(1 )
But x ∈ K says φx (x) ↑, so (1 ) becomes φx (x) ↑≡ (∃y)T (i, x, y)
(2)
¬(∃y)T (x, x, y) ≡ (∃y)T (i, x, y)
(3)
or (by Theorem 5.5.3(1))
Setting the variable x equal to i throughout in (3) we get a contradiction: ¬(∃y)T (i, i, y) ≡ (∃y)T (i, i, y)
Let us redo Example 6.2.3 using Example 0.5.23 giving Cantor credit for this method! 6.2.4 Example. Again, we show K ∈ / P∗ directly, not via the halting problem. By (∗∗) in Remark 6.1.2, x ∈ / Wx ≡ φx (x) ↑, that is, / Wx x∈K≡x∈
(1)
We translate (1) as the equivalent K = {x ∈ N : x ∈ / Wx } By (2) and Example 0.5.23, K is the “D” for the sequence of sets W0 , W1 , W2 , . . .
(2)
214
6 Semi-Recursiveness
Thus there is no i such as K = Wi as we saw in Example 0.5.23. By Remark 6.1.2, this is tantamount to K not being semi-recursive.
6.3 Unsolvability via Reducibility We turn our attention now to a methodology towards discovering new undecidable problems, and also new non semi-recursive problems, beyond the ones we learnt about so far, which are just, x ∈ K, φi = φj (equivalence problem) and x ∈ K. In fact, we will learn shortly that φi = φj is worse than undecidable; just like K it is not even semi-decidable. The tool we will use for such discoveries is the concept of reducibility of one set to another: 6.3.1 Definition. (Strong Reducibility) For any two subsets of N, A and B, we write A ≤m B 5 or more simply A≤B
(1)
pronounced A is strongly reducible or m-reducible to B, meaning that there is a (total) recursive function f such that x ∈ A ≡ f (x) ∈ B
(2)
We say that “the reduction is effected by f ”. If f is 1-1 we may write A ≤1 B and say that A is 1-1-reducible or just 1reducible to B.
When (1) —equivalently, (2)— holds, then, intuitively, “A is easier than B to either decide or verify” since the question “x ∈ A” is algorithmically transformable (the algorithm being that for computing f (x)) to the equivalent question “f (x) ∈ B”. If we answer the latter, we have answered the former as well. This observation has a very precise counterpart (Theorem 6.3.3 below). But first, 6.3.2 Lemma. If Q(y, x) ∈ P∗ and λz.f (z) ∈ R, then Q(f (z), x) ∈ P∗ . Proof By Definition 6.1.1 there is a g ∈ P such that
5 The subscript m stands for “many one”, and refers to f . We do not require it to be 1-1, that is; many (inputs) to one (output) will be fine.
6.3 Unsolvability via Reducibility
215
Q(y, x) ≡ g(y, x) ↓
(1)
Now, for any z, f (z) is some number which if we plug into y in (1) throughout we get an equivalence: Q(f (z), x) ≡ g(f (z), x) ↓
(2)
But λzx.g(f (z), x) ∈ P by Grzegorczyk operations. Thus, (2) and Definition 6.1.1
yield Q(f (z), x) ∈ P∗ . 6.3.3 Theorem. If A ≤ B in the sense of Definition 6.3.1, then (i) if B ∈ R∗ , then also A ∈ R∗ (ii) if B ∈ P∗ , then also A ∈ P∗ Proof Let f ∈ R effect the reduction. (i) Let z ∈ B be in R∗ . Then for some g ∈ R we have z ∈ B ≡ g(z) = 0 and thus f (x) ∈ B ≡ g(f (x)) = 0
(1)
But λx.g(f (x)) ∈ R by composition, so (1) says that “f (x) ∈ B” is in R∗ . This is the same as “x ∈ A”. (ii) Let z ∈ B be in P∗ . By Lemma 6.3.2, so is f (x) ∈ B. But this says x ∈ A.
Taking the “contrapositive”, we have at once: 6.3.4 Corollary. If A ≤ B in the sense of Definition 6.3.1, then (i) if A ∈ / R∗ , then also B ∈ / R∗ (ii) if A ∈ / P∗ , then also B ∈ / P∗ We can now use K and K as “yardsticks” —or reference “problems”— and discover more undecidable and also non semi-decidable problems. The idea of the corollary is applicable to the so-called “complete index sets”. 6.3.5 Definition. (Complete Index Sets) Let C ⊆ P and A = {x : φx ∈ C}. A is thus the set of all programs (known by their addresses) x that compute any unary f ∈ C: Indeed, let λx.f (x) ∈ C. Thus f = φi for some i. Then i ∈ A. But this is true of all m for which φm = f . We call A a complete (all) index (programs) set.
6.3.6 Example. The set A = {x : ran(φx ) = ∅} is not semi-recursive.
216
6 Semi-Recursiveness
Recall that “range” for λx.f (x), denoted by ran(f ), is defined by {x : (∃y)f (y) = x} We will try to show that K≤A
(1)
If we can do that much, then Corollary 6.3.4, part (ii), will do the rest. Well, define ψ(x, y)
0
if φx (x) ↓
↑
if φx (x) ↑
(2)
Here is how to compute ψ: Given x, y, ignore y. Call h(x, x) ( φx (x)). If the call ever halts, then print “0” and halt everything. If it never halts, then you will never return from the call, which is the correct behaviour specified in (2) for ψ(x, y). By CT, ψ is in P, so, by the S-m-n Theorem, there is a recursive h such that ψ(x, y) φh(x) (y), for all x, y (a) Mathematically, without invoking CT, note that ψ(x, y) 0 × h(x, x) thus ψ is in P and is defined iff h(x, x) ↓ —that is, iff φx (x) ↓. (b) You may not use S-m-n until after you have proved that your “λxy.ψ(x, y)” is in P. We can rewrite this as, φh(x) (y)
0
if φx (x) ↓
↑
if φx (x) ↑
(3)
or, rewriting (3) without arguments (as equality of functions, not equality of function calls) φh(x) =
λy.0
if φx (x) ↓
∅
if φx (x) ↑
In (3 ), ∅ stands for λy. ↑, the empty function.
(3 )
6.3 Unsolvability via Reducibility
217
Thus, bottom case in 3
h(x) ∈ A iff ran(φh(x) ) = ∅
iff
φx (x) ↑
/ P∗ by The above says x ∈ K ≡ h(x) ∈ A, hence K ≤ A, and thus A ∈ Corollary 6.3.4, part (ii).
6.3.7 Example. The set B = {x : φx has finite domain} is not semi-recursive. This is really easy (once we have done the previous example)! All we have to do is “talk about” our findings, above, differently! We use the same ψ as in the previous example, as well as the same h as above, obtained by S-m-n. Looking at (3 ) above we see that the top case has infinite domain, while the bottom one has finite domain (indeed, empty). Thus, bottom case in 3
h(x) ∈ B iff φh(x) has finite domain
iff
φx (x) ↑
/ P∗ by Corollary 6.3.4, part (ii). The above says x ∈ K ≡ h(x) ∈ B, hence B ∈
6.3.8 Example. Let us mine (3 ) twice more to obtain two more important undecidability results. 1. Show that G = {x : φx is a constant function} is undecidable. We (re-)use (3 ) of Example 6.3.6. Note that in (3 ) the top case defines a constant function, but the bottom case defines a non-constant. Thus h(x) ∈ G ≡ φh(x) = λy.0 ≡ x ∈ K Hence K ≤ G, therefore G ∈ / R∗ . 2. Show that I = {x : φx ∈ R} is undecidable. Again, we retell what we can read from (3 ) in words that are relevant to the set I : ∅∈ /R
h(x) ∈ I ≡ φh(x) = λy.0 ≡ x ∈ K Thus K ≤ I , therefore I ∈ / R∗ .
We soon will sharpen the result 2 of the previous example (see Theorem 6.4.7). 6.3.9 Example. (The Equivalence Problem, Again) We now revisit the equivalence problem and show it is more unsolvable than we originally thought: Earlier on Theorem 5.8.5 we showed that the relation φx = φy is not decidable. Here we sharpen this to: the relation φx = φy is not semi-decidable. By Lemma 6.3.2, if the 2-variable predicate above is in P∗ then so is λx.φx = φy , i.e., taking a constant for y. Choose then for y a φ-index for the empty function.
218
6 Semi-Recursiveness
So, if λxy.φx = φy is in P∗ then so is φx = ∅ which is equivalent to ran(φx ) = ∅ and thus not in P∗ by Example 6.3.6.
6.3.10 Example. (An Impossibly Hard Problem; Again) Here is a more insightful proof of the non semi-recursivensess of the correctness problem: The functions λy.1 and λy.cT (x, x, y), the latter for x = 0, 1, 2, . . . are in PR, thus they can be finitely given (defined) by loop programs. Here cT is the characteristic function of T (x, y, z). Is the problem of determining whether or not λy.1 = λy.cT (x, x, y)
(1)
decidable? verifiable? for no matter which x we ask the question? This asks, in essence, whether we can decide or verify equivalence of two loop programs. Well, no! Why? Because (1) is equivalent to the statement
(∀y) cT (x, x, y) = 1
(2)
same as (∀y)¬T (x, x, y) same as ¬(∃y)T (x, x, y) But (3) says that φx (x) ↑ which is not verifiable! Nor is (1)! This argument will recur in Chap. 15 for a tiny subclass of PR.
(3)
6.3.11 Example. The set C = {x : ran(φx ) is finite} is not semi-decidable. Here we cannot reuse (3 ) above, because both cases —in the definition by cases— have functions of finite range. We want one case to have a function of finite range, but the other to have infinite range. Aha! This motivates us to choose a different “ψ” (hence a different S-m-n function “k”), and retrace the steps we took above.
6.3 Unsolvability via Reducibility
219
Define g(x, y)
y
if φx (x) ↓
↑
if φx (x) ↑
(ii)
Here is an algorithm for g: Given x, y. Call the universal function h(x, x) —or, just call φx (x). If this ever returns, then print “y” and halt everything. If it never returns, then the correct behaviour for g(x, y) is obtained once more: namely, we got g(x, y) ↑ if x ∈ K. By CT, g is partial recursive. Before we proceed, here is a mathematically precise reason (no CT!) for the partial recursiveness of g: g(x, y) y + 0 × φx (x), for all x, y. Hence g ∈ P by substitution. Thus by S-m-n, for some recursive unary k we have g(x, y) φk(x) (y), for all x, y Hence, by (ii) φk(x) =
λy.y ∅
if x ∈ K
(iii)
othw
It follows that bottom case in iii
k(x) ∈ C iff φh(x) has finite range
iff
x∈K
That is, K ≤ C and we are done.
6.3.12 Exercise. Show that D = {x : ran(φx ) is infinite} is undecidable.
6.3.13 Exercise. Show that F = {x : dom(φx ) is infinite} is undecidable.
Enough “negativity”! Here is an important “positive result” that helps to prove that certain relations are semi-decidable: 6.3.14 Theorem. (Projection Theorem) A relation Q( xn ) is semi-recursive iff there is a recursive (decidable) relation S(y, xn ) such that Q( xn ) ≡ (∃y)S(y, xn )
(1)
Q is obtained by “projecting” S along the y-coordinate, hence the name of the theorem.
220
6 Semi-Recursiveness
Proof If -part. Trivially, Q( xn ) ≡ (μy)S(y, xn ) ↓ But λ xn .(μy)S(y, xn ) ∈ P. Only if -part. Say Q( xn ) has semi-index i ∈ N. Then, we are done by Theorem 6.1.3 since we have Q( xn ) ≡ (∃y)T (n) (i, xn , y) Def
and T (n) is primitive recursive: Take λy xn .S(y, xn ) ≡ λy xn .T (n) (i, xn , y).
This theorem sometimes has a qualifier: “strong projection theorem”, the weak version not being an iff result: See 3 under Theorem 6.5.2. 6.3.15 Example. The set A = {(x, y, z) : φx (y) = z} is semi-recursive. Here is a verifier for the above predicate: Given input x, y, z. Comment. Note that φx (y) = z is true iff two things happen: (1) φx (y) ↓ and (2) the computed value is z. 1. Call the universal function h on input x, y. 2. If the call ever returns, then • If the returned value h(x, y) equals z then halt everything (the “yes” output). • If the returned value is not equal to z, then enter an infinite loop (say “no”, by looping). By CT the above informal verifier can be formalised as a URM M. But is it correct? Does it verify φx (y) = z? Yes. See Comment above.
A mathematical proof of the above uses Theorem 5.5.3:
φx (y) z ≡ (∃w) T (x, y, w) ∧ out (w, x, y) = z The predicate in the big brackets is in PR∗ by Theorem 2.2.5. Now invoke Theorems 6.3.14. Incidentally, the predicate (in the left hand side of ≡ above) that we have discussed here is not decidable. See Exercise 6.11.1.
6.4 Recursively Enumerable Sets In this section we explore the rationale behind the alternative name “recursively enumerable” or “computably enumerable” that is used in the literature for the semi-
6.4 Recursively Enumerable Sets
221
recursive or semi-computable sets/predicates. Short names for this alternative jargon are “r.e.” and “c.e.” respectively. To avoid cumbersome codings (of n-tuples, by single numbers) we restrict attention to the one variable case in this section. I.e., predicates that are subsets of Nn for the case n = 1. First we define: 6.4.1 Definition. A set A ⊆ N is called computably enumerable (c.e.) or recursively enumerable (r.e.) precisely if one of the following cases holds: • A=∅ • A = ran(f ), where f ∈ R.
Thus, the c.e. or r.e. relations are exactly those we can algorithmically enumerate as the set of outputs of a (total) recursive function: A = {f (0), f (1), f (2), . . . , f (x), . . .} Hence the use of the term “c.e.” replaces the non technical term “algorithmically” (in “algorithmically” enumerable) by the technical term “computably”. Note that we had to hedge and ask that A = ∅ for any enumeration to take place, because no recursive function can have an empty range. Next we prove: 6.4.2 Theorem. (“c.e.” or “r.e.” vs. Semi-Recursive) Any non empty semi-recursive relation A (A ⊆ N) is the range of some (emphasis: total) recursive function of one variable. Conversely, every set A such that A = ran(f ) —where λx.f (x) is recursive— is semi-recursive (and, trivially, nonempty). In short, the semi-recursive sets are precisely the c.e. or r.e. sets. For A = ∅ this is the content of Theorem 6.4.2 while ∅ is r.e. by definition, and known to us to be also semi-recursive —indeed in PR∗ ⊆ R∗ ⊆ P∗ . Before we prove the result, here is an example: 6.4.3 Example. The set {0} is c.e. Indeed, f = λx.0, our familiar function Z, effects the enumeration with repetitions (lots of them!) x=0 1 2 3 4
...
f (x) = 0 0 0 0 0
...
Proof (I) We prove the first sentence of the theorem. So, let A = ∅ be semi-recursive. By the projection theorem 6.3.14 there is a decidable (recursive) relation Q(y, x) such that
222
6 Semi-Recursiveness
x ∈ A ≡ (∃y)Q(y, x), for all x
(1)
Thus, every x ∈ A will have some associated value y such that Q(y, x) holds.
(2)
and conversely, if Q(y, x) holds for some y, x pair, then x ∈ A.
(2 )
(2) and (2 ) jointly rephrase (1), but also suggest the idea of how to enumerate all x ∈ A: We should look for all pairs (y, x) in a systematic manner, and for each such pair that is in Q we should just output (enumerate) the x-component But we know how to generate all pairs in a systematic manner, computably. for z = 0, 1, 2, 3, . . . do generate the pair (z)0 , (z)1 . Here “y” is (z)0 and “x” is (z)1 Pause. Will the above generate all pairs? Sure: For any x and y, the pair (y, x) is guaranteed to be output when we reach the z-value y, x ( = 2y+1 3x+1 ). Thus the enumerating partial recursive function f we want (that has range equal to A) is: ⎧
⎨(z) if Q (z) , (z) 1 0 1 f (z) (3) ⎩↑ othw This works for any semi-recursive A, empty or not, and gives a mathematical proof to Corollary 6.4.4 below. A small tweak takes care of a nonempty A with a total f : So let a ∈ A be any. Modify (3) into (4). ran(g) = A and g ∈ R. g(z)
⎧ ⎨(z)
if Q (z)0 , (z)1
⎩a
othw
1
(II) Proof of the second sentence of the theorem. recursive. Thus,
(4)
So, let A = ran(f ) —where f is
x ∈ A ≡ (∃y)f (y) = x By Grzegorczyk operations, the fact that z = x is in R∗ and the assumption f ∈ R, the relation f (y) = x is recursive. We are done by the projection theorem.
6.4.4 Corollary. If A is semi-recursive, then A = ran(f ) for some f ∈ P.
6.4 Recursively Enumerable Sets
223
Do we have a converse? Is the range of any partial recursive function semirecursive? Yes. Theorem 6.6.2. 6.4.5 Corollary. An A ⊆ N is semi-recursive iff it is r.e. (c.e.) Proof For nonempty A this is Theorem 6.4.2. For empty A we note that this is r.e. by Theorem 6.4.2 but also semi-recursive by ∅ ∈ PR∗ ⊆ R∗ ⊆ P∗ .
Corollary 6.4.5 allows us to prove some non-semi-recursiveness results by Cantor diagonalisation. See below. 6.4.6 Remark. In view of the coincidence of semi-recursive and r.e. sets, one will encounter also the term r.e.-index of a semi-recursive or c.e. set A as synonym for semi-index. Interestingly the term “r.e.-index” still applies to an x such A = Wx and not to an x such that A = ran(φx ). We will adhere with the term “semi-index” in this volume.
6.4.7 Theorem. The complete index set A = {x : φx ∈ R} is not semi-recursive. This sharpens the undecidability result for A that we established in Example 6.3.8.
Proof By the equivalence of c.e.-ness and semi-recursiveness we prove that A is not c.e. If not, note first that A = ∅ since, e.g., Z ∈ R and thus at least one φ-index is in A (a φ-index for Z). Thus, Theorem 6.4.2 applies and there is an f ∈ R such that A = ran(f ) = {y : y = f (x), for some x}, that is, y ∈ A ≡ (∃x)f (x) = y. In words, a φ-index y is in A iff it has the form f (x) for some x
(1)
Define d = λx.1 + φf (x) (x)
(2)
Since λx.φf (x) (x) ∈ P, we obtain d ∈ P. But φf (x) is total since all the f (x) are φ-indices of total functions by (1). By the same comment, d = φf (i) , for some i
(3)
Let us compute d(i): d(i) 1 + φf (i) (i) by (2). Also, d(i) φf (i) (i) by (3), thus 1 + φf (i) (i) φf (i) (i) which is a contradiction since both sides of “” are defined.
One can take as d different functions, for example, either of d = λx.42 + φf (x) (x) . or d = λx.1 − φf (x) (x) works. And infinitely many other choices do.
224
6 Semi-Recursiveness
6.5 Some Closure Properties of Decidable and Semi-Decidable Relations We already know (Corollaries 2.2.7 and 2.2.15) that 6.5.1 Theorem. R∗ is closed under all Boolean operations, ¬, ∧, ∨, ⇒, ≡, as well as under (∃y)