271 6 10MB
English Pages 344 [339] Year 2014
THREE VIEWS O F LO G I C
THREE VIEWS O F LO G I C Mathematics, Philosophy, and Computer Science
D O N A L D W. L O V E L A N D RICHARD E. HODEL S. G. STERRETT
P R I N C E TO N U N I V E R S I T Y P R E S S Princeton and Oxford
c 2014 by Princeton University Press Copyright Published by Princeton University Press, 41 William Street, Princeton, New Jersey 08540 In the United Kingdom: Princeton University Press, 6 Oxford Street, Woodstock, Oxfordshire OX20 1TW press.princeton.edu All Rights Reserved ISBN 978-0-691-16044-3 Library of Congress Control Number: 2013949122 British Library Cataloging-in-Publication Data is available This book has been composed in ITC Stone Serif Printed on acid-free paper ∞ Typeset by S R Nova Pvt Ltd, Bangalore, India Printed in the United States of America 10
9
8
7
6
5
4
3
2
1
Contents
Preface Acknowledgments
PART 1. Proof Theory
ix xiii
1
DONALD W. LOVELAND
1
2
3
Propositional Logic 1.1 Propositional Logic Semantics 1.2 Syntax: Deductive Logics 1.3 The Resolution Formal Logic 1.4 Handling Arbitrary Propositional Wffs
3 5 13 14 26
Predicate Logic 2.1 First-Order Semantics 2.2 Resolution for the Predicate Calculus 2.2.1 Substitution 2.2.2 The Formal System for Predicate Logic 2.2.3 Handling Arbitrary Predicate Wffs
31 32 40 41 45 54
An Application: Linear Resolution and Prolog 3.1 OSL-Resolution 3.2 Horn Logic 3.3 Input Resolution and Prolog
61 62 69 77
Appendix A: The Induction Principle Appendix B: First-Order Valuation Appendix C: A Commentary on Prolog References
81 82 84 91
PART 2. Computability Theory
93
RICHARD E. HODEL
4
Overview of Computability 4.1 Decision Problems and Algorithms 4.2 Three Informal Concepts
95 95 107
vi
Contents
5
6
A Machine Model of Computability 5.1 Register Machines and RM-Computable Functions 5.2 Operations with RM-Computable Functions; Church-Turing Thesis; LRM-Computable Functions 5.3 RM-Decidable and RM-Semi-Decidable Relations; the Halting Problem 5.4 Unsolvability of Hilbert’s Decision Problem and Thue’s Word Problem
123 123
A Mathematical Model of Computability 6.1 Recursive Functions and the Church-Turing Thesis 6.2 Recursive Relations and RE Relations 6.3 Primitive Recursive Functions and Relations; Coding 6.4 Kleene Computation Relation Tn (e, a1 , . . . , an , c) 6.5 Partial Recursive Functions; Enumeration Theorems 6.6 Computability and the Incompleteness Theorem
165 165 175 187 197 203 216
List of Symbols References
219 220
136 144 154
PART 3. Philosophical Logic
221
7
223
S. G. STERRETT
Non-Classical Logics 7.1 Alternatives to Classical Logic vs. Extensions of Classical Logic 7.2 From Classical Logic to Relevance Logic 7.2.1 The (So-Called) “Paradoxes of Implication” 7.2.2 Material Implication and Truth Functional Connectives 7.2.3 Implication and Relevance 7.2.4 Revisiting Classical Propositional Calculus: What to Save, What to Change, What to Add?
8
Natural Deduction: Classical and Non-Classical 8.1 Fitch’s Natural Deduction System for Classical Propositional Logic 8.2 Revisiting Fitch’s Rules of Natural Deduction to Better Formalize the Notion of Entailment — Necessity 8.3 Revisiting Fitch’s Rules of Natural Deduction to Better Formalize the Notion of Entailment — Relevance
223 228 228 234 238 240 243 243 251 253
vii
Contents
8.4 The Rules of System FE (Fitch-Style Formulation of the Logic of Entailment) 8.5 The Connective “Or,” Material Implication, and the Disjunctive Syllogism
9
10
Semantics for Relevance Logic: A Useful Four-Valued Logic 9.1 Interpretations, Valuations, and Many Valued Logics 9.2 Contexts in Which This Four-Valued Logic Is Useful 9.3 The Artificial Reasoner’s (Computer’s) “State of Knowledge” 9.4 Negation in This Four-Valued Logic 9.5 Lattices: A Brief Tutorial 9.6 Finite Approximation Lattices and Scott’s Thesis 9.7 Applying Scott’s Thesis to Negation, Conjunction, and Disjunction 9.8 The Logical Lattice L4 9.9 Intuitive Descriptions of the Four-Valued Logic Semantics 9.10 Inferences and Valid Entailments
261 281
288 288 290 291 295 297 302 304 307 309 312
Some Concluding Remarks on the Logic of Entailment
315
References
316
Index
319
Preface This book is based on an interdisciplinary course on logic offered to upper-level undergraduates at Duke University over a period of more than ten years. Why an interdisciplinary course on logic? Although logic has been a discipline of study in philosophy since ancient times, in recent decades it has played an important role in other disciplines as well. For example, logic is at the core of two programming languages, is used in program verification, has enriched philosophy (and computer science) with non-classical logics that can deal constructively with contradictions, and has shaken the foundations of mathematics with insight into non-computable functions and non-provability. Several of these ideas are treated in this book. We developed a one-semester course suitable for undergraduates that presents some of these more recent, exciting ideas in logic as well as some of the traditional, core ideas. Undergraduate students generally have limited time to pursue logic courses and we found that the course we offered gave them some understanding of both the breadth of logic and the depth of ideas in logic. This book addresses select topics drawn from three different areas of logic: proof theory, computability theory, and philosophical logic. A common thread throughout is the application of logic to computers and computation. Part 1 on Proof Theory introduces a deductive system (resolution logic) that comes from an area of research known as automated deduction. Part 2 on Computability Theory explores the limits of computation using an abstract model of computers called register machines. Part 3 on Philosophical Logic presents a certain non-classical logic (relevance logic) and a semantics for it that is useful for automated reasoning systems that must deal with the possibility of inconsistent information. The book can serve a variety of needs. For the first time there is now available a text for an instructor who would like to offer a course that teaches the role of logic in several disciplines. The book could also be used as a supplementary text for a logic course that emphasizes the more traditional topics of logic but also wishes to include a few special topics. The book is also designed to be a valuable resource for researchers and academics who want an accessible yet substantial introduction to the three topics.
x
Preface
The three areas from which the special topics are drawn — proof theory, computability theory, and philosophical logic — exhibit the different roles that logic plays in three different disciplines: computer science, mathematics, and philosophy. The three parts of the book were written by a computer scientist, a mathematician, and a philosopher, respectively, and each part was reviewed by the other two authors for accessibility to students in their fields. The three parts of the book are roughly of equal length. The second part, on computability theory, is largely independent of the first, but the third part, on philosophical logic, is best presented after the first two parts. Although it is helpful to have had a previous course in logic, we present the topics in such a way that this is not necessary. However, some mathematical background is useful, especially if no logic background is offered. We had a number of freshmen and sophomores take this course with success, but they had a strong analytic preparation. In particular, prior exposure to proofs by induction is important. (We do offer a summary review of the induction methods employed in an appendix.) The three topics covered are both timely and important. Although the use of automated theorem proving in artificial intelligence (AI) is often associated with the early decades of AI, it is also of great value in some current AI research programs. Watson, IBM’s question-answering computer, made famous in 2011 by an impressive performance on the quiz show Jeopardy!, employed resolution logic (presented in Part 1) through the resolution-based programming language Prolog. Part 2, on computability theory, presents one of the great success stories of mathematical logic. We now have a methodology for proving that certain problems cannot be solved by an algorithm. The ideas required to reach this goal can be traced back to Gödel and Turing and moreover played an important role in the development of the modernday computer; it is for this reason that Gödel’s and Turing’s names are on Time magazine’s list of the twenty most influential scientists and thinkers of the twentieth century. Classical logic was motivated by considerations in mathematics. The important role that logic plays in other disciplines has given rise to logics that extend or differ from classical logic; examples include modal logic, intuitionistic logic, fuzzy logic, and relevance logic. Part 3 explores the ideas of extensions and alternatives to classical logic, with an in-depth treatment of one of these, relevance logic. The book begins with proof theory for both propositional logic and first-order logic. In each case, there is a quick review of the semantics of that logic. This has the advantage of serving as background for the subsequent parts on computability theory and non-classical logics.
Preface
xi
A computer-oriented deductive logic based on the resolution inference rule is employed. At the propositional level all proofs are given, including both soundness and completeness proofs. In resolution logic the completeness theorem proof has an intuitive graphical form that makes the proof easier to comprehend. At the first-order level proofs are deferred to a set of problems to be undertaken by the mathematically oriented students. They cover most of the major results, including the steps to the completeness theorem. Plausibility arguments are used instead. This pedagogical strategy works well without losing the important content because first-order proof theory based on resolution employs lifting proofs almost verbatim from the propositional counterpart proof. The lifting process is discussed in detail. There is an extensive treatment of restrictions of resolution logic based on linear resolution that serves as the basis of Prolog, a computer programming language based on deduction. No programming experience is required. The second part of the book introduces the student to computability theory, an area of mathematical logic that should be of interest to a broad audience due to its influence on the development of the computer. There are two major goals: clarify the intuitive notion of an algorithm; and develop a methodology for proving that certain problems cannot be solved by an algorithm. Four famous problems whose solution requires an algorithm are emphasized: Hilbert’s Decision Problem, Hilbert’s Tenth Problem, the Halting Problem, and Thue’s Word Problem. A wide range of explicit algorithms are described, after which attention is restricted to the set of natural numbers. In this setting three informal concepts are defined (each in terms of an algorithm): computable function, decidable relation, and semi-decidable relation. The first three problems mentioned above are semi-decidable (in a more general sense), but are they decidable? Two models of computation are described in considerable detail, each with the motivation of giving a precise counterpart to the three informal concepts. The first model is a machine model, namely the register machine and RM-computable functions. Turing’s diagonal argument that the Halting Problem is unsolvable is given, together with an outline of his application of that result, namely that Hilbert’s Decision Problem is unsolvable. The PostMarkov result that the Word Problem is unsolvable is also proved. The second model of computation is a mathematical model, the recursive functions. Precise counterparts of the three informal concepts are defined: recursive functions, recursive relations, and recursively enumerable relations. There is a detailed proof that the two models give the same class of functions. The relationship between the informal concepts and their formal counterparts, together with the important role of the Church-Turing Thesis, is emphasized.
xii
Preface
The third part of the book consists of topics from philosophical logic, with an emphasis on the propositional calculus of a particular non-classical logic known as relevance logic. We follow Anderson and Belnap’s own presentation of it here. The topic is presented by first considering some well-known theorems of classical propositional logic that clash with intuitions about the use of “if . . . then . . . ,” which have been known as “paradoxes of implication.” The student is invited to reflect on the features of classical logic that give rise to them. This is approached by presenting the rules for a natural deduction system for classical logic and examining which features of these rules permit derivation of the non-intuitive theorems or (so-called) paradoxes. This motivates considering alternative rules for deriving theorems, which is an occasion for a discussion of the analysis of the conditional (if . . . then . . . ) and its relation to deduction and derivation. The non-classical logic known as relevance logic is presented as one such alternative. Both a natural deduction style proof system and a four-valued semantics (told true, told false, told both, told neither) for this logic are given. This is important, as some philosophers present relevance logic as a paraconsistent logic. The pedagogical approach we take here shows that is by no means mandatory: by the use of the engaging example of its application in a question-answering computer, we present a practical application in which this non-classical logic accords well with intuitions about what one would want in a logic to deal with situations in which we are faced with conflicting information. This example broadens the student’s ideas of the uses and capabilities of logic. The inferential semantics is presented using a mathematical structure called a lattice. A brief introduction to mathematical lattices is provided. Then, drawing on the points in the classic paper “How a Computer Should Think,” it is shown that, in certain contexts in which automated deduction is employed, relevance logic is to be preferred over classical logic. Some connections with the two earlier parts of the course on computer deduction and computability theory are made. Part 3 closes with some remarks on the impact of relevance logic in various disciplines.
Acknowledgments
I am grateful to the many people who inspired and guided my education, of which this book is one consequence. Angelo Margaris and Robert Stoll, Oberlin College, showed me the elegance of mathematics. Paul Gilmore, at IBM, introduced me to the beauty of mathematical logic. Marvin Minsky, MIT, one of the founders of the field of artificial intelligence (AI), kindled my interest in AI which led, ultimately, to the study of logic. It is Martin Davis, at the Courant Institute of NYU and Yeshiva University, to whom I owe the greatest thanks for the knowledge of logic I possess. Books, particularly Davis’ Decidability and Unsolvability, Kleene’s Introduction to Metamathematics, Enderton’s A Mathematical Introduction to Logic, and Shoenfield’s Mathematical Logic replaced my lack of many formal courses in logic. Earlier work for Martin, and my interest in AI, led to my lifelong interest in automated theorem proving. Peter Andrews, at Carnegie Mellon University, also contributed significantly to my logic education. To the many others who go unnamed but contributed to my education, in the field of automated theorem proving in particular, I extend my great thanks. I dedicate this book to my father, to whom I owe my interest in science and mathematics, to my wife for her patience and support throughout my entire career, and to sons Rob and Doug who so enrich our lives. Donald W. Loveland
I wish to express my deep appreciation to my Ph.D. thesis advisor at Duke, the topologist J. H. Roberts. Roberts was a charismatic professor whose use of the Moore method inspired me to pursue an academic research career in set-theoretic topology. Although Roberts is my mathematical father, J. R. Shoenfield is surely my mathematical uncle. During the 1980s I greatly enjoyed attending his crystal-clear graduate level lectures at Duke on computability theory, set theory, and model theory. I also wish to express my thanks to my many other teachers and professors of mathematics and logic throughout the years. These are: High school: E. Whitley; Davidson College: R. Bernard, B. Jackson, J. Kimbrough, W. McGavock; Duke University: L. Carlitz, T. Gallie, S. Warner, N. Wilson. I am also grateful for the opportunity to spend my two years of active military duty (1963–1965) writing machine language programs for the Bendix G-15 computer at the Weapons Analysis Branch of the Army Artillery at Fort Sill, OK. Finally, I dedicate
xiv
Acknowledgments
my portion of this book to my wife Margaret, son Richie, daughter Katie, and my parents whom I miss very much. Richard E. Hodel
I owe thanks to many for reaching a point in my life at which I could participate in this interdisciplinary book project. Of the many professors from whom I took courses on mathematics and logic, I especially thank those whose lectures revealed something of the grand ideas in the field of mathematical logic: the mathematician Peter B. Andrews and the philosopher Gerald Massey (both of whom, I later learned, studied with Alonzo Church at Princeton) and the late Florencio Asenjo, whose lectures on set theory I especially enjoyed. A seminar in philosophy of mathematics run by Kenneth Manders and Wilfried Sieg showed me how the work of mathematicians from earlier centuries (Descartes, Dedekind, Cantor, Hilbert) could be appreciated both in historical context and from the perspective of twentieth century logic and model theory; I owe a lot to both of them for discussions outside that seminar, too. I feel fortunate to have been able to study with such mathematically informed philosophers in tandem with studying mathematical topics that I then only suspected of utility in philosophy: graph theory, combinatorics, abstract algebra, and foundations of geometry. My interest in logic probably first began, though, with computer science, as awe of the power of formalization while listening to R. W. Conway’s lectures on structured programming in PL/1 and PL/C at Cornell University. In philosophical logic, my debt is almost wholly to Nuel D. Belnap, Jr., for many wonderful seminar meetings, and for his elegant, highly instructive papers. It is Belnap’s work in Entailment: The logic of relevance and necessity, of course, that I present here, but with the larger idea of conveying the role of logician as actively working to capture and formalize what we recognize as good reasoning, revising and inventing along the way as needed. It is for that vision of philosophical logic, especially, that I owe much to Nuel. Both Nuel Belnap and his successor, Anil D. Gupta, reviewed earlier versions of the manuscript of Part 3 and provided useful suggestions; David Zornek did the same specifically for the section on four-valued logic. Jeff Horty provided encouragement, and Bill McDowell provided a rare and valuable test of the accessibility and clarity of the content of Part 3 frorn the standpoint of a student who had not previously studied relevance logic nor heard our lectures, by reviewing the manuscript and developing problem solutions. Rob Lewis offered his services to convert Part 3 of the manuscript to LaTeX under a demanding time constraint,
Acknowledgments
xv
and Peter Spirtes dropped everything to proofread the resulting formal proofs while publisher deadlines loomed. S. G. Sterrett
The inspiration for this textbook project owes much to the students in the course on which it is based; their situations varied widely, from students such as Jacqueline Ou, who took the course as a freshman, excelled, and went on to a career as a scientist, to Justin Bledin, who took the course as a senior , excelled, and then decided to switch fields, making logic, methodology and philosophy of science his career. This book project would not have been attempted without examples of such student interest in, and mastery of, the course material, and we thank the many students we have had the pleasure of knowing who similarly inspired us but are not specifically named here. We thank three anonymous reviewers for their thorough reviews, which contained constructive comments and excellent suggestions that greatly improved all three parts of the book. We thank the Princeton University Press staff for their devotion to producing a premier product, and to their understanding and cooperation when we made special requests. In particular, we thank Vickie Kearn, Executive Editor for mathematics, for her strong belief in the value of our proposed book, and her subsequent support and guidance through the publication process. Donald Loveland Richard Hodel S. G. Sterrett August 5, 2013
PART 1. Proof Theory D O N A L D W. LOV E L A N D
1 Propositional Logic There are several reasons that one studies computer-oriented deductive logics. Such logics have played an important role in natural language understanding systems, in intelligent database theory, in expert systems, and in automated reasoning systems, for example. Deductive logics have even been used as the basis of programming languages, a topic we consider in this book. Of course, deductive logics have been important long before the invention of computers, being key to the foundations of mathematics as well as a guide in the general realm of rational discourse. What is a deductive logic? It has two major components, a formal language and a notion of deduction. A formal language has a syntax and semantics, just like a natural language. The first formal language we present is the propositional logic. We first give the syntax of the propositional logic. •
Alphabet 1. Statement Letters: P, Q, R, P1 , Q 1 , R1 , . . . . 2. Logical Connectives: ∧, ∨, ¬, →, ↔ . 3. Punctuation: (, ).
•
Well-formed formulas (wffs), or statements 1. Statement letters are wffs (atomic w f f s or atoms). 2. If A and B are wffs, then so are (¬A), (A ∨ B), (A ∧ B), (A → B), and (A ↔ B).
Convention. For any inductively defined class of entities we will assume a closure clause, meaning that only those items created by (often repeated use of) the defining rules are included in the defined class. We will denote statement letters with the letters P, Q, R, possibly with subscripts, and wffs by letters from the first part of the alphabet unless explicitly noted otherwise. Statement letters are considered to be non-logical symbols. We list the logical connectives with their English labels. (The negation symbol doesn’t connect anything; it only has one argument. However, for convenience we ignore that detail and call all the listed logical symbols “connectives.”)
4
Part 1: Proof Theory
¬
negation
∧
and
∨
or
→
implies
↔
equivalence
Example. (((¬P ) ∧ Q) → (P → Q)). A formal language, like a natural language, has an alphabet and a notion of grammar. Here the grammar is simple; the wffs (statements) are our sentences. As even our simple example shows, the parentheses make statements very messy. We usually simplify matters by forgoing technically correct statements in favor of sensible abbreviations. We suppress some parentheses by agreeing on a hierarchy of connectives. However, it is always correct to retain parentheses to aid quick readability even if rules permit further elimination of parentheses. We also will use brackets or braces on occasion to enhance readability. They are to be regarded as parentheses formally. We give the hierarchy with the tightest binding connective on top. ¬ ∨, ∧ →, ↔ We adopt association-to-the-right. A→ B →C
is (A → (B → C)).
Place the parentheses as tightly as possible consistent with the existing bindings, beginning at the top of the hierarchy. Example. The expression ¬P ∧ Q → P → Q represents the wff (((¬P ) ∧ Q) → (P → Q)). A more readable expression, also called a wff for convenience, is (¬P ∧ Q) → (P → Q).
Chapter 1: Propositional Logic
5
1.1 Propositional Logic Semantics We first consider the semantics of wffs by seeing their source as natural language statements. English to formal notation. Example 1. An ongoing hurricane implies no class meeting, so if no hurricane is ongoing then there is a class meeting. Use:
H: there is an ongoing hurricane C: there is a class meeting
Formal representation. (1) A (2) (H → ¬C) → (¬H → C) (3) (H → ¬C) ∧ [(H → ¬C) → (¬H → C)] The first representation is technically correct (ignoring the “use” instruction), but useless. The idea is to formalize a sentence in as finegrained an encoding as is possible with the logic at hand. The second and third representations do this. Notice that there are three different English expressions encoded by the implies symbol. “If . . . then” and “implies” directly translate to the formal implies connective. “So” is trickier. If “so” implies the truth of the first clause, then wff (3) is preferred as the clause is asserted and connected by the logical “and” symbol to the implication. The second interpretation simply reflects that the antecedent of “so” (supposedly) implies the consequent. This illustrates that formalizing natural language statements is often an exercise in guessing the intent of the natural language statement. The task of formalizing informal statements is often best approached by creating a candidate wff and then directly replacing the statement letters by the assigned English meanings and judging how successfully the wff captures the intent of the informal statement. One does this by asserting the truth of the wff and then considering the truth status of the component parts. For example, asserting the truth of wff (3) forces the truth of the two components of the logical “and” by the meaning of logical “and.” We consider the full “meaning” of implication shortly. One should try alternate wffs if there is any question that the wff first tried is not clearly satisfactory. Doing this in this case yields representations (2) and (3). Which is better depends on one’s view of the use of “so” in this sentence.
6
Part 1: Proof Theory
Aside: For those who suspect that this is not a logically true formula, you are correct. We deal with this aspect later. Example 2. Either I study logic now or I go to the game. If I study logic now I will pass the exam but if I go to the game I will fail the exam. Therefore, I will go to the game and drop the course. Use:
L: I study logic now G: I go to the game P : I pass the exam D: I drop the course
Formal representation. (1) (L ∨ G) ∧ [((L → P ) ∧ (G → ¬P )) → G ∧ D] or (2) ((L ∨ G) ∧ (L → P ) ∧ (G → ¬P )) ∧ [((L → P ) ∧ (G → ¬P )) → G ∧ D]. “Therefore” is another English trigger for the logical “implies” connective in the wff that represents the English sentences. Again, it is a question of whether the sentences preceding the “therefore” are intended as facts or only as part of the conditional statement. Two possibilities are given here. As before, the logical “and” forces the assertion of truth of its two components when the full statement is asserted to be true. Note that the association-to-the-right is used here to avoid one pair of added parentheses in representation (2). We do group the first three subformulas (conjuncts) together reflecting the English sense. As illustrated above, logic studies the truth value of natural language sentences and their associated wffs. The truth values are limited to true (T) and false (F) in classical logic. All interpreted wffs have truth values. To determine the truth values of interpreted wffs we need to define the meaning of the logical connectives. This is done by a truth table, which defines the logical symbols as functions from truth values to truth values. (We often refer to these logical functions as the logical connectives of propositional logic.) Definition. The following truth tables are called the defining truth tables for the given functions. Although there appears to be only one “table” displayed here, it is because we have combined the defining truth table for each connective into one compound truth table.
7
Chapter 1: Propositional Logic
To illustrate the use of the truth tables, consider the “and” function. Suppose we have a wff (possibly a subformula) of form A ∧ B where we know that A and B both have truth value F. The first two columns give the arguments of the logical functions so we choose the fourth row. A∧ B has value F in that row so we know that A∧ B has value F when each of A and B has truth value F. All the logical functions except “implies” are as you have usually understood the meaning of those functions. Notice that the “or” function is the inclusive or. We discuss the function “implies” shortly, but a key concept is that this definition forces the falsity of A → B to mean both that A is true and B is false. A T T F F
B T F T F
¬A F F T T
A∧ B T F F F
A∨ B T T T F
A→ B T F T T
A↔ B T F F T
The idea of logic semantics is that the user specifies the truth value of the statement letters, or atoms, and the logical structure of the wff as determined by the connectives supplies the truth value of the entire wff. We capture this “evaluation” of the wffs by defining a valuation function. We will need a number of definitions that deal with propositional logic semantics and we give them together for easier reference later, followed by an illustration of their use. Here and hereafter we use “iff” for “if and only if.” Definition. An interpretation associates a T or F with each statement letter. Definition. A valuation function V is a mapping from wffs and interpretations to T and F, determined by the defining truth tables of the logical connectives. We write V I [A] for the value of the valuation of wff A under interpretation I of A. Definition. A wff A is a tautology (valid wff) iff V I [A] = T for all interpretations I . Example. The wff P ∨ ¬P is a tautology. We now consider further the choice we made in the defining truth table for implication. We have chosen a truth table for implication that
8
Part 1: Proof Theory
yields the powerful result that the theorems of the deductive logic we study in this part of the book are precisely the valid wffs. That is, we are able to prove precisely the set of wffs that are true in all interpretations. The implication function chosen here is called material implication. Other meanings (i.e., semantics) for implication are considered in Part 3 of this book. Definition. A wff A is satisfiable iff there exists an interpretation I such that V I [A] = T . We also say “I makes A true” and “I satisfies A.” Definition. A wff A is unsatisfiable iff the wff is not satisfiable; i.e., there does NOT exist an interpretation I such that V I [A] = T. Definition. Interpretation I is a model of wff A iff V I [A] = T. I is a model of a set S of wffs iff I is a model of each wff of S. If A is satisfiable, then A has a model. Definition. A is a logical consequence of a set S of wffs iff every model of S is a model of A. Notation. We will have occasion to explicitly represent finite sets. One way this is done is by listing the members. Thus, if A1 , A2 , and A3 are the (distinct) members of a set then we can present the set as {A1 , A2 , A3 }. {Ai |1 ≤ i ≤ n} represents a set with n members A1 , A2 , ..., An . {A} represents a set with one member, A. Notation. S |= A denotes the relation that A is a logical consequence of the set S of wffs. We write |= A iff A is a tautology. We write |= A for the negation of |= A. We sometimes write P |= Q for {P } |= Q where P and Q are wffs. Definition. A and B are logically equivalent wffs iff A and B have exactly the same models. We illustrate the valuation function. Example. Determine the truth value of the following sentence under the given interpretation. An ongoing hurricane implies no class meeting, so if no hurricane is ongoing then there is a class meeting.
9
Chapter 1: Propositional Logic
Use:
H: there is an ongoing hurricane C: there is a class meeting
Interpretation I is as follows. I (H) = T
I (C) = T
Then V I [(H → ¬C) → (¬H → C)] =?? We determine V I . We omit the subscript on V when it is understood. I (H) T
I (C) V[H → ¬C] V[¬H → C] V[(H → ¬C) → (¬H → C)] T F T T
The above gives the truth value of the statement under the chosen interpretation. That is, the meaning and truth value of the statement is determined by the interpretation and the correct row in the truth table that corresponds to the intended interpretation. Thus, V I [(H → ¬C) → (¬H → C)] = T. Now let us check as to whether the statement is a tautology. Following the tradition of presentation of truth tables, we omit reference to the interpretation and valuation function in the column headings, even though the entries T,F are determined by the interpretation and valuation function in the rows as above. Of course, the interpretation changes with each row. We see that the statement A is not a tautology because V I [A] = F under I (H) = F and I (C) = F. H T T F F
C T F T F
H → ¬C F T T T
¬H → C T T T F
(H → ¬C) → (¬H → C) T T T F
We now establish two facts that reveal important properties of some of the terms we have introduced. The first property is so basic that we do not even honor it by calling it a theorem. Although people often find it more comfortable to skip proofs on the first reading of a mathematical text, we suggest that reading proofs
10
Part 1: Proof Theory
often reinforces the basic concepts of the discipline. We particularly urge the reader to thoroughly understand the following two proofs as they utilize the basic definitions just given and help clarify the concepts involved. Fact. Wff A is a tautology iff ¬A is unsatisfiable. Proof. We handle the two implications separately. Note: The splitting of the iff (the biconditional, or equivalence) to two implications is justified by the tautology (A ↔ B) ↔ ((A →B) ∧ (B →A)). Likewise, the use of the contrapositive below is justified by a tautology, namely (A →B) ↔ (¬B → ¬A). The promotion of a tautology to permit the substitution of one proof approach by another warrants a comment as this illustrates a key use of logic. Consider the contrapositive. We wish to prove an implication (A → B). It is more convenient to prove the contrapositive (¬B → ¬A) in some cases as in the case that follows this note. The second tautology above and the truth table for equivalence ↔ says that (¬B → ¬A) is true whenever (A →B) is true and only then. Thus the contrapositive proof establishes the original implication. =⇒: Use the contrapositive. Assume NOT(¬A is unsatisfiable), i.e., ¬A is satisfiable. Thus, we know a model I exists for ¬A, that is, V I [¬A] = T. But then, V I [A] = F. Therefore, NOT(A is a tautology). ⇐=: See Exercise 10. The following theorem is very important as it goes to the heart of logic. A major reason to study logic is to understand the notion of logical consequence, namely, what follows logically from a given set of statements. This theorem states that given a (finite) set of statements, treated as facts, and a statement B that follows logically from those facts, then that entailment is captured by the logical implies symbol. Thus, to test for the logical consequence of a set of wffs, we need only test whether the related wff that “replaces” the logical consequence by the implies sign is a tautology. Theorem 1. If S = {Ai |1 ≤ i ≤ n} then the following equivalence holds: S |= B iff |= A1 ∧ A2 ∧ · · · ∧ An → B.
11
Chapter 1: Propositional Logic
Proof. =⇒: We prove the contrapositive. If |= A1 ∧ · · · ∧ An → B then there exists some interpretation I such that V I [A1 ∧ · · · ∧ An → B] = F which forces V I [A1 ∧ · · · ∧ An ] = T and V I [B] = F. But then V I [Ai ] = T all i, by repeated use of the truth table for ∧. So I is a model of the set S. Yet V I [B] = F, so S |= B. The contrapositive is shown. ⇐=: Again, we prove the contrapositive. If not S |= B then there exists an interpretation I such that for all i, 1 ≤ i ≤ n, V I [Ai ] = T, and V I [B] = F. Then V I [A1 ∧ · · · ∧ An] = T. Thus, V I [A1 ∧ · · · ∧ An → B] = F so A1 ∧ · · · ∧ An → B is not a tautology. We next review the notion of replacement of one subformula by another, passing the validity of a wff to its offspring when appropriate. This is a powerful tool that we will use frequently. Theorem 2 (Replacement Theorem). Let wff F contain subformula B and F1 be the result of replacing one or more occurrences of B by wff C in F . Then |= B ↔ C implies (*)
|= F ↔ F1 .
Note that (*) implies that for all interpretations I , V I [F ] = T iff V I [F1 ] = T. Proof. The proof is by induction on the number of connective occurrences in F . (We provide a review of induction techniques in an appendix.) We present a full proof below but first we present the key idea of the proof, which is quite intuitive. Recall that V I [F ] is computed from atomic formulas outward. Only the truth value of B, not the content of B, influences V I [E ] for any subformula E of F containing B. Thus, replacement of B by C where |= B ↔ C will mean V I [F ] = V I [F1 ], any I. Example. F : P ∨ (P → Q) B: P → Q C: ¬P ∨ Q
12
Part 1: Proof Theory
(Note:
|= (P → Q) ↔ (¬P ∨ Q) ) F1 : P ∨ (¬P ∨ Q) |= F ↔ F1 by the Replacement Theorem
Aside: |= P ∨(¬P ∨ Q) ↔ (P ∨¬P )∨ Q, by associativity of ∨. But (P ∨ ¬P ) ∨ Q is a tautology. Therefore, P ∨ (P → Q) is a tautology by reference to the example above. We now give the formal proof. We use a form of induction proof often called complete induction that is frequently used in proofs in mathematical logic. P(n). The theorem holds when the wff F contains n occurrences of logical connectives. P(0). F is an atomic wff so the substitution of one or more occurrences of B by C means that F is B and so F1 is C. Since we are given that |= B ↔ C we immediately have |= F ↔ F1 . We now show the induction step. For each n ≥ 0 we show: Assume P(k) for all 0 ≤ k ≤ n, to show P(n + 1). Case (a). F is completely replaced; i.e., F is B. This is the case P(0) already treated. Case (b). F is of form ¬G and one or more occurrences of B are replaced in G by C to create G 1 . By induction hypothesis |= G ↔ G 1 because G has fewer occurrences of connectives than F . Thus, for any arbitrary interpretation I , V I [G] = V I [G 1 ], so V I [¬G] = V I [¬G 1 ]. Thus, |= ¬G ↔ ¬G 1 by the truth table for equivalence. That is, |= F ↔ F1 . Case (c). F is of form G ∨ H and one or more occurrences of B are replaced in either or both G and H by C to create G 1 and H1 , respectively. By induction hypothesis |= G ↔ G 1 and |= H ↔ H1 . (Here we need complete induction as each of G and H may have considerably fewer connective occurrences than F .) We must show |= G ↔ G 1 and |= H ↔ H1 imply |= F ↔ F1 . Let I be an arbitrary interpretation for G, G 1 , H, and H1 . By |= G ↔ G 1 and |= H ↔ H1 we know that V I [G] = V I [G 1 ] and V I [H] = V I [H1 ]. Therefore, V I [G ∨ H] = V I [G 1 ∨ H1 ] because the first (resp., second) argument of the or logical function has the same truth value on both sides of the equation. To repeat, the preceding equation holds because the arguments of the or function “are the same” on both sides of the equation, not because of the truth table for or. Since this proof argument holds for any interpretation I of G, G 1 , H, and H1 we have |= (G ∨ H) ↔ (G 1 ∨ H1 ), or |= F ↔ F1 . Case (d), Case (e), Case (f). F is of form G∧ H, F is of form G → H, and F is of the form G ↔ H, respectively. These cases were actually argued in Case (c), as the proof argument for each case is independent of the truth table for the connective, as noted in Case (c).
Chapter 1: Propositional Logic
13
This establishes that for all n, n ≥ 0, P(n) holds from which the theorem immediately follows.
1.2 Syntax: Deductive Logics Definition (informal). A formal (deductive) logic consists of •
a formal language, axioms, and • rules of inference. •
Definition (informal). An (affirmative) deduction of wff A from axioms A and hypotheses H is a sequence of wffs of which each is either an •
axiom, hypothesis, or • inferred wff from previous steps by a rule of inference •
and with A as the last wff in the sequence. Definition (informal). A hypothesis is an assumption (wff) that is special to the deduction at hand. Definition (informal). A theorem is the last entry in an affirmative deduction with no hypotheses. Definition (informal). A propositional logic that has tautologies as theorems is called a propositional calculus. The above definitions are informal in that they are not fully specified; indeed, they can take different meanings depending on the setting. For example, a theorem is a wff dependent on the grammar of the logic under consideration. We will be developing a refutation-based deductive logic for the propositional calculus, one that is not in the traditional format for a deductive logic. It will not have a theorem (i.e., a tautology) as the last entry in the deduction as does an affirmative logic. Almost all deductive logics appearing in logic texts are affirmative and so the word is rarely used, and is our reason for placing the term in parentheses. However, the logic we introduce here will be able to give a proof of any tautology, just not directly. We instead refute by deduction the negation of any tautology. The only adjustment to the notion of deduction given above is that we do not derive the theorem as the last wff but rather start with the negated theorem at the beginning and derive a success symbol at the end.
14
Part 1: Proof Theory
We highlight a key point of the above paragraph. We can show that a wff is a tautology by refuting the negation of that wff. Why should we consider deductive logics? Although for propositional logic we can sometimes determine tautologyhood faster by truth tables than by a deductive logic, there is no counterpart to truth tables at the first-order logic level. The set of valid first-order theorems for almost all logics is not decidable; in particular, a non-valid wff cannot always be shown to be non-valid. However, the set of valid wffs is semidecidable. (The notion of decidability is studied in Part 2 of this book.) In fact, the standard method of showing semi-decidability is by use of deductive logics. We study a deductive logic first at the propositional level that people often find more convenient (and sometimes faster) to apply to establish tautologyhood than by using truth tables. Also, the first-order deductive logic is a natural extension of the propositional logic, so our efforts here will be doubly utilized.
1.3 The Resolution Formal Logic Resolution deductive logic is a refutation-based logic. As such, resolution has properties not shared with affirmative deductive logics. We list some of the differences here. •
We determine unsatisfiability, not tautologyhood (validity). We emphasize that resolution is used to establish tautologyhood by showing the unsatisfiability of the tautology’s negation. • The final wff of the deduction is not the wff in question. Instead, the final wff is a fixed empty wff which is an artifact of convenience. • There are no axioms, and only one rule of inference (for the propositional case). The resolution deductive logic has the very important properties that it is sound and complete. That means that it only indicates unsatisfiability when the wff is indeed unsatisfiable (soundness) and, moreover, it always indicates that an unsatisfiable wff is unsatisfiable (completeness). Frequently, we are interested in whether a set of assumptions logically implies a stated conclusion. For the resolution system we pose this as a conjunction of assumptions and the negation of the conclusion. We then ask whether this wff is unsatisfiable. This will be illustrated after we introduce the resolution formal system. •
Alphabet – As before, with logical connectives restricted to ¬, ∧, and ∨.
15
Chapter 1: Propositional Logic
•
Well-formed formulas 1. Statement letters are wffs (atomic wffs, atoms). 2. If A is an atom, then A and ¬A are literals, and wffs. 3. If L 1 , L 2 , ..., L n are distinct literals then L 1 ∨ L 2 ∨ ... ∨ L n is a clause, and wff, n ≥ 1. 4. If C1 , C2 , ..., Cm are distinct clauses then C1 ∧ C2 ∧ ... ∧ Cm is a wff, m ≥ 1. Examples of wffs. 1. 2. 3. 4. 5.
¬P is a literal, a clause, and a wff. P ∨ Q is a clause and a wff. P ∧ ¬Q is a wff consisting of two clauses. (¬P ∨ Q) ∧ (P ∨ ¬Q ∨ ¬R) is a two-clause wff. P ∧ ¬P is a two-clause wff but P ∧ P is not a wff because of the repeated clauses.
•
Axioms: None.
•
Rule of inference Let L i , . . . , L j denote literals. Let L and L c denote two literals that are complementary (i.e., one is an atom and the other is its negation). The resolution rule has as input two clauses, the parent clauses, and produces one clause, the resolvent clause. We present the inference rule by giving the two parents above a horizontal line and the resolvent below the line. Notice that one literal from each parent clause disappears from the resolvent but all other literals appear in the resolvent. L 1 ∨ L 2 ∨ . . . L n ∨ L L c ∨ L 1 ∨ L 2 . . . ∨ L m . L 1 ∨ L 2 ∨ . . . L n ∨ L 1 ∨ L 2 . . . ∨ L m Literal order is unimportant. Clauses, conjunctions of clauses, and resolvents almost always do not permit duplication and we may assume that condition to hold unless explicitly revoked.
We now can formally introduce the empty clause, denoted by the resolvent of two one-literal clauses. P
¬P
.
Notation. A one-literal clause is also called a unit clause. Definition. V I [
] = F , for all I .
, as
16
Part 1: Proof Theory
The easiest way to see that the above definition is natural is to consider that for any interpretation I a clause C satisfies V I [C] = T iff contains no literals it cannot V I [L] = T for some literal L in C. Since satisfy this condition, so it is consistent and reasonable to let such a clause C satisfy V I [ ] = F , for all I . Definition. The clauses taken as the hypotheses of a resolution deduction are called given clauses. Resolution deductions are standard logic deductions tailored to resolution; they are sequences of wffs that are either given clauses or derived from earlier clauses by use of the resolution inference rule. (In the firstorder case given later we do add a second inference rule.) Though not required in general, in this book we will follow the custom of listing all the given clauses at the beginning of the deduction. This allows the viewer to see all the assumptions in one place. Definition. When all given clauses are listed before any derived clause then the sequence of given clauses is called the prefix of the deduction. The goal of resolution deductions we pursue is to deduce two oneliteral clauses, one clause containing an atom and the other clause containing its complement. We will see that this emanates from an unsatisfiable wff. Rather than referring to complementary literals, whose form depends on the alphabet of the wff, we apply the resolution inference rule one step further and obtain the empty clause. Thus, we can speak definitively of the last entry of a resolution refutation demonstration as the empty clause. Before giving a sample resolution derivation we introduce terminology to simplify our discussion of resolution deductions. Definition. A resolution refutation is a resolution deduction of the empty clause. We will hereafter drop the reference to resolution in this chapter when discussing deductions and refutations as the inference system is understood to be resolution. A clause can be regarded as a set of literals since order within a clause is unimportant. We will present example deductions with clauses displaying the “or” connective but on occasion will refer to a clause as a set of literals. Definition. A resolving literal is a literal that disappears from the resolvent but is present in a parent clause.
17
Chapter 1: Propositional Logic
We now present two resolution refutations. Example 1. We give a refutation of the wff (¬A ∨ B) ∧ (¬B ∨ C) ∧ A ∧ ¬C: 1. 2. 3. 4. 5. 6. 7.
¬A ∨ B ¬B ∨ C A ¬C B C
given clause given clause given clause given clause resolvent of clauses 1 and 3 on literals ¬A and A resolvent of clauses 2 and 5 on literals ¬B and B resolvent of clauses 4 and 6 on literals ¬C and C
Henceforth we will indicate the clauses used in the resolution inference rule application by number. Also, we use lowercase letters, starting with a, to indicate the literal position in the clause. Thus, the justification of step 5 above is given as 5.
B
resolvent of 1a and 3.
As an aside, we note that the above refutation demonstrates that the wff (¬A ∨ B) ∧ (¬B ∨ C) ∧ A ∧ ¬C is unsatisfiable. This assertion follows from a soundness theorem that we prove next. Note that this unsatisfiability is equivalent to stating that the first three clauses of the wff imply C. Using the tautology (¬A ∨ B) ↔ (A → B) and the Replacement Theorem we see that the above wff is equivalent to (A ∧ (A → B) ∧ (B → C)) |= C. This is seen as correct by using modus ponens twice to yield a traditional proof that C follows from A∧ (A → B) ∧ (B → C) (and uses the soundness of your favorite affirmative logic). One might notice that the inference (A ∧ (A → B) ∧ (B → C)) |= C is obtained directly in the resolution deduction above; that is, C appears as a derived clause. So one may wonder if this direct inference approach is always possible. If so, then the refutation construct would be unnecessarily complex. However, notice that P |= P ∨ Q yet the singleton clause P cannot directly infer clause P ∨ Q via the resolution rule. But P ∧¬(P ∨ Q) can be shown unsatisfiable via a resolution refutation. The reader can verify this by noting that ¬(P ∨ Q) is logically equivalent to ¬P ∧ ¬Q so that P ∧ ¬(P ∨ Q) is unsatisfiable iff P ∧ ¬P ∧ ¬Q is unsatisfiable. (Later we show a general method for finding wffs within resolution logic that are unsatisfiable iff the original wff is.) The property we have shown is sometimes characterized by the statement that resolution logic does not contain a complete inference system (as opposed to a complete deductive system).
18
Part 1: Proof Theory
Example 2. We give a refutation of the following clause set {A ∨ B ∨ C, ¬A ∨ ¬B, B ∨ ¬C, ¬B ∨ ¬C, ¬A ∨ C, ¬B ∨ C}: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.
A∨ B ∨ C ¬A ∨ ¬B B ∨ ¬C ¬B ∨ ¬C ¬A ∨ C ¬B ∨ C ¬C ¬B A∨ B A C
given clause given clause given clause given clause given clause given clause resolvent of 3a and 4a resolvent of 6b and 7 resolvent of 1c and 7 resolvent of 8 and 9b resolvent of 5a and 10 resolvent of 7 and 11
Notice that clause 2 was not used. Nothing requires all clauses to be used. Also note that line 7 used the fact that ¬C ∨ ¬C is logically equivalent to ¬C. After the Soundness Theorem is considered we give some aids to help find resolution proofs. To show the Soundness Theorem we first show that the resolution rule preserves truth. This provides the induction step in an induction proof of the Soundness Theorem. Lemma (Soundness Step). Given interpretation I , if (1) V I [L 1 ∨ L 2 ∨ . . . ∨ L n ] = T and (2) V I [L 1 ∨ L 2 ∨ . . . ∨ L m] = T and L 1 = L c1 then resolvent satisfies
L 2 ∨ . . . ∨ L n ∨ L 2 ∨ . . . ∨ L m
(3) V I [L 2 ∨ . . . ∨ L n ∨ L 2 ∨ . . . ∨ L m] = T. Proof. Expression (1) implies V I [L i ] = T for some i, 1 ≤ i ≤ n. Likewise, expression (2) implies V I [L j ] = T for some j, 1 ≤ j ≤ m. Since L c1 = L 1 , one of V I [L 1 ] = F or V I [L 1 ] = F. Thus one of the subclauses L 2 ∨ . . . ∨ L n
Chapter 1: Propositional Logic
19
and L 2 ∨ . . . ∨ L m must retain a literal with valuation T. Hence (3) holds. Theorem 3 (Soundness Theorem for Propositional Resolution). If there is a refutation of a given clause set S, then S is an unsatisfiable clause set. Proof. We prove the contrapositive. To do this we use the following has no model. lemma. Recall that wff Lemma. If I is a model of clause set S then I is a model of every clause of any resolution deduction from S. Proof. By complete induction on the length of the resolution deduction. Induction predicate. P(n): If A is the nth clause in a resolution deduction from S and if I is a model of S then V I [A] = T. We show: P(1). The first clause D1 of a deduction must be a given clause. Thus V I [D1 ] = T. For each n ≥ 1, we show: Assume P(k) holds for all k, 1 ≤ k ≤ n, to show P(n + 1). Case 1. Dn+1 is a given clause. Then V I [Dn+1 ] = T. Case 2. Dn+1 is a resolvent clause. Then the parent clauses appear earlier in the deduction, say at steps k1 and k2 , and we know by the induction hypothesis that V I [Dk1 ] = V I [Dk2 ] = T. But then V I [Dkn+1 ] = T by the Soundness Step. We have shown P(n + 1) under the induction conditions so we can conclude that ∀nP(n), from which the lemma follows. We now can conclude the proof of the theorem. Suppose S is satisfiable. Let I be a model of S. Then I is a model of every clause of the refutation, including . But V I [ ] = F. Contradiction. Thus no such model I of S can exist and the theorem is proved. The last two sentences of the proof may seem a cheat as we to be false for all interpretations and then used that fact defined decisively to finish the proof. That is a fair criticism and so we give . Recall a more comfortable argument for the unsatisfiability of is created in a deduction only when for some atom Q we have that already derived clause {Q} and also clause {¬Q}. But for no interpretation I can both V I [Q] = T and V I [¬Q] = T; i.e., one derived clause is
20
Part 1: Proof Theory
false in interpretation I . So, by the above lemma, when is derived we know that we already have demonstrated the unsatisfiability of the given clause set. Although we will not focus on proof search techniques for resolution several search aids are so useful that they should be presented. We do not prove here that these devices are safe but a later exercise will cover this. Definition. A clause C θ -subsumes clause D iff D contains all the literals of C. Clause D is the θ -subsumed clause and clause C is the θ -subsuming clause. We use the term θ -subsumes rather than subsumes as the latter term has different meaning within resolution logic. The reference to θ will be clear when we treat first-order (logic) resolution. The definition above is modified for the propositional case; we give the full and correct definition when we treat first-order resolution. We use the term resolution restriction to refer to a resolution logic whose deductions are a proper subset of general resolution. (Recall that a proper subset of a set is any subset not the full set itself.) The first two search aids result in resolution restrictions in that some deductions are disallowed. Neither rule endangers the existence of a refutation if the original clause set is unsatisfiable. Aid 1. A clause D that is θ -subsumed by clause C can be neglected in the presence of C in the search for a refutation. Plausibility argument. The resolution inference rule eliminates only one literal from any parent clause. Thus the literals in D in some sense are eliminated one literal at a time. Those parts of the proof that eliminate the literals in D cover all the literals of C so a refutation involving D also would be a refutation involving C (and usually longer). (See Exercise 4, Chapter 3, for a suggested way to prove the safety of this search restriction.) The plausibility argument suggests that the refutation using the θ -subsuming clause is shorter than the refutation using the θ -subsumed clause. This is almost always the case. We demonstrate the relationship between refutations with an example. Example. We give a refutation of the following clause set. {A ∨ B ∨ C, ¬A ∨ ¬B, B ∨ ¬C, ¬B ∨ ¬C, ¬A ∨ C, C}. This example is an alteration of Example 2 on page 18 where clause 6 is here replaced by clause C that θ -subsumes the original clause ¬B ∨ C.
21
Chapter 1: Propositional Logic
Note that the sequence of resolvents given in lines 8–10 of the original refutation, needed to derive clause C at line 11, is absent here: 1. 2. 3. 4. 5. 6. 7. 8.
A∨ B ∨ C ¬A ∨ ¬B B ∨ ¬C ¬B ∨ ¬C ¬A ∨ C C ¬C
given clause given clause given clause given clause given clause given clause resolvent of 3a and 4a resolvent of 6 and 7
Note that clause C θ -subsumes clauses 1 and 5 so those clauses should not be used in the search for a refutation (and they were not). As another example of θ -subsumption of a clause, again using Example 2 on page 18, consider replacing clause 1 by the clause A ∨ B. Notice that the saving is much less, though present. Aid 2. Ignore tautologies. Plausibility argument. Consider an example inference using a tautology. P ∨ ¬P ∨ Q ¬P ∨ R . ¬P ∨ Q ∨ R By Aid 1, ignore the resolvent as the right parent is better. Aid 3. Favor use of shorter clauses, one-literal clauses in particular. Plausibility argument. Aid 1 gives a general reason. The use of oneliteral clauses is particularly good for that reason, but also because the resolvent θ -subsumes the other parent and rules out any need to view that clause further. In general, recall that we seek two complementary one-literal clauses to conclude the refutation, so we want to keep generating shorter clauses to move towards generating one-literal clauses. However, it is necessary to use even the longest clause of the deduction on occasion. This is particularly true of long given clauses but also includes inferred clauses that are longer than any given clause. We therefore stress the word “favor” in Aid 3. We now prove the Completeness Theorem for propositional resolution. Recall that this is the converse of the Soundness Theorem; now we assert that when a clause set is unsatisfiable then there is a refutation of that clause set. At the propositional level the result is stronger: Whether the clause set is satisfiable or not, there is a point at which the proof
22
Part 1: Proof Theory
search can stop. (It is possible to generate all possible resolvents in the propositional case, although that may be a large number of clauses. This property does not of itself guarantee completeness, of course.) As for almost all logics, the Completeness Theorem is much more difficult to prove than the Soundness Theorem. A nice property of resolution logic is that the essence of the propositional Completeness Theorem can be presented quite pictorially. However, we do need to define some terms first. We list the definitions together but suggest that they just be scanned now and addressed seriously only when they are encountered in the proof. In the proof that follows we make use of a data structure called a binary tree, a special form of another data structure called a graph. A (data structure) graph is a structure with arcs, or line segments, and nodes, which occur at the end points of arcs. We use a directed graph, where the arcs are often represented by arrows to indicate direction. We omit the arrowheads as the direction is always downward in the tree we picture. A binary tree is defined in the first definition below. For an example of a binary tree see Figure 1.1 given in the proof of the Completeness Theorem. In the following definitions S is a set of (propositional) clauses and A is the set of atoms named in S. (Recall: A∈A means A is a member of the set A.) Definition. A (labeled) binary tree is a directed graph where every node except the single root node and the leaf nodes has one incoming arc and two outgoing arcs. The root node has no incoming arcs and the leaf nodes have no outgoing arcs. Nodes and arcs may have labels. Definition. A branch is a connected set of arcs and associated nodes between the root and a leaf node. Definition. An ancestor of (node, arc, label) B is a (node, arc, label) closer to the root than B along the branch containing B. The root node is ancestor to all nodes except itself. Definition. A successor (node, arc, label) to (node, arc, label) B is a (node, arc, label) for which B is an ancestor. An immediate successor is an adjacent successor. Definition. The level of a node, and its outgoing arcs and their labels, is the number of ancestor nodes from the root to this node on this branch. Thus the root node is level zero.
23
Chapter 1: Propositional Logic
S = {¬P , P∨Q, ¬Q∨R, ¬R} PPP PP¬P PP P
(1) Q
A A R A ¬R A A
∗ @
@ ¬Q @ @ @ A A R A ¬R A A
(5)
PP
Q (6) R
o A
∗
(4)
∗
A A ¬R A A
∗
@ @ ¬Q @ @ @ (2) ∗A A R A ¬R A A
(3)
Failure nodes (1) failure node for ¬P (2) failure node for P∨Q (3) failure node for ¬Q∨R (4) failure node for ¬R Note that (5) is not a failure node
o Inference nodes for S (6) an inference node
Figure 1.1 Semantic tree for S.
Definition. A semantic tree for S is a labeled binary tree where the two outgoing arcs of a node are labeled A and ¬A, respectively, for some A ∈ A and all labels at the same level are from the same atom. Definition. Given node N the initial list I (N) for node N is the list of ancestor labels to node N. This is a list of the literals that appear above node N as we draw our binary trees.
24
Part 1: Proof Theory
Definition. A semantic tree for S is complete iff for every branch and every atom A ∈ A either A or ¬A is a label on that branch. Definition. A failure node for clause C is a node N of a semantic tree for S with C ∈ S such that I (N) falsifies C but for no ancestor node N does I (N) falsify some clause in S. Note that failure node N lies immediately “below” the last label needed to falsify clause C. Definition. An inference node is a node where both immediate successor nodes are failure nodes. Definition. A closed semantic tree has a failure node on every branch. Definition. The expression size (closed tree) denotes the number of nodes above and including each failure node. We now state and prove the propositional resolution completeness theorem. The resolution logic was defined and shown sound and complete by J. A. Robinson. The truly innovative part of resolution logic occurs in the first-order logic case; the propositional form had precursors that were close to this form. Theorem 4 (Completeness Theorem: Robinson). If a clause set S is unsatisfiable then there exists a refutation of S. Proof. We give the proof preceded by an example to motivate the proof idea and to introduce the definitions. Figure 1.1 contains the labeled binary tree of our illustration. (It is traditional to picture mathematical trees upside down.) The reader should consult the definitions as we progress through the example. Let our clause set S be given by S = {¬P, P ∨ Q, ¬Q ∨ R, ¬R}. A key property of the complete semantic tree for S is that each branch defines an interpretation of S by assigning T to each literal named on the branch. Moreover, each possible interpretation is represented by a single branch. Consider the interpretation where each atom is assigned T. This is represented by the leftmost branch of the tree in Figure 1.1. This branch is not a model of S because the clause ¬P is false in this interpretation. This fact is represented by the failure node for ¬P denoted by (1) that labels the node at level 1 on this branch. The number next to a failure node identifies the failed clause by position in the clause listing of clause set S. Failure nodes have an ∗ through the node. Node (5) is not a failure node only because there is a failure node
Chapter 1: Propositional Logic
25
above it on the same branch; a failure node must be as close to the root as possible. (The reader might place failure nodes on the complete semantic tree for the set S = {P ∨ R, Q ∨ R, ¬P ∨ R, ¬Q ∨ R, ¬R} to further his/her understanding of these definitions. Here the associated complete semantic tree has more failure nodes than clauses and the same node can be a failure node for more than one clause. We assume that the semantic tree is drawn as for the example presented here, with the atom R associated with level three.) The node (6) is an inference node because its two immediate successor nodes are failure nodes. An inference node is always associated with a resolvent clause whose parent clauses are the clauses associated with the failure nodes that immediately succeed it. Here the clauses that cause nodes (4) and (3) to be failure nodes are ¬R and ¬Q ∨ R, respectively. The resolvent of these two clauses is ¬Q, which is falsified by the ancestor list for node (6), i.e., I ((6)). While here the node (6) is the failure node for the new clause, that is not always the case. The failure node of the resolvent could be an ancestor node of the inference node if the literal just above the inference node does not appear (complemented) in the resolvent. The reader should understand the definitions and the example before proceeding from here. Lemma. If a clause set is unsatisfiable, then the associated complete semantic tree is closed. Proof. We prove the contrapositive. If some branch does not contain a failure node, then the literals labeling the branch define a model for that clause set. Lemma. Every closed semantic tree has an inference node. Proof. Suppose that there exists a closed semantic tree without an inference node. This means that every node has at least one of its two immediate successor nodes not a failure node. If we begin at the root and always choose (one of) the immediate successor node(s) not a failure node then we reach a leaf node without encountering a failure node. This violates the closed semantic tree condition as we have found an “open” branch. We now give the completeness argument. Given a (finite) propositional unsatisfiable clause set S: (1) We have a closed complete semantic tree for S by the first lemma.
26
Part 1: Proof Theory
(2) Thus, we have an inference node in the tree for S by the second lemma. (3) The inference node Ninf has two immediate successor nodes that are failure nodes by definition of inference node. Let A be an atom, and let C P and C N be clauses such that the failure node clauses are represented by C P ∨ A and C N ∨ ¬A. That A and ¬A appear in their respective clauses is assured by the fact that the failure nodes are immediate successors of the inference node and that the literal of the label immediately above the failure node must be complemented in the clause by definition of failure node. (4) The resolution rule applies to C P ∨ A and C N ∨ ¬A. The resolvent is C Res = C P ∨ C N . (5) A failure node exists for resolvent C Res at Ninf or an ancestor node for Ninf because all literals for C Res are complements of literals in I (Ninf ). (6) Size (closed tree for S ∪ {C Res }) < size (closed tree for S). This holds as at least the failure nodes below Ninf are missing from the closed tree for S ∪ {C Res }. The steps (1) through (6) are now to be repeated with {C Res } ∪ S as the new clause set. Each time we perform steps (1)–(6) with the new resolvent added the resulting tree is smaller. The argument is valid for any closed semantic tree for clause set S such that size (closed tree for S) > 1. Size one fails because one needs failure nodes. The argument thus ends when a tree of size one is reached. The reader might consider the case S = {P, ¬P } to better understand the “end game” of the reduction argument. This has a tree of size three associated with it and leads to a tree of size one in one round. A size one tree is just the root node and has the clause associated with it. Thus one has derived the empty clause. The proof is complete. One can formalize the reduction argument by use of the least number principle if a more formal treatment is desired. That is left to the reader.
1.4 Handling Arbitrary Propositional Wffs Now that we have established that the resolution logic is truthful (soundness) and can always confirm that a clause set is unsatisfiable (completeness) we need to broaden its usefulness. We show that any wff of propositional logic can be tested for unsatisfiability. This is done by giving an algorithm (i.e., a mechanical procedure) which associates
27
Chapter 1: Propositional Logic
with any propositional wff a surrogate wff within the resolution logic that is unsatisfiable iff the original wff is unsatisfiable. Definition. A wff is in conjunctive normal form (CNF) iff the wff is a conjunction of clauses (i.e., if the wff is a wff of the resolution formal system). Theorem 5. Given an arbitrary propositional wff F we can find a wff F1 such that F1 is in conjunctive normal form and F is unsatisfiable iff F1 is unsatisfiable. We actually will prove a stronger theorem although Theorem 5 is all we need to make F1 a usable surrogate for F . The stronger assertion we show is that F and F1 have the same models. Proof. We use the Replacement Theorem repeatedly. It is left to the reader to see all replacements come from logical equivalences. We will illustrate the sequential uses of the Replacement Theorem by an example. However, the prescription for the transformation from the original wff to the surrogate wff applies generally; the example is for illustration purposes only. We test the following wff for tautologyhood. [(P ∨ (¬Q ∧ R)) ∧ (R → Q) ∧ (Q ∨ R ∨ ¬P )] → (P ∧ Q). Step 0. Negate the wff to obtain a wff to be tested for unsatisfiability. New wff. ¬{[(P ∨ (¬Q ∧ R)) ∧ (R → Q) ∧ (Q ∨ R ∨ ¬P )] → (P ∧ Q)}. Step 1. Eliminate → and ↔. (Replacement Theorem direction: −→) Use
(A → B) −→ ¬A ∨ B, (A ↔ B) −→ (A ∨ ¬B) ∧ (¬A ∨ B), ¬(A → B) −→ A ∧ ¬B.
New wff. (P ∨ (¬Q ∧ R)) ∧ (¬R ∨ Q) ∧ (Q ∨ R ∨ ¬P ) ∧ ¬(P ∧ Q). Step 2. Move ¬ inward as far as possible.
28
Part 1: Proof Theory
Use
¬(A ∧ B) −→ ¬A ∨ ¬B, ¬(A ∨ B) −→ ¬A ∧ ¬B, ¬¬A −→ A.
New wff. (P ∨ (¬Q ∧ R)) ∧ (¬R ∨ Q) ∧ (Q ∨ R ∨ ¬P ) ∧ (¬P ∨ ¬Q). Step 3. Place the wff in conjunctive normal form (CNF). Use
A ∨ (B ∧ C) −→ (A ∨ B) ∧ (A ∨ C), (B ∧ C) ∨ A −→ (B ∨ A) ∧ (C ∨ A), (A ∧ B) ∨ (A ∧ C) −→ A ∧ (B ∨ C), (B ∧ A) ∨ (C ∧ A) −→ (B ∨ C) ∧ A.
New wff. (P ∨ ¬Q) ∧ (P ∨ R) ∧ (¬R ∨ Q) ∧ (Q ∨ R ∨ ¬P ) ∧ (¬P ∨ ¬Q). We now have the new wff with the same models as the given wff and the new wff is in conjunctive normal form (CNF). It is common for users of resolution logic to be imprecise and say “the given wff is now in conjunctive normal form.” The theorem is proven. We list the clauses that come from the CNF wff generated in the theorem. Clauses: 1. 2. 3. 4. 5.
P ∨ ¬Q P∨R ¬R ∨ Q Q ∨ R ∨ ¬P ¬P ∨ ¬Q
We leave to the reader the exercise of determining if this clause set, and hence the given wff, is unsatisfiable. Finally, we might mention that resolution is not used in the fastest algorithms for testing if a propositional wff is unsatisfiable. Some of the fastest algorithms for unsatisfiability testing are based on the DPLL (Davis-Putnam-Logemann-Loveland) procedure. These algorithms are used in many applications, such as testing “correctness” of computer chips. An interested reader can find more on this procedure and its applications by entering “DPLL” in an Internet search engine. Note that
Chapter 1: Propositional Logic
29
DPLL is usable only at the propositional level, whereas resolution is applicable also at the first-order logic level.
Exercises 1.
Give a formal representation of the following sentence using as few statement letters as possible. Give the intended meaning of each statement letter. If John owes one dollar to Steve and Steve owes one dollar to Tom then if Tom owes one dollar to John we have that John doesn’t owe a dollar to Steve, Steve doesn’t owe a dollar to Tom nor does Tom owe a dollar to John.
2.
Give a formal representation of the following sentence using the statement letters given. If it doesn’t rain, I swim and so get wet, but if it rains, I get wet so in any event I get wet. Use:
3.
R: it rains S: I swim W: I get wet
Give a resolution refutation of the following clause set. ¬A, B ∨ C, ¬B ∨ C, B ∨ ¬C, C ∨ D, ¬B ∨ ¬C ∨ D, A ∨ ¬B ∨ ¬C ∨ ¬D.
4.
Give a resolution refutation of the following clause set. P ∨ Q, P ∨ ¬Q, R ∨ S, ¬R ∨ ¬S, ¬P ∨ R ∨ ¬S, ¬P ∨ ¬R ∨ S.
5.
Give a resolution refutation of the following clause set. It is not necessary that all clauses be used in a refutation. M ∨ P, P ∨ Q, ¬P ∨ R ∨ S, ¬M ∨ Q ∨ ¬S, ¬M ∨ R, ¬Q ∨ R ∨ S, ¬M ∨ ¬S, ¬Q ∨ ¬R, ¬P ∨ ¬R, M ∨ ¬R, M ∨ ¬P ∨ R ∨ ¬S.
6.
The following clause set is satisfiable. Find a model of the clause set. How can you use the lemma in the Soundness Theorem to help locate models of the clause set? P ∨ Q ∨ R, ¬Q ∨ R, ¬P ∨ ¬R, ¬P ∨ Q, P ∨ ¬Q ∨ ¬R.
30
7.
Part 1: Proof Theory
Show that the following wff is a tautology using resolution. (P ∨ Q) → (¬P → (¬Q → P )).
8.
Show that the following wff is a tautology using resolution. (P ∨ Q) ∧ (P → R) → ¬Q → R.
9.
Determine if the following wff is a tautology. If it is a tautology give a resolution proof of its negation. If it is not a tautology show that its negation is satisfiable. (S ∨ ¬(P → R)) → ((S ∧ (R → P )) ∨ ¬(S ∨ ¬P )).
10.
Provide the proof for the “if” case for the FACT given in Section 1.1. Argue by the contrapositive.
11.
Consider an unsatisfiable clause set with a unit (one-literal) clause. Consider a new clause set obtained from the old set by resolving the unit clause against every possible clause and then removing all the θ -subsumed clauses and the unit clause. Prove that the resulting clause set is still unsatisfiable. Hint: Suppose that the new clause set is satisfiable.
12.
(Generalization of Exercise 11) Consider an unsatisfiable clause set with an arbitrary clause C. Choose an arbitrary literal L in C and create a new clause set obtained from the old set by resolving L against every occurrence of L c in the clause set and then removing all the θ -subsumed clauses and the clause C. Prove that the resulting clause set is still unsatisfiable. Hint: Solve Exercise 11 first.
13.
(Harder) The unit resolution restriction uses the resolution inference rule that requires a unit clause as one parent clause. This is not a complete resolution deductive system over all of propositional logic. (Why?) However, consider an unsatisfiable clause set for which no clause has more than one positive literal. Show that this clause set has a unit resolution refutation. Hint: First prove the preceding exercise.
2 Predicate Logic First-order logic is an enrichment of propositional logic that allows for added capabilities of expression and representation of logical notions. The terms predicate logic and predicate calculus are also used for this logic although the latter term can represent slightly different notions of logic. We will use the terms interchangeably. The reader is already familiar with the notion of a predicate, as it is used to represent the induction statement in the employment of induction in proofs of the preceding section. Technically, it is a mapping from a set of objects within an interpretation to true or false. Thus its syntactic expression consists of a predicate letter followed by its argument. The term logical function is also used for this notion of predicate. To illustrate, consider the set of all readers in a given classroom. The predicate B(x) can be defined to be true if x is instantiated to a boy and false otherwise. Also, this predicate can be identified with a set: the subset of readers in the classroom that are boys. The notion of logical function extends to multiple-argumented functions, also called relations. The notions above extend to this case in a natural way. A well-known example of a relation is that of equality. The label predicate is often extended to include relations to avoid having to use both labels when the number of arguments is immaterial. We use the expressions predicate logic and predicate calculus in this fuller sense also. We begin our presentation of the predicate logic with the syntax of the language, followed by an informal treatment of the semantics. The syntax for the predicate logic formal language: •
Alphabet 1. Variables: xi , yi , zi , i ≥ 1. 2. Constant symbols: ai , bi , ci , i ≥ 1. 3. Function symbols: fin , g in, hin , i ≥ 1, n ≥ 1. The n is usually dropped. 4. Predicate symbols: Pin , Q in, Rin , i ≥ 1, n ≥ 0. n = 0 denotes statement letters. 5. Logical symbols: ∀, ∃, ∧, ∨, ¬, →, ↔. 6. Punctuation: (, ), comma.
32
Part 1: Proof Theory
Note: f 3 (x, y, z) differs from f 2 (x, y). The phrase nonlogical symbols denotes variable, constant, function, and predicate symbols. • Terms 1. Variables and constants are terms. 2. If t1 , . . . , tn are terms then fin (t1 , . . . , tn) is a term. Likewise for g in and hin. •
Well-formed formulas (wffs) 1. If t1 , . . . , tn are terms then Pin (t1 , . . . , tn) is an atomic wff, n ≥ 0. Likewise for Q in and Rin . 2. If A and B are wffs then so are (¬A), (A ∨ B), (A ∧ B), (A → B), and (A ↔ B). 3. If A is a wff then so are (∀xi A) and (∃xi A). All occurrences of xi are bound in (∀xi A) and (∃xi A). The variable occurrences of xi in A are said to be in the scope of the quantifier ∀xi (a universal quantifier) or ∃xi (an existential quantifier). Likewise for yi and zi . An occurrence of a variable is free if it is not bound.
A variable is bound (free) in a wff if it has a bound (free) occurrence in the wff. A hierarchy for the logical connectives is: ¬, ∀, ∃ ∨, ∧ →, ↔ Associate to the right Example. ∀x A(x) ∧ B(x) → A(x). The actual wff, without use of the hierarchy, is (((∀x A(x)) ∧ B(x)) → A(x)). The variable x occurs both bound and free here.
2.1 First-Order Semantics Again, we introduce the semantics for our logic by example. This time, we will not give a formal treatment to the semantics but refer to standard logic texts for in-depth treatment of the semantics. We do give a formal definition of the valuation function in an appendix.
Chapter 2: Predicate Logic
33
The following defines an interpretation. 1. 2. 3. 4.
A nonempty domain D of discourse over which variables range. For each individual constant letter a selected element of D. For each function letter f n a selected function fˆn :Dn −→ D. For each predicate (relation) letter R n a selected subset of Dn . (The subset represents the n-tuples that are true for that predicate or relation.)
Dn denotes the set of n-tuples (d1 , d2 , . . . , dn ). In particular, D2 (also written D × D) is the set of pairs with elements from D. We say that the predicate letter R n , or function letter f n , with n arguments is of ar ity n. The above specifies an interpretation for the given wff. If a variable is free it needs to be interpreted also, and we choose (as many logicians do) to have a separate entity do that. An assignment assigns an element of D to each variable of the wff. Only the assignment to the free variables determines the valuation of a wff; the assignment of an element of D to every variable of the wff aids the recursive computation of the valuation of the wff. Modern logic sometimes separates the two aspects of an interpretation by defining a structure to be the real-world components to which the formal symbols are mapped. The interpretation is then only the mapping, i.e., the identification of the individuals, functions, and sets with the formal symbols. Thus the nonempty domain would be part of the definition of the structure. We have chosen the more traditional view of an interpretation. As done in the propositional case, we introduce the semantics of the logic through examples of formalizing English sentences. We state the English sentence and one or more wffs that are to encode its meaning. The first-order language and semantics allows us to capture in finer detail the meaning of these sentences. We first define the notion of an embellished signature. A traditional signature is purely syntactic; it states the nonlogical symbols with their arities. For the purpose of formalizing English sentences here we embellish the signature with semantic items. An embellished signature includes a nonempty domain and the nonlogical symbols: the predicate, function, and constant letters. To each predicate and function letter we append an n-tuple of variables and an English phrase that states our intended meaning for that symbol. Given an English sentence, we seek a wff over the embellished signature that forces all of the models of the wff to have each nonlogical symbol reflect our intention for that symbol. All of these interpretations are to have as their domains the domain of the embellished signature.
34
Part 1: Proof Theory
The equality relation will have its usual meaning as the identity relation and be written in infix form, as x = y. As an aside, we mention that when equality is added as a non-logical symbol and axioms written for it, then models may interpret equality as different than identity (but as an equivalence relation plus more). This issue is often addressed by making equality a logical relation symbol and defining the valuation function so that equality means the identity relation. We give several English sentences, embellished signatures, and encoding wffs and then consider in detail how those examples are formulated. Example 1. (a) Al likes Bob but Bob likes everyone. D: all people L(x, y): x likes y x = y: x and y are the same person a: Al b: Bob Corresponding wff: a = b ∧ L(a, b) ∧ ∀xL(b, x). The entry “L(x, y): x likes y” does not specify an interpretation of L(x, y) by stating “x likes y.” It simply says that we want the formal relation L(x, y) to represent the relationship of “likes” in the wff we create. However, from the entry “L(x, y): x likes y” alone we can determine that x and y should range only over people. We say that the sort of the variables x and y is “people.” We will later consider wffs with domains containing multiple sorts. We will forego the formal language alphabet, such as ai for constant letters, in favor of more suggestive symbols. (b) Bob does not like everyone. D: all people L(x, y): x likes y b: Bob Corresponding wff: ¬∀xL(b, x). (c) Al likes only Bob. The embellished signature is as for part (a). Corresponding wff: a = b ∧ ∀x(L(a, x) → (x = b)) ∧ L(a, b). Now we consider how to arrive at a wff that most closely represents a given English sentence by having all its models meet our intended meaning of the English sentence. When we use the phrase “ all models,”
Chapter 2: Predicate Logic
35
or a similar phrase, we shall mean all models over the stated domain. Interpretations over other domains need not concern us. First, we define a suitable embellished signature for the English sentence of concern. The domain is the union of all sets of objects that cover the sorts of objects referenced in the English sentence. One then chooses suitable predicate, function, and individual constant letters with their arguments, and adds the English phrases to indicate the intended meanings. One effective way to obtain the desired wff is to construct a first attempt at the formalization and then precisely verbalize the candidate wff using the assigned meanings of the non-logical symbols. This is done by asserting the truth of the wff and following the consequences of that down through the component subwffs. Then compare the result with the original English wff. If the wff captures the intent of the English sentence then the wff is the desired formal encoding. Otherwise, adjust the candidate wff until the wff verbalization is the desired expression of the English sentence. For simplicity, in the paragraphs that follow, we will write “L(a, b) is true” and understand that “I (L(a, b)) is true” holds for a suitable interpretation I . To have wff ∀xW be true the wff W must be true when x is uniformly replaced by every domain element. For Example 1(a) given earlier, L(b, x) must be true for each person for ∀xL(b, x) to be true. For Example 1(b), ∀xL(b, x) is read as “for each person, Bob likes that person.” That negated says what we desire. For Example 1(c), the ∀x binds two occurrences of x so each person c in the domain must make L(a, c) → (c = b) true for ∀x(L(a, x) → (x = b)) to be true. Note that this forces any model for this wff to have L(a, c) be false unless c is Bob. Although this means only L(a, b) can be true in any model it does not force L(a, b) to be true, hence the conjoining of L(a, b) to the wff. We even have to worry about interpretations where Al and Bob are names for the same person. Since we surely intend that Al and Bob be different people we have to explicitly force that fact by adding a = b to the wff in Examples 1(a) and 1(c). Now no models can have Al and Bob label the same person. To have wff ∃xW be true the wff W must be true when x is uniformly replaced by at least one domain element. Thus, ∃xL(a, x) is true if L(a, b) is true. We note that the interpretations are not fully defined in the above examples. In Example 1(a) we see that the truth value of L(a, b) is determined and likewise for all d ∈ D, L(b, d) but L(a, d) is not determined. An interpretation requires full specification so one can choose any collection of pairs (a, d) ∈ D2 one wishes to complete an interpretation
36
Part 1: Proof Theory
(providing L(a, b) holds). We rarely need to complete the interpretation but it is important to note that there are usually many models of a wff. We now give an example that illustrates the use of functions. Example 2. It happens that the best friend of my best friend is someone I dislike but that is not always true. D: L(x, y): f (x): i: Corresponding wff: Alternate wff:
all people x likes y the best friend of x me ¬L(i, f ( f (i))) ∧ ¬∀x¬L(x, f ( f (x))). ¬L(i, f ( f (i))) ∧ ∃xL(x, f ( f (x))).
The English sentence is ambiguous and many readers may prefer that the sentence be formalized as ¬L(i, f ( f (i))) ∧ ∃xL(i, x). We explore further the versions given with the example. Some readers may recognize that the alternate wff in Example 2 is logically equivalent to the chosen corresponding wff. However, the chosen wff is closer to the English statement in that it reflects the words “not always.” Translating back from the wff to the English sentence one arrives closer to the given English sentence from the chosen wff. A natural way to read the formal expression ¬∀x¬L(x, f ( f (x))) is “it is not true that for each person x, x dislikes the best friend of the best friend of x.” The phrase “it happens that” is not naturally captured by a formal encoding. Our next example introduces a more complex domain where we have different populations of elements. This is handled by a standard technique called relativization, a technique that in effect creates different sets of homogeneous elements. (This can be handled in a simpler way by sorted logics but this special logic is not required.) Example 3. Al always likes someone. This sentence is ambiguous. There are two interpretations; we give each interpretation. (a) Al likes the same person always. (b) Al may like different people at different times. We use the following embellished signature for both cases. D: L(x, y, t): T(t): a:
{all people} ∪ {all time instants} x likes y at time t t is a time instant Al
37
Chapter 2: Predicate Logic
Here we have a domain with two sorts of objects, people and time instances. For the first time in our examples the sort for some variables does have to be determined by the English phrase with the predicate. Pursuing Case (a), the first attempt at encoding the English sentence might be ∃y∀tL(a, y, t). However, this is an inappropriate wff for reasons we discuss after presenting correct wffs. The correct wff is:
Case (b).
Corresponding wff: Corresponding wff:
∃y∀t(T(t) → L(a, y, t)). ∀t∃y(T(t) → L(a, y, t)).
The wff ∃y∀tL(a, y, t) is seen as inappropriate when one recalls that the quantifier ∀ demands a valuation of T for the subwff within its scope for all elements of the domain. Here t can instantiate to a, for example, so L(a, y, a) would have to be true. But Al cannot like y at time “Al”; L(a, y, a) has valuation F. In general, a predicate instance is true precisely when all its arguments identify with objects of the correct sort as well as the predicate instance agreeing with the intention of the wff. The wff ∃y∀t(T(t) → L(a, y, t)) exploits the fact that T(a) is false so the implication is true even though L(a, y, a) is false. Of course, this holds for any non-time element of D. Thus, the subformula in the scope of the ∀t will be true for any instance for t providing y has a suitable instantiation (e.g., y is instantiated to Carl and Carl is liked by Al at every time instant). We see that the predicate L is forced to represent Likes in the intended manner. The use of the implication connective to relativize the domain to the desired set of instances (here “time”) for ∀ is uniformly applicable. In Example 5, we consider the encoding device for relativizing domains for the existential quantifier. Case (a) and Case (b) differ only in quantifier order. In Case (a), asserting that ∃yW is true says that there exists an object in D such that the object makes the universal subwff W true. To assert that the Case (b) wff ∀t∃yW is true states that for each object for t there exists a possibly different object for y such that W. For our next example, we consider another negative sentence. Negative sentences are tricky in general but one class of sentences has a simple approach, namely, to encode the corresponding positive statement first, and then negate the wff. Example 4. Nobody likes Al all the time. We use the embellished signature of Example 3. Corresponding wff: ¬∃x∀t(T(t) → L(x, a, t)).
38
Part 1: Proof Theory
The corresponding positive English sentence for “Nobody likes Al all the time” is “Somebody likes Al all the time.” We interpret that formally as ∃x∀t(T(t) → L(x, a, t)) which, when negated, gives us the wff of Example 4. Why is it not necessary to relativize the ∃ quantifier in Example 4? To make a wff ∃xW true it only takes one element of the interpretation domain that makes W true. Of course, that will be a person so the presence of time elements in the domain of the interpretation creates no problem. For our last example we use a wff logically equivalent to the preceding wff to consider the relativization of the ∃ quantifier. Example 5. Everyone dislikes Al at some time. We use the embellished signature of Example 3. Corresponding wff: ∀x∃t(T(t) ∧ ¬L(x, a, t)). To understand this encoding we need to see why relativization of the ∃ quantifier is needed here, why the ∧ is used with the ∃ quantifier when needed, and why the ∀ quantifier needs no relativization. The troublemaker in this case is the negation sign inside the quantifier occurrences. Consider the wff ∃t(¬L(x, a, t)). The wff L(x, a, a) is false for all x, again because the third argument must be a time instant to have a chance at being true. Thus, ¬L(x, a, a) is true for all x, so ∃t(¬L(x, a, t)) is true for all x independent of whether or not anyone dislikes Al. The wff T(t) ∧ ¬L(x, a, t) is true only when t is instantiated to a time instant and L(x, a, t) is false. Now L(x, a, t) must be false for the right reason, namely, that x really dislikes Al at time t. No relativization is needed for the ∀ quantifier because of the negated L predicate occurrence. Here L(x, a, t) must be false in the interpretation for every assignment of an element of D to x for the wff to be true. That occurs by default when x is assigned a time element. When x is assigned any person element then the requirement that L(x, a, t) be false to make the wff true forces the restriction on the interpretation that we desire. In contrast, note that the sentence “Everyone likes Al at some time” can be encoded as ∀x∃t(P (x) → L(x, a, t)), where P (x) is interpreted as “x is a person.” The subtlety of encoding some of the above English sentences shows the need for care in the encoding process. Again, an effective way to check the encoding is to reconstruct an English sentence by translating the wff back into English and comparing the resulting sentence with the original English sentence.
Chapter 2: Predicate Logic
39
Since our focus is on proof systems we will let the above set of examples serve to review the semantics of first-order logic. We do include the formal definition of valuation for first-order logic in an appendix for those wishing to consider how this definition is formalized. The definitions of validity and satisfiability change from the propositional case only in the need to accommodate the possibility of free variables. Of course, the notion of interpretation is much richer here than for the propositional case. The definitions of unsatisfiability, model, logical consequence, and logical equivalence read exactly as for the propositional case and we do not repeat those definitions. Definition. A wff A is valid iff for all interpretations and all assignments to the free variables the valuation of A is T. Definition. A wff A is satisfiable iff for some interpretation and some assignment to the free variables the valuation of A is T. Definition. A wff is closed iff the wff has no free variable occurrences. A closed wff has every variable occurrence preceded by a quantifier of the same name. Closed wffs with all quantifiers to the left have a special name. Definition. A closed wff is said to be in prenex normal form iff any nonempty subexpression to the left of any quantifier contains only quantifiers. The subformula to the right of the rightmost quantifier is called the quantifier-free subformula. Given wff A, the (universal) closure of A is the wff formed from A by preceding A by a universal quantifier for each free variable of A. Aside: There is a notion of existential closure but that will not be used in this book. Therefore, we will use the term closure and mean universal closure. Example. ∀x∃y∃z(P (x, y) ∨ Q(y, z)) is in prenex normal form. We close this section with the statement of the Replacement Theorem for first-order logic. As before, it establishes that validity is preserved when a new wff is obtained by replacing a subformula by a logically equivalent formula. The following facts are important to recall when using the Replacement Theorem. Facts about the closure of wffs: Let x1 , . . . , xn be the free variables in arbitrary wffs A and B.
40
Part 1: Proof Theory
Then |= A ↔ B iff |= ∀x1 , . . . , ∀xn (A ↔ B). Also note that sometimes |= (A ↔ B) ↔ (∀x1 , . . . , ∀xn (A ↔ B)). For example, |= (x < 10) ↔ ∀x(x < 10). The above facts hold for arbitrary wff C in place of A ↔ B (except if C is closed). We state the Replacement Theorem without proof as it follows the form of the proof of the propositional Replacement Theorem with minor complications. Note that the theorem statement wording is identical to the propositional case but is reinterpreted in the first-order logic setting. Theorem 2’ (Replacement Theorem for First-Order Logic). Let wff F contain subformula B and F1 be the result of replacing one or more occurrences of B by wff C in F . Then |= B ↔ C implies (*)
|= F ↔ F1 .
Here (*) implies that for all interpretations I and all assignments ϕ to the free variables of F and F1 , V I,ϕ [F ] = T iff V I,ϕ [F1 ] = T, where V is the valuation function defined in an appendix. The valuation function plays the same role as in the propositional case of mapping wffs to truth values when the non-logical symbols are interpreted. That F and F1 have precisely the same “meaning” follows from the semantics of equivalence and the definition of validity for first-order logic.
2.2 Resolution for the Predicate Calculus Resolution for first-order logic is essentially propositional resolution with a substitution mechanism for free variables. Because many formal proofs focus primarily on substitution arguments and not on matters of general logic interest we will omit proofs of the theorems of this chapter. However, proofs of the key Lifting Lemma, effectiveness of the most general unifier algorithm, and the Completeness Theorem for First-Order Resolution can be realized by completing the final four problems of this chapter. We do give a substantial discussion of the structure of the formal proofs. Full treatment of first-order resolution
41
Chapter 2: Predicate Logic
with proofs of theorems that are stated here can be found in logic books treating resolution. (See any of [2], [4], [5], [6], [7], and [8].) Resolution logic underlies proof procedures based on this logic. A procedure is a series of actions in a prescribed order; a proof procedure is a procedure that seeks to find proofs (or in the resolution case, refutations). We do not examine proof procedures in this chapter although we make an occasional reference to such.
2.2.1 Substitution The act of substitution involves replacement of free variables by arbitrary (first-order) terms. Notation. We represent an arbitrary substitution θ as θ ={t1 /x1 , . . . , tn/xn }, where each xi is replaced by term ti with simultaneous replacement. Example. Let θ = {y/x, 2/y}. Then (x + y)θ = y + 2. Here we see that the x in x + y is replaced by y, but only the original y is replaced by 2. This is because the y named in θ as the term replacing x is not eligible for replacement. This is the meaning of simultaneous replacement. Definition. A substitution instance is the expression that results from the application of a substitution to a given expression. A composite of substitutions, written as θ1 θ2 . . . θn , is applied to an expression E as ((...(E θ1 )θ2 )θ3 )...(θn ). It can be viewed as one substitution; there is a formula for creating the single composite substitution but that will not be used here. For the record, we present the composition rule without further elaboration.
Definition. If θ = {t1 /x1 , . . . , tn /xn } and η = {tk /xk , . . . , tn/xn , t1 /y1 , . . . , tp /y p } then the composition of substitutions θ and η is given by θ η = {t1 η/x1 , . . . , tnη/xn , t1 /y1 , . . . , tp /y p }. Replacement is simultaneous. The variables xk , . . . , xn are the only variables that overlap in θ and η as variables for replacement. The substitution θ η is called a composite substitution. Definition. Substitution θ unifies two expressions A and B iff Aθ and Bθ are identical. Such a θ is a unifier of A and B. Example. P (x, x) and P ( f (y), z) are unified by θ = { f (b)/x, b/y, f (b)/z} because P (x, x)θ = P ( f (b), f (b)) = P ( f (y), z)θ .
42
Part 1: Proof Theory
Definition. Substitution σ is a most general unifier (mgu) of expressions A and B iff σ unifies A and B and for every other unifying substitution θ for A and B there is a substitution λ such that Aθ = (Aσ )λ = (Bσ )λ = Bθ . Example (continued). The substitution σ = { f (y)/x, f (y)/z)} is a (candidate for) mgu for P (x, x) and P ( f (y), z) because
P (x, x)σ = P ( f (y), f (y)) = P ( f (y), z)σ
(σ is a unifier)
and P (x, x)θ = (P (x, x)σ )λ = P ( f (b), f (b)) where λ = {b/y} ( σ is more general than θ of the previous example). It can be shown that σ of the above example is in fact a mgu. That certainly seems likely by inspection. We give an algorithm for finding a mgu shortly. The role of mgus in first-order resolution is the most innovative part of resolution logic and the most important reason for the success of resolution procedures based on the logic. The proof of the significance of mgus is due to J. A. Robinson. Theorem 6 (Robinson). If expressions A and B are unifiable then there exists a “unique” mgu (unique except for variable renaming) for A and B. This key theorem is the Unification Lemma of Exercise 11. Also see Exercise 10. Expressions may not be unifiable. Examples. (1) P (a, x) and P (b, y) (recall that a and b are constants) (2) P (x, x) and P (z, f (z)) For the first example, there is no substitution that can unify a and b as only variables can be replaced by substitutions. It is more difficult to see the problem with the second example. Note that P (x, x) requires the two arguments be the same, whatever the terms are. However, there is no substitution for z in P (z, f (z)) that will make the two arguments the same. There will always be that extra f out front in the second argument. There is no way that any substitution can remove that f
43
Chapter 2: Predicate Logic
because substitutions can remove only variables, replacing them with terms. We now give the Most General Unifier algorithm. Given two expressions: 1. Set a pointer at the leftmost symbol of each expression. 2. If the designated symbols agree then (a) if both pointers are at the final symbols then go to step 4, (b) else move both pointers one symbol to the right and repeat step 2. 3. Else (the designated symbols differ) (a) if both designated symbols are non-variables then halt and fail, (b) else substitute for one variable xi the term ti whose first symbol is the other designated symbol, providing xi does not occur in ti . Substitute at all occurrences in both expressions, then move both pointers just to the right of term ti and go to step 2. If xi does occur in ti then halt and fail. 4. The two expressions are identical. Halt and accept. Notation. The action in 3(b) of the mgu algorithm of checking that the variable xi does not occur in the term ti that it faces (by pointer location) is called the occurs check. We have given a process for unifying two expressions in general, although it will generally be applied to well-formed terms or formulas. That is certainly true in resolution applications if the original formulas are well-formed. Note that we have not reported the substitution that does the most general unification. That process requires mechanisms we have not developed and we do not need to know the substitution created. We give some examples of most general unification. Example. P (x, x) and P ( f (y), f (b)). The pointers begin at the leftmost symbol P which is the same in both expressions so the pointer moves right, past the left parenthesis (agreement), and encounters x and f , respectively, as so: P (x, x) and P ( f (y), f (b)). ↑ ↑
44
Part 1: Proof Theory
At this first point of disagreement, we see that one pointer faces a variable and so the substitution σ1 = { f (y)/x} is possible and brings the symbols at the corresponding pointers into agreement. We now have P ( f (y), f (y)) and P ( f (y), f (b)). ↑ ↑ We move the pointers along in parallel until the next symbols of disagreement, which are the y and b, respectively: P ( f (y), f (y)) and P ( f (y), f (b)). ↑ ↑ We now can use the substitution σ2 = {b/y} to bring these symbol positions into agreement. This yields P ( f (b), f (b)) and P ( f (b), f (b)). ↑ ↑ It is important to note that the substitutions are made at all occurrences of y in both expressions. Following this requirement will result in the expressions to the left of the pointers always being in full agreement. We now can advance the pointers simultaneously to the end of the expressions and terminate successfully. The substitution that has provided this most general unification is σ = { f (b)/x, b/y}. We report the final substitution for general interest only; there is no need to perform resolution deductions to name the substitutions involved. We do note that the final substitution does not just take the union of the components of the intermediate substitutions. (End of example.) We now give an example where the unification fails. Note that it is not obvious at first. We omit the component and final substitutions. Example. P ( f (x, y), g(x), g(y)) and P ( f (u, v), g(v), u). We apply the mgu algorithm to the task of unifying these expressions. We move the pointers in parallel to the first point of disagreement. P ( f (x, y), g(x), g(y)) and P ( f (u, v), g(v), u). ↑ ↑
45
Chapter 2: Predicate Logic
We can replace either x by u or u by x. We choose to replace u by x. We then move the pointers to the next point of disagreement, which is y versus v. P ( f (x, y), g(x), g(y)) and P ( f (x, v), g(v), x). ↑ ↑ Again, we have an option on substitution and we choose to replace v by y everywhere. We again move the pointers to the next point of disagreement. P ( f (x, y), g(x), g(y)) and P ( f (x, y), g(y), x). ↑ ↑ We replace x by y and move to the next disagreement point. P ( f (y, y), g(y), g(y)) and P ( f (y, y), g(y), y). ↑ ↑ But here we see that the occurs check fails because y is embedded in the term that the second pointer aligns with y. Thus the unification fails. (End of example.) Note that the two expressions originally have no variables in common. This will be the case for many of our uses of the mgu algorithm. However, as we just saw this does not remove the possible occurrence of the occurs check that kills the unification process. We see in the next example that a slight variation in one of the two expressions of the previous example allows the unification to exist. We leave the execution of the mgu algorithm to the reader. Example. P ( f (x, y), g(x), g(y)) and P ( f (u, v), g(u), u). The common substitution instance formed by the mgu is P ( f (g(y), y), g(g(y)), g(y)).
2.2.2 The Formal System for Predicate Logic The wffs for first-order resolution (also known as resolution for predicate logic or resolution for the predicate calculus) form a subset of the quantifier-free wffs of predicate logic. This is really a syntactic convenience as the wffs of resolution deductions always are to be viewed as the universal closure of the wffs written.
46
Part 1: Proof Theory
The formal system of first-order resolution: •
Well-formed formulas 1. Atomic formulas: If t1 , . . . , tn are terms then Pin (t1 , . . . , tn ) is an atomic wff, n ≥ 0. 2. Literals: An atom or its negation. 3. Clauses: If L i are literals then L 1 ∨ L 2 ∨ . . . ∨ L n is a clause, n ≥ 0. 4. Well-formed formulas (wffs): If Ci are clauses then C1 ∧ C2 ∧ . . . ∧ Cm is a wff, m ≥ 1. If Ci is the empty clause, then i = 1 and m = 1.
Examples of wffs.
P ( f (x), x), P (a) ∨ ¬R(y, b), (P (x) ∨ Q(y)) ∧ ¬R(z, z).
•
Axioms: None.
•
Rules of inference Note. The term variable disjoint in reference to two clauses applies when the two clauses share no common variable. In a resolution deduction one can always rename the variables because every clause is implicitly universally quantified. The justification of this uses generalizations of the wff ∀x P (x) ↔ ∀y P (y) with the Replacement Theorem.
1. Resolution rule. Let L 1 , . . . , L n , L 1 , . . . , L m be literals where L i and L j share no variables. (The clauses are variable disjoint.) Let there exist a substitution θ such that L 1 θ and L 1 θ are complementary literals. Let σ be a mgu of L c1 and L 1 . Then the resolution inference rule is given by
L1 ∨ L2 . . . ∨ Lm L1 ∨ L2 . . . ∨ Ln . (L 2 ∨ . . . ∨ L n ∨ L 2 . . . ∨ L m)σ Literal order is unimportant. The resolving literals need not be leftmost. 2. Factorization rule. Let L 1 , . . . , L n be literals and let θ be a substitution such that L 1 θ = L 2 θ . Let σ be a mgu for L 1 and L 2 . Then the factorization rule is given by L1 ∨ L2 ∨ L3 ∨ . . . ∨ Ln . (L 2 ∨ L 3 ∨ . . . ∨ L n)σ Again, literal order is unimportant.
47
Chapter 2: Predicate Logic
Definition. Variable-free terms or wffs are often called ground terms or wffs. A literal whose terms are ground terms is a ground literal. Any clause (clause set) whose literals are all ground literals is called a ground clause (ground clause set, respectively). Note that any ground clause set is a propositional clause set as no substitution can alter the form of any literal. All literals could be replaced by statement letters. We see that there are two inference rules for predicate logic. This goes to the heart of what is occurring for first-order resolution. As mentioned earlier, first-order resolution is essentially propositional resolution with a substitution mechanism. More precisely, each first-order resolution deduction has a class of underlying propositional resolution deductions (or ground deductions) associated with it, each such deduction obtained by applying appropriate substitutions to the wffs of the first-order resolution deduction in a manner as to obtain a legal propositional deduction. We will call such associated underlying ground deductions grounded deductions. For each first-order resolution refutation, each grounded deduction is also a refutation. Whereas each grounded refutation clearly needs no factoring, there are first-order clause sets with no resolution refutation if only the resolution rule is applicable. We give an example of this after a simpler example to introduce first-order resolution. Example. Show that the wff ∀x P (x) ∧ ∀x¬P ( f (x)) is unsatisfiable. Note that the resolution rule requires the parent clauses to be variable disjoint. When two clauses are to be parent clauses of a resolution inference rule application, then the clauses are made variable disjoint by renaming variables of one clause. This is done as part of the rule application. 1. 2. 3.
P (x) ¬P ( f (x))
given clause given clause resolvent of P (x) and ¬P ( f (y))
The resolution inference at step 3 above uses the following rule instance: P (x)
¬P ( f (y))
with mgu σ = { f (y)/x}.
The following example shows the need for factoring in that no refutation is achievable by the resolution rule alone. We show a few steps of
48
Part 1: Proof Theory
a deduction to illustrate the situation. We adopt the convention of the previous chapter to indicate the clauses used in the resolution operation and the resolving literals. Example. 1.
P (x) ∨ P (y)
given clause
2.
¬P (x) ∨ ¬P (y)
given clause
3.
P (x) ∨ ¬P (y)
resolvent of 1b and 2a
by 4.
P (x) ∨ P (u) by
5.
P (x) ∨ P (z) ¬P (w) ∨ ¬P (y) mguσ = {z/w} P (x) ∨ ¬P (y) resolvent of 1b and 3b P (x) ∨ P (y) P (u) ∨ ¬P (v) P (x) ∨ P (u)
mgu σ = {y/v}
etc.
It is clear that the deduction cannot terminate because every deduced clause has two literals. That this is true follows from the fact that no merging of literals of a deduced clause is possible as the two literals always have different variables. In order to achieve a refutation one must deduce one-literal clauses (complementary literals, in fact) to then deduce the empty clause. This “merging” of literals is exactly what factoring accomplishes. We now reconsider the same clause set but use factoring along with the resolution rule. Example. 1. 2. 3.
P (x) ∨ P (y) ¬P (x) ∨ ¬P (y) P (x) by
4.
¬P (x) by
5.
given clause given clause factor clause 1 P (x) ∨ P (y) mgu σ = {x/y} P (x) factor clause 2 ¬P (x) ∨ ¬P (y) mgu σ = {x/y} ¬P (x) resolvent of 3 and 4
Alternately, one can get a refutation by using a resolution application instead of the second factorization, as the reader can check.
49
Chapter 2: Predicate Logic
Hereafter we omit the details of the resolution and factorization rules applications. Example. 1. 2. 3. 4. 5. 6. 7. 8. 9.
P (x, y) ∨ Q(y, x) P (x, y) ∨ ¬Q(z, z) ¬P (x, x) ∨ Q(x, f (x)) ¬P (x, f (y)) ∨ ¬Q(y, x) P (x, x) ∨ P (u, v) P (x, x) Q(x, f (x)) ¬P ( f (y), f (y))
given clause given clause given clause given clause resolvent of 1b, 2b factor of 5 resolvent of 3a, 6 resolvent of 4b, 7 resolvent of 6, 8
Our next example shows that factoring is not always the correct option when applicable. Example. 1. 2. 3. 4.
4’. 5’.
¬P (a, b) given clause ¬P (b, a) given clause P (x, y) ∨ P (y, x) given clause P (x, x) factor of 3 Now stuck (if this clause is to be used) –———Alternate deduction——–— P (b, a) resolvent of 1, 3a resolvent of 2, 4’
The search aids that are useful at the propositional level also apply at the first-order level although the first search aid has a richer meaning at this level. We repeat the three aids here, with comment. Aid 1. A clause D that is θ -subsumed by clause C can be neglected in the presence of C in a refutation. Aid 2. Ignore tautologies. Aid 3. Favor use of shorter clauses, one-literal clauses in particular. The first aid requires elaboration as we must now define and clarify the full meaning of θ -subsumption.
50
Part 1: Proof Theory
Definition. A clause C θ -subsumes clause D iff there exists a substitution θ such that D contains all the literals of Cθ and C has no more literals than D. The requirement that D contain the literals of Cθ is meant precisely; the variable names must agree. We require that C have no more literals than D because otherwise one could θ -subsume a factor and we know that use of factors is necessary for completeness. Note that P (x) ∨ P (y) would θ -subsume P (x) without the literal count restriction. One can ignore θ -subsumed clauses in a resolution refutation attempt because for any refutation using the θ -subsumed clause there is a resolution refutation without use of the θ -subsumed clause by using the θ -subsuming clause. A plausibility argument given for the propositional case serves here also, although one has the added concern of factors. (Also see Exercise 4, Chapter 3.) We now address Aid 2. First, the notion of tautology carries over from the propositional case in the strongest way. The variables must agree. Thus, P (x)∨¬P (x) is a tautology but P (x)∨¬P (y) is not. The justification for ignoring tautologies when pursuing resolution refutations is almost that of the propositional case but has a technical adjustment. As before, the resolvent of a tautology and another clause can be ignored, but now it may be because a factor of the non-tautologous clause θ -subsumes the resolvent. Consider the following inference. P (x, x) ∨ ¬P (x, x) P (x, y) ∨ P (y, x) . P (x, x) The resolvent is θ -subsumed by the factor of the right parent so it need not be retained. Aid 3 is not a restriction rule but a preference rule. It is also valid in the first-order resolution case with the repeated admonishment that long clauses need to be considered but just not favored. We now consider the soundness and completeness of first-order resolution. First, the following definition is useful. Definition. A first-order clause set is a set of clauses (some of) which contain variables. The term “first-order clause set” is used to clarify that we are considering more than propositional clause sets. First-order clause sets come from first-order wffs. Theorem 7 (Soundness and Completeness Theorem). S is an unsatisfiable first-order clause set iff there exists a resolution refutation of S.
Chapter 2: Predicate Logic
51
The completeness property may be proven by completing Exercises 11 through 14. For soundness, the proof essentially duplicates the argument in the propositional case: one shows that if I is a model of clause set S then I is a model of every clause of any resolution deduction from S. For explicit proofs of soundness and completeness at the first-order level see any of [2], [4], [5], [6], [7], and [8]. Here we give a substantial discussion of the structure of the formal proofs. To show the completeness property, we use the property that for every unsatisfiable first-order clause set there is a closely related finite unsatisfiable propositional clause set. This fact, mentioned earlier, is the key element in the logic that makes first-order resolution possible. We state this key property formally as a theorem. The theorem we quote is named for the discoverer, Jacques Herbrand (1908–1931). We also add the name of Kurt Gödel to the version we state because Gödel provided the link between refutable wffs and unsatisfiable wffs. Herbrand stated and proved his theorem entirely syntactically; Gödel’s result was still in the future. (Herbrand did make a significant error in his difficult proof that was caught by P. B. Andrews. B. Dreben, P. B. Andrews, and S. O. Aanderaa corrected the error.) The name Skolem, for Norwegian logician Thoralf Skolem, is often added as his results in the 1920s came close to establishing the completeness of first-order logic. Skolem functions, that we use later, are so named to honor Thoralf Skolem. Theorem 8 (Herbrand-Gödel Theorem) . If S is an unsatisfiable clause set then there exists a finite unsatisfiable ground set of clauses such that each clause is a substitution instance of a clause in S. This finite unsatisfiable propositional set of clauses can be searched for directly by first performing substitutions on the clauses of S and then applying the propositional resolution rules. In fact this was done by the earliest automated theorem provers using refutation methods other than ground resolution. Thanks to the notion of most general unifier the first-order logic of resolution can integrate the substitution steps into the inference steps. We now consider in more detail the nature of that related propositional clause set. Definition. Given a first-order clause set S the Herbrand universe H(S) of S is defined inductively as follows: (1) any constant symbol of any clause of S is a member of H(S) (if none occur in S then H(S) contains the constant symbol a);
52
Part 1: Proof Theory
(2) if f is a function symbol of S of n arguments and t1 , . . . , tn are in H(S), then f (t1 , . . . , tn) is in H(S). Definition. Given a first-order clause set S a member of the Herbrand universe H(S) is called a Herbrand term of S. Note that a Herbrand term is a ground term. Example. If S = {P (x), ¬P (y) ∨ Q(x)} then H(S) = {a}. If S = {P ( f (a)), Q(b, g(x, y))} then H(S) = {a, b, f (a), f (b), g(a, a), g(a, b), g(b, a), g(b, b), f ( f (a)), f ( f (b)), f (g(a, a)), . . .}. If S = {P, Q} then H(S) = {a}. Note that the Herbrand universe can be finite or infinite and will always be infinite if there is a function symbol in the given clause set. We will illustrate the Herbrand-Gödel Theorem by giving a firstorder clause set and a resolution refutation of that clause set. By the soundness property the refutation establishes the unsatisfiability of the given clause set. Then we show an underlying unsatisfiable ground clause set that can be obtained by replacing variables uniformly by Herbrand terms. The given first-order clause set is given by clauses 1–3. 1. 2. 3. 4. 5. 6.
P ( f (x), x) Q(x, y, y) ∨ ¬P ( f (x), a) ¬Q(x, b, y) ∨ ¬P (z, y) Q(a, y, y) ¬P (z, b)
given clause given clause given clause resolvent of 1, 2b resolvent of 3a, 4 resolvent of 1, 5
We now present a ground unsatisfiable clause set that is the given clause set with the variables uniformly replaced by Herbrand terms. We need to use two copies of the first given clause; it is sometimes the case that two (or more) ground clauses come from the same given clause. To keep the correspondence between given and related ground clause clear we use the numbering of the original clause set.
1. 1. 2. 3.
P ( f (a), a) P ( f (b), b) Q(a, b, b) ∨ ¬P ( f (a), a) ¬Q(a, b, b) ∨ ¬P ( f (b), b)
corresponding clause ground instance corresponding clause ground instance corresponding clause ground instance corresponding clause ground instance
By following the justification of the first-order refutation the reader can immediately write the resolution refutation of the ground clause set that mimics the first-order refutation that precedes it. We emphasize
53
Chapter 2: Predicate Logic
that here each ground clause of the refutation is a substitution instance of the corresponding first-order clause. The proof of completeness shows this relationship to always be obtainable, with adjustment for some needed factoring. The key tool for this is called the Lifting Lemma, first proved by J. A. Robinson. Lemma (Lifting Lemma: Robinson). Let A and B be variable-disjoint clauses, and let θ be a substitution such that Aθ and Bθ are ground clauses with C as the resolvent of Aθ and Bθ . Then there exists factors of clauses A and B whose resolvent has C as an instance. We can diagram the lemma. The upper level represents the sequence of steps at the first-order level that corresponds to one ground resolution step. Factoring (denoted by F) is needed because only one resolution operation is permitted and there can be multiple literals that, under instantiation, would merge. Without factoring not all of the pertinent literals would be removed. F and R A, B
-
D
instance
instance ?
? - C
Aθ, Bθ R
Exercises 11 and 12 allow a reader to provide a proof of the Lifting Lemma. The proof structure for a completeness result is now clear. Given an unsatisfiable first-order clause set S the Herbrand-Gödel Theorem states that there exists a finite unsatisfiable ground clause set S θ of ground clause instances from S. (The variant S of S is needed because of the multiple copies of some clauses, variable disjoint, to be instantiated to different ground clauses.) The propositional completeness theorem for resolution guarantees the existence of a resolution refutation of the clause set S θ . Then by repeated use of the Lifting Lemma, for each deduced clause C of the ground refutation there is a first-order deduced clause D and a substitution γ such that Dγ is C, for some γ , usually different than θ . Moreover, D has a valid derivation from S just as C has
54
Part 1: Proof Theory
a valid derivation from S θ . Since the only “lifting” of the propositional empty clause is the first-order empty clause, we see that a first-order refutation exists for clause set S. This proof can be formally pursued by undertaking Exercises 13 and 14 at the chapter’s end.
2.2.3 Handling Arbitrary Predicate Wffs As for the propositional case, we want to establish that an arbitrary firstorder wff can be tested for validity or unsatisfiability. Using a similar framework to the propositional case, we give an algorithm such that any first-order wff has a surrogate wff within resolution logic that is unsatisfiable iff the original wff is unsatisfiable. Again, a wff to be tested for validity is to be negated and tested for unsatisfiability. Definition. A first-order wff in the resolution formal system is called a Skolem conjunctive normal form (SCF) wff. Theorem 9. Given an arbitrary first-order wff F we can find a wff F1 such that F1 is a SCF wff and F is unsatisfiable iff F1 is unsatisfiable. It is no longer true that F and F1 have the same models, a result that held in the propositional case. We can use the Replacement Theorem to get new wffs that have the same models for all but one step. For that step we only preserve satisfiability and unsatisfiability, but that suffices. Proof outline. We use the Replacement Theorem repeatedly. It is left to the reader to see that all replacements come from logical equivalences. We will illustrate the steps of the process using two examples. However, the prescription for the transformation from the original wff to the surrogate wff applies generally; the examples are for illustrative purposes only: (A) ∀x∃y∃z(¬P (x, y) ∧ (Q(x, z) ∨ R(x, y, z))) (B) ¬∃z(∃x∀y P (x, y, y) → ∀y∃x(P (x, y, x) ∨ P (z, y, x))) Step 1. Eliminate → and ↔. (Replacement Theorem direction: −→) Use:
(A → B) −→ ¬A ∨ B (A ↔ B) −→ (A ∨ ¬B) ∧ (¬A ∨ B) ¬(A → B) −→ A ∧ ¬B
(A) ∀x∃y∃z(¬P (x, y) ∧ (Q(x, z) ∨ R(x, y, z))) (B) ¬∃z(¬∃x∀y P (x, y, y) ∨ ∀y∃x(P (x, y, x) ∨ P (z, y, x)))
55
Chapter 2: Predicate Logic
Note that in Example B we cannot use the replacement rule for the negated implication as the negation sign is not directly before the implication subformula. Step 2. Move ¬ inward as far as possible. Use: ¬∀x A −→ ∃x¬A ¬(A ∧ B) −→ ¬A ∨ ¬B ¬¬A −→ A
¬∃x A −→ ∀x¬A ¬(A ∨ B) −→ ¬A ∧ ¬B
(A) ∀x∃y∃z(¬P (x, y) ∧ (Q(x, z) ∨ R(x, y, z))) (B) ∀z(∃x∀y P (x, y, y) ∧ ∃y∀x(¬P (x, y, x) ∧ ¬P (z, y, x))) Step 3. Move ∀ and ∃ inward as far as possible. (Optional.) Use: ∀x(A♦B) −→ (∀x A)♦B ∃x(A♦B) −→ (∃x A)♦B ∀xB −→ B
∀x(B♦A) −→ B♦∀x A ∃x(B♦A) −→ B♦∃x A ∃xB −→ B
where x is not free in B. The ♦ is to be replaced consistently with either ∨ or ∧. Use the following replacements if x is free in A and B. ∀x(A ∧ B) ∃x(A ∨ B)
−→ −→
∀x A ∧ ∀xB, ∃x A ∨ ∃xB.
∀x∀y A −→ ∀y∀x A ∃x∃y A −→ ∃y∃x A are useful if x but not y can be moved inward. Also,
(A) ∀x∃y(¬P (x, y) ∧ (∃zQ(x, z) ∨ ∃zR(x, y, z))) (B) ∃x∀y P (x, y, y) ∧ ∀z∃y(∀x¬P (x, y, x) ∧ ∀x¬P (z, y, x)) Step 4. Rename variables so all quantifiers have distinct variable names. (Done so that Step 6 can be executed.) ∀x A −→ ∀y A[y/x],
∃x A −→ ∃y A[y/x],
where y does not occur in A (free or bound). Here [t/x] means that the variable x is replaced by term t at free occurrences of x in A. (A) ∀x∃y(¬P (x, y) ∧ (∃zQ(x, z) ∨ ∃w R(x, y, w))) (B) ∃x∀y P (x, y, y) ∧ ∀z∃w(∀u¬P (u, w, u) ∧ ∀v¬P (z, w, v)) Step 5. Remove ∃ by use of Skolem functions. This does not use the Replacement Theorem. This does not preserve models.
56
Part 1: Proof Theory
Rule of removal of existentially quantified variables: Remove each occurrence of ∃x (here x represents any variable) and replace every occurrence of x in the scope of this quantifier occurrence uniformly by a function letter new to the wff, with arguments consisting of any universally quantified variable having the ∃x occurrence in its scope. (If one moves universal quantifiers inward at Step 3, this may reduce the number of arguments in the new functions and often greatly reduce the size of the refutation search.) A function with no arguments is represented by a new constant letter. Functions that serve the role of existential quantifiers often are called Skolem functions. (This step will be justified after the surrogate wff is found.) (A) ∀x(¬P (x, f (x)) ∧ (Q(x, g(x)) ∨ R(x, f (x), h(x)))) (B) ∀y P (a, y, y) ∧ ∀z(∀u¬P (u, f (z), u) ∧ ∀v¬P (z, f (z), v)) Step 6. Move the universal quantifiers left to the front of the formula. This step is not fully necessary but makes the process easier to specify. The wff is now in prenex normal form. Use replacement rules of Step 3 in the opposite direction. (A) ∀x(¬P (x, f (x)) ∧ (Q(x, g(x)) ∨ R(x, f (x), h(x)))) (B) ∀u∀v∀y∀z(P (a, y, y) ∧ (¬P (u, f (z), u) ∧ ¬P (z, f (z), v))) Step 7. Place the matrix (quantifier-free part) in conjunctive normal form. Use:
A ∨ (B ∧ C) −→ (A ∨ B) ∧ (A ∨ C) (B ∧ C) ∨ A −→ (B ∨ A) ∧ (C ∨ A) (A ∧ B) ∨ (A ∧ C) −→ A ∧ (B ∨ C) (B ∧ A) ∨ (C ∧ A) −→ (B ∨ C) ∧ A
(A) ∀x(¬P (x, f (x)) ∧ (Q(x, g(x)) ∨ R(x, f (x), h(x))) (B) ∀u∀v∀y∀z(P (a, y, y) ∧ (¬P (u, f (z), u) ∧ ¬P (z, f (z), v))) The formula is now in Skolem conjunctive normal form. Remove the quantifiers and write the matrix as a set of clauses. Technically, one would move the relevant quantifiers right again to immediately before each clause as each clause in the clause set is regarded as a universally closed wff. This again uses the replacement
57
Chapter 2: Predicate Logic
rules of Step 3 and Step 6. This action is implicitly acknowledged whenever we rename variables in a clause. The clause sets: (A)
1. 2.
¬P (x, f (x)) Q(x, g(x)) ∨ R(x, f (x), h(x))
given clause given clause
(B)
1. 2. 3.
P (a, y, y) ¬P (x, f (y), x) ¬P (x, f (x), y))
given clause given clause given clause
We leave it to the reader to determine if the clause sets are unsatisfiable. We use an example to illustrate the justification for the modification to wffs performed in Step 5. Step 5 justification. Let ∀x∃y P (x, y) be the wff under consideration. The modification at Step 5 would yield the new wff ∀x P (x, f (x)). We show that the first wff is satisfiable iff the second wff is satisfiable. =⇒ We assume that ∀x∃y P (x, y) is satisfiable. Let M denote some model of ∀x∃y P (x, y) and D(M) denote the domain of M. Then V M,ϕ [∃y P (x, y)] = T regardless of the domain element that ϕ assigns to free variable x, because of the valuation definition for ∀. Consider an arbitrary element dˆ ∈ D(M) and the set of all assignments ϕ such that ˆ By the valuation definition for ∃ we know that there is at ϕ(x) = d. ˆ least one assignment ϕ0 in this set of assignments such that ϕ0 (x) = d, ˆ ϕ0 (y) = aˆdˆ, and V M , ϕ0 [P (x, y)] = T. The subscript d in aˆdˆ emphasizes the ˆ dependence of aˆdˆ on d. Now we define a new model M for wff ∀x∃y P (x, y) that agrees everywhere with model M but adds the function letter f to the alphabet. ˆ = aˆdˆ, Let fˆ denote the function from D(M ) to D(M ) such that fˆ(d) ˆ for all d in D(M ) and where aˆdˆ is defined as in the previous paragraph. Thus, V M [∀x P (x, f (x))] = T. We have shown that ∀x P (x, f (x)) is satisfiable. ⇐= We leave this to the reader. The argument is the reverse direction of the preceding argument. This completes the proof outline of Theorem 9. We now combine Theorem 7 and Theorem 9 to provide a comprehensive statement about arbitrary first-order wffs. Theorem 10 (Soundness and Completeness of First-Order Resolution). If A is a first-order wff then A is valid iff the SCF(¬A) has a firstorder resolution refutation.
58
Part 1: Proof Theory
Exercises 1.
Provide a formal sentence (wff) that most closely captures the sense of the English sentences given below. Specify the intended meaning of the non-logical symbols that you use. Do not use any subwff in your wff that is not needed for correctness of meaning. (a) Bob likes everyone Al likes and some others as well. (b) Al never likes Bob and Carol at the same time.
2.
Provide a formal sentence that most closely captures the sense of the English sentence given below. Choose from the following predicates: O(x, y, t){person x owes money to person y at time t}, M(t){t is this moment in time}, T(t){t is a time instant}, and P (x){x is a person}. Do not use any subwff in your wff that is not needed for correctness of meaning. Everyone is owed money sometimes but not everyone owes money at this moment.
3.
Provide a formal sentence that most closely captures the sense of the English sentence given. Use F (x, t){You can fool person x at time t}, P (x){x is a person}, T(t){t is a time instant}, and any other non-logical symbols that you want to introduce. However, do not use any unnecessary subwff or non-logical symbols in your wff. You can fool some of the people all of the time and all of the people some of the time but not all the people all the time.
4.
Are the two expressions below unifiable? If so, give the common unified expression determined by the most general unifier. If not, indicate precisely the reason for failure. Show the conflict explicitly and say why this is a conflict. Letters u, x, y, z are variables. P (z, g(x), z, f (x)) and P (u, g(u), h(y), f (y)).
5.
Show that the following clause set is unsatisfiable by finding a resolution refutation of the clause set. 1. 2. 3. 4. 5. 6.
P (a) ∨ P (b) ¬P (a) ∨ ¬Q( f (x)) ∨ P (y) ¬P (a) ∨ Q(x) ¬P (x) ∨ P (a) ¬P (x) ∨ R(x) ¬P (b) ∨ ¬R(a)
Chapter 2: Predicate Logic
6.
Find a resolution refutation of the following clause set. 1. 2. 3. 4. 5. 6. 7.
7.
59
Q(a) Q(b) ∨ Q(x) ¬P (x) ∨ R(y, x) ∨ R(y, a) P ( f (x)) ∨ ¬Q(x) ∨ ¬Q(y) ¬Q(x) ∨ ¬Q(y) ∨ ¬R(x, x) ¬P ( f (a)) ∨ ¬R(b, f (a)) ¬P ( f (a)) ∨ ¬R(a, f (b))
Find the Skolem Conjunctive Form for the following wffs. The wffs are not necessarily unsatisfiable. (a) ∀x∃y∃z(¬P (x, y) ∧ (Q(x, z) ∨ R(x, y, z))) (b) ¬∃z(∃x∀y P (x, y, y) → ∀y∃x(P (x, y, x) ∨ P (z, y, x)))
8.
Write out the proof for the “if” direction of the justification of Step 5 in the definition of a surrogate wff for the given wff. Be sure your proof is complete and understandable without the reader needing to consult the proof of the “only if” case. See page 57.
9.
The following set of clauses encodes the negation of the theorem that any integer greater than 1 has a prime divisor. The intended interpretation of the non-logical symbols is as follows: P (x){x is a prime factor}, D(x, y){x divides y}, L(x, y){x < y}, g(x){a non-trivial divisor of x if x is composite, else arbitrary}, f (x){a prime divisor of x, 1 < x < a, else arbitrary}, a{the least positive integer counterexample to the theorem}, 1{the integer 1}. The intended domain is the set of positive integers. Give a resolution refutation that establishes the theorem. This is a difficult problem without the hints given here. You are on the right track when you derive the literal D( f (g(a)), g(a)). Also clause 2 is needed only once as a parent clause in a refutation. Incidentally, this problem is hard only for humans; state-of-the-art automated deduction systems solve this almost instantaneously. That said, automated systems cannot solve most deep mathematics theorems. The clause set is: 1. 2. 3. 4. 5. 6.
D(x, x) ¬D(x, y) ∨ ¬D(y, z) ∨ D(x, z) P (x) ∨ D(g(x), x) P (x) ∨ L(1, g(x)) P (x) ∨ L(g(x), x) L(1, a)
60
Part 1: Proof Theory
7. 8. 9.
¬P (x) ∨ ¬D(x, a) ¬L(1, x) ∨ ¬L(x, a) ∨ P ( f (x)) ¬L(1, x) ∨ ¬L(x, a) ∨ D( f (x), x)
10.
Prove that any two most general unifiers for expressions E 1 and E 2 are unique up to a variable renaming.
11.
(Harder) (Unification Lemma: Robinson) If first-order expressions E 1 and E 2 , not necessarily variable disjoint, are unified by substitution θ then the mgu algorithm of this chapter produces a mgu σ for E 1 and E 2 . Prove this lemma by induction on the number of substitution components σi in the expression E 1 σ1 σ2 . . . σn λn (= E θ ), where each substitution component is created at a point of disagreement and removes that disagreement. Show that σn ={t/v} for the term t and variable v facing the pointers of the disagreement point, and that there exists a t such that {t /v} is the (sole) component removed from λn−1 to make λn. The mgu is σ = σ1 σ2 . . . σm such that E 1 σ1 σ2 . . . σm = E 2 σ1 σ2 . . . σm. Of course, λm need not be empty; σ need not be θ .
12.
Prove the Lifting Lemma using multiple applications of the Unification Lemma. The proof should be done using induction but since there are usually very few factors that need to be created, it is permissible for this exercise to use intervening dots.
13.
Using the Lifting Lemma give an induction proof of the Lifting Theorem for Resolution whose statement follows: If S is a finite set of clauses, Sgr is a finite set of ground substitution instances of S, and C1 , . . . , Ci is a deduction from clauses of Sgr , then there exists a deduction D1 , . . . , D j from clauses of S and a sequence of substitutions θ1 , . . . , θm such that D1 θ1 , . . . , Dmθm is the deduction C1 , . . . , Ci , possibly with repetitions of clauses. Consider that the given clauses are part of the deduction. Explain how the index m may differ from either i or j.
14.
Prove the Completeness Theorem for First-Order Resolution, the “only if” direction of Theorem 10. Use the Lifting Theorem of the previous exercise (and other theorems). If A is a first-order wff then A is valid only if the SCF(¬A) has a first-order resolution refutation.
3 An Application: Linear Resolution and Prolog
As an application of the resolution proof system we present a restriction of the proof system closely tied to the programming language Prolog (for PROgramming in LOGic). With our focus on logic in this book we do not pursue a general introduction to Prolog, but only suggest the correspondence between the resolution restriction and the core structure of Prolog. The primary point of this chapter is to illustrate that resolution logic has a concrete application outside of logic itself. We do give two illustrations of simple Prolog programs in an appendix for those interested in a further sense of the programming language. Interested readers can pursue programming in Prolog on many college campuses using the C-Prolog system. C-Prolog is not freely available to the general public but other versions are available, including systems that can be downloaded to PCs or Apple computers. SWI-Prolog is available on the most platforms at this time. Internet search engines will provide a pointer to this and other systems. Books [1] and [3] provide an introduction to the Prolog language. Two researchers, Colmerauer and Roussel, created Prolog from a simplification of the SL-resolution procedure of Kowalski and Kuehner, itself a product of work by one of this book’s authors and others. Colmerauer and Roussel were interested in natural language processing and developed Prolog to help in language parsing and inference. Kowalski and others provided key ideas to define the logical foundations of Prolog. Colmerauer and Kowalski are considered the fathers of the field of logic programming, of which Prolog is the centerpiece. The language would never have been practical without the abstract machine architecture of David H. D. Warren. See the Wikipedia entry for Prolog for more information and history. The design ideal of Prolog is to have a programming language/system where the user describes the problem and the system solves the problem. That is, the user does not have to be concerned about the control aspect of programming. Declarative statements about the problem environment (axioms) define the problem and the query (theorem conjecture) presents the specific problem to be solved.
62
Part 1: Proof Theory
This ideal is not achieved; some control considerations are needed in the presentation of the program. (An example of this is that clause order in a program can be the key to a successful execution, both for correctness and termination/speed.) We do not address the reasons for clause ordering sensitivity in Prolog or other control issues. We do present a resolution restriction that underlies the core of the Prolog system, along with the subset of logic in which this resolution system is embedded. The subset of full first-order logic we present is called Horn logic; the resolution restriction we give is complete for Horn logic. (We capitalize “Horn” to honor the logician Alfred Horn who first noted its usefulness.) We first present a restriction of the general resolution system of the preceding chapter that remains complete for all of first-order logic, and then further restrict this system to yield the refutation system which underlies the core of Prolog. In the preceding chapters we have been concerned with the design, soundness, and completeness of the resolution refutation logic. Little attention was paid to the issue of searching for a refutation. With minor exceptions every clause was a potential parent clause and each literal was a candidate for the resolving literal. The resolution restriction we present greatly limits the options for the next derived clause of a refutation. This is a convenience for humans but of great importance to automated proof search. The search for a refutation can be significantly aided by use of a resolution restriction although it also can lengthen the resulting refutation. The resolution restrictions we consider are proper deductive systems with more complex rules of inference than in the last chapter. These restrictions also can be easily translated into proof procedures, although choosing a good search algorithm to accompany a proof system is not always easy. Sometimes the language use is relaxed and the proof systems are themselves called proof procedures.
3.1 OSL-Resolution We define a restricted class of resolution deductions called ordered s-linear resolution (OSL-resolution) deductions. OSL-resolution and the resolution restriction that underlies the Prolog language are examples of the class of linear resolution deduction systems. Linear resolution systems have the property that one of the parent clauses of a resolvent in a deduction is the immediately preceding clause of the deduction. This allows the view that the deduction is constantly moving forward with the other parent clause modifying the currently
Chapter 3: Linear Resolution and Prolog
63
last clause of the developing deduction. We consider the term s-linear later. To understand the OSL-resolution format we first present two examples in that format. Before that, we need a few key definitions. We give a formal definition of OSL-resolution later. We will need to employ ordered clauses, which we will label O-clauses. Order is determined by giving an ordering rule O for ordering the literals of a clause. Almost any ordering rule is acceptable, although it must accommodate the conditions of the application involved. (Unnatural orderings can give trouble; we address this later in the chapter.) Just as clauses can be viewed as sets of literals, ordered clauses can be viewed as sequences of literals. We view the given clauses as unordered and the derived clauses as ordered, that is, as O-clauses. All operations with O-clauses involve the leftmost literal of the O-clause. We name the immediately preceding O-clause and parent O-clause to a resolvent the near parent O-clause of the resolvent. The other clause or O-clause is the far parent clause or O-clause, and may be a given clause or an earlier derived O-clause. To simplify notation we use only “far parent clause” (and sometimes, “near parent clause”) hereafter, understanding that the clause is ordered when it is a derived O-clause. The resolvent has a simple but definite structure. Any surviving literals from the near parent O-clause are placed on the right and must be in the same order as the literals from which they derive. (Again, throughout our remaining treatment we will speak of descendent literals as if they are the same literals, ignoring that descendents may be further refined by a substitution.) Literals from the far parent clause appear to the left of the near parent literals and are ordered by the ordering O. Informally, this means that one can order these literals as one pleases. O-factoring must involve the leftmost literal and the O-factor retains the order of the parent clause for the remaining literals. We now present our example, first in standard format and then in an embellished refutation. We list given clauses first, as usual, throughout this chapter. The last listed given clause becomes the first near parent O-clause, called the top O-clause of the refutation. If one wishes to use a given clause not listed last as the top O-clause it must be repeated, with the literals ordered as desired, as done here. As a notational change, in the justification that appears on the right of a derived O-clause entry we list only the far parent resolving literal, as the other resolving literal is always leftmost in the preceding O-clause.
64
Part 1: Proof Theory
1. 2. 3. 4. 5 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.
P ( f (x)) ¬Q(x) Q(x) ∨ ¬R(a, x) ¬P (a) ∨ ¬R(b, f (x)) P (x) ∨ ¬R(b, f (y)) ∨ R(a, y) ¬P ( f (a)) ∨ Q(x) ∨ R(a, y) ∨R(b, f (a)) P ( f (x)) Q(x) ∨ R(a, y) ∨ R(b, f (a)) R(a, y) ∨ R(b, f (a)) Q(y) ∨ R(b, f (a)) R(b, f (a)) ¬P (a) ¬R(b, f (y)) ∨ R(a, y) R(a, a) Q(a)
given clause given clause given clause given clause given clause given clause top O-clause, clause 1 repeated resolvent, used 6a resolvent, used 2 resolvent, used 3b resolvent, used 2 resolvent, used 4b resolvent, used 5a resolvent, used 11 resolvent, used 3b resolvent, used 2
Note that in the deduction above, there was one use of a derived O-clause as a far parent O-clause, O-clause 11. As here, given clauses are the usual far parent O-clauses; use of derived O-clauses as far parents is relatively rare. OSL-resolution deductions, and in particular refutations, are best understood in terms of refutation segments. Let C denote an O-clause of a deduction with leftmost literal L and remaining sub-O-clause C R . A refutation segment with head L is the contiguous sequence of sub-O-clauses in the deduction each to the left of the sub-O-clause C R , beginning with literal L and terminating when C R is the entire derived O-clause. Alternately, a refutation segment with head L is also the sequence of consecutive O-clauses that begins with the clause C with leftmost literal occurrence L and terminates when the set of sub-O-clauses introduced by L results in the empty clause. Note that an O-factor operation may, but need not, end a refutation segment. (As always, we have used the term C R without acknowledging that it undergoes refinements in subsequent O-clauses.) We now give the same refutation but embellished to display the refutation segments. For this embellished refutation each refutation segment concludes with an annotated empty clause, the number indicating the first line of the segment. We also use offsets to distinguish major segments. We can view the refutation as a removal in turn of each of the literals of line 8. Each literal there eventually heads a refutation segment whose completion removes the head
Chapter 3: Linear Resolution and Prolog
65
literal. When R(b, f (a)) is removed the refutation is successful. Every line initiates a refutation segment in a successful refutation, but some are more interesting than others. Notice the nesting of resolution segments. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.
P ( f (x)) ¬Q(x) Q(x) ∨ ¬R(a, x) ¬P (a) ∨ ¬R(b, f (x)) P (x) ∨ ¬R(b, f (y)) ∨ R(a, y) ¬P ( f (a)) ∨ Q(x) ∨ R(a, y) ∨ R(b, f (a)) P ( f (x)) Q(x) 8
7
∨ R(a, y) R(a, y) Q(y) 9, 10
∨ R(b, f (a)) ∨ R(b, f (a)) ∨ R(b, f (a)) R(b, f (a)) ¬P (a) ¬R(b, f (y)) ∨ R(a, y) R(a, a) 13 Q(a) 11, 12, 14, 15
given clause given clause given clause given clause given clause given clause top O-clause, clause 1 repeated resolvent, used 6a resolvent, used 2 resolvent, used 3b resolvent, used 2 resolvent, used 4b resolvent, used 5a resolvent, used 11 resolvent, used 3b resolvent, used 2
We now give another example (without embellishment) that has different features. The major new feature is the s-resolution operation. S-resolution applies when the far parent is also a derived O-clause. In this case, normal ordered resolution would bring the possibly substantial disjunction of surviving literals from the far parent to the resolvent. Often these literals can all be removed by immediate O-factoring in subsequent steps. The condition that guarantees that such removal is acceptable, although not necessarily by O-factoring, is that the near parent clause be in the refutation segment headed by the far parent clause leftmost literal. Under this condition we may delete all the far parent surviving literals. Moreover, this is the only resolution operation needed using two derived O-clauses! S-linear resolution derives its name from the s-resolution operation. The s-resolution operation was originally defined using θ -subsumption. Here we use the refutation segment concept. We give an example using s-resolution. For this example the ordering we choose requires that the Q literals precede the P literals. Otherwise, the order is as the given clause is listed. Recall that the ordering affects only the far parent residual in the resolvent and also the top O-clause. Also recall that the given clauses are not ordered.
66
Part 1: Proof Theory
Example. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.
P (x) ∨ Q(x) ¬P (x) ∨ Q(x) ∨ R(y) P ( f (x)) ∨ ¬Q(x) ¬P ( f (a)) ∨ ¬Q(x) ¬Q(x) ∨ ¬R(y) Q(x) ∨ ¬R(x) ∨ ¬P (x) ¬R(y) ∨ ¬R(x) ∨ ¬P (x) ¬R(x) ∨ ¬P (x) Q(z) ∨ ¬P (z) ∨ ¬P (x) P ( f (z)) ∨ ¬P (z) ∨ ¬P (x) ¬Q(y) ∨ ¬P (a) ∨ ¬P (x) ¬P (a) ∨ ¬P (x) ¬P (a) Q(a) P ( f (a)) ¬Q(x)
given clause given clause given clause given clause given clause given clause and top O-clause resolvent, used 5a O-factor resolvent, used 2c resolvent, used 3b resolvent, used 4a s-resolvent, used 9a O-factor resolvent, used 1a resolvent, used 3b resolvent, used 4a s-resolvent, used 14
An interesting s-resolvent occurs at line 12 above. A standard ordered resolvent would inherit two literals from the far parent clause, resulting in the resolvent ¬P (z) ∨ ¬P (y) ∨ ¬P (a) ∨ ¬P (x). But the s-resolution operation simulates the O-factoring that removes the subclause ¬P (z) ∨ ¬P (y). The removal of the far parent surviving literals is justified because at the ground level the surviving literals of the far parent clause have also been passed down as near parent literals of near parent clauses to the present O-clause. Then merging occurs. Special care in the lifting argument justifies this at the first-order level even though occasionally O-factoring is not justified. Again, the condition that assures the safety of s-resolution is that the near parent clause be in the refutation segment headed by the leftmost literal of the far parent clause. Also note that O-factoring is an optional operation so that, absent s-resolution, the unfactored alternative must remain. S-resolution removes the possibility that alternate deduction paths need be explored. Note that line 11 is in the refutation segment beginning at line 9 but not in the refutation segment beginning at line 6. (The latter is an example of a refutation segment terminating with an O-factoring.) An s-resolvent occurs at line 17 that is also a standard ordered resolution operation. We label it as an s-resolvent simply because the deduction format we define requires s-resolution to be used when the
Chapter 3: Linear Resolution and Prolog
67
far parent is a derived O-clause. The resolvent at line 14 of the earlier example is also an s-resolvent. We now give the formal definitions of O-clause and OSL-resolution. Definition. An O-clause of clause A is a sequence of the literals of clause A using a user-defined ordering O to the extent permitted by other order constraints on the clause. An O-clause is an O-clause of clause B for some B. Given clauses are unordered and, as such, every literal of every clause may be a resolving literal. The last given clause is also the top O-clause. It begins the deduction and, as such, is an O-clause; its leftmost literal is the resolving literal for the first resolution operation of the deduction. After that it is an unordered clause free to be a far parent clause. Because a linear deduction cannot be interrupted to derive a factor of a given clause, we permit a far parent clause to be a factor of a given clause. Definition. An ordered s-linear resolution (OSL-resolution) deduction of clause A from given clause set S consists of a sequence of clauses and O-clauses such that 1. all given clauses are listed at the beginning of the deduction; 2. all O-clauses are ordered by a user-given ordering O subject to the constraints listed below; 3. one parent O-clause of a resolution inference is the immediately preceding O-clause (the near parent); 4. the other parent O-clause (the far parent) is either a given clause or its factor, or an earlier derived O-clause; 5. a resolving literal is the leftmost literal of a parent O-clause, or any literal of a given clause or its factor; 6. the surviving literals of the near parent O-clause appear rightmost in the resolvent O-clause in the order determined by the near parent O-clause, while the surviving literals of the far parent O-clause appear in the resolvent O-clause and are ordered by the ordering O; 7. the top O-clause is a given clause that begins the deduction as the first near parent O-clause, ordered by O; 8. any resolution inference between two derived O-clauses must be an s-resolution operation, which is permitted only if the near parent clause is in the refutation segment headed by the resolving literal of the far parent clause;
68
Part 1: Proof Theory
9. if the s-resolution condition is met then the s-resolvent is the standard resolvent of the two O-clauses but with no far parent literals retained; 10. the top O-clause may be used in an s-resolution if the s-resolution condition is met; 11. O-factoring and merging must involve the leftmost literal, and the O-factor retains the order of the parent clause for the remaining literals; and 12. the last O-clause is an O-clause of the clause A. Definition. An OSL-refutation is an OSL-resolution deduction of . Note that tautologies are permitted here. OSL-resolution is technically not a resolution restriction as the s-resolution operation is not within the resolution operation set. However, it is considered within the resolution realm as resolution tools are used, notwithstanding that s-resolution requires special attention when lifting the ground case. OSL-resolution is a variant of s-linear resolution, a complete resolution refutation procedure defined in [6]. OSL-resolution is almost the resolution counterpart of the Model Elimination proof procedure, which is also technically not a resolution procedure, for somewhat more substantial reasons. The Model Elimination procedure is treated in [5] and [6]. We introduce a concept needed for the statement of the Completeness and Soundness Theorem, and one we need in the next section on Horn logic. Definition. Given an unsatisfiable clause set S a minimally unsatisfiable clause set Smin is a subset of S with every proper subset of Smin satisfiable. Fact. Every (finite) unsatisfiable set has (usually many) minimally unsatisfiable subsets. Proof. Suppose not. Then every subset of an unsatisfiable subset of the given set has at least one proper subset that is also unsatisfiable. Starting at the given set choose a proper subset that is itself unsatisfiable. Because the given set is finite the number of selections of subsets is finite. Every one-element set is satisfiable so the chain of subsets must end in a set of cardinality at least two. Thus the last set in the chain of sets has nonempty subsets that are all satisfiable. This contradicts the assumption and the theorem holds.
69
Chapter 3: Linear Resolution and Prolog
Theorem 11 (Soundness and Completeness of OSL-resolution). If A is a first-order wff then A is valid iff the (the clause set of) SCF(¬A) has an OSL-resolution refutation for a suitable ordering O. Moreover, if S A is a minimally unsatisfiable subset of SCF(¬A) then a refutation exists using any O-clause of S A as top O-clause. By a suitable ordering we mean to exclude certain unnatural orderings that remain vague at this point. Maintaining clause order, shifting order to keep certain predicates leftmost, or keeping positive literals leftmost are all examples of good orderings. A bad ordering is one that changes clause ordering based on rolling dice, for example. The proof is omitted as our concern is with a modification of this restriction that is closely associated with the programming language Prolog. We will present a proof for that variant. Soundness is immediate at the ground level as s-resolution can be replaced by standard resolution and merging. The first-order level involves care with the s-resolution operation but soundness does hold.
3.2 Horn Logic Definition. A Horn clause is a (disjunctive) clause with at most one positive literal. Definition. A Horn clause with a positive literal is called a definite (Horn) clause. Definition. Horn logic is the subset of first-order logic that has as its wffs the universal closure of conjunctions of Horn clauses. The wffs of a Horn logic are called Horn wffs. Clearly the conversion of a Horn wff to its related clause set is immediate. We will hereafter treat Horn wffs and Horn clause sets as synonymous. Recall the relationship between a clause and its implicative associated clause. The symbol ⊥ denotes an always-false proposition, such as P ∧ ¬P . Horn clause P (x) ∨ ¬Q(y) ∨ ¬R(z) ¬P (x) ∨ ¬Q(y)
Implicative format ∀x∀y∀z(Q(y) ∧ R(z) → P (x)) ∀x∀y(P (x) ∧ Q(y) →⊥)
Definition. An ordered (linear) input resolution (OI-resolution) deduction of clause A from clause set S is an OSL-resolution deduction of an
70
Part 1: Proof Theory
O-clause of A from given set S where all far parent O-clauses are given O-clauses or their factors and neither O-factoring nor merging is permitted. The s-resolution operation using the top O-clause also is not permitted. Definition. An OI-resolution refutation is an OI-resolution deduction of . Because no two derived O-clauses can resolve with each other, the s-resolution operation is never used in an OI-resolution refutation. We note in passing that any resolution restriction with the property that one parent of every resolvent is a given clause is an input resolution restriction. Just as for OSL-resolution there is effectively no restriction on the ordering O to be used in an OI-resolution deduction. Of course, there are restrictions on the clauses of the deduction by the definition of OSL-resolution. We continue to allow factors of given clauses as given clauses. A simple propositional example shows that OI-resolution is not complete for full (propositional or) first-order logic. Example. 1. 2. 3. 4. 5. 6. 7. 8.
A∨ B ¬A ∨ B A ∨ ¬B ¬A ∨ ¬B ¬B ∨ ¬B A ∨ ¬B B ∨ ¬B — etc.
given clause given clause given clause given clause and top O-clause resolvent, used 3a resolvent, used 1b resolvent, used 2a continues forever without deducing
One would love to s-resolve O-clauses 4 and 6 or 5 and 7, but top O-clause s-resolution is not permitted and resolution with a given clause always replaces the resolving literal of the near parent with the far parent surviving literal. Merging is excluded by the conditions of OI-resolution, but one might observe that if merging were permitted a refutation still would not be obtainable. We now give an example using IO-resolution on a Horn given clause set. 1. 2. 3.
P Q R
given clause given clause given clause
71
Chapter 3: Linear Resolution and Prolog
4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.
S ∨ ¬Q ∨ ¬R T ∨ ¬Q ∨ ¬S U ∨ ¬P ∨ ¬R ∨ ¬T ¬T ∨ ¬U ¬Q ∨ ¬S ∨ ¬U ¬S ∨ ¬U ¬Q ∨ ¬R ∨ ¬U ¬R ∨ ¬U ¬U ¬P ∨ ¬R ∨ ¬T ¬R ∨ ¬T ¬T ¬Q ∨ ¬S ¬S ¬Q ∨ ¬R ¬R
given clause given clause given clause top O-clause resolvent, used 5a resolvent, used 2 resolvent, used 4a resolvent, used 2 resolvent, used 3 resolvent, used 6a resolvent, used 1 resolvent, used 3 resolvent, used 5a resolvent, used 2 resolvent, used 4a resolvent, used 2 resolvent, used 3
It is interesting to follow the refutation segments from the top O-clause. We see that the refutation segment for ¬T is embedded in the refutation segment for ¬U . That is a sign that a regular (nonlinear) proof would be shorter. There is a nonlinear restriction of resolution, called unit resolution, that is also complete over Horn logic and often quite efficient at finding refutations. See Exercise 13. OI-resolution is complete for Horn logic. We later prove the completeness of OI-resolution for ground Horn clause sets for the special case that the top O-clause is restricted to a negative clause. First, we establish some properties of unsatisfiable clause sets. Definition. A clause with only negative (positive) literals is called a negative ( positive) clause. Definition. A literal L is pure in a clause set if there is no occurrence of a literal L 1 in the clause set such that L and L 1 can unify. Theorem 12. Let S be an arbitrary first-order minimally unsatisfiable clause set. (i) Every unsatisfiable clause set has at least one positive and one negative clause. (ii) No clause in S is θ -subsumed by another clause of S. (iii) Every clause in S must be used in a refutation of S. (iv) No clause containing a pure literal is in S.
72
Part 1: Proof Theory
(v) If a multiliteral clause C is in S, C − is a subclause of C, and clause (C − C − ) replaces C in S to form clause set S − , then S − is unsatisfiable and clause (C − C − ) is in any minimally unsatisfiable subset of S − . Proof. We only prove the results for ground clause sets although the results hold for first-order clause sets as well. Again, let S be an arbitrary minimally unsatisfiable clause set. (i) Suppose there is no negative clause in some unsatisfiable clause set. Then every clause contains a positive literal. Consider the interpretation I where all positive literals are assigned T. I is then a model for the clause set. Contradiction. The positive clause case is similar. (ii) Let C and D be distinct clauses of S such that C θ -subsumes D. Because S is unsatisfiable every interpretation falsifies at least one clause. Any interpretation that falsifies clause D must also falsify clause C because every literal of a clause must be false in that interpretation to falsify the clause. If clause D is removed from S then still no interpretation satisfies all clauses of S-{D}, so S-{D} is unsatisfiable. But then S is not minimally unsatisfiable. Thus, D could not be θ -subsumed by C. (iii) Given a refutation of S if a clause C is omitted from the refutation then S-{C} is unsatisfiable as the refutation shows. Thus S is not minimally unsatisfiable. (iv) (Proof theoretic version) No resolution refutation of S can include a pure literal clause C for otherwise a pure literal L of C remains in the deduction as L can resolve with no literal of any clause of S. Therefore, C is not in S. (v) (Model theoretic version) No interpretation I of S has a pure literal clause C as the only false clause. Otherwise, one can define a new interpretation I as for I except that a pure literal L of C is assigned truth value T. This creates no new false clauses since L is pure, so I is a model of S. Therefore, all interpretations have other false clauses and C is not in S. (The reader should fill in the details of the two proofs for assertion (iv). See Exercise 6.) (vi) Suppose S − is satisfiable; let M be a model of S − . Then M also is a model of S because adding literals onto a disjunctive clause does not alter the truth assignment to that clause. If clause (C − C − ) is not in a minimally unsatisfiable subset S −− of S − , then S −− is unsatisfiable and a proper subset of S.
Chapter 3: Linear Resolution and Prolog
73
We prove our completeness result by complete induction on the number of excess literal occurrences in a clause set (defined below). This proof technique provides a uniform approach to proving completeness of most resolution variations. We could have used it to show that the original unrestricted propositional resolution logic is complete, but it imparts little understanding of the reason resolution works. Here the proof is more complex and it is also instructive to understand this approach, as well as to have more exposure to induction proofs. Definition. If S is a clause set then the number of excess literal occurrences for S is the number of literal occurrences in S minus the number of clauses of S. Example. S = {P ∨ Q, ¬P ∨ Q, ¬Q} has 5-3 = 2 excess literal occurrences. We now comment on suitable orderings. A suitable ordering is any ordering that allows the induction proof to go through. We remark more inside the proof of Theorem 13. Theorem 13 provides a proof of ground completeness for a negative O-clause as top O-clause. Problem 10 offers a generalization of this result to all top O-clauses. Theorem 13 (Restricted Completeness Theorem). If S is a minimally unsatisfiable set of ground Horn clauses and C is a negative clause in S then there exists an OI-resolution refutation of S with a top O-clause of C for any suitable ordering O. Proof. The proof is by induction. The induction predicate is straightforward. P(n): The theorem holds for any clause set S with n excess literal occurrences. P(0): The clause set must be { {A}, {¬A } } for some atom A. The OI-refutation is: 1. 2. 3.
A ¬A
given clause top O-clause resolvent, used 1
For each n ≥ 0 we show: Assume P(k) for all 0 ≤ k ≤ n, to show P(n + 1). Now the minimally unsatisfiable set S has at least one excess literal occurrence. The proof is just several instances of the following scheme.
74
Part 1: Proof Theory
Remove one or more literals from a clause of S, find a minimally unsatisfiable subset of the altered set, obtain a refutation by induction hypothesis, and patch this refutation to obtain a deduction/refutation of S. The proof breaks into two cases. Case 1. We consider first the case that C is a one-literal clause. We list the terms that we use in the proof below for consultation as they are used in the proof. One can see in the terms the framework for the scheme stated above. We use set notation for clauses for convenience and conciseness. S: a minimally unsatisfiable set of ground Horn clauses C: a one-literal clause {¬A} in S D: a multiliteral Horn clause in S of form {A} ∪ D− D− : an ordered negative subclause of D S − : the clause set S with D− replacing D; i.e., (S − {D}) ∪ {D− } − : a minimally unsatisfiable clause subset of S − containing D− Smin First, why do we know that D exists in S? If there isn’t a multiliteral O-clause with A as one literal then either ¬A is a pure literal in S or literal A is in a one-literal clause, so S is {{A}, {¬A}} and has no excess literal occurrences. Neither of these is possible. We then know that S − is unsatisfiable and contains the O-clause D− by Theorem 12. Then there − containing D− , also by Theorem 12. Note that the exists the subset Smin − might contain the clause {¬A}; we do not know otherwise. set Smin − − as Smin We can invoke the induction hypothesis regarding set Smin has fewer excess literal occurrences than S. Choose an arbitrary suitable ordering O. The induction hypothesis asserts that we can use the − as a top clause. We clearly O-clause of any negative clause of Smin can invoke the induction hypothesis using the top O-clause D− as − . Doing so, we have an OI-resolution we are sure that D− is in Smin − refutation of Smin that holds for the ordering O. Note that all the derived O-clauses of the refutation are negative O-clauses, shown by an induction argument. Argued roughly, the top O-clause is negative so any given clause resolving with it must have the sole positive literal as a resolving literal, so the resolvent is a negative derived O-clause. The argument continues in the same manner for all derived O-clauses. Of course, this should be formalized by a proper induction proof. We must now show that the original set S has an OI-resolution refutation starting with {¬A}. We display the form of the refutation below. We replace D− by {A} ∪ D− everywhere as required, but the literal A is resolved away at line m+2. Note that D− is never used as a far parent − as all derived O-clauses in the induction hypothesis refutation over Smin are negative O-clauses. This fact is important for two reasons. First, the
75
Chapter 3: Linear Resolution and Prolog
deduction below line m + 2 has no replacement of D− by {A} ∪ D− , so the induction hypothesis refutation below line m + 2 is still a valid refutation. Thus, we have a valid resolution refutation of S. Second, D− is no longer a given clause in the refutation of S, but it is not used as a far parent. Therefore, all far parent clauses are given clauses of S, so we have a valid OI-resolution refutation of S. (In the general case treated in Exercise 10 the clause in the position of D− may be used again, but the one-literal clause is available to help.) 1.
m. m + 1. m + 2.
. . . . . {A} ∪ D− {¬A} D− continuation of induction hypothesis refutation with top O-clause D− .
given clauses of S
given clause top O-clause resolvent, used m(a)
The above argument holds for any suitable ordering O. Note that a possible non-suitable ordering might be one that adjusts the ordering based on the distance of the clause from the top O-clause. The ordering would change as one added the step m + 1. However, one could adjust for this ordering by choosing an ordering that added 1 to the distance count when invoking the induction hypothesis. In short, for many unnatural orderings one can change the ordering rule when invoking the induction hypothesis. This completes Case 1. Case 2. C is a multiliteral clause. As before, we list the terms used in the proof of this case. S: a minimally unsatisfiable set of ground Horn clauses C: a mutiliteral clause C − ∪ {¬A} in S, also the top O-clause of the desired refutation C − : an ordered negative subclause of C S − : the clause set (S − {C}) ∪ {C − } − : a minimally unsatisfiable subset of S − Smin − S A : the clause set (S − C) ∪ {¬A} − : a minimally unsatisfiable subset of S A− S A,min
76
Part 1: Proof Theory
Fix an ordering O. Given the desired O-clause for C, we split the O-clause to separate the rightmost literal, labeled ¬A, from the remaining sequence of literals C − . As before, we know that S − is unsatisfiable − and that C − is in S − , and also in Smin , by Theorem 12. Also S − has fewer excess literal occurrences than S. We invoke the induction hypothesis − with top O-clause C − and get a refutation of the set on the set Smin − Smin with top O-clause C − . We then construct a deduction of {¬A} from S with top O-clause C by placing ¬A back as the rightmost literal in the top O-clause C − ∪ {¬A}, and also as the rightmost literal in every derived O-clause. The empty clause is replaced by {¬A}. Note that C is not called as a far parent clause, so the replaced ¬A does not disrupt the deduction. Again, this is because all derived O-clauses are negative clauses. Therefore, we have a valid resolution deduction of {¬A} from S. We see that it is a valid OI-resolution deduction by noting that ¬A appears rightmost in all derived O-clauses as required by the resolvent ordering rule for OI-resolution, and that only given clauses of S are far parent clauses. Next, we consider S A− , where {¬A} replaces C in S. S A− is unsatisfiable and contains the clause {¬A}. Also, S A− has fewer excess literal occurrences than S. We again invoke the induction hypothesis to get an OI− from top O-clause {¬A}. We append this refutation refutation of S A,min to the deduction of {¬A} to get a resolution refutation of S. Again, ¬A is not used as a far parent in the refutation of {¬A}, so only clauses from S are far parent clauses. Thus, this portion of the refutation, and therefore the entire refutation, is an OI-refutation. This result holds for all suitable orderings O. Theorem 14 (Soundness Theorem). If S is a first-order set of Horn clauses and if there exists an OI-resolution refutation of S then S is unsatisfiable. Proof. OI-resolution is a restriction of (general) resolution without factoring as the s-resolution operation is not a part of OI-resolution. A general resolution proof without factoring is sound so OI-resolution is sound. Note that the incompleteness of general resolution without factoring does not concern us here. We only are concerned with soundness. Theorem 15 (Completeness Theorem). If S is an unsatisfiable set of Horn clauses then there is an OI-resolution refutation of any O-clause as top O-clause for any ordering O providing that the O-clause is in a minimally unsatisfiable subset of S.
Chapter 3: Linear Resolution and Prolog
77
Proof comment. We do not require S to be minimally unsatisfiable. A single negative clause in S is clearly in some minimally unsatisfiable subset of S even if we only know S to be unsatisfiable. Theorem 13 provides the prototype for this theorem, but one needs both a generalization to handle all top O-clauses and a lifting argument to establish the first-order level.
3.3 Input Resolution and Prolog We now show by a primitive example the relationship between OIresolution and Prolog. An appendix includes some more examples of Prolog programs with limited commentary. We remark that the OIresolution format on Horn clauses satisfies a standard Artificial Intelligence (AI) reasoning paradigm that can be stated as follows: One begins the reasoning process in a certain world state (here the top O-clause) and applies given operators (here the given clauses) until one achieves the desired final world state (here the empty clause). This paradigm about linear reasoning is an important aspect of Prolog. One might also reread the beginning paragraphs of this chapter that discuss Prolog. There we stated the design ideal of Prolog, namely, to have a declarative presentation of problems. Together the stated paradigm and the design ideal capture much of what Prolog sought. We need to introduce some Prolog notation. A fact, which is atomic as Horn logic does not permit disjunctive facts, is written as “A.” for fact A. The implication A ∧ B → C is written in Prolog as “C :- A, B.” and is logically equivalent to the clause C ∨ ¬A ∨ ¬B. A Prolog program is a conjunction of Horn clauses, the definite Horn clauses written as Prolog clauses as shown above. Prolog permits the single negative clause (see Exercise 9) to be a disjunctive clause, but most frequently it is a single literal. A one-literal negative clause ¬C is encoded as “?- C.” This expression is called the query as it is the hypothesized fact that one hopes the Prolog program logically implies. We finish with the promised example. (Abstract) Prolog program+ ?- Q. (query) Q :- A, B. A :- B, D. B :- C. C. D.
Resolution counterpart ¬Q (neg. clause) Q ∨ ¬A ∨ ¬B A ∨ ¬B ∨ ¬D B ∨ ¬C C D
78
Part 1: Proof Theory
Prolog execution ?- Q. :- A, B. :- B, D, B. :- C, D, B. :- D, B. :- B. :- C. Yes
OI-resolution refutation 1. ¬Q 2. ¬A ∨ ¬B 3. ¬B ∨ ¬D ∨ ¬B 4. ¬C ∨ ¬D ∨ ¬B 5. ¬D ∨ ¬B 6. ¬B 7. ¬C 8.
Exercises 1.
Find an OSL-resolution refutation of the following clause set. 1. P (a) ∨ P (b) 2. ¬P (a) ∨ ¬Q( f (x)) ∨ P (y) 3. ¬P (a) ∨ Q(x) 4. ¬P (x) ∨ P (a) 5. ¬P (x) ∨ R(x) 6. ¬P (b) ∨ ¬R(a)
2.
Find an OSL-resolution refutation of the following clause set. 1. Q(a) 2. Q(b) ∨ Q(x) 3. P ( f (x)) ∨ ¬Q(x) 4. ¬P (x) ∨ R(y, x) ∨ R(y, a) 5. ¬Q(a) ∨ ¬Q(y) ∨ ¬R(x, y) 6. ¬P ( f (a)) ∨ ¬R(b, f (a)) 7. ¬P ( f (a)) ∨ ¬R(a, f (b))
3.
From the preceding clause set find two minimally unsatisfiable ground clause sets of different cardinality. Proceed by finding ground instances, chosen from the O-clauses of the preceding exercise, that form the minimally unsatisfiable clause sets, then prove the minimality of one of the clause sets.
4.
Show that Search Aid 1 (see page 20 and page 49) is safe for the resolution proof system under discussion to use over any ground unsatisfiable set in that its application preserves refutability. Hint: Use the notion of minimally unsatisfiable sets.
Chapter 3: Linear Resolution and Prolog
79
5.
The least number principle states that every nonempty subset of the positive nonnegative integers has a least member. Use this principle to give a detailed alternate proof for the fact stated in Section 3.3 that every finite unsatisfiable set has minimally unsatisfiable subsets.
6.
Provide complete details of the two proofs of Theorem 12, assertion (iv). Complete these explanations as if you were teaching a new reader these proofs where the reader knew the first two chapters of the book and could look up and understand any theorem or definition you use. In writing the explanations, every time you make an assertion, ask yourself “why is this true?” and provide the answer.
7.
In Appendix C the first example program is a toy database. Using the mapping between the Abstract Prolog program plus query and the Resolution counterpart given on page 77, write the resolution counterpart to the given set of Prolog clauses and the query, plus axioms suggested below, and then find the refutation with the query counterpart as top O-clause. Use abbreviations for predicate names and constants for convenience but provide a translation table. Use xn, yn and zn, where n is any integer of your choice, for your variable names. Represent \== by the predicate name neq in prefix form. You need to add axioms for neq; these axioms will be very problem specific. Note that your axioms are feasible only because this is a very small database. Why? Discuss. Note: Prolog has a built-in procedure for evaluating the truth of this predicate.
8.
Use Theorem 11 to show that every unsatisfiable ground Horn set S has a refutation where at least one of the parent clauses of each resolution is in S. This is a weak form of OI-resolution.
9.
Prove that every minimally unsatisfiable Horn clause set has only one negative clause.
10.
(Harder) Generalize Theorem 13. Prove that the statement of Theorem 13 holds for any top O-clause of any clause in a minimally unsatisfiable ground Horn clause set. Prove this for any suitable ordering rule. It may help to first consider the case for orderings that have the positive literal rightmost. You may use Theorem 13 without reproving it, but if you use any property from within the proof, clearly state that property.
11.
(Harder) Show that there exists a first-order application of s-resolution that cannot be simulated by factoring and a resolution application. Hint: Use shared variable clauses that pass bindings along.
80
Part 1: Proof Theory
12.
(Harder) Show that there is a clause ordering O for an unsatisfiable clause set that when applied to given and derived clauses yields O-clauses for which no regular resolution ground refutation is possible. That is, not every clause ordering yields a complete ordered clause resolution system. This assumes that we retain the restriction that all resolution and factoring involves only the leftmost literals of an O-clause. (Note that here you can choose the permitted ordering for given clauses as well as for derived clauses.)
13.
(Harder) Show the same result as for the previous exercise when the ordering O is special for derived clauses only. That is, each literal of each given clause is leftmost in some O-clause of the ordering you define. Hint: Solve the previous exercise first.
APPENDIX A: The Induction Principle
We will summarize two induction principles, mathematical induction and complete induction. These principles provide proof techniques for proving statements about an infinite number of objects, in our case the positive integers and the nonnegative integers. These are labeled as principles as one or the other is traditionally taken as an axiom in the formal definition of arithmetic. (The mathematical induction principle follows very closely the usual definition of the integers.) As such, one cannot prove them except from equally strong assumptions. We comment further on this after presentation of the induction principles. The induction principles. P(n): A predicate that one conjectures is true for all positive (nonnegative) integers. (1) Prove P (1) [ P (0)] . (2) Prove that for all n, ranging over the positive (nonnegative) integers, the following holds. Mathematical induction: Under the assumption that P (n) is true prove that P (n + 1) is true. Complete induction: Under the assumption that, for all k ≤ n, P (k) is true prove that P (n + 1) is true. One can then assert that P (n) is true for the set of positive (nonnegative) integers. Many readers find more comfortable the fact that the induction principles can be proven from the least number principle. The least number principle states that every nonempty subset of the positive (nonnegative) integers has a least member. This proof is given in many undergraduate texts on the foundations of mathematics, but the reader likely can outline the proof him/herself. Given the induction predicate P (n), assume that the induction principle of your choice does not hold for this predicate. Consider the set of all integers for which P (n) is false and apply the least number principle.
APPENDIX B: First-Order Valuation
The valuation function V defines formally the meaning of a wff A. V : terms ∪ wffs −→ D ∪ {T, F }. We define V relative to an interpretation and assignment. Assume wff A, interpretation I A, and assignment ϕ A are given. ϕ A is defined over all the variables of A here. We write the valuation function V I A,ϕ for A as V I when the chosen interpretation and assignment are understood. V I is defined recursively on subparts of wff A. Let B and C be subformulas of A. 1. V I [xi ] = ϕ A(xi ). 2. V I [ai ] = I A(ai ). 3. If t1 , . . . , tn are terms, then V I [ f (t1 , . . . , tn)] = I A( f )(V I [t1 ], . . . , V I [tn]). 4. If P is a propositional symbol, then V I [P ] = I A(P ). 5. If t1 , . . . , tn are terms, then V I [P (t1 , . . . , tn)] = I A(P )(V I [t1 ], . . . , V I [tn]). 6. V I [B ◦ C] = V I [B] ◦ V I [C] for the Boolean connectives ∧, ∨, →, ↔. 7. V I [¬B] = ¬V I [B]. 8. V I [∀xB] = T iff for all d ∈D we have V Id/x [B] = T, = F otherwise. d/x
Here V I means compute V I as usual except ϕ A(x) = d. 9. V I [∃xB] = T iff there exists d ∈D such that V Id/x [B] = T, = F otherwise. We illustrate the use of the valuation function here to determine validity. We require every interpretation considered below to have, for each d∈D, a constant ci of the formal language such that I B (ci ) = d∈D. Wff B: ∀x∃y A(x, y) → ∃x A(x, b). Wff B is not valid if (a) V I [∀x∃y A(x, y)] = T
Appendix B: First-Order Valuation
83
and (b) V I [∃x A(x, b)] = F for some I . (a) V I [∀x∃y A(x, y)] = T iff V I [∃y A(ci , y)] = T, all I B (ci ) ∈D I , iff V I [A(ci , c j )] = T, for some I B (c j ) ∈D I , dependent on I B (ci ); this for all I B (ci ) ∈ D I . (b) V I [∃x A(x, b)] = F iff V I [A(ci , b)] = F , all I B (ci ) ∈D I . Does there exist an I B , ϕ B satisfying (a) and (b)? Consider D: {1,2}. I B (A): AI B (1,1) = AI B (2,1) = T, else F. I B (b): 2. The predicate A(x, y) satisfies the subformula of part (a) with ϕ B (y) = 1 for each ϕ B (x). The subformula in part (b) is seen to be falsified. Interpretation I B satisfies both (a) and (b), so wff B is not valid.
APPENDIX C: A Commentary on Prolog
It is not possible to provide a substantial treatment or overview of the programming language Prolog in the limited space of this appendix, but we can suggest some of the nature of Prolog by selected examples. We give a database example that uses only the structure of Prolog inherited from OI-resolution as treated in the text. We then give two examples that employ recursion and the list data structure. Recursion is a basic technique for Prolog but whose semantics lies beyond the first-order formulation we have studied. The list is the basic data structure that couples with recursion to give Prolog the computation power of a universal language. For lack of space our comments on the programs are admittedly inadequate; for those seeking further understanding of Prolog we refer to the Prolog books in the bibliography. Before proceeding we mention that Prolog has generated interest in the computing community not only for its elegance but also for its usefulness in areas such as natural language processing (parsing applications in particular), database systems, and expert systems. It has proved to be a useful prototyping language; for example, the first expert systems used Prolog, but later systems used other languages that cost much more development time but provided customized tools and data structures. Prolog has been used for quick analysis of proposed chemical plants and other seemingly remote applications simply because of the ability to easily convert concepts into code. Part of the attractiveness of Prolog is the use of logic as a language, with its declarative semantics. The ideal implementation would be free of control concerns for the programmer; this has not been realized for many practical reasons but the concept does play an important role in the usefulness of the language. Taking this declarative viewpoint we see Prolog as writing down the constraints (axioms) of the problem and then asking if these constraints force the desired conclusion (theorem conjecture). The imperative viewpoint (or reading) is as important as the declarative reading of Prolog. Here we comment on some places where control issues are important. One important example is that clause order can matter. For example, clause order is often exploited to realize economy of writing; use of clause order often allows some clauses to be
Appendix C: A Commentary on Prolog
85
shorter because earlier clauses block cases that would otherwise require literals that condition the use of the clause. Control issues enter also in side-effect routines such as input and output routines. Another issue that falls outside classical logic is “negation as failure,” a non-monotonic inference rule that is based on a more general principle called the Closed World Assumption. The predicate not(x) exists in Prolog but is built-in (i.e., the truth value is determined by the Prolog program) and defined to be true iff the argument x cannot be proven from the given clause set. Hence the term “negation as failure.” We later use the inequality predicate \==, another built-in predicate that uses the negation-as-failure rule to determine its truth value. We defer further discussion of this issue to the books on Prolog. Extensions to Prolog exist that provide classical negation (and encompass all of first-order logic in effect), but these extensions are not in common use. Negationas-failure also occurs in the extension as it behaves very differently from classical negation. Another alteration from its logic base is that Prolog can return incorrect answers because the Prolog unification algorithm omits the occurs check. This is done for execution speed and does not cause difficulty on the type of problem to which Prolog is applied. Before giving our examples we note some notational Prolog conventions. Variable names begin with a capital letter, constants have lowercase first letters. Predicate, variable, and constant names must be single words which may include numbers and underscores interior to the name. The first example is a toy database that uses only the Prolog portion that was treated in the text, that is, the Prolog core that arises from OIresolution. We give the Prolog program followed by the execution trace. The reader may wish to reread Section 3.3 to recall the relation between OI-resolution and a Prolog program. We repeat here the notation that encodes an implication such as A ∧ B → C as C :- A, B, and presents the query Q to be instantiated with the answer (or just verified) as ? − Q. Implications are called rules and atomic wffs are called facts; both end with a period. A clause refers to either a rule or a fact, as one would expect. This toy database allows determination of whether someone is qualified to take a particular course by meeting all the prerequisites. We list the predicates used in the program along with the meaning of each. cs_id(course_id, course_name) links course identifier with course name prereq1(course, prevcourse) asserts that the named course has prevcourse as a single prerequisite
86
Part 1: Proof Theory
prereq2(course, prevcourse) asserts that the named course has prevcourse as one of two prerequisite courses has_taken(student, course_id) asserts that the student named has taken that course can_take(person, course_name) asserts that the student named can take the course named The program contains two rules, both with conclusion “can_take(X,Y).” The first rule asserts that person P can take the course named C if course C has the course identifier Ci, Ci has the single prerequisite Cj, and P has taken Cj already. The second rule is similar to the first rule but deals with two prerequisites. As mentioned earlier, the symbol \== encodes =. Prolog allows certain built-in symbols to be written in the usual infix notation, such as =, \==, and +. (Infix notation places the predicate between the two arguments instead of before the arguments as for prefix notation.) The program is given in Figure C.1 along with the query “?- can_take(X, artificial_intelligence).” In Figure C.2 we give a trace of the execution of the program on the query. (The program is constructed for this example query to limit its size and performs incorrectly on many other queries.) To clarify the execution trace we need some terminology, terminology that reflects the procedural view of Prolog. The atomic wff of a fact and the lefthand atomic wff of a rule are called the clause head. Rules have a nonempty body, consisting of one or more goals. The procedural view of the program considers each clause as a subroutine (subprogram) with the head as the entry/exit point and each goal as a call to another subroutine. The call is activated if the goal and some clause head match (through unification). If that call succeeds, such as by matching with a fact, then the calling goal succeeds. If all goals of a clause succeed then the head exits successfully and its calling goal succeeds in turn. Variable bindings are passed through these calls just as in resolution deductions. Goals can fail by exhausting all the clauses whose heads match with the goal without ever succeeding. On occasion goals can also go into limbo; that is not for discussion here. The reader is advised to jump between reading this paragraph and viewing Figure C.2. The trace reports on the status of successive goals. (Note: Prologs have different trace formats; this is the format used by C-Prolog.) The first action by a new goal is a CALL; the first call is by the query goal. (The terms of form “_n” where n is a number are unbound variables.) When a CALL succeeds then this is noted by an EXIT label
Appendix C: A Commentary on Prolog
87
cs_id(cs_8, programming_design_1). cs_id(cs_100, programming_design_2). cs_id(cs_108, program_methodology). cs_id(cs_115, artificial_intelligence). prereq1(cs_100, cs_8). prereq2(cs_115, cs_8). prereq2(cs_115, cs_100). has_taken(tom, cs_8). has_taken(jane, cs_8). has_taken(jane, cs_100). can_take(Person, Course_name) :cs_id(Course, Course_name), prereq1(Course, Prevcourse), has_taken(Person, Prevcourse). can_take(Person, Course_name) :cs_id(Course, Course_name), prereq2(Course, Prevcourse1), prereq2(Course, Prevcourse2), Prevcourse1 \== Prevcourse2, has_taken(Person, Prevcourse1), has_taken(Person, Prevcourse2). ?- can_take(X, artificial_intelligence). Figure C.1 A database program and sample query.
with the (perhaps instantiated) goal repeated. The leftmost number tags a goal call so one can see the results of that call by reading the lines tagged with the number that labels that call. A call may FAIL which causes the process to backtrack, to retry the previous successful goal to seek a different instantiation that may allow the failed goal to succeed on a new attempt. The label BACK TO names the goal to be retried and includes a new call of that goal. The second number on a line is less easy to define but here can be associated with rule applications. The number 2 is assigned to the first “can_take” rule. The last line with the label “2” indicates the failure of the first goal “cps_id(_65637, artificial_intelligence),” which leads to the failure of the rule. The first goal fails as no more options for success exist to lead to retries of later goals. This causes a backtrack and retry of the calling goal, the query goal (1). With the success of the new call
88
Part 1: Proof Theory
?- can_take(X, artificial_intelligence). (1) 1 Call: can_take(_0,artificial_intelligence) ? (2) 2 Call: cps_id(_65637,artificial_intelligence) ? (2) 2 Exit: cps_id(cps_115,artificial_intelligence) (3) 2 Call: prereq1(cps_115,_65638) ? (3) 2 Fail: prereq1(cps_115,_65638) (2) 2 Back to: cps_id(_65637,artificial_intelligence) ? (2) 2 Fail: cps_id(_65637,artificial_intelligence) (1) 1 Back to: can_take(_0,artificial_intelligence) ? (4) 3 Call: cps_id(_65637,artificial_intelligence) ? (4) 3 Exit: cps_id(cps_115,artificial_intelligence) (5) 3 Call: prereq2(cps_115,_65638) ? (5) 3 Exit: prereq2(cps_115,cps_8) (6) 3 Call: prereq2(cps_115,_65639) ? (6) 3 Exit: prereq2(cps_115,cps_8) (7) 3 Call: cps_8 \== cps_8 ? (7) 3 Fail: cps_8 \== cps_8 (6) 3 Back to: prereq2(cps_115,_65639) ? (6) 3 Exit: prereq2(cps_115,cps_100) (8) 3 Call: cps_8 \== cps_100 ? (8) 3 Exit: cps_8 \== cps_100 (9) 3 Call: has_taken(_0,cps_8) ? (9) 3 Exit: has_taken(tom,cps_8) (10) 3 Call: has_taken(tom,cps_100) ? (10) 3 Fail: has_taken(tom,cps_100) (9) 3 Back to: has_taken(_0,cps_8) ? (9) 3 Exit: has_taken(jane,cps_8) (11) 3 Call: has_taken(jane,cps_100) ? (11) 3 Exit: has_taken(jane,cps_100) (1) 1 Exit: can_take(jane,artificial_intelligence) X = jane ; Figure C.2 An execution trace.
to the second “can_take” rule (labeled “3”), indicated by success of the final goal (11) of that rule, we have success of the query goal. A person new to Prolog is usually startled to see the variables in goal calls (5) and (6) both instantiated to “cps_8,” leading to the clearly false inequality of call (7). But Prolog works methodically (and dumbly) topdown to the first correct match. After goal call (7) fails then goal call (6) is retried; the search continues from that previous match and finds a new match where the variable is now instantiated to “cps_100.” The new call to the inequality now succeeds.
Appendix C: A Commentary on Prolog
89
We now turn to other examples. These are presented only to further enlighten the reader to a few additional basic ideas underlying Prolog. The discussion regarding these examples is very limited. Again, we refer the reader to a Prolog book for coverage of the ideas we introduce. The data structure for Prolog is the list. Processing a list usually involves recursive programs, a format that is often confusing for the beginning Prolog programmer. It has much in common with proof by math induction; indeed, math induction is usually the manner in which one justifies the recursive program. We illustrate this program design first on a simple example, adding the elements of a list together. A fixed length list is written [1,2,3] for the list 1,2,3. The internal Prolog structure for the list [1,2,3] is of form [1,[2, [3, [] ]]] ]. (Here [] is the list of no elements, the empty list.) That is, every list is a pair of objects (x, y) where x is a list element and y is a list. The second argument, being a list, also has the structure of a pair consisting of an element and a list, as is seen above. We can have variables in lists. [X, Y, Z] will match with any three element list. There is a list variable that matches any list, written [H|T] where H (traditionally called the head) matches with the first element of the list and T matches with the tail of the list, which is always a list, as noted above. We give a program to sum a list of numbers. To understand the program one should recall mathematical induction. One establishes the base case P(0), say, and then processes the general case P(n) by assuming P(n − 1) holds and showing that P(n) must then hold. The program sum follows this structure. The first clause is the base case and the second clause the general induction (recursion) step. sum([], 0). sum([H|T], Sum) :- sum(T, Psum), Sum is Psum + H. Suppose we wish to add the elements of list [1,2,3]. We would enter the query “ ?- sum([1,2,3], Answer).” The goal “sum([1,2,3], Answer)” tries to match with sum([], 0) but cannot since our input list is not empty. The second clause is then called. The first goal is a recursive call to the predicate “sum” again with the tail of the given list as first argument. As for math induction, assume that the call is satisfied with the second argument instantiated as the program definition states. Thus we have Psum instantiated to 5. The second goal uses the built-in predicate “is” that is capable of executing arithmetic expressions in its second argument. (The predicate “equals” will not do this.) Sum is instantiated to 6 at all occurrences of the variable in the clause, in particular in the head of the clause, and hence to the query variable Answer.
90
Part 1: Proof Theory
Our final program shows how to build a list. Once one can build lists as well as modify them, many tasks can be undertaken with Prolog. (Actually, Prolog can simulate a Universal Turing Machine and so is a universal programming language.) We present a program to build a list of n Fibonacci numbers given the number n. A Fibonacci sequence is a sequence where each number is the sum of the previous two numbers. Traditionally, the first two numbers are 0 and 1. We actually present two programs, the first written less concisely to illustrate the form of many simple programs that build lists. The second program is written as a Prolog programmer might address this particular task. There are two predicates for the first program, “fib” and “addToList.” A call to create a four element Fibonacci sequence is “?- fib(4, FibSeq),” where the variable FibSeq will be instantiated with the answer. fib(2, [1,0]) :- !. fib(N, FibSeq) :- N1 is N − 1, fib(N1, Pfib), addToList(Pfib, FibSeq). addToList([H1,H2|T], FibS) :- H is H1 + H2, FibS = [H, H1, H2|T]. Again, the program is understood by reference to the induction principle. The second rule for “fib” first prepares for the recursive call to “fib” by computing the number N − 1 (here, 3). Recall that there is no reassignment of values to any variable in a declarative setting. Thus the new variable N1 carries the value N − 1. The recursive call “fib(N1, Pfib)” by induction hypothesis returns the three element list [1, 1, 0]. We grow the list to the left; a six element list would be computed as [5, 3, 2, 1, 1, 0]. (If we want the usual list order another program can be added to reverse the finished list.) The variable Pfib holds this partial Fibonacci list upon success of the recursive call. The last goal calls the rule for the predicate “addToList.” The goal “addToList(Pfib, FibSeq)” matches “addToList([H1,H2|T], FibS),” which causes the variables H1 and H2 to bind to the first two elements of the list Pfib. The variable T gets the rest of the list. The first goal sums H1 and H2 for new value H. The second goal creates a new list consisting of [H1,H2|T] with the new variable H added as the first element. This is the simplest, but powerful, form of list creation. It should now be clear that the list grows “to the left” because this is the convention for the display of the list variable. Had we used the convention [T|H], where again T is the list tail (a list) and H is the first element, then we would grow lists to the right. The base case is given by fact “fib(2, [1,0]) :- !.” Because we grow the list to the left the 0 element is to the right in our two element starter list. The goal “!” is a built-in predicate with side effects. It is not needed in
Appendix C: A Commentary on Prolog
91
the computation of the Fibonacci sequence but does prevent an error if someone mistakenly seeks an alternate answer. We can ignore its existence here. Our final program is the more concise program for the same task. Note that the second argument [H, H1, H2|Pfib] of the head is a collection of uninstantiated variables when the call to that rule is made. (However, it will only match with a list of at least three elements. A free variable matching with it takes on the same constraint.) This ability to employ terms that are only later fully defined gives Prolog and like languages much expressive and computational power. fib(2, [1,0]) :- !. fib(N, [H, H1, H2|Pfib]) :- N1 is N − 1, fib(N1, [H1, H2|Pfib]), H is H1 + H2.
References [1] Bratko, I. Prolog Programming for Artificial Intelligence. Pearson Education Ltd., Essex, 2001. [2] Chang, C. L., and Lee, R.T.C. Symbolic Logic and Mechanical Theorem Proving. Academic Press, New York, 1973 [3] Clocksin, W. F., and Mellish, C. S. Programming in Prolog: Using the ISO Standard. Springer-Verlag, Berlin, 2003. [4] Fitting, M. First-order Logic and Automated Theorem Proving. Springer-Verlag, New York, 1990. [5] Harrison, J. Handbook of Practical Logic and Automated Reasoning. Cambridge University Press, Cambridge, 2009. [6] Loveland, D. W. Automated Theorem Proving: a Logical Basis. North-Holland Publ. Co., Amsterdam, 1978. [7] Nerode, A., and Shore, R. A. Logic for Applications. Springer-Verlag, New York, 1993. [8] Schöning, U. Logic for Computer Scientists. Birkhäuser, Boston, 2008.
PART 2. Computability Theory RICHARD E. HODEL
4 Overview of Computability The second part of this book is on computability theory, a fundamental branch of mathematical logic. Other branches include proof theory, discussed by Loveland in Part 1 of this book, nonclassical logic, discussed by Sterrett in Part 3, model theory, and set theory. Computability theory has two major goals: clarify the notion of an algorithm; and develop a methodology for proving that certain problems are not solvable by an algorithm. There are two problems from Hilbert’s Program and his related research that explicitly ask for an algorithmic solution to the problem; these are Hilbert’s Decision Problem and Hilbert’s Tenth Problem. In addition, there is a rather natural problem that comes from computer science whose solution requires an algorithm, namely the Halting Problem. These three problems, together with the Word Problem, are discussed below in considerable detail. Our goal is to develop the ideas necessary to prove the nonexistence of an algorithm to solve such problems. In broad outline, we will discuss the following topics: algorithms and decision problems; the informal concepts of a computable function, a decidable relation, and a semi-decidable relation; a machine model of computation; a mathematical model of computation; and the relationship between the informal approach to computability, the machine model, and the mathematical model. It is a challenge to cover all of the material in Part 2 in an allotted time of approximately one-third of a semester. Here is a condensed version: Chapter 4, Chapter 5(1, 2, 3), and Chapter 6(1, 2, 3, 4) (or just Chapter 6(1, 2)). The discussion of LRM-computable functions and their equivalence with primitive recursive functions can also be omitted.
4.1 Decision Problems and Algorithms Throughout we use N to denote the set {0, 1, 2, . . . , n, . . .} of natural numbers and Z to denote the set of integers {0, ±1, ±2, . . . , ±n, . . .}. In addition, we use ⇔ as an abbreviation for if and only if as used in mathematical statements and proofs whereas the symbol ↔ is reserved for use in a formal language. Similar remarks apply to the symbols ⇒ and →. For example: A B ⇔ A → B.
96
Part 2: Computability Theory
Four Famous Decision Problems We begin with a discussion of four famous decision problems. Each of these problems asks for an algorithm that accepts a single input (two for the Word Problem) and eventually halts with a YES or NO answer; for each of these decision problems, the number of possible inputs is infinite. The first and oldest decision problem we discuss is one of the 23 problems raised by Hilbert in his talk at the International Congress of Mathematicians in 1900. Hilbert’s Tenth Problem. Find an algorithm that, given an arbitrary polynomial p(x1 , . . . , xn ) with integer coefficients, decides (YES or NO) whether there exist natural numbers a1 , . . . , an such that p(a1 , . . . , an ) = 0. Here are some examples of the type of polynomial that Hilbert had in mind: p(x) = 2x2 − 13x + 6; • p(x, y, z) = 4xyz2 − 20yz − 17; • p(x) = an xn + . . . + a1 x + a0 (a0 , . . . , an ∈ Z); • p(x, y, z) = (x + 1)2 + (y + 1)2 − (z + 1)2 ; • p(x, y) = x2 + y2 − 14; • p(x, y, z) = (x + 1)3 + (y + 1)3 − (z + 1)3 . √ On the other hand, p(x, y) = 2x3 + y2 and p(x, y, z) = xy + 4z are not allowed. Hilbert’s Tenth Problem was not solved until 1970, when Matiyasevich (Russian), Robinson, Davis, and Putnam (Americans) proved that there is no such algorithm; unfortunately, a complete proof of this result is beyond the scope of this text. (However, see [4, Chapter 10] or [5].) •
Thue’s Word Problem (1915). Here is a simple example that illustrates the idea of a word problem (or a “substitution puzzle”). Let be the finite set of symbols {A, B, C, . . . , Z} and let CS → C MAR → L MAT → G THE → RO be a finite list of “substitution rules.” These rules are used as follows: Suppose we have a word (= finite sequence of symbols of ) in which THE occurs; we may then replace this occurrence of THE with RO to obtain a new word. Now choose two words, say MATHEMATICS and
97
Chapter 4: Overview of Computability
LOGIC. Can we find a finite sequence of substitutions to obtain LOGIC from MATHEMATICS? MATHEMATICS ⇒ MATHEMATIC MATHEMATIC ⇒ MAROMATIC MAROMATIC ⇒ LOMATIC LOMATIC ⇒ LOGIC Although this is a rather frivolous example, it turns out that many important ideas in mathematics and computer science can be expressed in terms of substitution puzzles. Such puzzles give rise to a decision problem as follows: Given a finite alphabet and a finite set of substitution rules, is there an algorithm that, given two words W and V, decides (YES or NO) whether V can be obtained from W by a finite sequence of substitutions? The first mathematician to study word problems was the Norwegian Axel Thue (pronounced “too-ay”). In Chapter 5 we will give an example of a finite alphabet, a finite set of substitution rules, a word V, and an infinite list of words {Wa : a ∈ N} for which there is no algorithm that, given Wa , decides (YES or NO) whether V can be obtained from Wa by a finite number of applications of the substitution rules. This result is due independently to Post and Markov in 1948 and has historical interest: It is the first decision problem to be solved whose origin is outside of mathematical logic. Hilbert’s Decision Problem (1928). Let L be a first-order language. Find an algorithm that, given an arbitrary formula A of L, decides (YES or NO) whether A is logically valid (that is, true in every interpretation of L). By Gödel’s Completeness Theorem, this is equivalent to asking if A is a theorem of first-order logic. This is certainly a natural question to ask; after all, truth tables give us an algorithm that, given an arbitrary formula A of propositional logic, decides whether or not A is a tautology. Suppose that L is the first-order language with constant symbol c and 2-ary relation symbol R. Here are some relatively simple formulas of L on which the algorithm must decide: • ∀xR(x, c) → R(y, c); • ∃x∀yR(x, y) → ∀y∃xR(x, y);
• ∀y∃xR(x, y) → ∃x∀yR(x, y); • ¬∃y∀x[R(y, x) ↔ ¬R(x, x)].
The existence of an algorithm that decides logical validity would have far-reaching consequences for mathematics. To see why this is so, consider a mathematical theory T that has just a finite number of axioms (for example, linear algebra). Introduce an appropriate firstorder language L and let A1 , . . . , An be sentences of L that are the
98
Part 2: Computability Theory
translations of the axioms of T. Then for any sentence B of L: (A1 ∧ . . . ∧ An ) → B is logically valid ⇔ {A1 , . . . , An } B. (This is a consequence of the Completeness Theorem for First-Order Logic and the Deduction Theorem.) From this it follows that an algorithm for logical validity gives us a purely mechanical procedure for deciding whether or not there is a formal proof of the sentence B in firstorder logic using the non-logical axioms A1 , . . . , An ! During the early 1930s, most mathematicians were skeptical that such an algorithm exists, but confessed that they had no proof. Here are comments by two prominent mathematicians of the period. •
von Neumann (On Hilbert’s proof theory, 1927) “It appears thus that there is no way of finding the general criterion for deciding whether or not a well-formed formula A is provable. (We cannot, however, at the moment demonstrate this. Indeed, we have no clue as to how such a proof of undecidability would go). . . . The very day on which the undecidability would cease to exist, so would mathematics as we now understand it; it would be replaced by an absolutely mechanical prescription, by means of which anyone could decide the provability or unprovability of any given sentence. . . . Thus we have to take the position: it is generally undecidable, whether a given well-formed formula is provable or not. The only thing we can do is . . . to construct an arbitrary number of provable formulae. In this way, we can establish for many well-formed formulae that they are provable. But in this way we never succeed to establish that a well-formed formula is not provable.” (Note: Here von Neumann is referring to the fact that there is an algorithm for listing all of the logically valid formulas of L.) • Hardy (Mathematical Proof, Mind 38, 1929, p. 16) “Suppose, for example, that we could find a finite system of rules which enabled us to say whether any given formula was demonstrable or not. This system would embody a theorem of metamathematics. There is of course no such theorem, and this is very fortunate, since if there were we should have a mechanical set of rules for the solution of all mathematical problems, and our activities as mathematicians would come to an end.” In 1936 Church and Turing independently solved Hilbert’s Decision Problem by proving there is a first-order language L for which the
99
Chapter 4: Overview of Computability
decision problem is unsolvable (in other words, no algorithm exists). In Chapter 5 we will give Turing’s proof of the non-existence of the required algorithm. The languages for Church’s proof and for Turing’s proof are complicated in the sense that the number of non-logical symbols is fairly large. We now know that no algorithm exists even for such simple languages as these: the language has just one 2-ary relation symbol (say R); the language has just one 2-ary function symbol (say F); and the language has just two 1-ary function symbols (say F and G). Our final example of an unsolvable problem is rather natural, given the ubiquity of computers in modern society. The problem was actually raised by Turing as a first step in his solution of Hilbert’s Decision Problem. Turing’s Halting Problem (modern version). Fix a computer programming language, for example, BASIC. Let P be the collection of all BASIC programs with the following property: the input for each program is an arbitrary natural number. Find an algorithm that, given any program P in P and any natural number a, decides (YES or NO) whether program P with input a halts or continues forever. In Chapter 5 we will prove the following result due to Turing (1936): There is no such algorithm. What constitutes a decision problem? The general setting is as follows. We have a countably infinite set of inputs, say X = {x0 , x1 , x2 , . . .} = {xn : n ∈ N} and a subset A of X (i.e., A ⊆ X).
xn •
X
Input: xn ∈ X.
A
Question: Is xn ∈ A?
100
Part 2: Computability Theory
The decision problem A, X asks: Is there an algorithm that, given an arbitrary element xn of X, decides (YES or NO) whether xn ∈ A? If there is such an algorithm, we say that the decision problem A, X is solvable; on the other hand, if no such algorithm exists, we say that the decision problem A, X is unsolvable. We can view a decision problem as follows, we want a finite list of instructions that will answer an infinite number of YES/NO questions: Is x0 ∈ A? Is x1 ∈ A? Is x2 ∈ A? . . ., and so on. Each of the above (Hilbert’s Tenth Problem, certain Word Problems, Hilbert’s Decision Problem, the Halting Problem) is a decision problem that is unsolvable. On the other hand, let FOR = all formulas of propositional logic and TAUT = all tautologies of propositional logic. The decision problem TAUT, FOR is solvable (use truth tables).
Algorithms Algorithms play a fundamental role in mathematics, computer science, and modern linguistics. They also occupy a central position in mathematical logic. Indeed, one of the great achievements of mathematical logic during the twentieth century was the clarification of the notion of an algorithm. Incidentally, the word algorithm derives (by way of Latin) from Al-Khowârizmi, the name of a ninth century Arab mathematician who wrote a book on the step-by-step rules for addition, subtraction, multiplication, and division. Let us begin with an informal definition of an algorithm. Definition 1 (informal). An algorithm consists of a finite list of instructions. These instructions give a purely mechanical, step-by-step procedure for solving a problem (for example, performing calculations or answering YES-NO questions). We will use and to denote algorithms. We emphasize that the construction of an algorithm is a creative process and often requires considerable ingenuity; on the other hand, the implementation of an algorithm is purely mechanical. A carefully written computer program satisfies all of our requirements for an algorithm. Examples of well-known algorithms from number theory include the Euclidean algorithm and the division algorithm. Below are the details of several examples. Warning: These algorithms are not designed to be efficient; rather, the emphasis is on getting the job done. Example 1 (an algorithm for detecting palindromes). Let be a nonempty but finite set of symbols. Recall that a palindrome on is an expression on that is the same, symbol by symbol, when read backward or forward. As an example, abcba and cccc are palindromes
Chapter 4: Overview of Computability
101
on = {a, b, c}. The following algorithm decides (YES or NO in a finite number of steps) if an arbitrary expression on is a palindrome. (1) (2) (3) (4) (5) (6) (7)
Input an expression E of . Is the number of symbols in E ≤ 1? If so, print YES and halt. Are the first and last symbols of E the same? If not, print NO and halt. Erase the first and last symbols of E. Go to instruction 2.
Example 2 (an algorithm for tautologies). The following algorithm decides (YES or NO) if an arbitrary formula A of propositional logic is a tautology. (1) (2) (3) (4)
Input a formula A of propositional logic. Write out the complete truth table for A. Does the last column of this table have all T’s? If so, print YES and halt; otherwise, print NO and halt.
Example 3 (an algorithm for primes). We write an algorithm that, given a natural number c, decides (YES or NO in a finite number of steps) whether c is prime. Recall the following: a positive integer c is a prime if and only if c > 1 and c cannot be written as the product of two numbers, both greater than 1. (1) Input a natural number c. (2) If c = 0 or c = 1, print NO and halt. (3) Calculate all products i × j, where i and j are integers with 1 < i < c and 1 < j < c (the number of such products is finite). Is there a pair i, j such that i × j = c? (4) If so, print NO and halt; otherwise print YES and halt. There are several tacit assumptions in instruction 3: we have a mechanical procedure for listing all integers i and j such that 1 < i, j < c; we have an algorithm for calculating i × j; and we have an algorithm for deciding if i × j = c. Example 4. Euclid proved that the number of primes is infinite. Let π : N → N be the function that lists the primes in increasing order. Thus π (0) = 2, π (1) = 3, π (2) = 5, and so on. We write an algorithm that computes π . Let be an algorithm that, given a number c ∈ N, decides (YES or NO) whether c is a prime (see Example 3 above). The variables k and x are used as follows: 0 ≤ k ≤ n and x = π (k).
102
Part 2: Computability Theory
Input n ∈ N. Set k = 0 and x = 2. Is k = n? If so, print x and halt. Add 1 to k. Use the algorithm to find the smallest prime c such that x < c. (6) Let x = c. (7) Go to instruction 3.
(1) (2) (3) (4) (5)
Note: Since the number of primes is infinite, instruction (5) always terminates.
This algorithm illustrates the important idea that a known algorithm can be used to construct a new algorithm. The counterpart of this idea in programming is the use of subroutines. Ideally, an algorithm should halt for every input. However, this may not be so easy to decide! Consider, for example, the following algorithm that implements the 3x + 1 problem (also known as the Collatz conjecture). Example 5 (the 3x + 1 problem, or the Collatz conjecture). (1) Input an integer x ≥ 1. (2) Is x = 1? If so, print YES and halt. (3) If x is even, replace x with x/2; if x is odd, replace x with (3x + 1)/2. (4) Go to instruction 2. The Collatz conjecture states that for every x > 1, a finite number of applications of instruction 3 reduce the value of x to 1; in other words, the algorithm halts for every input x ≥ 1. For example, for the inputs x = 5 and x = 11 we have: x = 5, 8, 4, 2, 1; x = 11, 17, 26, 13, 20, 10, 5, 8, 4, 2, 1. Despite considerable effort, the 3x + 1 problem is still unsolved, and therefore we cannot be certain that the algorithm halts for every input. Although the notion of an algorithm is not precisely defined, we can make a list of some of the properties of an algorithm. •
Finite list of instructions An algorithm consists of a finite list of instructions, say I1 , . . . , It . We require that each instruction
Chapter 4: Overview of Computability
•
• •
•
103
be stated in a clear, precise manner so that a machine can execute the instruction. In particular, an instruction should require no ingenuity or judgment on the part of the agent that executes the instruction. Step-by-step execution An algorithm proceeds step-by-step according to the instructions. Thus there is a first step, a second step, and so on. Inputs For most of the algorithms that we study, a typical input will be a natural number or a finite list of natural numbers. Finiteness For each set of legal inputs, an algorithm halts after a finite number of steps and gives an output. This output may be the answer to a YES/NO question or the result of a computation. (Later we will relax the requirement that an algorithm must halt for every input.) Deterministic Each step of an algorithm is uniquely determined. Thus, if an algorithm is executed more than once with the same input(s), the output will be the same.
Disclaimer: In the examples that we have given thus far, we have not always strictly adhered to the requirement that an instruction can be carried out by a machine; in many cases more detail would be required. However, in these examples we are primarily interested in capturing the spirit of an algorithm. Example 6. We will also consider algorithms that list, or print out, all of the elements (and only the elements) of some infinite set. Let be the algorithm in Example 3 that decides whether or not an input c ∈ N is prime. We use to construct another algorithm that lists all the primes in increasing order; since there are infinitely many primes, this algorithm continues forever. (1) (2) (3) (4) (5)
Set c = 2. Use the algorithm to decide if c is prime. If c is prime, print c. Add 1 to c. Go to instruction 2.
Let us summarize the situation with respect to algorithms. The notion of an algorithm is informal in the sense that it is not precisely defined. This is not a serious drawback if we want to show that a certain decision problem is solvable; in this case we describe an algorithm and then show that it works. The difficulty arises when we want to prove that a decision problem is unsolvable, for in this case we must prove the non-existence of an algorithm, and to do this we need a precise
104
Part 2: Computability Theory
definition. So we have isolated the following fundamental problem: give a precise definition of an algorithm.
Decision Problems in the Framework of N; Gödel Numbering Recall the idea of the decision problem A ⊆ N: Find an algorithm that, given an arbitrary natural number c, decides (YES or NO in a finite number of steps) whether c ∈ A. Thus, for every subset A of N, we have the decision problem A, N. For example, we have: A = set of even numbers = {c : c ∈ N and c is even}; A = set of prime numbers = {c : c ∈ N and c is prime}; • A = {c : c ∈ N, c > 0, and there exist a, b > 0 such that c3 = a3 + b3 }. • •
Each of these three decision problems is solvable. On the other hand, a Cantor diagonal argument proves the existence of a subset A of N for which the decision problem is unsolvable. In Chapter 5 we will construct an explicit example of such a set. Theorem 1 (existence of an unsolvable decision problem). There exists B ⊆ N such that the decision problem B, N is unsolvable. Proof. Let us assume that all algorithms are written in the English language. Thus, the number of possible algorithms is countably infinite. It follows that the number of sets A for which the decision problem A, N is solvable is countable, say A0 , A1 , . . ., An , . . . . We now construct a set B ⊆ N that is not in this list. Here is the basic idea. Let n ∈ N. To ensure that B = An , do the following: if n is in An , do not put n in B; if n is not in An , put n in B. More precisely, let B = {n ∈ N : n ∈ / An }
By a procedure called Gödel numbering, decision problems of the form A, X can be reduced to decision problems of the form B, N. A Gödel numbering for X is obtained by proving: Theorem 2 (Gödel numbering of X). Let X be a countably infinite set. Then there is a one-to-one function # : X → N such that the following two conditions hold. Gö1. There is an algorithm that, given a ∈ X, computes #(a); the number #(a) is called the code of a.
Chapter 4: Overview of Computability
105
Gö2. There is an algorithm that, given e ∈ N, decides (YES or NO) if e is the code of some element of X. If so, the algorithm finds a ∈ X such that #(a) = e. Once a Gödel numbering is available, we then have the following reduction procedure: Theorem 3 (reduction). Let A, X be a decision problem and let # : X → N be a Gödel numbering of X with the two algorithms Gö1 and Gö2. The following are equivalent: (1) the decision problem A, X is solvable; (2) the decision problem B, N is solvable, where B = #(A) = {#(a) : a ∈ A}. See Section 5.3 on the Halting Problem for a further discussion of these ideas. To summarize, Gödel numbering allows us to focus on the following problem: give a precise definition of an algorithm within the framework of the natural number system N.
Exercises on 4.1 1.
Implement the algorithm for detecting palindromes on each of the following inputs (ignore blank spaces). (a) abbcabba (b) was it a rat i saw (c) live not on evil, madam, live not on evil
2.
For each of the following formulas, use truth tables to decide (YES or NO) whether the formula is a tautology. (a) [(A → B) → A] → A; (b) [(A → (B → C)] → [(A ∧ B) → C]; (c) [(A → B) → C] → [(A ∧ B) → C].
3.
For each of the following polynomials p(x1 , . . . , xn ), decide (YES or NO) if there are natural numbers a1 , . . . , an such that p(a1 , . . . , an ) = 0. (a) (b) (c) (d)
p(x) = 2x2 − 13x + 6; p(x, y) = x2 + y2 − 14; p(x, y, z) = 4xyz2 − 20yz − 17; p(x, y, z) = (x + 1)2 + (y + 1)2 − (z + 1)2 .
106
Part 2: Computability Theory
For more examples, see p. 790 of Wolfram’s A New Kind of Science, Wolfram Media, Inc., Champaign, IL, 2002. 4.
Let p(x) = an xn + . . . + a1 x + a0 , where a0 , . . . , an ∈ Z. Write an algorithm that, given integers a0 , . . . , an , decides (YES or NO) whether there exists c ∈ N such that p(c) = 0. Hint: If c ∈ N satisfies an cn + · · · + a1 c + a0 = 0, then c is a divisor of a0 .
5.
For each of the following formulas, decide (YES or NO) if the formula is logically valid. (a) ∀xR(x, c) → R(y, c); (b) ∃x∀yR(x, y) → ∀y∃xR(x, y); (c) ∀y∃xR(x, y) → ∃x∀yR(x, y); (d) ¬∃y∀x[R(y, x) ↔ ¬R(x, x)].
6.
Implement the algorithm for the 3x + 1 problem with input x = 39. List all calculations.
7.
In the Tower of Hanoi puzzle, we are given three pegs A, B, C and n rings R1 , . . . , Rn of increasing size (say ring Rk has diameter k, 1 ≤ k ≤ n). The n rings are originally on peg A. There are two rules: a larger ring can never be on top of a smaller ring; and rings are moved one at a time from one peg to another peg (pictures available on the Internet). (a) Suppose n = 3. Write an algorithm that moves the three rings from peg A to peg C. The algorithm should have 7 instructions, each of the form: move ring ___to peg ___. (b) Repeat part (a) with n = 4 (use 15 instructions). (c) Prove that for all n ≥ 1, there is an algorithm with 2n − 1 instructions (or steps) that moves n rings from one peg to another peg. Hint: Induction.
8.
Write an algorithm for the following decision problem: The algorithm accepts as inputs integers a, b, c (assume a = 0) and decides (YES or NO) whether or not the quadratic equation ax2 + bx + c = 0 has two distinct real solutions.
9.
The division algorithm states that given a, b ∈ N with b > 0, there exist unique q, r ∈ N such that a = bq + r and 0 ≤ r < b. The proof chooses the smallest x ≥ 1 such that a < bx. Then q = x − 1 and r = a − bq.
107
Chapter 4: Overview of Computability
Write an algorithm that accepts as inputs a, b ∈ N with b > 0 and outputs q and r. 10.
Criticize the following definition: Let X be countably infinite. The decision problem A, X is solvable if for all x ∈ X, there exists an algorithm that decides (YES or NO) if x ∈ A.
11.
(coding). We give a simple example of coding, or Gödel numbering. Let E be the collection of all expressions on the symbol set {a, b}. Assign symbol numbers to a and b as follows: SN(a) = 1, SN(b) = 2. Now define a function # : E → N as follows: if s1 . . . sn is an expression on {a, b}, then SN(s1 )
#(s1 . . . sn ) = p1
× . . . × pnSN(sn ) ,
where p1 , . . . , pn are the first n primes in increasing order. For example, #(abb) = 21 32 52 . The number #(s1 . . . sn ) is called the code of the expression s1 . . . sn . (a) Find # (baab). (b) Is there an expression s1 . . . sn such that #(s1 . . . sn ) = 100? If so, find the expression. (c) Is there an expression s1 . . . sn such that #(s1 . . . sn ) = 1260? If so, find the expression. (d) Use the Unique Factorization Theorem to show that the function # is one-to-one. (e) Write an algorithm that, given an arbitrary expression E on {a, b}, calculates #(E). (f) Write an algorithm that, given an arbitrary integer a ∈ N, decides (YES or NO) whether there is an expression E such that #(E) = a. Moreover, if YES, the algorithm finds the expression E such that #(E) = a. 12.
Prove Theorem 3.
4.2 Three Informal Concepts We have isolated the following fundamental problem: give a precise definition of an algorithm within the framework of N, the set of natural numbers. We begin by introducing three informal concepts: computable function, decidable relation, and semi-decidable relation. Each is defined in terms of an algorithm with inputs from N.
108
Part 2: Computability Theory
Computable Functions Recall that N = {0, 1, 2, . . .} ; for each n ≥ 1 let Nn = {a1 , . . . , an : ak ∈ N for 1 ≤ k ≤ n}. We will often use a to denote the element a1 , . . . , an of Nn ; thus, a = a1 , . . . , an . Definition 1. Let n ≥ 1. An n-ary function is a function F: Nn → N. We emphasize that the domain of F is Nn and that the values of F are in N. As a general rule, we use F, G, and H to denote n-ary functions. Most of the functions we consider are either 1-ary (F: N → N) or 2-ary (F: N×N → N). Here are some important examples. The successor function S: N → N, defined by S(a) = a + 1. The decrease function D: N → N, defined by a − 1 if a > 0; D(a) = 0 if a = 0. • The two decision functions sg: N → N and csg: N → N, defined by 1 if a = 0; 0 if a = 0; and csg(a) = sg(a) = 0 if a = 0. 1 if a = 0. • •
Addition and multiplication are denoted by + : N × N → N and × : N × N → N. However, we usually write a + b and a × b instead of the proper + (a, b) and × (a, b). a − b if a ≥ b; . • Proper subtraction, defined by a −b = 0 if a < b. • The n-ary projection functions Un,k (n ≥ 1, l ≤ k ≤ n), defined by Un,k (a1 , . . . , an ) = ak . •
Definition 2 (informal). An n-ary function F: Nn → N is computable, or effectively calculable, if there is an algorithm that accepts as inputs any sequence a1 , . . . , an of n natural numbers and then halts after a finite number of steps with output F(a1 , . . . , an ). More succinctly, given a = a1 , . . . , an in Nn , the algorithm calculates the number F(a) in a finite number of steps. We have labeled this an informal definition since it depends on the notion of an algorithm, itself not precisely defined. A main goal of computability theory is to give a precise meaning to Definition 2; in this way, we indirectly make the concept of an algorithm precise. Addition and multiplication, as well as the successor and the decrease functions, are all examples of computable functions. Indeed, it is not
Chapter 4: Overview of Computability
109
unreasonable to claim that every n-ary function ordinarily encountered in mathematics is computable; this claim is made on the following basis: a description of the function suggests an algorithm for computing the function. One way to prove that a given n-ary function F is computable is to write a computer program with the following property: the program accepts as inputs a sequence a1 , . . . , an of natural numbers and then halts after a finite number of steps with output F(a1 , . . . , an ). Example 1. Let FT: N → N be the factorial function, defined by FT(n) = n!. The algorithm below shows that F is computable. The key to the algorithm is: 0! = 1; (n + 1)! = (n + 1) × n!. The variables k and x are used as follows: 0 ≤ k ≤ n, x = k!. (1) (2) (3) (4) (5)
Input n ∈ N. Set k = 0 and x = 1. If k = n, print x and halt. Add 1 to k and then replace x with k × x. Go to instruction 3.
Example 2. The function π : N → N that lists the primes in increasing order (π (0) = 2, π (1) = 3, π (2) = 5, and so on) is computable; see Example 4 in 4.1 for the required algorithm. Although it is difficult to give an explicit example of a noncomputable function, we can use a Cantor diagonal argument to prove the existence of such functions. Theorem 1 (existence of a non-computable function). There is a 1-ary function that is not computable. Proof. We assume that all algorithms are written in the English language and thus the number of possible algorithms is countably infinite. It follows that the number of 1-ary computable functions is countable and therefore can be listed as F0 , F1 , . . . , Fn , . . . . We now define a 1-ary function G that is not in this list; it follows that G is not computable as required. Let G(n) = Fn (n) + 1.
Operations with Computable Functions In this section we will discuss operations on n-ary functions that preserve computability. These ideas will be important later when we discuss the class of RM-computable functions (a machine model of computability) and the class of recursive functions (a mathematical model of
110
Part 2: Computability Theory
computability). The operations are: composition, primitive recursion, and minimalization (of a regular function). Composition. Let G be a k-ary function and let H1 , . . . , Hk be n-ary functions. The composition of G, H1 , . . ., Hk is the n-ary function F defined by F(a1 , . . . , an ) = G(H1 (a1 , . . . , an ), . . . , Hk (a1 , . . . , an )) or F(a) = G(H1 (a), . . . , Hk (a)), where a = a1 , . . . , an . For example, let G(a, b) = ab , H1 (a, b, c) = a + b + c + 1, and H2 (a, b, c) = a×b×c. The composition of G, H1 , H2 is the 3-ary function F(a, b, c) = (a + b + c + 1)a×b×c . Theorem 2. Let F be the composition of G, H1 , . . . , Hk . If each of G, H1 , . . . , Hk is computable, then F is also computable. Proof. The basic idea is this: if we have algorithms (or subroutines) for computing each of G and H1 , . . . , Hk , then we can write an algorithm (or a program) that computes F. Let 1 , . . . , k be algorithms that compute H1 , . . . , Hk , respectively, and let ψ be an algorithm that computes G; the following is an algorithm that computes F. (1) (2) (3) (4)
Input a1 , . . . , an ∈ N and let a = a1 , . . . , an . For 1 ≤ j ≤ k, use the algorithm j to compute H j (a). Use the algorithm ψ to compute G(H1 (a), . . . , Hk (a)). Print G(H1 (a), . . . , Hk (a)) and halt.
Before defining the operation of primitive recursion, let us illustrate the idea with a familiar function that is defined recursively: F(a, b) = ab , defined by a0 = 1 and ab+1 = ab × a. Primitive Recursion. Given an n-ary function G and an n+2-ary function H, one can prove that there is a unique n+1-ary function F such that for all a1 , . . . , an , b ∈ N: F(a1 , . . . , an , 0) = G(a1 , . . . , an ) F(a1 , . . . , an , b + 1) = H(a1 , . . . , an , b, F(a1 , . . . , an , b)).
111
Chapter 4: Overview of Computability
More succinctly: F(a, 0) = G(a) F(a, b + 1) = H(a, b, F(a, b)), where a = a1 , . . . , an . We say that F is obtained from G and H by the operation of primitive recursion. As an example, suppose that the 2-ary function F is obtained from the 1-ary function G and the 3-ary function H by primitive recursion. Then F(a, b) is calculated as follows. F(a, 0) = G(a) F(a, 1) = H(a, 0, F(a, 0)) F(a, 2) = H(a, 1, F(a, 1)) F(a, 3) = H(a, 2, F(a, 2)), and so on. An important special case of primitive recursion is the following: F(0) = k (a constant) F(n + 1) = H(n, F(n)). Theorem 3. Let F be obtained from the n-ary function G and the n+2ary function H by the operation of primitive recursion. If both G and H are computable, then F is computable. Proof. Let be an algorithm that computes G and let ψ be an algorithm that computes H. We write an algorithm that computes F; the variables k and x are used as follows: 0 ≤ k ≤ b, x = F(a1 , . . . , an , k). (1) (2) (3) (4) (5) (6) (7)
Input a1 , . . . , an , b ∈ N and let a = a1 , . . . , an . Use the algorithm to compute F(a, 0) = G(a). Set k = 0, x = F(a,0). If k = b, print x and halt. Use the algorithm to compute F(a, k + 1) = H(a, k, x). Let x = F(a, k + 1) and add 1 to k. Go to instruction 4.
Example 3. Let L be a 2-ary computable function and let F be the 2-ary function defined by F(a, b) = L(a, 0) + 2L(a, 1) + . . . + (b + 1)L(a, b).
112
Part 2: Computability Theory
It is not difficult to write an algorithm that computes F. However, we can also show that F is computable by finding computable functions G and H such that F is obtained from G and H by primitive recursion. Let G(a) = L(a,0); clearly G is computable. It remains to find H. We have: F(a, b + 1) = L(a, 0) + 2L(a, 1) + . . . + (b + 1)L(a, b) + (b + 2)L(a, b + 1) = F(a, b) + (b + 2)L(a, b + 1). On the other hand, H satisfies F(a, b + 1) = H(a, b, F(a, b)). Thus, H(a, b, x) = x + (b + 2)L(a, b + 1) is the required 3-ary computable function (verify by writing an algorithm). Minimalization (of a regular function). An n+1-ary function G is regular provided the following condition is satisfied: given a1 , . . . , an ∈ N, there exists b ∈ N such that G(a1 , . . . , an , b) = 0. The minimalization of an n+1-ary regular function G is the n-ary function F defined by F(a1 , . . . , an ) = µb[G(a1 , . . . , an , b) = 0] or F(a) = µb[G(a, b) = 0], where µb[. . . ] means the smallest b ∈ N such that [. . . ] holds. Here are some examples. + is not regular; . . a − b is regular and moreover F(a) = µb[a − b = 0] is the identity function F(a) = a; • × is regular and moreover F(a) = µb[a × b = 0] is the zero function F(a) = 0. • •
Theorem 4. Let F be obtained by minimalization from the n + 1-ary regular function G. If G is computable, then F is computable. Proof. Let be an algorithm that computes G. The following algorithm computes the function F defined by F(a) = µb[G(a,b) = 0].
113
Chapter 4: Overview of Computability
(1) Input a1 , . . . , an ∈ N and let a = a1 , . . . , an . (2) Set b = 0. (3) Use the algorithm to compute G(a,b). (4) Is G(a,b) = 0? If so, print b and halt. (5) Add 1 to b. (6) Go to instruction 3. This algorithm systematically searches for the smallest (= first) b such that G(a, b) = 0; since G is regular, b exists and therefore the algorithm halts.
Decidable Relations Definition 3. An n-ary relation R is a subset of Nn . Two special cases are: R ⊆ N (1-ary relation); R ⊆ N×N (2-ary relation). We will use the following notation. For an n-ary relation R and a1 , . . . , an ∈ N: R(a1 , . . . , an ) for a1 , . . . , an ∈ R (we say: a1 , . . . , an satisfies the relation R); / R (we say: a1 , . . . , an does not ¬ R(a1 , . . . , an ) for a1 , . . . , an ∈ satisfy the relation R). In particular, for a 1-ary relation Q and a 2-ary relation R we have: Q(a) for a ∈ Q; ¬ Q(a) for a ∈ / Q;
R(a, b) for a, b ∈ R; ¬ R(a, b) for a, b ∈ / R.
Here are some examples that also illustrate a convenient notation for introducing relations. PR(a) ⇔ a is prime; • ODD(a) ⇔ a is odd; •
PT(a, b, c) ⇔ a2 + b2 = c2 ; • TP(a, b) ⇔ both a and b are primes and b = a + 2 (a and b are twin primes). •
PR and ODD are 1-ary relations; PR is the subset of N consisting of the prime numbers and ODD is the subset of N consisting of the odd numbers. We have: PR(5), ¬ PR(6), ODD(17), ¬ ODD(200). The relation PT is a 3-ary relation, and a 3-tuple that satisfies PT is called a Pythagorean triple; for example, PT(3 ,4, 5), PT(5, 12, 13). However, ¬ PT(2, 7, 8). The relation TP is a 2-ary relation on N; note that TP(3, 5) and T(11, 13) but ¬ TP(7, 9). It is not known if there are infinitely many pairs a, b such that TP(a, b). In the case of a 2-ary relation R (also
114
Part 2: Computability Theory
called a binary relation), we usually write a R b rather than R(a, b). For example, = and < are 2-ary relations, and we write 3 < 5 rather than < (3,5) and 4 = 7 rather than ¬ = (4,7). Definition 4 (informal). An n-ary relation R is decidable if there is an algorithm with the following property: accepts as inputs any sequence a1 , . . . , an of n natural numbers and after a finite number of steps halts with a YES or NO answer as follows: YES if R(a1 , . . . , an ); NO if ¬ R(a1 , . . . , an ). Again we have labeled this as an informal definition since it depends on the notion of an algorithm, itself not precisely defined. The relations PR, ODD, PT, TP are all decidable, as are the 2-ary relations =, , and ≥. On the other hand, by Theorem 1 in 4.1, there exist 1-ary relations that are not decidable. Theorem 5 (algebra of decidable relations). Let Q and R be n-ary decidable relations. Then each of the following n-ary relations is also decidable: (1) ¬ Q, defined by ¬ Q(a1 , . . . , an ) ⇔ not Q(a1 , . . . , an ); (2) Q ∨ R, defined by (Q ∨ R)(a1 , . . . , an ) ⇔ Q(a1 , . . . , an ) or R(a1 , . . . , an ); (3) Q ∧ R, defined by (Q ∧ R)(a1 , . . . , an ) ⇔ Q(a1 , . . . , an ) and R(a1 , . . . , an ). Proof of (3). Let and ψ be algorithms that decide Q and R, respectively. The following algorithm decides Q ∧ R: (1) Input a = a1 , . . . , an ∈ Nn . (2) Use to decide if Q(a). (3) If not, print NO and halt. (4) Use ψ to decide if R(a). (5) If not, print NO and halt. (6) Print YES and halt. To illustrate, we can use the algebra of decidable relations to show that TP is decidable as follows: TP(a, b) ⇔ PR(a) ∧ PR(b) ∧ (b = a + 2).
115
Chapter 4: Overview of Computability
Definition 5 (from relation to function). Let R be an n-ary relation. The characteristic function of R is the n-ary function KR defined by
KR (a1 , . . . , an ) =
0
if R(a1 , . . . , an );
1 if ¬R(a1 , . . . , an ).
Idea: 0 = YES, 1 = NO. Theorem 6. Let R be an n-ary relation. Then R is decidable if and only if KR is computable. Proof of ⇒. Let be an algorithm that decides R; the following algorithm computes KR . (1) Input a = a1 , . . . , an ∈ Nn . (2) Use to decide R(a). (3) If YES, print 0; if NO, print 1. Then halt. Theorem 7. Every finite subset of N is decidable. Proof. Let A = {a1 , . . . , an } be a finite subset of N. The following algorithm decides A. (1) Input a ∈ N. (2) Is there some k, 1 ≤ k ≤ n, such that a = ak ? (3) If so, print YES; otherwise, print NO. Then halt. Since A is finite, this algorithm always halts with a YES or NO answer. There is a subtle point here. How can we implement this algorithm if we do not know the elements of A? The answer is that the above proof actually describes an infinite number of algorithms, one for each nonempty finite subset of N. For any given A, one of these algorithms works; however, if we do not know A, then we do not know which algorithm to choose. To illustrate this point, let A = {a: 0 ≤ a ≤ 9 and a occurs infinitely often in the decimal expansion of π }. Clearly A is a nonempty set and A ⊆ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. However, at our present state of knowledge of the decimal expansion of π , we do not know the set A. Nevertheless there is some algorithm that decides A.
116
Part 2: Computability Theory
Semi-Decidable Relations Definition 6. An n-ary relation R is semi-decidable if there is an algorithm that accepts as input any sequence a1 , . . . , an of n natural numbers and has outcome as follows: (1) if R(a1 , . . . , an ), then the algorithm prints YES and halts; (2) if ¬ R(a1 , . . . , an ), then the algorithm continues forever. Why the interest in semi-decidability? First of all, it can be regarded as a consolation prize for decidability: If we are unable to show that a given relation is decidable, perhaps we can at least show that it is semi-decidable. Moreover, there is an interesting interplay between decidability and semi-decidability. For the moment let us step away from the setting of N and consider decision problems in general. It turns out that many decision problems that are unsolvable are easily seen to be “semi-solvable.” Here are three examples. Theorem 8 (Hilbert’s Tenth Problem). There is an algorithm that, given an arbitrary polynomial p(x1 , . . . , xn ) with integer coefficients: (1) if p(x1 , . . . , xn ) = 0 has a solution in natural numbers, then the algorithm prints YES and halts; (2) if p(x1 , . . . , xn ) = 0 has no solution in natural numbers, then the algorithm continues forever. Proof. The required algorithm is described informally as follows: Given p(x1 , . . . , xn ), systematically search for a1 , . . . , an ∈ N such that p(a1 , . . . , an ) = 0. Here is the precise description. (1) Input the polynomial p(x1 , . . . , xn ) and set c = 0. (2) List all possible n-tuples a1 , . . . , an of natural numbers such that a1 + · · · + an = c (there are a finite number in all). (3) Is there one such n-tuple a1 , . . . , an such that p(a1 , . . . , an ) = 0? (4) If so, p(x1 , . . . , xn ) = 0 has a solution in natural numbers; print YES and halt. (5) If not, add 1 to c. (6) Go to instruction 2. Theorem 9 (Halting Problem). There is an algorithm that, given an arbitrary program P and a natural number a: (1) if P with input a halts, then the algorithm prints YES and halts;
Chapter 4: Overview of Computability
117
(2) if P with input a does not halt, then the algorithm continues forever. Proof. Given P and a, run the program P with input a; if and when it halts, print YES and halt. Theorem 10 (Hilbert’s Decision Problem). Let L be a first-order language. There is an algorithm that, given an arbitrary formula A of L: (1) if A is logically valid, then the algorithm prints YES and halts; (2) if A is not logically valid, then the algorithm continues forever. Proof. We give the basic idea and ignore technical details (see p. 205 in [4]). By Gödel’s Completeness Theorem, A is logically valid if and only if A is a theorem of first-order logic. (1) Input a formula A of L. (2) Let n = 1. (3) Print out all n-line proofs. Is A the last formula of one of those proofs? (4) If so, print YES and halt. (5) Add 1 to n. (6) Go to instruction 3.
The next result reinforces the importance of semi-decidable relations. Theorem 11. Let R be an n-ary relation. The following are equivalent: (1) R is decidable; (2) both R and ¬ R are semi-decidable. Proof. To prove (1) ⇒ (2), let be an algorithm that decides R, and let us write an algorithm that semi-decides ¬ R. (1) (2) (3) (4)
Input a = a1 , . . . , an ∈ Nn . Use to decide R(a). If ¬ R(a), print YES and halt. Go to instruction 4.
To prove (2) ⇒ (1), let and be algorithms that semi-decide R and ¬R, respectively. We use a technique called dove-tailing to write an
118
Part 2: Computability Theory
algorithm that decides R. The variable k is used to count the number of steps in the execution of an algorithm. (1) Input a = a1 , . . . , an ∈ Nn . (2) Set k = 1. (3) Does the algorithm with input a halt in k steps? If so, print YES and halt. (4) Does the algorithm with input a halt in k steps? If so, print NO and halt. (5) Add 1 to k. (6) Go to instruction 3. For any a ∈ Nn , either R(a) or ¬ R(a); thus one of the algorithms or with input a halts after k steps for some k and therefore this algorithm always halts. Corollary 1. There is a 1-ary relation that is not semi-decidable. Proof. By Theorem 1 in 4.1, there is a 1-ary relation R that is not decidable. By Theorem 11, one of R,¬R is not semi-decidable. Here is another important relationship between semi-decidability and decidability. Theorem 12. Let R be an n-ary relation. The following are equivalent: (1) R is semi-decidable; (2) there is an n+1-ary decidable relation Q such that for all a1 , . . . , an ∈ N: R(a1 , . . . , an ) ⇔ ∃bQ(a1 , . . . , an , b). ↑ there exists b ∈ N Proof. To prove (1) ⇒ (2), let be an algorithm that semi-decides R. Define an n+1-ary relation Q by Q(a,b) ⇔ the algorithm with input a = a1 , . . . , an halts in ≤ b steps (and prints YES). We leave it to the reader to check that Q is a decidable relation and that R(a) ⇔ ∃bQ(a, b) for all a ∈ Nn . To prove (2) ⇒ (1), assume that there is an n+1-ary decidable relation Q such that R(a) ⇔ ∃bQ(a,b) for all a ∈ Nn . Let be an algorithm that decides Q; we write an algorithm that semi-decides R.
Chapter 4: Overview of Computability
(1) (2) (3) (4) (5) (6)
119
Input a = a1 , . . . , an ∈ Nn . Set b = 0. Does the algorithm with inputs a, b halt with output YES? If so, print YES and halt. Add 1 to b. Go to instruction 3.
Note that this algorithm systematically searches for the smallest b for which Q(a, b) holds. If there is no such b, then the algorithm continues forever. Theorem 13 (algebra of semi-decidable relations). Let Q and R be n-ary semi-decidable relations. Then both Q ∨ R and Q ∧ R are semidecidable. Definition 7. Let F be an n-ary function. The graph of F is the n+1-ary relation GF defined by GF (a1 , . . . , an , b) ⇔ F(a1 , . . . , an ) = b. We will use this definition to establish a relationship between computability and semi-decidability. Theorem 14 (from function to relation). Let F be an n-ary function. The following are equivalent: (1) F is computable; (2) GF is decidable; (3) GF is semi-decidable. Proof. We prove (3) ⇒ (1) and leave (1) ⇒ (2) and (2) ⇒ (3) to the reader. Let be an algorithm that semi-decides GF , and let us write an algorithm that computes F. The required algorithm is motivated as follows. Since F is a function, given a = a1 , . . . , an ∈ Nn , there is a unique b such that F(a) = b, in which case GF (a,b). In addition, the algorithm with inputs a, b halts in k steps for some k ≥ 1. So the strategy is to systematically search for the smallest x ≥ 1 such that b ≤ x and also k ≤ x. (1) Input a = a1 , . . . , an ∈ Nn . (2) Set x = 1. (3) Is there some b ≤ x such that with inputs a, b halts in ≤ x steps with output YES? (4) If so, print b and halt.
120
Part 2: Computability Theory
(5) Add 1 to x. (6) Go to instruction 3.
For 1-ary relations, semi-decidability can be characterized in terms of the following property. Definition 8. A 1-ary relation R is listable if there is an algorithm that lists, or prints out, all elements of R (and only elements of R). Assuming R infinite, this algorithm continues forever. For example, the set PR of prime numbers is listable (see Example 6 in 4.1). Theorem 15. Let R be a 1-ary relation on N with R infinite. Then the following are equivalent: (1) R is semi-decidable; (2) R is listable; (3) there is a computable function F: N → N such that R = { F(n): n ∈ N} (F enumerates R). Proof that (1) ⇒ (2). Assume that R is semi-decidable. Then there is a 2-ary decidable relation Q such that for all a ∈ N: R(a) ⇔ ∃bQ(a, b) ⇔ ∃b[KQ (a, b) = 0]. The following algorithm lists all elements of R and only elements of R. (1) Let k = 1. (2) Calculate KQ (a, b) for all a, b ≤ k; whenever KQ (a, b) = 0, print a. (3) Add 1 to k. (4) Go to instruction 2. Proof that (2) ⇒ (3). Let be an algorithm that lists the elements of R in the order a0 , a1 , . . . , an , . . . . Define F: N → N by F(n) = an . Clearly R = {F(n): n ∈ N}, and the following algorithm shows that F is computable. (1) Input n ∈ N. (2) Run algorithm until a0 , . . . , an are listed. (3) Print an and halt.
Chapter 4: Overview of Computability
121
Proof that (3) ⇒ (1). Let F: N → N be a computable function such that R = {F(n): n ∈ N}. Then R(a) ⇔ ∃n[F(n) = a] and therefore R is semi-decidable.
Exercises on 4.2 1.
The Fibonacci sequence is defined inductively as follows: F0 = F1 = 1; Fn+1 = Fn + Fn−1 for n ≥ 1. Thus we have: 1, 1, 2, 3, 5, 8, 13, . . . . Show that F(n) = Fn is a computable function. Hint: Use variables as follows: 2 ≤ k ≤ n, a = Fk−2 , b = Fk−1 , c = Fk .
2.
Let F(a, b, c) = H1 (a, b, c) × H2 (a, b, c), where H1 and H2 are 3-ary computable functions. (a) Show that F is computable by writing an algorithm. (b) Use the operation of composition to show that F is computable.
3.
Let F : N → N be defined inductively by F(0) = 2, F(a + 1) = 2F(a) . Thus 2 F(1) = 22 , F(2) = 22 , and so on. Assume that the function L(a) = 2a is computable. (a) Show that F is computable by writing an algorithm. (b) Use the operation of primitive recursion to show that F is computable (identify the constant k and the 2-ary function H).
4.
Let F be the 1-ary function defined by
F(n) =
0
there are at least n consecutive 3’s in the decimal expansion of π ;
1 otherwise.
(a) Let F(k) = 1. Show that F(k + 1) = 1. (b) Let F(k) = 0. Show that F(j) = 0 for all j < k. (c) Show that F is computable. 5.
/ R} is decidable. Let R ⊆ N be finite. Show that Rc = {x : x ε N and x ∈
6.
Let F: N → N be a 1-ary computable function, let Q be a 1-ary decidable relation, and define R by R(a) ⇔ ¬Q(F(a)). Let be an algorithm that computes F and let be an algorithm that decides Q. Use and to write an algorithm that decides R.
122
Part 2: Computability Theory
7.
(Goldbach’s conjecture) Let R = {a ∈ N: a even, a ≥ 4, a is the sum of two primes}. Show that R is decidable.
8.
Let Q and R be 1-ary relations. Prove the following: (a) KQ∨R = KQ × KR ; (b) KQ∧R = KQ + KR – (KQ × KR ); (c) K¬Q = csg o KQ . Recall that KP∧Q (a) = 0 if (P ∧ Q)(a) and KP∧Q (a) = 1 otherwise and that (csg o KQ )(a) = csg(KQ (a)).
9.
Let H be a 2-ary function such that, given any 1-ary computable function F, there exists e ∈ N such that for all a ∈ N, F(a) = H(e, a). Show that H is not computable. Hint: Assume that H is computable; define a 1-ary function F by F(a) = H(a, a) + 1.
10.
Let R be a 2-ary relation such that, given any 1-ary decidable relation Q, there exists e ∈ N such that for all a ∈ N, Q(a) ⇔ R(e, a). Show that R is not decidable. Hint: Assume that R is decidable; define a 1-ary relation Q by Q(a) ⇔ ¬R(a, a).
11.
Let G : N → N be a function such that for every 1-ary computable function F, there exists a ∈ N such that G(a) = F(a). Let R be the 1-ary relation defined by R(a) ⇔ G(a) = 1. Show that R is not a decidable relation.
12.
Prove Theorem 13.
13.
Complete the proof of Theorem 14.
14.
Let R be a 3-ary decidable relation and let Q be defined by Q(a) ⇔ ∃b∃cR(a, b, c). Show that Q is semi-decidable.
15.
Let R be a 1-ary relation with the following property: there is an algorithm that lists all elements of R (and only elements of R) in increasing order (that is, a0 < a1 < · · · < an < · · · ). Show that R is decidable.
16.
Let Q and R be 1-ary relations on N (that is, Q, R ⊆ N) such that Q ∩ R = , Q ∪ R is decidable, and both Q and R are semi-decidable. Show that Q is decidable.
5 A Machine Model of Computability By definition, an n-ary function F : Nn → N is computable if there is an algorithm that calculates F. However, this is not a precisely defined concept since it depends on the informal notion of an algorithm. This chapter is our first attempt to give a precise definition of computability that captures the above informal approach. The model of computability is a machine model and is described in terms of an ideal computer called a register machine. Given an n-ary function F, we say that F is RM-computable if there is a program for the register machine with the following property: Given a1 , . . . , an ∈ N, the program with inputs a1 , . . . , an halts with output F(a1 , . . . , an ). This situation certainly satisfies our intuitive notion of a computable function, and therefore it is natural to develop a model of computation based on computers and programs. The relationship between the formal and informal approach, which will be discussed a little later in considerable detail, is summarized as follows: Theorem. Every RM-computable function is computable. Church-Turing Thesis. Every computable function is RM-computable.
5.1 Register Machines and RM-Computable Functions The computer we describe is called a register machine (more precisely, an unlimited register machine) and is due to Shepherdson and Sturgis, 1963. A register machine is quite primitive in terms of the number and power of its instructions. Modern-day programming languages have many more instructions with apparently much greater computational power. On the other hand, a register machine differs from a real computer in that it has an unlimited memory. There are good reasons for not putting a bound on the amount of memory. In the first place, it is difficult to put a precise bound on the memory of any computer, since it is always possible to add more tapes, disks, drives, and so on. In addition, we are interested in the theoretical limits of computation. What can a computer do in theory, given no limitations on time or memory? We now describe the basic components of a register machine: registers, instructions, and programs.
124
Part 2: Computability Theory
Registers. There are an infinite number of registers, denoted R1 , R2 , . . . , Rn , . . . . Each register Rn can store any natural number, regardless of size. However, at any given step in a computation, 0 is in all but a finite number of the registers. Registers are used for storing input, output, and intermediate calculations. In more detail: Let a1 , . . . , an ∈ N be inputs for a computation; these inputs are stored in registers R1 , . . . , Rn , respectively; 0 is in all other registers; the output is stored in register R1 ; and the remaining registers are used for storing intermediate calculations. We emphasize that register R1 is used for both the input a1 and the final output. We often use # Rk to denote the number currently in register Rk . We can visualize the registers with a1 , . . . , an in R1 , . . . , Rn , respectively, as follows: R1
R2
•••
Rn
•••
a1
a2
•••
an
•••
Instructions and Programs. There are four types of instructions. Three of these change the number in a register; the fourth controls the sequence of steps in a computation by comparing the number in a register with 0 and acting accordingly. Instructions are described below. A program for the register machine is a finite sequence of instructions. Thus I1 , . . . , It denotes a program with t instructions; instructions in a program are always numbered consecutively beginning with 1. Increase instructions S(k), k = 1, 2, . . . . The instruction S(k) adds 1 to the current number in register Rk ; the numbers in all other registers remain unchanged. • Decrease instructions D(k), k = 1, 2, . . . . The instruction D(k) subtracts 1 from the current number in register Rk ; the numbers in all other registers remain unchanged. If the number in Rk is 0, the decrease instruction D(k) does nothing. • Zero instructions Z(k), k = 1, 2, . . . . The instruction Z(k) puts the number 0 in register Rk ; the numbers in all other registers remain unchanged. • Branch instructions B(k, q), k = 1, 2, . . . and q = 1, 2, . . . . The instruction B(k, q) compares the number in register Rk with 0; execution of the program then proceeds as follows: if # Rk = 0, branch to instruction Iq ; if there is no instruction Iq , the program halts; if # Rk = 0, continue to the instruction that follows B(k, q) in the program; if there is no next instruction, the program halts. •
125
Chapter 5: Machine Model of Computability
Note that each instruction references exactly one register. Here is a simple example of these ideas. Assume that 3, 2, 1 are in registers R1 , R2 , and R3 , respectively; we execute in order the instructions Z(2), S(3), D(1), D(2).
Initial inputs After instruction Z(2) After instruction S(3) After instruction D(1) After instruction D(2)
R1
R2
R3
•••
3 3 3 2 2
2 0 0 0 0
1 1 2 2 2
••• ••• ••• ••• •••
Definition 1 (informal). Let P be a program with instructions I1 , . . . , It ; assume that the natural numbers a1 , . . . , an are in registers R1 , . . . , Rn , respectively, and that 0 is in all other registers. The P-computation with inputs a1 , . . . , an begins with instruction I1 and then continues in sequence I2 , I3 , . . . unless a branch instruction is encountered. The P-computation with inputs a1 , . . . , an halts whenever one of the following conditions is satisfied: •
The computation reaches the last instruction It and it is not a branch instruction. • The computation reaches the last instruction It , it is a branch instruction B(k,q), and # Rk = 0; the machine wants to execute instruction It+1 next but there is no such instruction. • The computation reaches a branch instruction B(k,q) with # Rk = 0 and q > t; the machine wants to execute instruction Iq next but there is no such instruction. If a P-computation with inputs a1 , . . . , an halts, the output is the number in register R1 . To illustrate these ideas, the following program never halts, regardless of input: I1
Z(1)
I2
S(2)
I3
B(1,2)
Definition 2. An n-ary function F : Nn → N is RM-computable (or computable by a register machine) if there is a program P such that for all a1 , . . . , an ∈ N, the P-computation with inputs a1 , . . . , an halts with output F(a1 , . . . , an ). We also say: program P computes the function F.
126
Part 2: Computability Theory
We emphasize that the notion of a program for the register machine is a precisely defined concept. We know exactly what instructions are allowed, we know the procedure of inputs and outputs, and we understand how a computation proceeds. This is in contrast with the notion of an algorithm, where instructions and execution are not so clearly spelled out. Theorem 1. Every RM-computable function is computable. Proof. A program for the register machine certainly satisfies our intuitive notion of an algorithm, and therefore every RM-computable function is computable (in the intuitive sense). Before giving examples of RM-computable functions, let us summarize the discussion above. Suppose that P is a program with t instructions that computes the n-ary function F. Let a1 , . . . , an ∈ N. The computation of F(a1 , . . . , an ) proceeds as follows. Put a1 , . . . , an in registers R1 , . . . , Rn , respectively; for k > n, # Rk = 0. • The P-computation with inputs a1 , . . . , an halts with output F(a1 , . . . , an ) in register R1 . • All other registers are available to store intermediate calculations. • The number t + n is a bound on the total number of registers used in any P-computation with n inputs. •
Theorem 2. The 2-ary function + is RM-computable. Proof. We are required to write a program P such that the P-computation with inputs a,b in registers R1 and R2 , respectively (and 0 in all other registers) halts with the number a + b in register R1 . The required program, with comments, is as follows: I1 I2 I3 I4
B(2,5) S(1) D(2) B(3,1)
If # R2 = 0, halt; a + b is in R1 . Add 1 to # R1 . Subtract 1 from # R2 . Go to instruction I1 .
The last instruction of this program, B(3,1), acts like an unconditional GO TO instruction. We know that 0 is in register R3 and therefore B(3,1) tells the program to branch to instruction I1 next. To improve readability of programs, this use of the branch instruction B(k,q) will
Chapter 5: Machine Model of Computability
127
often be written as GO TO Iq . Moreover, we will not attempt to keep track of the register Rk that is used; however, it will always be understood that the register Rk is chosen so that # Rk = 0 and that Rk is not used by any other instruction in the program. Later, when we prove the unsolvability of certain decision problems, and also when we code computations, it will be necessary to have a precise definition of the P-computation with inputs a1 , . . . , an . In anticipation of this precise definition, let us describe, for the program P in Theorem 2, the P-computation with inputs a = 1 and b = 2. For this program, a step in a P-computation is denoted by s = j; r1 , r2 , r3 . This 4-tuple tells us that the numbers r1 , r2 , r3 are in registers R1 , R2 , and R3 , respectively (these are the only registers used by the program) and that the next instruction is Ij . The following sequence s1 , . . . , s10 is the P-computation with inputs 1, 2: s1 = 1; 1, 2, 0 s2 = 2; 1, 2, 0 s3 = 3; 2, 2, 0 s4 = 4; 2, 1, 0 s5 = 1; 2, 1, 0 s6 = 2; 2, 1, 0 s7 = 3; 3, 1, 0 s8 = 4; 3, 0, 0 s9 = 1; 3, 0, 0 s10 = 5; 3, 0, 0
# R1 = 1, # R2 = 2, # R3 = 0, next (= first) instruction is I1 . # R1 = 1, # R2 = 2, # R3 = 0, next instruction is I2 . # R1 = 2, # R2 = 2, # R3 = 0, next instruction is I3 . # R1 = 2, # R2 = 1, # R3 = 0, next instruction is I4 . # R1 = 2, # R2 = 1, # R3 = 0, next instruction is I1 . # R1 = 2, # R2 = 1, # R3 = 0, next instruction is I2 . # R1 = 3, # R2 = 1, # R3 = 0, next instruction is I3 . # R1 = 3, # R2 = 0, # R3 = 0, next instruction is I4 . # R1 = 3, # R2 = 0, # R3 = 0, next instruction is I1 . # R1 = 3, # R2 = 0, # R3 = 0, next instruction is I5 (does not exist).
. Theorem 3. The 2-ary function − (proper subtraction) is RMcomputable. Proof. We are required to write a program P such that the P-computation with inputs a,b in registers R1 and R2 , respectively, . halts with the number a − b in register R1 . Here is the required program with comments. . B(2,6) If # R2 = 0, halt (a − b is in R1 ). I1 . B(1,6) If # R1 = 0, halt (a − b is in R1 ). I2
128
Part 2: Computability Theory
I3 I4 I5
D(1) D(2) GO TO I1
Subtract 1 from # R1 . Subtract 1 from # R2 . Go to instruction I1 .
The Busy Beaver Function The busy beaver function, denoted by and constructed by Rado in 1962, is an explicit, easily described example of a 1-ary function that is not RM-computable. The basis for the construction of is the following fact: Every nonempty finite subset of N has a largest element. The function is best motivated by introducing a puzzle: Write a program P with exactly 12 instructions so that the P-computation with initial configuration R1 R2 R3 (0 in all registers) 0
0
0
•••
halts with output as large as possible. A first attempt might produce the obvious program S(1), . . . , S(1) (twelve times) to produce an output of 12. However, we can obviously do better. For example, the following program produces an output of 25: I1 I2 I3 I4
S(2) S(2) S(2) S(2)
I5 I6 I7 I8 I9
S(1) S(1) S(1) S(1) S(1)
I10 I11 I12
B(2,13) D(2) B(3,5)
Note the “trick” used in writing this program: We put the decrease instruction after the branch instruction B(2,13) rather than before. This gives one additional iteration of the loop I5 –I9 . With these comments in mind, we now turn to the definition of the busy beaver function. Definition 3. Let k ≥ 1. A program P for the register machine is k-special if it satisfies these two conditions: (1) P has exactly k instructions, chosen from among the following finite list: S(j): 1 ≤ j ≤ k; D(j): 1 ≤ j ≤ k; Z(j): 1 ≤ j ≤ k; B(j,q): 1 ≤ j ≤ k, 1 ≤ q ≤ k + 1.
Chapter 5: Machine Model of Computability
129
(2) The P-computation with initial configuration of 0 in all registers halts. If the output of this computation is c, we say: P returns c. Note the following consequences of condition (1): the only registers used by a k-special program are among R1 , . . . , Rk ; the number of k-special programs is finite; and q ≤ k + 1 for each branch instruction B(k,q) of a k-special program. Each instruction for the register machine refers to exactly one register, and this fact allows us to prove the following. Lemma 1. Let P be a program with t instructions that computes a 1-ary function F. Then P can be replaced with a t-special program that also computes F. Outline of proof. Let I1 , . . . , It be the instructions of program P. We may assume that q = t + 1 for each branch instruction B(k,q) of P with q > t. Let Rj1 , . . . , Rjs , where s ≤ t and j1 < . . . < js , be the registers used by P. We now construct a t-special program Q that also computes F but uses registers R1 , . . . , Rs . Replace each instruction I of P with the instruction I* obtained as follows: If I refers to register Rjk , then I* refers to register Rk . Lemma 2. For each j ≥ 2, there is a j + 4-special program Pj that returns 2j and leaves 0 in all other registers. Proof. Here is the required program with comments.
Ij−1 Ij Ij+1
S(2) : S(2) S(1) S(1)
Ij+2 Ij+3
B(2, j + 5) D(2)
If # R2 = 0, halt. Subtract 1 from # R2 .
Ij+4
B(3, j)
GO TO Ij
I1 :
# R2 = j− 1. Add 1 to # R1 . Add 1 to # R1 .
Definition 4. The busy beaver function is the function : N → N defined as follows: (0) = 0; for k ≥ 1, (k) = max.{c : c ∈ N and c is returned by some k-special program}.
130
Part 2: Computability Theory
Note the following: is well-defined (the number of k-special programs is finite and each must halt with input 0); • is strictly increasing (i.e., for all k, (k) < (k + 1)); • for all j ≥ 2, (j + 4) ≥ 2j (see Lemma 2). •
Theorem 4. is not RM-computable. Proof. Suppose by way of contradiction that there is a program P with instructions I1 , . . . , It that computes . We may assume that this program is t-special (see Lemma 1). We now obtain a contradiction by using P to construct a k-special program that returns a number greater than (k). The program Qj . For each j ≥ 2, we construct a j + 4 + t-special program Qj that returns (2j). The required program is obtained as follows. Let J1 , . . . , Jj+4 be a j + 4-special program that returns 2j and leaves 0 in all other registers (see Lemma 2). Then Qj is the following program (the join of the two programs J1 , . . . , Jj+4 and P , a concept to be discussed in detail later in this section): J1 : : Jj+4 I1 :
In the instructions I1 , . . . , It , replace each branch instruction B(i, q) with B(i, q + j + 4).
: It Clearly Qj returns (2j), and this completes the construction of the program Qj . We now claim that for all j ≥ 2, (2j) ≤ (j + 4 + t). To see this, note the following: • •
(2j) is the number returned by the j + 4 + t-special program Qj ; (j + 4 + t) is by definition the largest number returned by any j + 4 + t-special program whatsoever.
Chapter 5: Machine Model of Computability
131
Now choose j so that j + 4 + t < 2j (for example, j = 5 + t). Since is increasing, (j + 4 + t) < (2j), a contradiction of the above inequality. Corollary 1. Assume the Church-Turing Thesis. (1) The function is not computable. (2) The Halting Problem for Register Machines is unsolvable. Proof. By Theorem 4, is not RM-computable. The Church-Turing Thesis then implies that is not computable. To prove (2), suppose that there is an algorithm that, given a program P and an input a, decides (YES or NO) whether the P-computation with input a halts or continues forever. We now contradict (1) by showing that is computable. Here is an algorithm that computes for all k ≥ 1 (recall that (0) = 0). (1) Input k ≥ 1. (2) List all programs P that have exactly k instructions and whose instructions are among S(j), D(j), Z(j), and B(j,q), where 1 ≤ j ≤ k and 1 ≤ q ≤ k + 1. The number of such programs is finite (but some may not halt with input 0). (3) For each such program P, use the algorithm to decide if the P-computation with input 0 halts. Keep those programs P that do halt, say P1 , . . . , Pn . (4) Print c = max{cj : 1 ≤ j ≤ n and cj is returned by Pj }.
Subroutines It is convenient to use subroutines in programs for the register machine, and we now discuss the procedure for doing this. Two useful subroutines COPY(j,k) and MOVE(j,k) will be emphasized. COPY(j,k). Let j and k be positive integers. We write a program COPY(j,k) such that if # Rj = a and # Rk = b, then the program halts with a in both of the registers Rj and Rk . The program uses an auxiliary register, say Rm , as a temporary storage for a. In addition, it also uses register Rm+1 to implement an unconditional GO TO command. We thus assume that Rm is chosen so that # Rm = # Rm+1 = 0 and that these two registers are not used by the program in which COPY(j,k) appears as a subroutine. Since the number of registers is infinite, it is always possible to choose two such registers. Here is a picture: If we
132
Part 2: Computability Theory
start with Rj
•••
Rk
•••
Rm
•••
a
•••
b
•••
0
•••
then COPY(j,k) halts with configuration Rj
•••
Rk
•••
Rm
•••
a
•••
a
•••
0
•••
Here is the required program with comments. I1 I2 I3 I4 I5 I6 I7 I8 I9 I10
Z(k) B(j,7) S(k) S(m) D(j) GO TO I2 B(m, 11) S(j) D(m) GO TO I7
# Rk = 0. If # Rj = 0, go to I7 . Add 1 to # Rk . Add 1 to # Rm . Subtract 1 from # Rj . # Rm+1 = 0. If # Rm = 0, halt. Add 1 to # Rj . Subtract 1 from # Rm . # Rm+1 = 0.
The subroutine COPY(j,k) should perhaps be denoted COPY(j,k;m) as a reminder that the two registers Rm and Rm+1 are also used by the program and are chosen so that they differ from all other registers used by the program in which the subroutine is used. However, this is a bookkeeping detail and so we will not attempt to keep track of these two registers. MOVE(j,k). This program is similar to COPY(j,k) except that upon halting the number in register Rj is 0. For future reference, this program has five instructions and does not need an auxiliary register for temporary storage but does use an unconditional GO TO instruction. Theorem 5. Let k and n be positive integers with 1 ≤ k ≤ n. The n-ary function Un,k , defined by Un,k (a1 , . . . , an ) = ak , is RM-computable. Proof. The program MOVE(k,1) shows that Un,k is RM-computable. The next two definitions serve as background for incorporating subroutines into a larger program.
Chapter 5: Machine Model of Computability
133
Definition 5. A program I1 , . . . , It is in standard form if q ≤ t+1 for every branch instruction B(k,q). It is clear that if P is a program that computes an n-ary function F, then there is a program P* in standard form that also computes F. Definition 6. Let P and Q be programs, both in standard form; assume that P has s instructions and Q has t instructions. The join of P and Q, written PQ, is the program I1 , . . . , Is , Is+1 , . . . , Is+t , where I1 , . . . , Is is the program P and Is+1 , . . . , Is+t is the program Q except that each branch instruction B(k,q) of Q is replaced with the branch instruction B(k, q + s). Idea: In a computation, program P is executed first, followed by program Q. Note that program PQ is in standard form. Lemma 3. The join operation is associative. More precisely, if P, Q, and R are programs in standard form, then (PQ)R and P(QR) are the same program (and therefore parentheses may be omitted). Proof. Assume P has s instructions and Q has t instructions. We obtain the program (PQ)R as follows: First obtain PQ by replacing each branch instruction B(k,q) of Q with B(k, q + s); then replace each branch instruction B(k,q) of R with B(k, q + s + t). On the other hand, program P(QR) is obtained as follows: First obtain QR by replacing each branch instruction B(k,q) of R with B(k,q + t); then replace each branch instruction B(k,q) of QR with B(k, q + s). This amounts to replacing each branch instruction B(k,q) of Q with B(k, q + s) and each branch instruction B(k,q) of R with B(k, q + s + t); this is the same as the program (PQ)R. Theorem 6. The 2-ary function × is RM-computable. Proof. We are required to write a program P such that the Pcomputation with inputs a,b in registers R1 and R2 , respectively, halts with the number a × b in register R1 . We write two versions; in the first version we treat a subroutine as a single instruction; in the second version, we show the bookkeeping changes that are required to make the first version technically correct. Idea of the program: a × b = a + . . . + a (b times). Version One. I1 I2 I3
B(1,11) MOVE(1,3) B(2,11)
If a = 0, then a × b = 0; halt. # R3 = a, # R1 = 0. If # R2 = 0, then # R1 = a × b; halt.
134
Part 2: Computability Theory
I4 I5 I6 I7 I8 I9 I10
COPY(3,4) S(1) D(4) B(4, 9) GO TO I5 D(2) GO TO I3 .
# R3 = a, # R4 = a. Add 1 to # R1 . Subtract 1 from # R4 . If # R4 = 0, go to I9 . Subtract 1 from # R2 .
Version Two. I1 I2 – I6 I7 I8 − I17 I18 I19 I20 I21 I22 I23
B(1,24) MOVE(1,3) B(2,24) COPY(3,4) S(1) D(4) B(4, 22) GO TO I18 D(2) GO TO I7 .
If a = 0, then a × b = 0; halt. # R3 = 0, # R1 = 0. (*) If # R2 = 0, then # R1 = a × b; halt. # R3 = a, # R4 = a. (**) Add 1 to # R1 . Subtract 1 from # R4 . If # R4 = 0, go to I22 . Subtract 1 from # R2 .
In Version Two, branch instructions are modified to account for the fact that the subroutines COPY(j,k) and MOVE(j,k) have 10 and 5 instructions, respectively. In addition, the join operation requires that branch instructions in these two subroutines be modified as follows: (*) each branch instruction B(j,q) of MOVE(1,3) is replaced by B(j,q + 1); (**) each branch instruction B(j,q) of COPY(3,4) is replaced by B(j,q + 7). Finally, register R5 is available as the auxiliary register for COPY(3,4). Convention on writing programs: Henceforth, we will write all programs in the style of Version One. Although Version Two is the technically correct program, the difference amounts to bookkeeping details. The more transparent Version One is the preferred choice.
Exercises on 5.1 1.
Show that the 3-ary function F(a, b, c) = 2a + b + c is RM-computable by writing a program P that computes F.
135
Chapter 5: Machine Model of Computability
2.
Write the subroutine MOVE(j,k).
3.
Show that the following 2-ary function is RM-computable: K= (a,b) = 0 if a = b and K= (a, b) = 1 otherwise. (This is the characteristic function for =.)
4.
Write a program P such that given input a ∈ N: If a is even, the P-computation with input a halts with output a/2; if a is odd, the P-computation with input a does not halt.
5.
(Fibonacci numbers) Let F: N → N be defined recursively as follows: F(0) = F(1) = 1; for n ≥ 1, F(n + 1) = F(n) + F(n − 1). Show that F is RM-computable.
6.
Write a subroutine JUMP(j,k; q) that does the following: If # Rj = #Rk , branch to instruction q; if # Rj = # Rk , the program halts (more precisely, goes to the instruction that follows the subroutine); the numbers in Rj and Rk remain unchanged. Hint: use the subroutine COPY(j,k).
7.
(Zero instruction superfluous) Without using zero instructions, write a program ZERO(k) that puts 0 in register Rk and leaves all other registers unchanged. Assume that 0 is in register Rm and is therefore available for an unconditional GO TO instruction.
8.
The following is a 20-special program that returns c. (Note that the program only uses 4 registers, and thus registers Rk , 5 ≤ k ≤ 20 are all available for unconditional GO TO instructions.) I1 I2 I3 I4
S(4) S(4) S(4) S(4)
I5 I6 I7 I8 I9
S(2) S(2) B(1,11) D(1) GO TO I5
I10 I11 I12 I13 I14 I15 I16
S(3) S(1) S(1) B(2,17) D(2) B(4,11) GO TO I10
I17 I18 I19 I20
B(3,21) D(3) D(4) GO TO I5 .
(a) Under what condition does the P-computation with input 0 in all registers halt? (b) Assume that # R1 = 2k − 2 and # R2 = 0. Find # R1 and # R2 after executing the instructions I5 –I9 . (c) Assume # R1 = 2k − 2, # R2 = 0, # R3 = x, and # R4 = y. Find # R1 , # R2 , # R3 , and # R4 after executing the instructions I5 –I19 . Consider two cases: y = 0, y = 0. (d) Find the number c returned by the 20-special program P.
136
Part 2: Computability Theory
For a further discussion of the busy beaver function and examples such as the one above, see Sweet Reason by Tymoczko and Henle. 9.
Let F: N → N be RM-computable. (a) Show that there exists a ∈ N such that F(a) < (a). (b) Use (a) to show that is not RM-computable.
5.2 Operations with RM-Computable Functions; Church-Turing Thesis; LRM-Computable Functions In Section 4.2 we discussed three important operations with n-ary functions: composition, primitive recursion, and minimalization (of a regular function). Each of these operations preserves computability. Our goal is to show that each of these operations also preserves RMcomputability. In this section we consider composition and primitive recursion; minimalization will follow from a general result in 5.3. The following two subroutines will be useful. Program ZERO[j, w]. Let j < w. This program puts 0 in the registers Rj , . . . , Rw . Program P[k, w, z]. Let P be a program with t instructions I1 , . . . , It that computes a k-ary function H and let w ∈ N be such that w > k and such that the registers used by P are among R1 , . . . , Rw . Finally, let z = w + 1, . . . , z = w + k. We write a program (or subroutine), denoted P[k, w, z], such that with a1 , . . . , ak in the registers Rw+1 , . . . , Rw+k , respectively, proceeds as follows: copies a1 , . . . , ak to registers R1 , . . . , Rk , respectively; puts 0 in registers Rk+1 , . . . , Rw ; • calculates H(a1 , . . . , ak ); • halts with # Rz = H(a1 , . . . , ak ) and a1 , . . . , ak in registers Rw+1 , . . . , Rw+k , respectively. • •
The program P[k,w,z] is the join of the following k + 3 programs: COPY(w + 1, 1) : : COPY(w + k, k) ZERO[k + 1, w]
Chapter 5: Machine Model of Computability
137
I1 , . . . , It (the program P) MOVE(1,z). Theorem 1 (composition). Let F be an n-ary function defined by F(a1 , . . . , an ) = G(H1 (a1 , . . . , an ), . . . , Hk (a1 , . . . , an )). If each of G, H1 , . . . , Hk is RM-computable, then F is also RMcomputable. Proof. For 1 ≤ j ≤ k let Pj be a program that computes Hj and let Q be a program that computes G. Choose w > n + k so that the registers used by all of the programs P1 , . . . , Pk and Q are among R1 , . . . , Rw . A program that computes F is the join of the following programs: MOVE(1, w + 1) : : MOVE(n, w + n) P1 [n, w, w + n + 1] : : Pk [n, w, w + n + k] Q[k, w + n + 1, 1]. This program with inputs a1 , . . . , an proceeds as follows: moves a1 , . . . , an to registers Rw+1 , . . . , Rw+n , respectively; for 1 ≤ j ≤ k, computes bj = Hj (a1 , . . . , an ) and stores bj in register Rw+n+j ; • computes G(b1 , . . . , bk ) and halts with # R1 = G(b1 , . . . , bk ). • •
Theorem 2 (primitive recursion). Let F be an n+1-ary function defined by primitive recursion as follows: F(a1 , . . . , an , 0) = G(a1 , . . . , an ); F(a1 , . . . , an , b + 1) = H(a1 , . . . , an , b, F(a1 , . . . , an , b)). If G and H are both RM-computable, then F is also RM-computable.
138
Part 2: Computability Theory
Proof. To simplify notation, we consider the following special case: F(a, 0) = G(a); F(a, b + 1) = H(a, b, F(a, b)). Let I1 , . . . , Is be a program that computes G, let J1 , . . . , Jt be a program that computes H, and let w be sufficiently large so that the registers used by these two programs are among R1 , . . . , Rw . Consider the following:
Ip
Iq
COPY(1, w + 1) MOVE(2, w + 3) I1 , . . . , Is B(w + 3, q + 1) MOVE(1,3) COPY(w + 1, 1) COPY(w + 2, 2) ZERO[4, w] J1 , . . . , Jt D(w + 3) S(w + 2) GO TO Ip .
The above is essentially the join of several programs. The exceptions are the two branching instructions labeled Ip and Iq ; we have not calculated the precise values of p and q. This program with inputs a,b proceeds as follows: • Copies a to register Rw+1 and moves b to register Rw+3 . Let x = # Rw+2 . Initially x = 0 but increases up to b as the computation proceeds. The program is designed to compute F(a,x), 0 ≤ x ≤ b. • The program uses I1 , . . . , Is to compute F(a,0). If b = 0, the program halts with G(a) in register R1 . • If b > 0, the program uses J1 , . . . , Jt to compute F(a,1), . . . , F(a, b) and then halts with F(a, b) in register R1 . By Theorems 1 and 2, the class of RM-computable functions is closed under the two operations of composition and primitive recursion. This gives us a powerful technique for showing a function is RM-computable without actually writing the required program. Example 1. The 2-ary function L defined by L(a, b) = ab is RMcomputable. Informally, L is defined recursively from multiplication by
139
Chapter 5: Machine Model of Computability
L(a, 0) = 1 and L(a, b + 1) = L(a, b) × a. Formally, let G and H be defined by G(a) = 1 and H(a, b, x) = x × a; since G and H are RM-computable (verify) and L is obtained from G and H by primitive recursion, L is RM-computable as required. Example 2. We justify the following (somewhat vague) assertion: There is a short program that, given input 0, returns a very large number. By Example 1, the function F0 (a) = 2a is RM-computable. Let a
F1 (a) = F0 (F0 (a)) = 22 , 2a
F2 (a) = F1 (F0 (a)) = 22 , and so on. Each of F1 and F2 is RM-computable by composition. Continuing this process, the function F20 is also RM-computable. By Problem 9 in 5.1, there exists k ∈ N such that F20 (k) < (k). Now let P be a k-special program that returns (k). Then P is a program with k instructions and the P-computation with input 0 returns a number >F20 (k). There is another operation that preserves the property of being RM-computable, namely minimalization. We state the result now but postpone the proof until the next section. Theorem 3 (minimalization of a regular function). Let G be an n + 1ary regular function and let F be the n-ary function defined by F(a1 , . . . , an ) = µx[G(a1 , . . . , an , x) = 0]. If G is RM-computable, then F is also RM-computable.
Church-Turing Thesis At the beginning of this chapter we summarized the relationship between computable functions and RM-computable functions as follows: Theorem. Every RM-computable function is computable. Church-Turing Thesis. Every computable function is RM-computable. The proof of the theorem follows immediately from the fact that RM-programs satisfy our requirement for an algorithm. But can an algorithm that calculates a given n-ary function F always be converted
140
Part 2: Computability Theory
into a program for the register machine that computes F? The ChurchTuring Thesis is the assertion that the answer is “yes” and that the register machine does indeed capture the informal notion of an n-ary computable function. However, this claim is not subject to proof. To see why, consider the contrapositive of the Church-Turing Thesis: not RM-computable ⇒ not computable. Suppose we have an n-ary function F that is not RMcomputable (for example, the busy beaver function). In order to show that F is not computable, we must show that there is no algorithm that calculates F. Since the notion of an algorithm is not precisely defined, we cannot hope to prove its nonexistence. Nevertheless, there is strong evidence that the Church-Turing Thesis does indeed hold (more on this in Chapter 6). Accepting the Church-Turing Thesis in the form given above amounts to accepting the following two claims: (1) Let F be an arbitrary n-ary computable function. Then there is a program, written in some programming language, for some computer, that computes F. In other words, any algorithm (in the informal sense) that computes F can be converted into a computer program that implements the algorithm. (2) The computational power of the register machine is as powerful as any combination of programming language and computer that now exists or ever will exist. We already have examples and theorems (composition, primitive recursion, minimalization) to support claim (2). Moreover, throughout this chapter and the next we will continue to collect evidence to support both (1) and (2). Could the Church-Turing Thesis be false? Consider the following scenario: A bright mathematician describes, to a panel of experts on computability, a function F and an algorithm for computing F. The experts all agree that the proposed algorithm does indeed satisfy their intuitive understanding of an algorithm and moreover that it computes the given function. The clever mathematician then proceeds to give a precise proof that F is not RM-computable. In other words, a counterexample to the Church-Turing Thesis is theoretically possible but highly unlikely. The Church-Turing Thesis plays an essential role in proofs of the unsolvability of decision problems. Suppose that we want to prove that the decision problem R, N is unsolvable. First of all, it suffices to show that the characteristic function KR is not computable (see Section 4.2). Proceed as follows: Prove that KR is not RM-computable; by the ChurchTuring Thesis (contrapositive form), KR is not computable. This is an essential use of the Church-Turing Thesis (more on this later).
Chapter 5: Machine Model of Computability
141
LRM-Computable Functions The branch instruction for the register machine is quite powerful. For example, it allows loops whose length depends on the result of a calculation that takes place within the loop itself. In addition, careless use of a branch instruction can lead to a computation that does not halt. In this section we modify the instructions of the register machine to obtain the limited register machine (LRM). The basic idea is to omit the branch instructions and replace them with pairs of instructions that allow loops that are executed a fixed number of times, where this number is specified in advance of the loop. In more detail, the instructions for the limited register machine are S(k) and Z(k) for all k ≥ 1 together with the new instructions FOR(k), k = 1, 2, . . . ; END. The new instructions FOR(k) and END are always used in pairs (similar to left and right parentheses). We illustrate the use of these new instructions in a hypothetical program. Assume that # Rk = n. We use FOR(k) and END to execute the sequence of instructions Iq+1 , . . . , It a total of n times. I1 : Iq−1 Iq FOR(k) Iq+1 . . . . . . . . . : : ......... It It+1 END : Iz
The sequence of instructions Iq+1 , . . . , It is executed n times, where n = #Rk .
The pair of instructions FOR(k) and END tell the program to loop n times, where n = # Rk . The instruction END marks the end of the loop. In addition: (1) If the number in register Rk is changed by the instructions Iq+1 , . . . , It , the loop is still executed n times, where n is the number in register Rk immediately before entering the loop. (2) If # Rk = 0, the entire loop is skipped and the instruction executed after FOR(k) is the instruction that follows the
142
Part 2: Computability Theory
corresponding END (if there is such an instruction; otherwise, the program halts). For example, the following LRM-program computes the function F(a) = 2 × a. I1 I2 I3
FOR(1) S(1) END
Definition. The collection of (legal) programs for the limited register machine is defined inductively as follows. (1) Each increase instruction S(k) and each zero instruction Z(k) is a program. (2) If P and Q are programs, then PQ (the instructions of P followed by the instructions of Q) is a program. (3) If P is a program and k ≥ 1, then FOR(k) P END is a program. (4) Every program is obtained by a finite number of applications of (1), (2), (3). Now let P be a program for the limited register machine with instructions I1 , . . . , It . The P-computation with inputs a1 , . . . , an (in registers R1 , . . . , Rn , respectively) begins with instruction I1 and then continues in sequence I2 , I3 , . . . . If a FOR(k) instruction is encountered, the program has a loop of the form FOR(k) Q END that is executed a fixed number of times, after which the computation again continues in order. One can prove that the program always halts. An n-ary function F is LRMcomputable if there is a program for the limited register machine that computes F. Theorem 4. The following functions are LRM-computable: +, ×, . D(a) = a−1, and Un,k . Proof. We prove the result for × and D and leave + and Un,k to the reader. ×
I1 I2 I3 I4 I5 I6 I7 I8 I9
FOR(2) FOR(1) S(3) END END Z(1) FOR(3) S(1) END
Key idea: a × b = a + . . . + a (b times).
Chapter 5: Machine Model of Computability
D
I1 I2 I3 I4 I5 I6 I7
143
FOR(1) Z(1) FOR(2) S(1) END S(2) END
The LRM-computable functions are closed under the operations of composition and primitive recursion (see the exercises) but not under minimalization. These ideas will be discussed in more detail in Chapter 6, where we will compare the recursive functions with the primitive recursive functions. For now the situation can be summarized as follows: •
Every LRM-computable function is RM-computable. • Ackermann’s function, described in 6.3, is an example of an RM-computable function that is not LRM-computable. It follows that the Church-Turing Thesis does not hold for LRM-computable functions; in other words, the LRM-computable functions are not sufficient to capture our intuitive notion of an algorithm.
Exercises on 5.2 1.
Let F(a, b) = ab + a × b. Show that F is RM-computable without writing an explicit program that computes F.
2.
Prove that the following functions are LRM-computable. (a) (b) (c) (d)
3.
a + b; sg(a) = 0 if a = 0 and sg(a) = 1 otherwise; csg(a) = 1 if a = 0 and csg(a) = 0 otherwise; . a − b.
Let P be the LRM program that computes a × b. (a) Show that P is a legal LRM-program. (b) Execute P with inputs a = 2, b = 3.
4.
. Let P be the LRM program that computes D(a) = a−1. (a) Show that P is a legal LRM-program. (b) Execute P with input a = 3.
144
Part 2: Computability Theory
5.
Write an LRM-program COPY(j,k) that copies the number in register j to register k and leaves all other registers unchanged. This program does not need an auxiliary register.
6.
Let G and H be 1-ary functions that are LRM-computable and let F(a) = G(H(a)). Show that F is LRM-computable.
7.
Let k ∈ N and let H be a 2-ary LRM-computable function. Define a 1-ary function F by recursion as follows: F(0) = k; F(n + 1) = H(n, F(n)). Show F is LRM-computable.
8.
Let G be a 1-ary LRM-computable function, let H be a 3-ary LRM-computable function, and let F be defined recursively by F(a,0) = G(a), F(a, b + 1) = H(a, b, F(a,b)). Show F is LRM-computable.
9.
Let P be an LRM-program. Show that the P-computation with any inputs a1 , . . . , an halts. Hint: Use the inductive definition of a legal LRM-program.
5.3 RM-Decidable and RM-Semi-Decidable Relations; the Halting Problem Let F be an n-ary function. By definition, F is computable if there is an algorithm that calculates F; on the other hand, F is RM-computable if there is a program P that calculates F. In this way an informal concept (algorithm) is replaced with a precise definition (program). In this section we will use the register machine to give precise counterparts to the informal notions of a decidable relation and a semi-decidable relation. An obvious way to give a precise definition of decidability is to use the result from Section 4.2 that a relation R is decidable if and only if the characteristic function of R is computable. Definition 1. An n-ary relation R is RM-decidable if there is a program P for the register machine such that for all a1 , . . . , an ∈ N: if R(a1 , . . . , an ), then the P-computation with inputs a1 , . . . , an halts with output 0(= YES); • if ¬ R(a1 , . . . , an ), then the P-computation with inputs a1 , . . . , an halts with output 1(= NO). •
In other words, R is RM-decidable if and only if KR is RM-computable. We also say the program P decides R. Example 1. The 2-ary relation < is RM-decidable.
Chapter 5: Machine Model of Computability
145
Proof. We are required to write a program P such that the P-computation with inputs a,b in registers R1 and R2 , respectively, (and 0 in all other registers) halts with 0 in register R1 if a < b and 1 otherwise. Here is the required program. I1 I2 I3 I4 I5 I6 I7
B(2,6) B(1,8) D(1) D(2) GO TO I1 Z(1) S(1)
If # R2 = 0, then # R1 ≥ # R2 ; put 1 in R1 and halt. If # R1 = 0 and # R2 > 0, then # R1 < # R2 ; halt. Decrease # R1 by 1. Decrease # R2 by 1.
# R1 = 1
The relationship between the informal concept and its formal counterpart is summarized in the next theorem (proof left to the reader). Theorem 1. The following hold: (1) Every RM-decidable relation is decidable. (2) Assuming the Church-Turing Thesis, every decidable relation is RM-decidable.
The Halting Problem We now give an explicit, natural example of a relation that is not RMdecidable. Consider the following decision problem. Halting Problem (for register machines). Find an algorithm that, given the two inputs program P, natural number a, decides (YES or NO in a finite number of steps) whether the P-computation with input a halts or continues forever. Eventually we will show that the Halting Problem is unsolvable (assuming the Church-Turing Thesis). The first step is to transfer the problem to a decision problem about a 2-ary relation HLT on N. In other words, we replace the input P (a program) with a number e ∈ N called the code of P; this technique is known as Gödel numbering. Let P be the collection of all register machine programs. Theorem 2 (Gödel numbering of programs). There is a one-to-one function # : P → N such that the following two conditions hold.
146
Part 2: Computability Theory
Gö1. There is an algorithm that, given a program P for the register machine, computes #(P); the number #(P) is called the code of P. Gö2. There is an algorithm that, given e ∈ N, decides (YES or NO) if e is the code of some program. If so, the algorithm finds the program P such that #(P) = e. Outline of proof. First assign to each instruction I an instruction number IN(I) as follows: IN(S(k)) = 3k ;
IN(Z(k)) = 7k ;
IN(D(k)) = 5k ; IN(B(k, q)) = 11k × 13q . Note that 3, 5, 7, 11, 13 are odd primes. Moreover, there is an algorithm that, given a ∈ N, decides (YES or NO) if a is an instruction number; if so, the algorithm also finds the instruction I such that IN(I) = a. The Gödel coding function #: P → N is defined as follows. For a program P with instructions I1 , . . . , It : #(P) = π (0)IN(I1 ) × . . . × π (t − 1)IN(It )
(π lists the primes in increasing order: π (0) = 2, π (1) = 3, and so on).
By the Unique Factorization Theorem, the function # is one-to-one. Before writing algorithms for Gö1 and Gö2, consider the following examples. •
Find #(P) for the program P with instructions Z(2), S(3), D(1), 2 3 1 2 D(2). Answer: #(P) = 27 × 33 × 55 × 75 . • Let e = 27 × 39 × 51859 . Is e the code of a program? If so, find the program. Hint: 1859 = 11 × 132 . Answer: YES; the program is Z(1), S(2), B(1,2). We now write the required algorithm for Gö2 and leave Gö1 to the reader. Recall from Example 4 in 4.1 that the function π is computable. (1) Input e ∈ N; if e ≤ 1, go to instruction 7. (2) Factor e = pe11 × . . . × penn (p1 < . . . < pn primes and ek ≥ 1 for 1 ≤ k ≤ n). (3) Is pk = π (k − 1) for l ≤ k ≤ n? If not, go to instruction 7. (4) If so, p1 , . . . , pn are the first n primes in increasing order. For 1 ≤ k ≤ n, is ek an instruction number? If some ek is not an instruction number, go to instruction 7. (5) For 1 ≤ k ≤ n let Ik be the instruction such that ek = IN(Ik ).
Chapter 5: Machine Model of Computability
147
(6) Print YES, print I1 , . . . , In , halt. (7) Print NO and halt. Now we can define HLT. Definition 2. HLT is the 2-ary relation on N defined by HLT(e,a) ⇔ e is the code of a program P and the P-computation with input a halts. Solvability of the Halting Problem and decidability of HLT are related as follows: Theorem 3. The following are equivalent: (1) the Halting Problem for register machines is solvable; (2) the 2-ary relation HLT is decidable. Proof that (1) ⇒ (2). Let be an algorithm that, given a program P and a ∈ N, decides (YES or NO) if the P-computation with input a halts. The following algorithm shows that HLT is decidable. (1) Input e, a. (2) Use Gö2 to decide if e is the code of a program. If not, print NO and halt. (3) Use Gö2 to find the program P such that #(P) = e. (4) Use to decide if the P-computation with input a halts. If not, print NO; otherwise, print YES. (5) Halt. We are going to prove that HLT is not RM-decidable; we then have: • •
HLT is not decidable (Church-Turing Thesis and Theorem 1); the Halting Problem for register machines is unsolvable (Theorem 3).
To simplify the proof, we replace HLT with a special case that is a 1-ary relation and is therefore easier to handle. Definition 3 (diagonalization of HLT). Let K be the 1-ary relation defined by K(e) ⇔ HLT(e, e) ⇔ e is the code of a program P and the P-computation with input e halts.
148
Part 2: Computability Theory
Note that if HLT is RM-decidable, then K is also RM-decidable (exercise for the reader). Therefore our goal is to prove K is not RM-decidable. Theorem 4 (Turing). The 1-ary relation K is not RM-decidable. Proof. Suppose by way of contradiction that there is a program Q with instructions I1 , . . . , It such that for all e with e = #(P): (a) if the P-computation with input e halts, then the Q-computation with input e halts with output 0; (b) if the P-computation with input e does not halt, then the Q-computation with input e halts with output 1. We will use Q to construct a program Q∗ such that Q∗ = P for every program P; in particular, Q∗ = Q∗ , a contradiction. This is a classical Cantor diagonal argument. Let P be a program and let e =#(P). To ensure that Q∗ = P, we construct Q∗ so that (c) if the P-computation with input e halts, then the Q∗ -computation with input e does not halt; (d) if the P-computation with input e does not halt, then the Q∗ -computation with input e halts. By (a) and (b), we can obtain (c) and (d) by writing Q∗ so that (e) if the Q-computation with input e halts with output 0, then the Q∗ -computation with input e does not halt; (f) if the Q-computation with input e halts with output 1, then the Q∗ -computation with input e halts. The required program Q∗ is easily obtained from Q as follows: I1 , . . . , It , B(1, t + 1). These results can be summarized as follows: Theorem 5 (Turing). Assume the Church-Turing Thesis. (1) K is not decidable (but is semi-decidable). (2) HLT is not decidable (but is semi-decidable). (3) The Halting Problem for register machines is unsolvable.
RM-Semi-Decidable Relations By a diagonal argument (see above), the relation K is not RM-decidable; from this it follows that HLT is not RM-decidable. On the other hand, both HLT and K are RM-semi-decidable (precise definition given below). This is fairly difficult to prove and will be postponed until later (see 6.4). For now, we prove the much easier result that HLT is semi-decidable.
Chapter 5: Machine Model of Computability
149
Theorem 6. The relations HLT and K are semi-decidable. Outline of Proof. We write the required algorithm for HLT. (1) Input e, a ∈ N and set k = 1. (2) Use Gö2 to decide if e is the code of a program. If not, go to instruction 2. (3) If so, use Gö2 to find the program P whose code is e. (4) Does the P-computation with input a halt in k steps? If so, print YES and halt. (5) Add 1 to k. (6) Go to instruction 4. Note that the above algorithm continues forever in either of the following situations: e is not the code of a program; e is the code of a program P but the P-computation with input a does not halt. (A precise definition of a step of a computation will be given in the next section.) Definition 4. An n-ary relation R is RM-semi-decidable if there is a program P for the register machine such that for all a1 , . . . , an ∈ N: if R(a1 , . . . , an ), then the P-computation with inputs a1 , . . . , an halts; • if ¬R(a1 , . . . , an ), then the P-computation with inputs a1 , . . . , an continues forever. •
In other words, R(a1 , . . . , an ) ⇔ the P-computation with inputs a1 , . . . , an halts. We also say P semi-decides R. Every RM-decidable relation is RM-semi-decidable (exercise for the reader). Moreover, as the next theorem shows, every n + 1-ary RMdecidable relation R gives rise to an n-ary RM-semi-decidable relation Q as follows: Q(a) ⇔ ∃bR(a, b). Theorem 7. Let R be an n + 1-ary RM-decidable relation. Then there is a program P such that for all a1 , . . . , an ∈ N: (1) if there is some b ∈ N such that R(a1 , . . . , an , b), then the P-computation with inputs a1 , . . . , an halts with output the smallest such b; (2) if there is no such b ∈ N, then the P-computation with inputs a1 , . . . , an continues forever. In particular, the n-ary relation Q(a) ⇔ ∃bR(a, b) is RM-semi-decidable.
150
Part 2: Computability Theory
Proof. To simplify the proof, we assume that n = 1. Let I1 , . . . , It be a program that decides the 2-ary relation R. The program P given below systematically searches for the first (= smallest) b for which R(a,b). The input a and the current value of b are stored in registers Rw+1 and Rw+2 , respectively; initially b = 0. Choose w so that the registers used by I1 , . . . , It are among R1 , . . . , Rw . MOVE(1, w + 1) Ip
COPY(w + 1, 1) COPY(w + 2, 2) ZERO[3, w] I1 , . . . , It
B(1, q)
Iq
S(w + 2) GO TO Ip MOVE(w + 2, 1)
Store the input a in register Rw+1 . Copy a to register R1 . Copy b to register R2 (initially 0). Put 0 in registers R3 , . . . , Rw . Run the program that decides R (if the output is 0, R(a,b)). If # R1 = 0, branch to instruction Iq . Increment b by 1. New P-computation. Put b in register R1 and halt.
Corollary 1 (minimalization of a relation). Let R be an n + 1-ary RMdecidable relation that is also regular (for all a1 , . . . , an in N, there exists b ∈ N such that R(a1 , . . . , an ,b)). Then the n-ary function F(a1 , . . . , an ) = µb[R(a1 , . . . , an , b)] is RM-computable. Proof. The program P of Theorem 7 computes F. Corollary 2 (minimalization of a function). Let G be an n + 1-ary RMcomputable function that is also regular. Then the n-ary function F(a1 , . . . , an ) = µb[G(a1 , . . . , an , b) = 0] is RM-computable.
151
Chapter 5: Machine Model of Computability
Proof. Let R be the n + 1-ary relation defined by R(a1 , . . . , an , b) ⇔ G(a1 , . . . , an , b) = 0. The relation R is regular and RM-decidable (details left to the reader) and therefore by Corollary 1, F is RM-computable as required. Corollary 3. Assume the Church-Turing Thesis. Then every semidecidable relation is RM-semi-decidable. Proof. Let Q be an n-ary semi-decidable relation. Then there is an n + 1ary decidable relation R such that for all a = a1 , . . . , an ∈ Nn , Q(a) ⇔ ∃bR(a, b) (see Theorem 12 in 4.2). By the Church-Turing Thesis, R is RM-decidable. By Theorem 7, Q is RM-semi-decidable. Here is a summary of the relationship between the three informal notions of Chapter 4.2 and the ideas defined in this chapter. Informal concept F is computable R is decidable R is semi-decidable
Formal counterpart (machine model) ⇐ ⇐ ⇐
F is RM-computable R is RM-decidable R is RM-semi-decidable
Assuming the Church-Turing Thesis, each occurrence of ⇐ can be replaced with ⇔. Under the assumption of the Church-Turing Thesis, the Halting Problem is unsolvable. We now sharpen this result by proving that there is a fixed program P for which the Halting Problem is unsolvable; in other words, we show the existence of a fixed program P for which there is no algorithm that, given a ∈ N, decides if the P-computation with input a halts. This important result will be used in the next section to prove the unsolvability of Hilbert’s Decision Problem and Thue’s Word Problem. Theorem 8. Assume the Church-Turing Thesis. Then there is a fixed program P for the register machine such that the Halting Problem for P is unsolvable. Proof. Let R be any 1-ary relation that is semi-decidable but not decidable (for example, K). By Corollary 3, there is a program P such that for all a ∈ N: R(a) ⇔ the P-computation with input a halts.
152
Part 2: Computability Theory
But R is not decidable, and therefore the Halting Problem for P is unsolvable.
Exercises on 5.3 1.
Let Q and R be 1-ary RM-decidable relations. Write programs to prove the following: (a) ¬Q is RM-decidable; (b) Q ∨ R is RM-decidable.
2.
Let EVEN be the set of even natural numbers. Write a program to show that EVEN is RM-decidable.
3.
Prove Theorem 1.
4.
Find the code (in factored form!) of the program in Example 1.
5.
Is 211 ×13 × 33 × 55 × 711 program.
6.
Write the algorithm for Gö1 in Theorem 2.
7.
Complete the proof of Theorem 3 by proving: if HLT is decidable, then the Halting Problem for register machines is solvable.
8.
Prove: If HLT is RM-decidable, then K is RM-decidable.
9.
Let G be an n + 1-ary RM-computable function that is also regular. Show that the following n + 1-ary relation R is RM-decidable:
2
5
1
2
3
×13 1
the code of a program? If so, find the
R(a1 , . . . , an , b) ⇔ G(a1 , . . . , an , b) = 0. 10.
Recall that the graph GF of an n-ary function F is defined by GF (a, b) ⇔ F(a) = b. Use Corollary 1 to prove: if GF is RM-decidable, then F is RM-computable.
11.
Assume that every decidable relation is RM-decidable. Prove the Church-Turing Thesis. Hint: Use the previous exercise.
12.
Let Q be a 1-ary RM-decidable relation. Write a program to prove: (a) Q is RM-semi-decidable; (b) ¬ Q is RM-semi-decidable.
Chapter 5: Machine Model of Computability
13.
153
Let R be an n + 1-ary RM-decidable relation and let Q be the n-ary relation defined by Q(a1 , . . . , an ) ⇔ ∃bR(a1 , . . . , an , b). Use Theorem 7 to prove that Q is RM-semi-decidable.
14.
This problem outlines another proof that the Halting Problem is unsolvable. Define two 1-ary relations K0 and K1 on N as follows (note that K0 ∩ K1 = Ø): • K0 (e) ⇔ e = #(P) and the P-computation with input e halts with output 0; • K1 (e) ⇔ e = #(P) and the P-computation with input e halts with output 1. (a) Prove: If K is decidable, then K0 is decidable. Hint: Let be an algorithm that shows K decidable; write an algorithm that shows K0 decidable. (b) Let P be a program with instructions I1 , . . . , It such that for all a ∈ N, the P-computation with input a halts with output either 0 or 1. Show that there is a program P∗ that “reverses” P. In other words, for all a ∈ N: •
if the P-computation with input a halts with output 0, then the P∗ -computation with input a halts with output 1; • if the P-computation with input a halts with output 1, then the P∗ -computation with input a halts with output 0. (c) Let R be a 1-ary relation such that K0 ⊆ R and R ∩ K1 = Ø; in other words, for all e ∈ N: if K0 (e), then R(e); if K1 (e), then ¬R(e). Prove that R is not RM-decidable. Hint: Suppose P is a program that decides R. Let e = #(P∗ ), where P∗ is the program in (b). (d) Show that K0 is not RM-decidable. (e) Use the Church-Turing Thesis to prove that K is not decidable. 15.
Show that HLT enumerates all 1-ary RM-semi-decidable relations. In other words, show that given any 1-ary RM-semi-decidable relation R, there exists e ∈ N such that for all a ∈ N, HLT(e, a) ⇔ R(a).
16.
This problem outlines another proof of the unsolvability of the Halting Problem. (a) Let R be a 2-ary RM-decidable relation. Define a 1-ary relation Q by Q(a) ⇔ ¬R(a, a). Show that Q is RM-decidable.
154
Part 2: Computability Theory
(b) Let R be a 2-ary relation that enumerates all 1-ary RM-semi-decidable relations; that is, given any 1-ary RM-semi-decidable relation Q, there exists e ∈ N such that for all a ∈ N, R(e,a) ⇔ Q(a). Prove that R is not RM-decidable. Hint: Suppose R is RM-decidable. Consider the 1-ary relation Q defined as follows: for all a ∈ N, Q(a) ⇔ ¬R(a, a). (c) Use (b) and the previous exercise to show that HLT is not RM-decidable.
5.4 Unsolvability of Hilbert’s Decision Problem and Thue’s Word Problem By 1928, Hilbert’s famous Decision Problem had been clearly formulated: Let L be a first-order language; find an algorithm that, given an arbitrary formula A of L , decides (YES or NO) whether A is logically valid. In 1936, and using completely different methods, Church and Turing independently solved the Decision Problem by proving that there are languages for which no such algorithm exists. In this section we outline Turing’s approach to the problem. Interestingly enough, he invented Turing machines (closely related to register machines) as a key step in his solution. We will also give the Post-Markov proof that Thue’s Word Problem is unsolvable. Both of these proofs make use of the fact that there is a fixed program P for which the Halting Problem is unsolvable (see Theorem 8 in 5.3). Up to this point our definition of a computation has been somewhat informal, but now a precise definition is required. The basic idea is this: Let P be a program and let a1 , . . . , an ∈ N; choose w ∈ N so large that n < w and also the registers used by P are among R1 , . . . , Rw . A Pcomputation with inputs a1 , . . . , an is a sequence s1 , s2 , . . . of w-steps; these w-steps are unique and are related in a special way by the program P. Informally, each w-step is a record of the numbers in registers R1 , . . . , Rw and the next instruction. We begin with a precise description of a w-step. Definition 1. Let w ≥ 1. A w-step is a w + 1-tuple of natural numbers with first entry ≥ 1. A w-step s is written in the form s = j; r1 , . . . , rw . The idea is that the numbers r1 , . . . , rw are in registers R1 , . . . , Rw , respectively, and that Ij is the next instruction. Observe that s tells us all of the essential information of one step in a P-computation. Of special
Chapter 5: Machine Model of Computability
155
interest is the w-step s1 = 1; a1 , . . . , an , 0, . . . , 0, called an initial w-step. Informally, s1 describes the situation at the beginning of a P-computation: The inputs a1 , . . . , an are in registers R1 , . . . , Rn , respectively, 0 is in registers Rn+1 , . . . , Rw , and the next (= first) instruction is I1 . Definition 2. Let P be a program in standard form with instructions I1 , . . . , It , let w be such that the registers used by P are among R1 , . . . , Rw , and let s = j; r1 , . . . , rk , . . . , rw be a w-step with j ≤ t. The P-successor of s is the unique w-step s obtained from s according to one of four cases as follows: Ij is S(k). s is obtained from s by replacing the first entry with j + 1 and replacing rk with rk + 1; in other words, s = j + 1; r1 , . . . , rk + 1, . . . , rw ; Ij is Z(k). s is obtained from s by replacing the first entry with j + 1 and replacing rk with 0; in other words, s = j + 1; r1 , . . . , 0, . . . , rw ; Ij is D(k). s is obtained from s by replacing the first entry with j + 1 . and replacing rk with rk −1; in other words, s = j + 1; r1 , . . . , . rk −1, . . . , rw ; Ij is B(k, q). s is obtained from s by replacing the first entry with j + 1 if rk = 0 and with q otherwise; thus s = j + 1; r1 , . . . , rk , . . . , rw or s = q; r1 , . . . , rk , . . . , rw . If s = t + 1; r1 , . . . , rk , . . . , rw , we say that s is P-terminal. Definition 3. Let P be a program in standard form with instructions I1 , . . . , It , let a1 , . . . , an be inputs, and let w be such that w ≥ n and the registers used by P are among R1 , . . . , Rw . Given the initial w-step s1 = 1; a1 , . . . , an , 0, . . . , 0, exactly one of the following holds: (1) There is a finite sequence s1 , . . . , sm of w-steps such that for 1 ≤ i ≤ m − 1, si+1 is the P-successor of si and sm is P-terminal. (2) There is an infinite sequence s1 , s2 , . . . of w-steps such that for all i ≥ 1, si+1 is the P-successor of si .
156
Part 2: Computability Theory
In either case, the sequence s1 , s2 , . . . of w-steps is unique and is called the P-computation with inputs a1 , . . . , an . The P-computation with inputs a1 , . . . , an halts if (1) holds. In this case, the output is b, where sm = t+1; b, . . . , rw . For an example of a P-computation, see 5.1. Theorem 1 (simulation of a program by a first-order language). Let P be a program for the register machine in standard form. Then there is a first-order language L , and for each a ∈ N there is a sentence Ca of L , such that for all a ∈ N: Ca is logically valid ⇔ the P-computation with input a halts. Outline of proof. Let I1 , . . . , It be the instructions of P, and assume that the registers used by P are among R1 , . . . , Rw . The language L has non-logical symbols with intended interpretation as follows: N is the domain of the interpretation; R(w + 1-ary relation symbol, interpreted as the set of steps in a computation); D, S (1-ary function symbols, interpreted as the decrease and successor functions); 0 (constant symbol, interpreted as zero). The following notation is used: 00 = 0, 0j+1 = S(0j ). Thus 01 = S(0), 02 = S(S(0)), and so on; intuitively, 0j is interpreted as the natural number j. Let A0 be the conjunction of these three sentences: D(0) = 0, ∀x1 ¬[S(x1 ) = 0], ∀x1 [D(S(x1 )) = x1 ]. For 1 ≤ j ≤ t let Aj be the sentence of L determined by one of four cases as follows: Ij is S(k) Aj is ∀ x1 · · · ∀xw [R(0j , x1 , . . . , xk , . . . , xw ) → R(0j+1 , x1 , . . . , S(xk ), . . . , xw )]; Ij is D(k) Aj is ∀ x1 · · · ∀ xw [R(0j , x1 , . . . , xk , . . . , xw ) → R(0j+1 , x1 , . . . , D(xk ), . . . , xw )]; Ij is Z(k) Aj is ∀ x1 · · · ∀ xw [R(0j , x1 , . . . , xk , . . . , xw ) → R(0j+1 , x1 , . . . , 0, . . . , xw )]; Ij is B(k, q) Aj is ∀ x1 · · · ∀ xw [R(0j , x1 , . . . , xk , . . . , xw ) → [(¬(xk = 0) → R(0j+1 , x1 , . . . , xk , . . . , xw )) ∧ ((xk = 0) → R(0q , x1 , . . . , xk , . . . , xw ))]].
Chapter 5: Machine Model of Computability
157
Finally, for each a ∈ N let Ca be the sentence [A0 ∧ A1 ∧ · · · ∧ At ∧ R(01 , 0a , 0, . . . , 0)] → ∃x1 · · · ∃xw R(0t+1 , x1 , . . . , xw ). Let a ∈ N, and let us prove that Ca is logically valid ⇔ the P-computation with input a halts. First assume the P-computation with input a halts. Then there is a finite sequence s1 , . . . , sm of w-steps with s1 = 1; a, 0, . . . , 0 and sm = t + 1; r1 , . . . , rw . These w-steps provide us with instructions on how to use the sentences = {D(0) = 0, ∀ x1 ¬[S(x1 ) = 0], ∀ x1 [D(S(x1 )) = x1 ], A1 , . . . , At , R(01 , 0a , 0, . . . , 0)} to write a formal proof of the sentence ∃x1 · · · ∃xw R(0t+1 , x1 , . . . , xw ). In particular, the w-step sk = j; r1 , . . . , rw tells us to use axiom Aj and to replace the variables x1 , . . . , xw with the terms 0r1 , . . . , 0rw , respectively. One eventually obtains R(0t+1 ; 0r1 , . . . , 0rw ), and existential generalization then gives ∃x1 · · · ∃xw R(0t+1 , x1 , . . . , xw ). From this it follows that Ca is a theorem of first-order logic (Deduction Theorem used here) and therefore is logically valid as required. Now assume that Ca is logically valid. Construct an interpretation I of L as follows: The domain of I is N; 0 is interpreted as the number zero; S and D are interpreted as the successor function and the decrease function, respectively; and R is interpreted as the set of w + 1-tuples of natural numbers defined inductively as follows: (1) 1; a, 0, . . . , 0 is in R; (2) if s is in R and s is the P-successor of s, then s is in R; (3) every element of R is obtained by a finite number of applications of (1) and (2). All of the sentences D(0) = 0, ∀x1 ¬[S(x1 ) = 0], ∀x1 [D(S(x1 )) = x1 ], A1 , . . . , At , and R(01 , 0a , . . . , 0) are true in I. Moreover, Ca is logically valid and therefore is true in I. It follows that ∃x1 · · · ∃xw R(0t+1 , x1 , . . . , xw ) is true in I. Informally, this sentence states that there exists r1 , . . . , rw ∈ N such that t + 1; r1 , . . . , rw ∈ R. Now suppose the P-computation with input a does not halt. Then R = {s1 , s2 , . . .}, where s1 = 1; a, 0, . . . , 0 and for all k > 0, sk+1 is the P-successor of sk . Since each sk has a P-successor, but t + 1; r1 , . . . , rw is P-terminal, it follows that t + 1; / R, a contradiction. r1 , . . . , rw ∈
158
Part 2: Computability Theory
Theorem 2 (Turing). Assume the Church-Turing Thesis. Then there is a first-order language L for which Hilbert’s Decision Problem is unsolvable. Proof. By the Church-Turing Thesis, there is a fixed program P for which there is no algorithm that decides, for all a ∈ N, if the P-computation with input a halts. By Theorem 1, there is a first-order language L , and for each a ∈ N there is a sentence Ca of L , such that for all a ∈ N: Ca is logically valid ⇔ the P-computation with input a halts. This shows that no algorithm exists that decides the logical validity of formulas of L . We now turn to the unsolvability of the Word Problem. We begin by introducing the notion of a string rewrite system, a variation of a formal system that is an algorithm for generating an infinite list of expressions. String rewrite systems have applications in biology, complexity theory, computer science, and linguistics. A common property of these systems is the idea of substitution: Replace an occurrence of an expression with another expression. Thus another description of a string rewrite system is a substitution system. Every string rewrite system has two components: an alphabet and a finite list of productions; a third possible component is an axiom. The alphabet is a finite set of symbols, and a word is the concatenation of a finite number of symbols. For example, if = {a, b}, then b, abba, and bbb are all examples of words. The empty word is allowed and is denoted by ; for any word W, W = W = W. Let A and W be words; then A occurs in W if there exist words W1 and W2 (one or both may be the empty word) such that W = W1 AW2 . For example, aa occurs once in baab and three times in baabbaabaa. A production has the form A → B, where A and B are words. This production is applied as follows: Let W be a word with at least one occurrence of A, say W = UAX, and let V be a word obtained from W by replacing exactly one of the occurrences of A with B, say V = UBX. This operation is denoted W ⇒ V, and we say that word V is directly derivable from W (by the production A → B). For example, for the production a → aa we have aba ⇒ aaba and also aba ⇒ abaa. Given a string rewrite system (alphabet and productions Ak → Bk , 1 ≤ k ≤ n), the notion of a proof is now defined as follows. Let W and V be words. A proof of V from W is a finite sequence W1 , . . . , Wk of words such that W = W1 , V = Wk , and for 1 ≤ j ≤ k − 1, Wj+1 is directly
159
Chapter 5: Machine Model of Computability
derivable from Wj (by some production). We use the notation W V to denote the fact that there is a proof of V from W. Example 1. The following is a description of a string rewrite system (more precisely, a formal grammar) that generates all of the formulas of propositional logic with connectives ¬ ,∨. The alphabet is = {S, ¬, ∨, (, ), p, }; here S is an axiom, or start word. There are four productions: S → ¬S S → (S ∨ S) S → S S → p. We show S ¬(p ∨ ¬p ):
S ⇒ ¬S ⇒ ¬(S ∨ S) ⇒ ¬(S ∨ ¬S) ⇒ ¬(p ∨ ¬S) ⇒ ¬(p ∨ ¬S ) ⇒ ¬(p ∨ ¬p ).
Theorem 3 (simulation of a program by a rewrite system). Let P be a program for the register machine. Then there is a rewrite system with words V and {Ua : a ∈ N} such that for all a ∈ N: the P-computation with input a halts ⇔ Ua V. Proof. Let I1 , . . . , It be the instructions of P (put in standard form) and assume the registers used by P are among R1 , . . . , Rw . Alphabet. The symbols of are b, x b1 , . . . , bw , bw+1 c1 , . . . , ct , ct+1 d1 , . . . , dt e1 , . . . , e t . The choice of these symbols is motivated as follows. A step s = j; r1 , . . . , rw in a P-computation is described by the word Ws = b cj b1 xr1 b2 xr2 · · · bw xrw bw+1 , where symbols are used as follows: b marks the leftmost position; cj tells us that the next instruction is Ij (in particular, ct+1 is for halt); b1 , . . . , bw , bw+1 separate the registers R1 , . . . , Rw ; xri states that the number ri is in register Ri , 1 ≤ i ≤ w. (Note: xa denotes a occurrences of the symbol
160
Part 2: Computability Theory
x; in particular, x0 denotes no occurrence of x.) Later we will see that the d’s tell us to execute the next instruction in sequence whereas the e’s tell us to branch to a different instruction. The words Ua and V are Ua = b c1 b1 xa b2 · · · bw bw+1 and V = bct+1 . The word Ua describes the initial configuration of a computation: The input a is in register 1, 0 is in all other registers, and the computation begins with instruction I1 . Productions. Suppose the P-computation with input a halts with ri = # Ri for 1 ≤ i ≤ w. The following productions suffice to derive b ct+1 (= V) from b ct+1 b1 xr1 b2 xr2 · · · bw xrw bw+1 : ct+1 bi → ct+1 for 1 ≤ i ≤ w + 1 (erase b1 , . . . , bw+1 ); ct+1 x → ct+1 (erase all x’s). The remaining productions depend on the instructions of P. Consider Ij , where 1 ≤ j ≤ t, and assume that Ij refers to register k, where 1 ≤ k ≤ w. There are four cases; in each case, we state the goal and then list the required productions. Let W = b cj b1 xr1 · · · bk−1 xrk−1 bk xrk bk+1 · · · bw+1 . First of all, the following productions are common to all cases: cj bi → bi cj for 1 ≤ i ≤ k − 1 and cj x → x cj (move cj to right to obtain cj bk ); bi dj → dj bi for 1 ≤ i ≤ k − 1 and x dj → dj x (move dj to left to obtain b dj ); b dj → b cj+1 (change dj to cj+1 ). Additional production(s) are as follows. Ij is S(k). The goal is W b cj+1 b1 xr1 · · · bk−1 xrk−1 bk xrk +1 bk+1 · · · bw+1 . cj bk → dj bk x. Ij is Z(k). The goal is W b cj+1 b1 xr1 · · · bk−1 xrk−1 bk x0 bk+1 · · · bw+1 . c j b k x → cj b k ; cj bk bk+1 → dj bk bk+1 .
Chapter 5: Machine Model of Computability
161
. Ij is D(k). The goal is W b cj+1 b1 xr1 · · · bk−1 xrk−1 bk xrk −1 bk+1 · · · bw+1 . cj b k x → d j b k ; cj bk bk+1 → dj bk bk+1 . Ij is B(k, q). For rk > 0 the goal is W b cj+1 b1 xr1 · · · bk−1 xrk−1 bk xrk bk+1 · · · bw+1 and for rk = 0 the goal is W b cq b1 xr1 · · · bk−1 xrk−1 bk xrk bk+1 · · · bw+1 . cj bk x → dj bk x; cj bk bk+1 → ej bk bk+1 ; bi ej → ej bi for 1 ≤ i ≤ k − 1 and x ej → ej x (move ej to left to obtain b ej ); b ej → b c q . We leave it to the reader to check the following: (a) If s = j; r1 , . . . , rw is a w-step with j ≤ t and s is the P-successor of s, then Ws Ws ; (b) if s = t + 1; r1 , . . . , rw , then Ws V. Now let a ∈ N; we show the following equivalent: (1) the P-computation with input a halts; (2) Ua V. (1) ⇒ (2). Assume the P-computation with input a halts and let the steps of the computation be s1 , . . . , sm , where s1 = 1; a, 0, . . . , 0, si+1 is the P-successor of si for 1 ≤ i ≤ m − 1, and sm = t + 1; r1 , r2 , . . . , rw . By (a) and (b), Ws1 Ws2 , . . . , Wsm−1 Wsm , and Wsm V, therefore Ws1 V. But Ws1 = Ua and thus Ua b ct+1 as required. (2) ⇒ (1). Assume that Ua V; that is, b c1 b1 xa b2 · · · bw bw+1 V. Let W be the set of all words W with exactly one occurrence of exactly one symbol from the list c1 , . . . , ct , ct+1 , d1 , . . . , dt , e1 , . . . , et ; note that Ua ∈ W. We ask the reader to check the following: If W ⇒ W and W ∈ W, then W ∈ W (closure); • if W ∈ W and if W ⇒ V and W ⇒ V , then V = V (deterministic); • if W1 , . . . , Wk and V1 , . . . , Vk are words such that Wi ⇒ Wi+1 and Vi ⇒ Vi+1 for 1 ≤ i ≤ k − 1, W1 = V1 , and W1 ∈ W, then Wk = Vk . •
Suppose by way of contradiction that the P-computation with input a does not halt. Then there is an infinite sequence s1 , s2 , . . . of w-steps with s1 = 1; a, 0, . . . , 0 and such that si+1 is the P-successor of si for all i ≥ 1. By (a), we have Ws1 Ws2 , Ws2 Ws3 , and so on. It follows that there is an infinite sequence of words W1 , W2 , . . . with W1 = Ua and
162
Part 2: Computability Theory
such that Wi ⇒ Wi+1 for i = 1, 2, . . . . On the other hand, Ua V and therefore there is a finite sequence of words V1 , . . . , Vk with V1 = Ua , Vk = V, and Vi ⇒ Vi+1 for 1 ≤ i ≤ k – 1. Since W1 = V1 and W1 ∈ W, it follows that Wk = Vk . We now have a contradiction: Vk = b ct+1 and no production of applies to b ct+1 ; on the other hand, Wk ⇒ Wk+1 . Acknowledgment: The proof above is from Shoenfield, Recursion Theory, 1993. Theorem 4 (unsolvability of the word problem for rewrite systems; Post, Markov). Assume the Church-Turing Thesis. Then there is a rewrite system for which there is no algorithm that, given two words W and V, decides (YES or NO) if W V. Proof. There is a program P for which there is no algorithm that, given a ∈ N, decides if the P-computation with input a halts. Let be a rewrite system with words {Ua : a ∈ N} and V such that for all a ∈ N: the P-computation with input a halts ⇔ Ua V. An algorithm that decides W V would imply the existence of an algorithm that decides if the P-computation with input a halts, a contradiction.
Exercises on 5.4 1.
Let P be the program S(2), D(2), D(2), B(1,1). (a) Write the sentences A1 , A2 , A3 , A4 of Theorem 1 for this program (w = 2 suffices). (b) Write the steps s1 , . . . , s5 of the P-computation with input 2; s1 = 1; 2, 0. (c) Let = {D(0) = 0, ∀x1 ¬[S(x1 ) = 0], ∀x1 [D(S(x1 )) = x1 ], A1 , A2 , A3 , A4 , R(01 , 02 , 0)}. Write a formal proof of ∃x1 ∃x2 R(05 , x1 , x2 ) using s1 , . . . , s5 as a guide. (d) The P-computation with input 0 does not halt. Write the sequence s1 , s2 , s3 , s4 , . . . , where s1 = 1; 0, 0.
163
Chapter 5: Machine Model of Computability
(e) Consider the following interpretation of L : domain of the interpretation is N, 0 is interpreted as zero, S as the successor function, D as the decrease function, R = {1, 0, 0, 2, 0, 1, 3, 0, 0, 4, 0, 0}. Show that the sentences D(0) = 0, ∀x1 ¬(S(x1 ) = 0), ∀x1 (D(S(x1 )) = x1 ), A1 , A2 , A3 , A4 , R(01 , 0, 0) are true in this interpretation but ∃x1 ∃x2 R(05 , x1 , x2 ) is false. 2.
Let be the string rewrite system with alphabet = {a, b, c} and productions: ab → ba ba → ab
ac → ca ca → ac
bc → cb cb → bc.
(a) Let W and V be words on . Prove that W V if and only if W and V have the same number of a’s, b’s, and c’s. (b) Show that there is an algorithm that, given words W and V on , decides if W V. 3.
Let be a string rewrite system with alphabet . Prove that for all words U, V, W: (a) W W (reflexive); (b) if W U and U V, then W V; (c) if W V, then WU VU and UW UV.
4.
Refer to the string rewrite system in the proof of Theorem 3. Let W be the set of all words W with exactly one occurrence of exactly one symbol from the list c1 , . . . , ct , ct+1 , d1 , . . . , dt , e1 , . . . , et . Prove the following: (a) if W ⇒ W and W ∈ W, then W ∈ W (closure); (b) if W ∈ W and if W ⇒ V and W ⇒ V , then V = V (deterministic); (c) if W1 , . . . , Wk and V1 , . . . , Vk are words such that Wi ⇒ Wi+1 and Vi ⇒ Vi+1 for 1 ≤ i ≤ k − 1, W1 = V1 , and W1 ∈ W, then Wk = Vk .
164
5.
Part 2: Computability Theory
Let P be the following program: I1 I2 I3 I4 I5
S(2) B(2,6) S(1) D(2) B(3,2)
The goal of this exercise is to write the productions for a rewrite system that simulates P. The symbols are b, x, b1 , b2 , b3 , b4 c1 , c2 , c3 , c4 , c5 , c6 d1 , d2 , d3 , d4 , d5 e1 , e2 , e3 , e4 , e5 Below are the 3-steps s1 , . . . , s7 of the P-computation with input 2. s1 s2 s3 s4
= 1; 2, 0, 0 = 2; 2, 1, 0 = 3; 2, 1, 0 = 4; 3, 1, 0
s5 = 5; 3, 0, 0 s6 = 2; 3, 0, 0 s7 = 6; 3, 0, 0
(a) Write the corresponding words Ws1 , . . . , Ws7 . (b) Write productions so that b c6 b1 xr1 b2 xr2 b3 xr3 b4 b c6 . Then check that Ws7 b c6 . (c) Write the productions for I1 = S(2). Then check that Ws1 Ws2 . (d) Write the productions for I2 = B(2, 6). Then check that Ws2 Ws3 and Ws6 Ws7 . (e) Write the productions for I3 = S(1). Then check that Ws3 Ws4 . (f) Write the productions for I4 = D(2). Then check that Ws4 Ws5 . (g) Write the productions for I5 = B(3, 2). Then check that Ws5 Ws6 .
6 A Mathematical Model of Computability The importance of the technical concept recursive function derives from the overwhelming evidence that it is coextensive with the intuitive concept effectively calculable function. E. Post, 1944
In this chapter we describe a model of computability that will give us a precise mathematical counterpart to the informal notions of a computable function, a decidable relation, and a semi-decidable relation. These new concepts are: recursive function, recursive relation, and RE (recursively enumerable) relation. A major result states that these three classes coincide respectively with the RM-computable functions, the RM-decidable relations, and the RM-semi-decidable relations. In summary: Mathematical model Machine model Informal concept recursive function ⇔ RM-computable function ⇒ computable function recursive relation ⇔ RM-decidable relation ⇒ decidable relation RE relation ⇔ RM-semi-decidable relation ⇒ semi-decidable relation
Moreover, by the Church-Turing Thesis, we can replace each occurrence of ⇒ with ⇔.
6.1 Recursive Functions and the Church-Turing Thesis In the definition of a recursive function there are starting functions (“axioms”) and operations on functions (“rules of inference”). The three operations are by now familiar: composition, primitive recursion, and minimalization of a regular function (see Section 4.2: Operations with Computable Functions). The starting functions are: S (the 1-ary successor function defined by S(a) = a + 1); • Z (the 1-ary zero function defined by Z(a) = 0); • for n ≥ 1 and 1 ≤ k ≤ n, the n-ary projection functions Un,k defined by •
Un,k (a1 , . . . , an ) = ak . The epigraph to this chapter is drawn from Davis, The Undecidable, p. 306.
166
Part 2: Computability Theory
Note that U1,1 (a) = a and therefore the 1-ary identity function is a starting function. Definition 1. An n-ary function F is recursive if there is a finite sequence F1 , . . . , Fn of functions with Fn = F and such that for 1 ≤ k ≤ n, one of the following holds: (1) Fk is a starting function; (2) k >1 and Fk is obtained from previous functions (that is, from F1 , . . . , Fk−1 ) by the operation of composition, primitive recursion, or minimalization of a regular function. According to this definition, proving that a given function F is recursive is similar to writing a proof in a formal system: Starting functions take the place of axioms, and the three operations are rules of inference. As a consequence, we have the following method for showing that all recursive functions have some property Q. Theorem 1 (induction on recursive functions). Let Q be a property of n-ary functions. To prove that every recursive function has property Q, it suffices to prove the following: (1) each starting function has property Q; (2) if F is the composition of functions, each of which has property Q, then F has property Q; (3) if F is the minimalization of a regular function with property Q, then F has property Q; (4) if F is obtained from G and H by primitive recursion, and both G and H have property Q, then F also has property Q. Proof. We give an informal argument; a precise proof can be formulated in terms of complete induction. Let F be a recursive function. Then there is a finite sequence F1 , . . . , Fn of functions with Fn = F and such that for 1 ≤ k ≤ n, one of the following holds: Fk is a starting function; k >1 and Fk is obtained from previous functions F1 , . . . , Fk−1 by one of the three operations. We now argue that each of the functions F1 , . . . , Fn has property Q. First of all, F1 is a starting function and therefore F1 has property Q by (1). Now consider F2 ; if F2 is also a starting function, then F2 has property Q by the same reason. Otherwise, F2 is obtained from F1 by one of the three operations. By (2), (3), or (4), F2 has property Q. Continuing in this way, we see that each of the functions F1 , . . . , Fn has property Q; in particular, Fn has property Q as required.
Chapter 6: Mathematical Model of Computability
167
By Theorem 1 we have: Theorem 2. Every recursive function is RM-computable. Proof. We use induction on recursive functions with Q the property of being RM-computable. First of all, the starting functions are all RMcomputable; this is obvious for S and Z, and also holds for each Un,k by Theorem 5 in Section 5.1. To complete the proof, recall that in Section 5.2 (Theorems 1 and 2) and in Section 5.3 (Corollary 2) we proved that if F is obtained from RM-computable functions by composition, primitive recursion, or minimalization (of a regular function), then F is also RM-computable. Corollary 1. The following 1-ary functions are not recursive: (1) the busy beaver function ; (2) the characteristic function of K, where K is defined by K(e) ⇔ HLT(e,e). Corollary 2. Every recursive function is computable. Proof. This can be proved by induction on recursive functions (use Theorem 1 above and Theorems 2, 3, and 4 from Section 4.2). It is also a direct consequence of Theorem 2, since every RM-computable function is computable. Recursive functions have been advertised as a new model of computation based on mathematical methods. But by Theorem 2, the recursive functions also give us a new strategy for proving a function RM-computable: Given F, we need not write a program to compute F; instead, it suffices to show that F is recursive. This idea was discussed earlier in Section 5.2. According to Definition 1, a proof that F is recursive requires us to write a sequence F1 , . . . , Fn of functions satisfying certain conditions and such that Fn = F. But this is much too cumbersome to carry out in practice; instead, we use the following more efficient method. Theorem 3. To show that the n-ary function F is recursive, it suffices to show that F satisfies one of the following four conditions: (1) (2) (3) (4)
F is a starting function; F is the composition of recursive functions; F is obtained by primitive recursion from recursive functions; F is the minimalization of a regular recursive function.
168
Part 2: Computability Theory
Proof. If F is a starting function, then F is certainly recursive. Suppose F is the composition of functions already known to be recursive. To simplify notation, let F(a1 , . . . , an ) = G(H(a1 , . . . , an ), J(a1 , . . . , an )), where G, H, and J are recursive. Since G is recursive, there is a finite sequence G1 , . . . , Gk of functions with Gk = G and such that for 1 ≤ j ≤ k, Gj is a starting function or j >1 and Gj is obtained by composition, primitive recursion, or minimalization of a regular function from among G1 , . . . , Gj−1 . Likewise for H and J, we have sequences H1 , . . . , Hr with Hr = H and J1 , . . ., Js with Js = J and such that the same conditions hold. The sequence of functions G1 , . . . , Gk , H1 , . . . , Hr , J1 , . . . , Js , F (where Gk = G, Hr = H, and Js = J) shows that F is recursive as follows: F is the last function in the list and each function in the list is either a starting function or is obtained by composition, primitive recursion, or minimalization of previous functions in the list; in particular, F is obtained by composition from Gk , Hr , and Js . There are two other cases: F is obtained by primitive recursion from recursive functions; F is the minimalization of a regular recursive function. We leave these two cases to the reader. Henceforth, Theorem 3 will be our basic strategy for proving that a given function is recursive and will be used without explicit mention. We now begin a systematic study of recursive functions. The results obtained fall into one of two categories: proofs that specific functions are recursive (“theorems”); or new operations for constructing recursive functions from known recursive functions (“derived rules of inference”). For each such result, the reader should verify the correctness of the statement when “recursive” is replaced by “computable.” Theorem 4. For all k ∈ N, the n-ary constant function Cn,k : Nn → N defined by Cn,k (a1 , . . . , an ) = k is recursive (Cn,k is called the n-ary function with constant value k).
Chapter 6: Mathematical Model of Computability
169
Proof. The proof is by induction on k. To see that Cn,0 is recursive, we have: Cn,0 (a1 , . . . , an ) = Z(Un,1 (a1 , . . . , an )). Aside: Compare this with a proof as required by the official definition of a recursive function, namely: (1) Un,1 (a1 , . . . , an ) (2) Z (3) Z(Un,1 (a1 , . . . , an ))
starting function; starting function; composition of previous functions Un,1 and Z.
Now assume that Cn,k is recursive. Then Cn,k+1 is recursive by composition of known recursive functions as follows: Cn,k+1 (a1 , . . . , an ) = S(Cn,k (a1 , . . . , an )). Theorem 5. Each of the following functions is recursive: (1) (2) (3) (4) (5)
addition +; multiplication × ; exponentiation ab ; . decrease function D(a) = a − 1; . proper subtraction −.
Proof. In each case we use the operation of primitive recursion. •
Addition. Informally we have: a + 0 = a; a + (b + 1) = (a + b) + 1. Formally, we must find recursive functions G and H such that a + 0 = G(a); a + (b + 1) = H(a, b, a + b) or (a + b) + 1 = H(a, b, a + b).
•
Let G(a) = a and H(a, b, x) = x + 1 = S(U3,3 (a, b, x)). Multiplication and exponentiation. Informal proofs are given below; formal proofs (find the recursive functions G and H) are left to the reader. a × 0 = 0 and a × (b + 1) = (a × b) + 1; a0 = 1 and ab+1 = ab × a.
•
Decrease function. Informally, D(0) = 0 and D(a + 1) = a. Proceeding formally, the required recursive function H satisfies D(a + 1) = H(a, D(a)) or a = H(a, D(a)); take H(a, x) = U2,1 (a, x).
170
Part 2: Computability Theory
•
. . . Proper subtraction. First of all, a − (b+1) = (a − b) −1 (verify!) and . . . so informally we have: a − 0 = a, a − (b+1) = D(a − b). Formally, . . the required recursive function H satisfies a − (b+1) = H(a,b, a − . . b) or D(a − b) = H(a,b, a − b); take H(a,b,x) = D(x).
Example 1. The 3-ary function F(a,b,c) = (2 × c) + (a × (b + 7)) is recursive. To prove this, we begin by introducing two 3-ary auxiliary functions H1 and H2 defined by H1 (a,b,c) = 2 × c and H2 (a,b,c) = a × (b+7). Assuming that H1 and H2 are both recursive, F is recursive by composition: F(a, b, c) = +(H1 (a, b, c), H2 (a, b, c)). The function H1 (a,b,c) = 2 × c is recursive by H1 (a, b, c) = ×(C3,2 (a, b, c), U3,3 (a, b, c)). For H2 (a,b,c) = a × (b+7) we introduce yet another auxiliary function H3 (a,b,c) = (b+7), which is recursive by H3 (a,b,c) = +(U3,2 (a,b,c), C3,7 (a,b,c)). Finally, H2 (a, b, c) = ×(U3,1 (a, b, c), H3 (a, b, c)). As the previous example shows, a proper application of composition requires a considerable amount of fussing around with constant and projection functions. It seems clear, however, that in applying the composition rule to F(a1 , . . . , an ), the variables a1 , . . . , an can be rearranged, omitted, repeated, or even replaced by constants. We now show that this is indeed the case. Theorem 6 (simplification of composition). Let G be a k-ary recursive function, and define an n-ary function F by F(a1 , . . . , an ) = G(b1 , . . . , bk ), where for 1 ≤ i ≤ k, bi is one of the variables a1 , . . . , an or bi ∈ N. Then F is recursive. Proof. For 1 ≤ i ≤ k, let Hi be the n-ary recursive function determined as follows: •
If bi = aj , then Hi is the n-ary projection function Un,j (and so Hi (a1 , . . . , an ) = bi ).
171
Chapter 6: Mathematical Model of Computability
•
If bi ∈ N, then Hi is the n-ary constant function Cn,bi (and so Hi (a1 , . . . , an ) = bi ).
We now have Hi (a1 , . . . , an ) = bi for 1 ≤ i ≤ k and so F can be written as F(a1 , . . . , an ) = G(H1 (a1 , . . . , an ), . . . , Hk (a1 , . . . , an )). Thus F is recursive by composition as required. To illustrate, suppose we have a 2-ary function G and a 4-ary function H, both of which are recursive. By Theorem 6, each of the following functions is recursive: F1 (a, b, c) = 3 × b; F2 (a, b, c) = G(a, c); F3 (a, b, c) = H(c, c, a, 6). Henceforth, Theorem 6 and variations thereof (see Example 1) will be used without explicit mention, thereby avoiding tedious details involving projection and constant functions that are technically required for a precise proof. Theorem 7. The following 1-ary decision functions are recursive.
sg(a) =
0 if a = 0; 1 if a = 0,
csg(a) =
1 if a = 0; 0 if a = 0.
(sg(a) is the sign of a, and csg(a) is the co-sign or complement of sg.) . Proof. We have: csg(a) = 1 − a and sg(a) = csg(csg(a)). Theorem 8 (bounded sum, bounded product rules). Let G be an n+1-ary recursive function and let L be an n-ary recursive function. Then each of the following functions is recursive, where a = a1 , . . . , an ∈ Nn : (1) (2) (3) (4)
F1 (a, b) = G(a, 0) + . . . + G(a, b) = k≤b G(a, k); F2 (a) = G(a, 0) + . . . + G(a, L(a)) = k≤L(a) G(a, k); F3 (a, b) = G(a, 0) × . . . × G(a, b) = k≤b G(a, k); F4 (a) = G(a, 0) × . . . × G(a, L(a)) = k≤L(a) G(a, k).
172
Part 2: Computability Theory
Proof. For F1 we use primitive recursion: F1 (a, 0) = G(a, 0); F1 (a, b + 1) = F1 (a, b) + G(a, b + 1) (take H(a, b, x) = x + G(a, b + 1)). For F2 we have F2 (a) = F1 (a, L(a)). The proofs for F3 and F4 are left to the reader.
Church-Turing Thesis By Corollary 2, every recursive function is computable. The converse, an idea already discussed in Chapter 5 in connection with the RMcomputable functions, states: Church-Turing Thesis. Every computable function is recursive. Here are some comments on the history and the origins of the Church-Turing Thesis. •
The statement “every computable function is recursive” was first proposed by Church in 1936. (Strictly speaking, Church worked with the λ-calculus, another mathematical model of computability similar in spirit to the recursive functions.) However, Gödel was not immediately convinced that the recursive functions completely captured the class of n-ary functions computable by an algorithm. • At about the same time Turing invented Turing machines and argued that any n-ary function computable by an algorithm is also computable by a Turing machine. In addition, he essentially proved: F is recursive ⇔ F is computable by a Turing machine. Gödel found Turing’s machine model of computability convincing. In light of these two comments, the Church-Turing Thesis could be divided into two parts: Church’s Thesis: Every recursive function is computable. Turing’s Thesis: Every RM-computable function is computable. However, we will continue our use of the terminology Church-Turing Thesis. The Church-Turing Thesis is used in two different ways. On the one hand, we can argue that a given function F is computable (in the intuitive sense) and then call on the Church-Turing Thesis to claim that
Chapter 6: Mathematical Model of Computability
173
F is actually recursive. This is a non-essential use of the Church-Turing Thesis; with additional hard work, we could give a detailed proof that F is recursive. On the other hand, consider the contrapositive form: If F is not recursive, then F is not computable. Suppose that we want to prove that a certain function F is not computable in the intuitive sense. Instead we give a precise mathematical proof that F is not recursive and then call on the Church-Turing Thesis to claim that F is not computable. This is an essential use of the Church-Turing Thesis. We have already used the ChurchTuring Thesis in an essential way as follows: the busy beaver function is not computable; Hilbert’s Decision Problem, the Halting Problem for register machines, and the Word Problem for rewrite systems are all unsolvable. Another essential use of the thesis, to be discussed later in this chapter, is the negative solution of Hilbert’s Tenth Problem. The fact that the collection of all recursive functions is the same as the collection of all RM-computable functions is evidence in favor of the Church-Turing Thesis. This is called the “agreement argument.” In fact, there have been many different attempts to capture the intuitive concept of a computable function: •
recursive function; RM-computable function; • computable by a Turing machine; • computable by a Markov algorithm; • derivable in the λ-calculus. •
However, each choice leads to the same collection of functions. In a 1946 address (see Davis, The Undecidable, p. 84), Gödel states, “With this concept [recursive function or Turing computability] one has for the first time succeeded in giving an absolute definition of an interesting epistemological notion, i.e., one not depending on the formalism chosen.” As additional evidence in support of the thesis, we note that from 1936 on, no one has yet found an example of a computable function that could not be proved recursive. Up to now we have had a limited number of specific recursive functions and operations for constructing new recursive functions from old. In the next section we will expand the list of specific recursive functions and also develop additional operations for constructing new recursive functions. For each such result, there is a corresponding statement that one obtains by systematically replacing “recursive” by “computable”; this new statement should be relatively easy to prove using algorithms. To test the power of these expanded tools, let us set the following goal: prove that the function π : N → N that gives the nth prime (π (0) = 2, π (1), = 3, π (2) = 5, and so on) is recursive.
174
Part 2: Computability Theory
Exercises on 6.1 1.
Prove Theorem 3(3): If F is obtained by primitive recursion from recursive functions G and H, then F is recursive.
2.
Prove Theorem 3(4): If F is obtained by minimalization of a regular recursive function G, then F is recursive.
3.
In this exercise, use composition, projection functions Un,k , constant functions Cn,k , known recursive functions, and various auxiliary functions to prove that F is recursive. Give complete details. Do not use Theorem 6; instead, use the methods of Example 1. (a) (b) (c) (d) (e)
F(a,b,c) = c × b; F(a,b,c) = (5 × b) + (2 × a); F(a,b,c) = a + b + c + 4; F(a,b,c) = (a + b + c + 4) c×b . F(a,b,c) = G(H(b,7), G(a+c,3 × b)), where G and H are 2-ary recursive functions.
4.
Give details that multiplication and exponentiation are recursive.
5.
. . . Verify that a − (b + 1) = (a − b) − 1 for all a, b ∈ N.
6.
Show that the factorial function FT(n) = n! is recursive.
7.
For each n ≥ 1 let An be the n-ary function defined by An (a1 , . . . , an ) = a1 + . . . + an . Show that each An is recursive.
8.
Let G1 , . . . , Gk be n-ary recursive functions and let F be the n-ary function defined by F(a) = G1 (a) + . . . + Gk (a). Show that F is recursive.
9.
Let G be a 2-ary recursive function. Show that the following 2-ary function is recursive: F(a, b) = G(a, 0) + 2G(a, 1) + . . . + (b + 1)G(a, b).
10.
Let G be a 1-ary recursive function. Prove that F is recursive, where F is the 1-ary function defined by F(a) = 4a × G(3) × G(4) × . . . × G(a + 4).
11.
Complete the proof of Theorem 8 by showing that F3 and F4 are recursive.
175
Chapter 6: Mathematical Model of Computability
12.
Assume that G and H are 1-ary recursive functions. Show that F(a) = 2G(a) × 3H(a) is recursive.
13.
Let H be a 2-ary function such that, given any 1-ary recursive function F, there exists e ∈ N such that for all a ∈ N, F(a) = H(e, a). Show that H is not recursive. Hint: H(a, a) + 1.
6.2 Recursive Relations and RE Relations Many important n-ary functions are defined in terms of relations on N, and thus it is natural to introduce recursive relations as a new tool. Recall the following notation: R(a1 , . . . , an ) for a1 , . . . , an ∈ R (we say: a1 , . . . , an satisfies the relation R); / R (we say: a1 , . . . , an does not ¬ R(a1 , . . . , an ) for a1 , . . . , an ∈ satisfy the relation R). Definition 1. An n-ary relation R is recursive if its characteristic function KR is recursive. Thus, R is recursive if the function
KR (a1 , . . . , an ) =
0 if R(a1 , . . . , an ), 1 if ¬R(a1 , . . . , an ),
is recursive. This definition is based on Theorem 6 in 4.2: R is decidable if and only if KR is computable. Theorem 1. The 2-ary relation ≤ is recursive. Proof. We are required to prove that the 2-ary function K≤ is recursive. The key is to compare K≤ with proper subtraction:
K≤ (a, b) =
0
if a ≤ b;
1
if a > b;
. whereas a−b =
0
if a ≤ b;
a − b if a > b.
. Thus K≤ (a,b) = sg(a − b). Theorem 2. The following hold: (1) Every recursive relation is decidable; (2) assuming the Church-Turing Thesis, every decidable relation is recursive.
176
Part 2: Computability Theory
Proof of (1): R recursive ⇒ KR recursive (definition of R recursive) ⇒ KR computable (Corollary 2 in 6.1) ⇒ R decidable (Theorem 6 in 4.2). Proof of (2): R decidable ⇒ KR computable (Theorem 6 in 4.2) (Church-Turing Thesis) ⇒ KR recursive ⇒ R recursive (definition of R recursive). Theorem 2(2) plays a key role in computability theory in the following way. Let R ⊆ N; to prove that the decision problem R, N is unsolvable, it suffices to show that R is not a recursive relation. This classifies as an essential use of the Church-Turing Thesis. Theorem 3 (logical operations). Let Q and R be n-ary recursive relations. Then ¬ Q, Q ∨ R, and Q ∧ R are recursive relations. Proof. By hypothesis, KQ and KR are recursive functions. For a = a1 , . . . , an ∈ Nn : . K¬Q (a) = csg(KQ (a)) = 1 − KQ (a); • KQ ∨ R (a) = KQ (a) × KR (a); • KQ ∧ R (a) = sg(KQ (a) + KR (a)). •
Theorem 4 (composition). Let Q be a k-ary recursive relation and let H1 , . . . , Hk be n-ary recursive functions. Then the n-ary relation R is recursive, where R(a1 , . . . , an ) ⇔ Q(H1 (a1 , . . . , an ), . . . , Hk (a1 , . . . , an )). Proof. It suffices to show that KR is recursive. Let a = a1 , . . . , an . Then KR (a) = KQ (H1 (a), . . . , Hk (a)). Now KQ and H1 , . . . , Hk are recursive functions and therefore KR is recursive by composition. Corollary 1. Each of the following 2-ary relations is recursive: ≤, = , , ≥, =.
177
Chapter 6: Mathematical Model of Computability
Proof. We already know that ≤ is recursive. For the others: a = b ⇔ (a ≤ b) ∧ (b ≤ a); a < b ⇔ (a ≤ b) ∧ ¬(a = b); a > b ⇔ b < a; a ≥ b ⇔ b ≤ a; a = b ⇔ ¬(a = b). We note that Theorem 4 is used in the above proof; for example, a = b ⇔ (a ≤ b) ∧ (U2,2 (a, b) ≤ U2,1 (a, b)).
Theorem 5. Every finite subset of N is recursive. Proof. Let R ⊆ N be finite. If R = Ø, then R(a) ⇔ ¬ (a = a) shows that R is recursive. Otherwise, R = {c1 , . . . , cn } is recursive by R(a) ⇔ (a = c1 ) ∨ · · · ∨ (a = cn ). The use of · · · is allowed in this case since n is fixed and does not depend on a. Theorem 6 (definition by cases). Let G1 , . . . , Gk be n-ary recursive functions and let R1 , . . . , Rk be n-ary recursive relations such that given a = a1 , . . . , an in Nn , exactly one of R1 (a), . . . , Rk (a) holds. Then F is recursive, where G1 (a) : F(a) = : : Gk (a)
if R1 (a); : : : if Rk (a).
Proof. Let Kj be the characteristic function of Rj for 1 ≤ j ≤ k. Then F is recursive as follows (see Exercise 8 in 6.1): F(a) = [G1 (a) × csg(K1 (a))] + . . . + [Gk (a) × csg(Kk (a))].
The class of recursive relations is closed under the logical operations of ¬ , ∨, and ∧. But what about ∃? More precisely, given a 2-ary recursive
178
Part 2: Computability Theory
relation R, is the 1-ary relation Q(a) ⇔ ∃xR(a,x) recursive? Not necessarily (more on this later). However, we do have: Theorem 7 (bounded quantifier rules). Let R be an n+1-ary recursive relation and let H be an n-ary recursive function. Then P and Q are recursive, where P(a1 , . . . , an ) ⇔ ∃x ≤ H(a1 , . . . , an )[R(a1 , . . . , an , x)] and Q(a1 , . . . , an ) ⇔ ∀x ≤ H(a1 , . . . , an )[R(a1 , . . . , an , x)]. Proof. Let a = a1 , . . . , an ; by hypothesis, KR is recursive. We have (recall Theorem 8 in 6.1) : P(a) ⇔ R(a, 0) ∨ · · · ∨ R(a, H(a)) ⇔ [KR (a, 0) = 0] ∨ · · · ∨ [KR (a, H(a)) = 0] ⇔ k≤H(a) KR (a, k) = 0 and Q(a) ⇔ R(a, 0) ∧ · · · ∧ R(a, H(a)) ⇔ k ≤ H(a) KR (a, k) = 0.
Here is an important application of this new operation. Theorem 8. The following relations are recursive: (1) DIV(a,b) ⇔ a divides b; (2) COMP(c) ⇔ c is composite; (3) PR(c) ⇔ c is prime. Proof. We have: DIV(a,b) ⇔ ¬ (a = 0) ∧ ∃c ≤ b[b = a × c] (here H(a,b) = b and R(a,b,c) ⇔ b = a × c); • COMP(c) ⇔ (c >1) ∧ ∃x ≤ c[(x >1) ∧ (x < c) ∧ DIV(x,c)]; • PR(c) ⇔ (c >1) ∧ ¬ COMP(c). •
We have almost reached our short-term goal of proving that π is recursive. But first we need to extend the operation of minimalization to relations.
179
Chapter 6: Mathematical Model of Computability
Definition 2. An n+1-ary relation R is regular if for all a1 , . . . , an ∈ N, there exists c ∈ N such that R(a1 , . . . , an ,c). Moreover, µc[R(a1 , . . . , an ,c)] denotes the smallest such c. Theorem 9 (minimalization of a regular recursive relation). Let R be an n+1-ary relation that is both recursive and regular. Then F is recursive, where F is the n-ary function defined by F(a1 , . . . , an ) = µc[R(a1 , . . . , an , c)]. Proof. F(a1 , . . . , an ) = µc[KR (a1 , . . . , an ,c) = 0]. Theorem 10. The 1-ary function π is recursive. Proof. By primitive recursion: π (0) = 2; π (n + 1) = µc[(π (n) < c) ∧ PR(c)]. In this case H is defined by H(n,x) = µc[(x < c) ∧ PR(c)]; the relation R(x,c) ⇔ (x < c) ∧ PR(c) is regular by Euclid’s proof of the infinitude of the primes. The operation of minimalization has been used for the first time; up to now, the only operations that we have used are composition and primitive recursion. It turns out that, with a little more effort, we can prove π recursive without the use of minimalization. To do this we need a weaker version of minimalization that can be justified on the basis of primitive recursion. Theorem 11 (bounded minimalization, first version). Let R be an n+1ary recursive relation. Then F is recursive, where
F(a1 , . . . , an , b) =
µx ≤ b[R(a1 , . . . , an , x)]
b+1
if ∃x ≤ b[R(a1 , . . . , an , x)]; otherwise.
Proof. The proof uses the operation of primitive recursion. Let a = a1 , . . . , an ∈ Nn . Then F(a,0) = 0 or F(a,0) = 1, depending on whether R(a,0)
180
Part 2: Computability Theory
or ¬ R(a,0). Define G by
G(a) =
0 if R(a, 0); 1 if ¬R(a, 0).
Then F(a,0) = G(a), where the function G is recursive by definition by cases. Now let us calculate F(a,b+1) on the basis of the value of F(a,b), which is ≤ b + 1. We have: F(a, b) if F(a, b) ≤ b; F(a, b + 1) = b + 1 if F(a, b) = b + 1 and R(a, b + 1); b+2 if F(a, b) = b + 1 and¬R(a, b + 1). To obtain H such that F(a,b+1) = H(a,b, F(a,b)), define H(a,b,x) by x H(a, b, x) = b + 1 b+2
if x ≤ b; if (x > b) ∧ R(a, b + 1); if (x < b) ∧ ¬R(a, b + 1).
We then have F(a,b + 1) = H(a,b,F(a, b)), where H is recursive by cases.
Corollary 2 (bounded minimalization, second version). Let R be an n+1-ary recursive relation and let H be an n-ary recursive function such that for all a ∈ Nn , there exists x ≤ H(a) such that R(a,x). Then F is a recursive, where F(a) = µx ≤ H(a)[R(a, x)]. Proof. By Theorem 11, the function
G(a, b) =
µx ≤ b[R(a, x)] ∃x ≤ b[R(a, x)], b+1
otherwise,
is recursive and, moreover, G(a,H(a)) = µx ≤ H(a)[R(a,x)]. Thus we can write F(a) = G(a,H(a)) and therefore F is recursive as required. We now use bounded minimalization (second version) to show that π is recursive; for this we need a bound on the first prime that follows n. Lemma. For all n ∈ N, there is a prime p such that n < p ≤ n! + 1.
181
Chapter 6: Mathematical Model of Computability
Proof. If n! + 1 is a prime, there is nothing to prove. Let p be a prime such that p | n! + 1. Clearly p cannot divide n! and therefore n < p as required. The new proof that π is recursive now proceeds as follows. By Corollary 2, the function H(n) = smallest prime p such that n < p is recursive by H(n) = µx ≤ (n!+1)[(n < x) ∧ PR(x)]. By primitive recursion: π (0) = 2; π (n + 1) = H(π (n)). Definition 3. Let EXP(c, k) be the 2-ary function defined as follows. If c = 0 or c = 1, EXP(c,k) = 0; for c ≥ 2, EXP(c,k) = the unique number j such that π (k)j |c but π (k)j+1 c. In other words, EXP(c,k) is the exponent of the kth prime π (k) in the prime power decomposition of c. For example: • •
if c = 1001, then EXP(c,0) = 0 since π (0) = 2 and 1001 is odd; if c = 1250, then EXP(c,2) = 4 since π (2) = 5 and 1250 = 2 × 54 .
Note that for c ≥ 2, EXP(c, k) < c. Theorem 12. The 2-ary function EXP is recursive. Proof. We use definition by cases and bounded minimalization:
EXP(c, k) =
j+1 µj ≤ c[¬DIV(π (k) , c)] if c ≥ 2; 0
if (c = 0) ∨ (c = 1).
We have now completed our study of basic facts about recursive functions and recursive relations. Here is a summary of these results. Functions. • •
S, Z, D, sg, csg, and π are 1-ary recursive functions; . addition +, multiplication ×, proper subtraction −, b exponentiation a , and EXP are 2-ary recursive functions;
182
Part 2: Computability Theory
for each n ≥ 1, Un,k (1 ≤ k ≤ n) and Cn,k (k ∈ N) are n-ary recursive functions; • bounded sum and bounded product rules; • definition by cases; • bounded minimalization. •
Relations. , ≥, =, = and DIV are 2-ary recursive relations; PR and COMP are 1-ary recursive relations; • every finite subset of N is recursive; • negation / disjunction / conjunction rules; • bounded ∃ and bounded ∀ rules. • •
We emphasize that none of the above results (specific functions and relations, new operations) require minimalization. But minimalization cannot be eliminated altogether. There is a well-known 2-ary recursive function A, called Ackermann’s function, that cannot be proved recursive on the basis of composition and primitive recursion alone. These ideas will be discussed in more detail in the next section.
Recursively Enumerable Relations (RE Relations) Theorem 12 in Section 4.2 is the basis of the following definition, which is the formal counterpart of the intuitive notion of a semi-decidable relation. Definition 4. An n-ary relation R is RE (recursively enumerable) if there is an n+1-ary recursive relation Q such that for all a1 , . . . , an ∈ N: R(a1 , . . . , an ) ⇔ ∃bQ(a1 , . . . , an , b). The precise relationship between recursive, RE, and semi-decidable relations is summarized as follows. Theorem 13. The following hold: (1) every recursive relation is RE; (2) every RE relation is semi-decidable; (3) assuming the Church-Turing Thesis, every semi-decidable relation is RE. Proof. To prove (1), let R be an n-ary recursive relation. Define an n+1ary relation Q by Q(a,b) ⇔ R(a); Q is an n+1-ary recursive relation such
Chapter 6: Mathematical Model of Computability
183
that for all a ∈ Nn , R(a) ⇔ ∃bQ(a,b). To prove (2), use the fact that every recursive relation is decidable and Theorem 12 in 4.2. For (3), use the consequence of the Church-Turing Thesis that every decidable relation is recursive. Theorem 14. Let R be an n-ary relation such that both R and ¬ R are RE. Then R is recursive. Outline of proof: Let P and Q be n+1-ary recursive relations such that R(a) ⇔ ∃bP(a,b) and ¬ R(a) ⇔ ∃bQ(a,b) for all a = a1 , . . . , an ∈ Nn . The function F(a) = µb[P(a, b) ∨ Q(a, b)] is recursive (check that the relation P(a,b) ∨ Q(a,b) is regular) and moreover R(a) ⇔ P(a,F(a)). Theorem 15. The following are equivalent: (1) Church-Turing Thesis; (2) every decidable relation is recursive; (3) Post Thesis: every semi-decidable relation is RE (see p. 201 of [8]). Proof. For (1) ⇒ (2), see Theorem 2(2) of this section. For (2) ⇒ (3), let R be an n-ary semi-decidable relation. There is an n+1-ary decidable relation Q such that for all a ∈ Nn , R(a) ⇔ ∃bQ(a,b). By (2), Q is recursive, and therefore R is RE. For (3) ⇒ (2), let R be a decidable relation. Then both R and ¬ R are semi-decidable, and by (3), both are RE. Theorem 14 now applies and R is recursive. Finally, for (2) ⇒ (1), let F be computable. Then GF is decidable (see Theorem 14 in 4.2), and by (2), GF is recursive. Moreover, since F is a function, GF is regular. Finally, F(a) = µbGF (a,b). We now consider the following very fundamental question: Is there an example of an RE relation that is not recursive? We already know that K is not recursive, where K is defined by K(e) ⇔ e is the code of a program P and the P-computation with input e halts. Moreover, K is semi-decidable, and therefore is RE by the Church-Turing Thesis. However, this is a non-essential use of the thesis. It is possible to prove directly that K is RE, and this will be done in 6.4. For now, let us record this fundamental result and show the important role it plays
184
Part 2: Computability Theory
in the negative solution of Hilbert’s Tenth Problem (HX). Also recall Turing’s proof that Hilbert’s Decision Problem is unsolvable and the Post-Markov proof that the Word Problem is unsolvable. Theorem 16 (a fundamental theorem of computability theory). There is a 1-ary relation that is RE but not recursive. Definition 5 below plays a key role in HX. In classical number theory, we often begin with a polynomial p(x1 , . . . , xn ) and ask for the set R ⊆ Nn of solutions of p(x1 , . . . , xn ) = 0. For example, given p(x1 , x2 , x3 ) = x21 + x22 − x23 , the solutions of p(x1 , x2 , x3 ) = 0 with non-zero values are the Pythagorean triples 3, 4, 5 , 5, 12, 13 , and so on. The following definition reverses this procedure: Given a set R of solutions, find the corresponding polynomial equation. Definition 5. An n-ary relation R is Diophantine if there is a polynomial p(y1 , . . . , yn , x1 , . . . , xk ) with n + k variables y1 , . . . , yn , x1 , . . . , xk (k = 0 allowed) and integer coefficients such that for all a1 , . . . , an ∈ N: R(a1 , . . . , an ) ⇔ p(a1 , . . . , an , x1 , . . . , xk ) = 0 has a solution in natural numbers ⇔ ∃b1 · · · ∃bk [p(a1 , . . . , an , b1 , . . . , bk ) = 0]. We emphasize that b1 , . . . , bk ∈ N. In the polynomial p(y1 , . . . , yn , x1 , . . . , xk ), y1 , . . . , yn are called parameters and x1 , . . . , xk unknowns; for many Diophantine relations, unknowns x1 , . . . , xk are not needed. Example 1. The relations ≤, , ≥, and COMP are all Diophantine. We will give the details for 1 ⇔ ∃x1 ∃x2 [a = (x1 + 2) × (x2 + 2)] ⇔ ∃x1 ∃x2 [a − (x1 + 2) × (x2 + 2) = 0]; •
the required polynomial is p(y, x1 , x2 ) = y − (x1 + 2) × (x2 + 2). Theorem 17 (Matiyasevich, Robinson, Davis, Putnam). Every RE relation is Diophantine.
Chapter 6: Mathematical Model of Computability
185
This deep theorem is beyond the scope of this book; for a complete proof, see Chapter 10 in [4] or Matiyasevich [5]. As we now show, the MRDP Theorem immediately leads to a negative solution to HX. Theorem 18 (HX is unsolvable). Assume the Church-Turing Thesis. Then Hilbert’s Tenth Problem is unsolvable. Proof. Let R be a 1-ary relation that is RE but not recursive. By the Church-Turing Thesis, R is not decidable. By the MRDP Theorem, there is a polynomial p(y, x1 , . . . , xk ) with integer coefficients such that for all a ∈ N: R(a) ⇔ p(a, x1 , . . . , xk ) = 0 has a solution in natural numbers. Now suppose there is an algorithm that decides whether a polynomial equation has a solution in N. It follows that R is a decidable relation, a contradiction of the choice of R as a non-recursive relation.
Exercises on 6.2 1.
Let F be the 1-ary function defined by 0 there are at least n consecutive 3’s in the decimal expansion of π ; F(n) = 1 otherwise.
(a) Let F(k) = 1. Show that F(k + 1) = 1. (b) Let F(k) = 0. Show that F(j) = 0 for all j < k. (c) Show that F is recursive. Note: Compare with Exercise 4 in 4.2. 2.
Let QT and RM be 2-ary functions defined as follows: QT(a,b) = q, where q is the quotient when a is divided by b = 0 (QT(a,0) = 0 by convention). RM(a,b) = r, where r is the remainder when a is divided by b = 0 (RM(a,0) = 0 by convention). The division algorithm states that given a, b ∈ N with b = 0, there exist unique q, r such that a = bq + r, 0 ≤ r < b; note that r = a − bq and
186
Part 2: Computability Theory
QT(a,b) = q, RM(a,b) = r. In the proof of the division algorithm, q satisfies b × q ≤ a < b × (q + 1). Show that QT and RM are recursive. 3.
Let R be an n+1-ary recursive relation. Prove the following versions of the bounded quantifier rule by proving that P and Q are recursive: (a) P(a1 , . . . , an ,b) ⇔ ∃x ≤ b[R(a1 , . . . , an , x)], (b) Q(a1 , . . . , an ,b) ⇔ ∀x ≤ b [R(a1 , . . . , an , x)].
4.
Let G be a 1-ary function. Given a ∈ N, the iterates of a by G are a, G(a), G(G(a)), and so on; more precisely, G0 (a) = a, Gn+1 (a) = G(Gn (a)). Define a 2-ary function F by F(a,0) = a and F(a, n + 1) = G(F(a, n)). Prove the following. (a) If G is recursive, then F is recursive. (b) F(a, n) = Gn (a).
5.
Refer to the proof of Theorem 14 and verify: R(a) ⇔ P(a, F(a)).
6.
Let F and G be 1-ary recursive functions. Prove that H is recursive, where H is defined by H(n) = 2F(n) × 3G(n) .
7.
Let F be a 2-ary function and let G be defined by G(c) = F(EXP(c, 0), EXP(c, 1)). Prove the following: (a) F(a, b) = G(2a × 3b ); (b) F is recursive if and only if G is recursive.
8.
Let R be a 2-ary relation and let Q be defined by Q(c) ⇔ R(EXP(c, 0), EXP(c, 1)). Prove the following: (a) R(a, b) ⇔ Q(2a × 3b ); (b) R is recursive if and only if Q is recursive.
9.
(pairing functions and coding) Define a 2-ary function J and 1-ary functions K and L as follows: J(a,b) = 2a × 3b (coding function); K(c) = EXP(c,0) (decoding function); L(c) = EXP(c,1) (decoding function). Prove the following: (a) J, K, and L are recursive functions; (b) K(J(a, b)) = a and L(J(a, b)) = b; (c) K(c) ≤ c and L(c) ≤ c for all c ∈ N.
Chapter 6: Mathematical Model of Computability
10.
187
Let R be an n+2-ary recursive relation. Show that ∃a∃bR(a,a,b) is RE by showing that ∃a∃bR(a, a, b) ⇔ ∃c[R(a, K(c), L(c))].
11.
Let P and Q be n-ary RE relations. Prove the following: (a) P ∨ Q is RE; (b) P ∧ Q is RE.
12.
Use the definition to show that the following relations are Diophantine: (a) (b) (c) (d)
13.
a ≤ b, a ≥ b, a > b; E(a) ⇔ a is even; R(a,b,c) ⇔ a + 2c ≤ 5; R(a,b,c) ⇔ (a + c = 5) ∨ (b × c ≥ 8).
Let Q and R be 1-ary Diophantine relations. Prove the following: (a) Q ∨ R is Diophantine (hint: a × b = 0 ⇔ a = 0 or b = 0); (b) Q ∧ R is Diophantine (hint: a2 + b2 = 0 ⇔ a = 0 and b = 0).
14.
Show that the following two functions are recursive:
√ (a) F(n) = first n + 1 digits in the decimal expansion of 2 = 1.414213. . . . √ (b) G(n) = n + 1st digit in the decimal expansion of 2.
For example, F(0) = 1, F(1) = 14, F(2) = 141, and so on; G(0) = 1, G(1) = 4, G(2) = 1, and so on.
6.3 Primitive Recursive Functions and Relations; Coding The operation of minimalization has two conspicuous features. First of all, the given function G must be regular; for example, if G is a 2-ary function, we require that for all a ∈ N, there exists b ∈ N such that G(a, b) = 0, in which case we search for the smallest such b. However, this search need not be bounded: there may be no x ∈ N such that for all a ∈ N, there exists b ≤ x such that G(a,b) = 0. These two features of minimalization make it an especially powerful operation. Yet up to now, all of the specific recursive functions and recursive relations, and all of the derived operations on recursive functions and/or relations, have been obtained without the use of minimalization. Indeed, it seems that bounded minimalization, derived from primitive recursion, is sufficient in most cases. But minimalization cannot be eliminated entirely.
188
Part 2: Computability Theory
There is a well-known 2-ary recursive function A, called Ackermann’s function, that cannot be proved recursive on the basis of composition and primitive recursion alone. This function is defined as follows: A(0, b) = b + 1; A(a + 1, 0) = A(a, 1); A(a + 1, b + 1) = A(a, A(a + 1, b)). In spite of the existence of such functions, it is reasonable to claim that most of the functions ordinarily encountered in mathematics can be proved recursive without the use of minimalization. This observation suggests the following more restrictive class of functions and relations. Definition 1. An n-ary function F is primitive recursive if there is a finite sequence F1 , . . . , Fn of functions with Fn = F and such that for 1 ≤ k ≤ n, one of the following holds: (1) Fk is a starting function (that is, S, Z, or Un,k ); (2) k >1 and Fk is obtained from previous functions (that is, from F1 , . . . , Fk−1 ) by the operation of composition or primitive recursion. An n-ary relation R is primitive recursive if its characteristic function is primitive recursive. The efficient way to show that a given function F is primitive recursive is to show that it satisfies one of the following conditions: •
F is a starting function; • F is the composition of primitive recursive functions; • F is obtained by primitive recursion from primitive recursive functions. Every primitive recursive function is recursive and therefore is computable. However, as Ackermann’s function shows, the class of primitive recursive functions is a proper subset of the class of recursive functions. On the other hand, all of the functions (for example, D, sg, π ) and all of the relations (for example, ≤), proved recursive in 6.1 and 6.2, are actually primitive recursive; in addition, all of the derived operations in 6.1 and 6.2 preserve the property of being primitive recursive. Here is a summary of most of those results: The following are primitive recursive: • •
the 1-ary functions S, Z, D, sg, csg, and π ; . the 2-ary functions +, ×, −, ab , and EXP;
Chapter 6: Mathematical Model of Computability
189
for each n ≥ 1, the n-ary functions Un,k (1 ≤ k ≤ n) and Cn,k (k ∈ N); • the 2-ary relations , ≥, and DIV. •
The following operations, when applied to primitive recursive functions and/or relations, give a primitive recursive function: •
bounded sum and bounded product rules; definition by cases; • bounded minimalization. •
As in the case of the recursive functions, we have the following method for showing that every primitive recursive function has some property Q. Theorem 1 (induction on primitive recursive functions). Let Q be a property of n-ary functions. To prove that every primitive recursive function has property Q, it suffices to prove the following: (1) each starting function has property Q; (2) if F is the composition of functions, each of which has property Q, then F has property Q; (3) if F is obtained from G and H by primitive recursion, and both G and H have property Q, then F also has property Q. An immediate application of Theorem 1 is the following: Theorem 2. Every primitive recursive function is LRM-computable. Proof. We use induction on primitive recursive functions with Q the property of being LRM-computable. It is obvious that the starting functions S, Z, and Un,k are LRM-computable. Moreover, by modifying the proofs of Theorem 1 and Theorem 2 in Section 5.2, we can show that if F is obtained from LRM-computable functions by composition or primitive recursion, then F is also LRM-computable. (Also see Exercises 6,7, and 8 in 5.2.) The proof of the converse of Theorem 2, that every LRM-computable function is primitive recursive, is based on the inductive definition of a legal LRM-program. However, to execute this approach, we formulate the following property, which easily implies the property of being primitive recursive but has the advantage of giving a stronger induction hypothesis. Definition 2. Let P be a program for the limited register machine with registers among R1 , . . . , Rm . We say that P is PR-representable (primitive
190
Part 2: Computability Theory
recursive representable) if there exist m-ary primitive recursive functions G1 , . . . , Gm (one for each of the registers R1 , . . . , Rm ) such that for all a1 , . . . , am ∈ Nm , the P-computation with inputs a1 ,. . . , am halts with # Rk = Gk (a1 , . . . , am ) for 1 ≤ k ≤ m. For example, each of the following 1 line programs is PR-representable: S(k); Z(k). Let P be a PR-representable program with registers among R1 , . . . , Rm and primitive recursive functions G1 , . . . , Gm as described above. If P computes the n-ary function F, then F is primitive recursive as follows: F(a1 , . . . , an ) = # R1 = G1 (a1 , . . . , an , 0, . . . , 0). Theorem 3. Every LRM-computable function is primitive recursive. Proof. We prove that every LRM-computable function is PRrepresentable. By the definition of a legal LRM-computable function (see the discussion in Section 5.2 on limited register machines), it suffices to show: (1) Each of the 1 line programs S(k) and Z(k), k ≥ 1 is PR-representable. (2) If P and Q are programs for the limited register machine and are PR-representable, then the join PQ is also PR-representable. (3) If P is a program for the limited register machine and is PR-representable, then so is the program FOR(k) P END. We prove (3) and leave (1) and (2) to the reader. Let Q be the program FOR(k) P END, and assume that the registers in Q are among R1 , . . . , Rm . Since the program P is PR-representable, there are primitive recursive functions G1 , . . . , Gm such that for all a = a1 , . . . , am ∈ Nm , the P-computation with inputs a1 , . . . , am halts with # Rj = Gj (a1 , . . . , am ) for 1 ≤ j ≤ m. Define m+1-ary functions F1 , . . . , Fm inductively as follows: F1 (a, 0) = Um,1 (a) = a1 : Fm (a, 0) = Um,m (a) = am
Chapter 6: Mathematical Model of Computability
191
and F1 (a, b + 1) = G1 (F1 (a, b), . . . , Fm (a, b)) : Fm (a, b + 1) = Gm (F1 (a, b), . . . , Fm (a, b)). Since G1 , . . . , Gm are primitive recursive, the functions F1 , . . . , Fm are also primitive recursive (to be proved later in this section). Finally, the functions F1 , . . . , Fm show that the program Q is PR-representable. To see this, in a Q-computation with input a = a1 , . . . , am ∈ Nm , ak gives the number of times the P-computation with this input is executed. Therefore the Q-computation with input a halts with # Rj = Fj (a,ak ) for 1 ≤ j ≤ m.
Primitive Recursive Coding Systems We now prove the existence of certain primitive recursive functions and relations for coding and decoding. These functions and relations are fundamental in computability theory and will allow us to complete the proof of Theorem 3 above, prove that Ackermann’s function A is recursive, and, most importantly, prove that every RM-computable function is recursive. Definition 3. A primitive recursive coding system consists of the following: for each n ≥ 1, an n-ary primitive recursive function • • • n (coding functions); • a 1-ary primitive recursive function lh and a 2-ary primitive recursive function β (decoding functions); • a 1-ary primitive recursive relation CD; • a 2-ary primitive recursive function ∗. •
If c = a0 , . . . , an−1 n , then c is called the code of the n-tuple a0 , . . . , an−1 . The functions • • • n , lh, β, ∗ and the relation CD have the following properties: (1) if a0 , . . . , an−1 n = b0 , . . . , bm−1 m , then n = m and ak = bk for 0 ≤ k ≤ n−1; (2) if c = a0 , . . . , an−1 n , then n < c, and ak < c for 0 ≤ k ≤ n−1; • lh(c) = n; • β(c,k) = ak for 0 ≤ k ≤ n−1; • β(c,k) = 0 for k ≥ n. •
(3) β(0, k) = 0 for all k ∈ N;
192
Part 2: Computability Theory
(4) CD(c) ⇔ c is the code of some n-tuple a0 , . . . , an−1 ; (5) if c = a0 , . . . , am−1 m and d = b0 , . . . , bn−1 n , then c ∗ d = a0 , . . . , am−1 , b0 , . . . , bn−1 m+n . Our immediate goal is to prove the existence of a primitive recursive coding system. Once the proof is complete, we can forget the details of the construction and all subsequent work with coding and decoding will be based on properties (1)–(5). Theorem 4. There is a primitive recursive coding system. Proof. We are required to construct • • • n , lh, β, CD, ∗ and prove (1)–(5). We will use the primitive recursive functions π and EXP and the primitive recursive relation DIV (see 6.2). For each n ≥ 1 let • • • n be the n-ary function defined by a0 , . . . , an−1 n = π (0)a0 +1 × . . . × π (n − 1)an−1 +1 . Note that the primes π (0), . . . , π (n−1) all occur in the factorization of a0 , . . . , an−1 n (this is the reason for the +1 in each exponent) and moreover are the only primes in this factorization. Thus property (1) is a consequence of the Unique Factorization Theorem. The proof that each of the functions . . . n is primitive recursive is by induction on n. For the beginning case we have a 1 = 2a+1 , which is certainly primitive recursive. Now assume that • • • n is primitive recursive. Then a0 , . . . , an n+1 = π (0)a0 +1 × . . . × π (n − 1)an−1 +1 × π (n)an +1 = a0 , . . . , an−1 n × π (n)an +1 . Thus • • • n+1 is the product of two primitive recursive functions and is therefore primitive recursive as required. To prove (2), let c = a0 , . . . , an−1 n = π (0)a0 +1 × . . . × π (n − 1)an−1 +1 . Clearly n < c and ak < c for 0 ≤ k ≤ n−1 (use the fact that a < 2a for all a ∈ N). We now construct the two functions lh (= length) and β. From c = π (0)a0 +1 × . . . × π (n − 1)an−1 +1 it follows that EXP(c,0) = a0 +1, . . . , EXP(c,n−1) = an−1 + 1 and EXP(c, k) = 0 for k ≥ n. Let lh(c) = µk ≤ c[EXP(c, k) = 0], . β(c, k) = EXP(c, k) −1. Both lh and β are primitive recursive; moreover, it is easy to see that lh(c) = n, β(c,k) = ak for 0 ≤ k ≤ n − 1, β(c,k) = 0 for k ≥ n, and β(0,k) = 0 for all k ∈ N.
Chapter 6: Mathematical Model of Computability
193
Let c ∈ N. Is c the code of some n-tuple? If so, we can use lh to find n, and then we can use β to find a0 , . . . , an−1 such that c = a0 , . . . , an−1 n . What we need is a primitive recursive relation that decides if c is a code in the first place; this is the role of CD, defined by CD(c) ⇔ c is the code of some n-tuple ⇔ there exist n ≥ 1 and a0 , . . . , an−1 ∈ N such that c = a0 , . . . , an−1 n . We need to prove that CD is primitive recursive. Suppose CD(c), say c = π (0)a0 +1 × . . . × π (n − 1)an−1 +1 . Then c > 1, lh(c) ≥ 1, and π (k) divides c for all k < lh(c), and π (k) does not divide c for all k ≥ lh(c). These conditions actually characterize those numbers c that are codes, and thus CD is primitive recursive as follows (recall that ↔ can be expressed in terms of ¬ , ∨, ∧): CD(c) ⇔ (c > 1) ∧ (lh(c) ≥ 1) ∧ ∀k ≤ c[DIV(π (k), c) ↔ (k < lh(c))]. Finally, we use definition by cases to construct the primitive recursive function ∗. If ¬ CD(c) or ¬ CD(d), then c ∗ d = 0. Otherwise, suppose c = a0 , . . . , am−1 m and d = b0 , . . . , bn−1 n . An upper bound on a0 , . . . , am−1 , b0 , . . . , bn−1 m+n is H(c, d) = [π (c + d)c+d ]c+d . We now have c ∗ d = µe ≤ H(c, d)[CD(e) ∧ (lh(e) = lh(c) + lh(d)) ∧ ∀j < lh(c)[β(e, j) = β(c, j)] ∧ ∀j < lh(d)[β(e, lh(c) + j) = β(d, j)]]. For an application coding we have: Theorem 5. Let G1 , . . . , Gm be m-ary primitive recursive functions and let F1 , . . . , Fm be m+1-ary functions defined by induction as follows: for a = a1 , . . . , am ∈ Nm , Fk (a, 0) = Gk (a), 1 ≤ k ≤ m; Fk (a, b + 1) = Gk (F1 (a, b), . . . , Fm (a, b)), 1 ≤ k ≤ m. Then F1 , . . . , Fm are primitive recursive.
194
Part 2: Computability Theory
Proof. Let L(a,b) = F1 (a,b), . . . , Fm (a,b) m , and note that Fk (a,b) = . β(L(a,b),k − 1) for 1 ≤ k ≤ m. Thus, it suffices to show that L is primitive recursive. We have: • •
L(a,0) = F1 (a,0), . . . , Fm (a,0) m = G1 (a), . . . , Gm (a) m , L(a,b + 1) = F1 (a,b + 1), . . . , Fm (a,b + 1) m = G1 (F1 (a,b), . . . , Fm (a,b)), . . . , Gm (F1 (a,b), . . . , Fm (a,b)) m = . G1 (β(L(a,b), 0), . . . , β(L(a,b), m−1)), . . . , Gm (β(L(a,b),0), . . . , . β(L(a,b), m−1)) m .
Exercises on 6.3 1.
Prove the following: (a) the programs S(k) and Z(k) are PR-representable; (b) if P and Q are PR-representable, then the join PQ is also PR-representable.
2.
Assume that F is a 2-ary recursive function that enumerates all 1-ary primitive recursive functions; in other words, given a 1-ary primitive recursive function G, there exists e ∈ N such that for all a ∈ N, G(a) = F(e, a). Use F to construct a 1-ary function that is recursive but not primitive recursive. Let A be Ackermann’s function, the 2-ary function defined by A(0, b) = b + 1; A(a + 1, 0) = A(a, 1); A(a + 1, b + 1) = A(a, A(a + 1, b)).
3.
(rapid growth of A) Prove the following by induction. (a) (b) (c) (d)
4.
A(1,b) = b + 2; A(2,b) = 2b + 3; A(3,b) = 2b+3 − 3; A(4,b) = ?
For each k ∈ N let Fk be the 1-ary function defined by Fk (b) = A(k,b). Show that each Fk is primitive recursive. The next four problems outline a proof that A is not primitive recursive.
5.
Let a, b ∈ N. Prove the following basic properties of Ackermann’s function A. (a) b < A(a, b) (prove ∀b[b < A(a, b)] by induction on a); (b) A(a, b) < A(a, b + 1) (two cases: a = 0; for a > 0, use part (a));
195
Chapter 6: Mathematical Model of Computability
(c) A(a,b + 1) ≤ A(a + 1,b) (induction on b and the inequality b + 2 ≤ A(a, b + 1)); (d) A(a, b) < A(a + 1, b); (e) A(a, 2b) < A(a + 2, b) (induction on b and the inequality 2b + 2 ≤ A(a, 2b) + 1). 6.
An n-ary function F is within level r if for all a1 , . . . , an ∈ N, F(a1 , . . . , an ) ≤ A(r,x), where x = max.{a1 , . . . , an }. Note that if F is within level r and r ≤ s, then F is within level s. Prove the following. (a) The starting functions Z, S, and Un,k are within level 0. (b) Let F(a1 , . . . , an ) = G(H1 (a1 , . . . , an ), . . . , Hk (a1 , . . . , an )). If each of G, H1 , . . . , Hk is within level r, then F is within level r + 2. Hint: 5(c). (c) Let G and H be within level r and let F be defined by F(a1 , . . . , an , 0) = G(a1 , . . . , an ); F(a1 , . . . , an , b + 1) = H(a1 , . . . , an , b, F(a1 , . . . , an , b)). Then • F(a1 , . . . , an , b) ≤ A(r + 1, x + b), where x = max.{ a1 , . . . , an }; • F is within level r+3. Hint: x = max.{a1 , . . . , an }, y = max.{a1 , . . . , an , b}; note that x + b ≤ 2y; use 5(e).
7.
Prove that every primitive recursive function is within level r for some r ≥ 0.
8.
Prove that A is not primitive recursive. Hint: suppose it is; let F(a) = A(a, a) + 1. The next four problems outline a proof that A is computable; with more effort, these ideas can be turned into a proof that A is recursive. To motivate the approach, let us look at the computation A(2,1) = 5, where the strategy is to choose the right-most occurrence of A(a, b). A(2,1)
= A(1, A(2,0)) = A(1, A(1,1)) = A(1, A(0, A(1,0))) = A(1, A(0, A(0,1))) = A(1, A(0,2)) = A(1,3) = A(0, A(1,2))
= A(0, A(0, A(1,1))) = A(0, A(0, A(0, A(1,0)))) = A(0, A(0, A(0, A(0,1)))) = A(0, A(0, A(0,2))) = A(0, A(0,3)) = A(0,4) = 5.
196
Part 2: Computability Theory
We can implement this strategy and at the same time omit occurrences of A by introducing an operator that is defined on all finite sequences of N (that is, N ∪ N2 ∪ N3 ∪ . . . ) as follows: (a) = a; for n ≥ 2, (a1 , . . . , an−2 , an−1 , an ) = an−1 = 0; a1 , . . . , an−2 , an + 1 an−1 > 0 and an = 0; a1 , . . . , an−2 , an−1 − 1, 1 a1 , . . . , an−2 , an−1 − 1, an−1 , an − 1 an−1 , an > 0. 9.
10.
Calculate (2,1), 2 (2,1), and so on to obtain r (2,1) = 5; compare with the calculation of A(2,1) given above. Show that has the following properties: (0, b) = b + 1; (a + 1, 0) = a, 1 ; (a + 1, b + 1) = a, a + 1,b ; (a1 , . . . , an−2 , an−1 , an ) = a1 , . . . , an−2 , (an−1 , an ) (there are three cases); (e) r (a1 , . . . , an−2 , an−1 , an ) = a1 , . . . , an−2 , r (an−1 , an ) (hint: use induction on r; let r (an−1 , an ) = b1 , . . . , bk−2 , bk−1 , bk ).
(a) (b) (c) (d)
11.
Show that for all a, b ∈ N, there exists r ∈ N such that r (a, b) = A(a,b). Hint: show that ∀b∃r[r (a,b) = A(a,b)] by induction on a; use the previous exercise.
12.
Write an algorithm to show that Ackermann’s function A is computable. Hint: Use the previous exercise; assume there is an algorithm that computes the operator .
13.
In this problem we outline a proof that A is actually recursive. The proof uses results from the previous four problems. Define a 3-ary function H and a 2-ary function G by H(a, b, r) = code of r (a, b); G(a, b) = µr[lh(H(a, b, r)) = 1]. By a previous exercise, for all a,b ∈ N, there exists r ∈ N such that r (a,b) = A(a,b). It follows that the relation lh(H(a,b,r)) = 1 is regular and therefore G is recursive provided H is recursive. Moreover, A(a,b) = β(H(a,b, G(a,b)),0) and thus A is recursive as required. It remains to prove that H is recursive.
Chapter 6: Mathematical Model of Computability
197
(a) Define a 1-ary function L: N → N by cases as follows: Let c ∈ N; if ¬ CD(c), L(c) = 0; if [CD(c) ∧ (lh(c) = 1)], L(c) = c. For the case c = a1 , . . . , an−2 , an−1 , an n L(c) = code of ((a1 , . . . , an−2 , an−1 , an )); in other words, a1 , . . . , an−2 , an + 1 n−1 if an−1 = 0; . a1 , . . . , an−2 , an−1 −1, 1 n if an−1 > 0 and an = 0; L(c) = . . a , . . . , an−2 , an−1 −1, an−1 , an −1 n+1 1 if an−1 > 0 and an > 0. Show that L is recursive. Hint: by cases; for the case c = a1 , . . . , an−2 , an−1 , an n let d = a1 , . . . , an−2 n−2 , an−1 = β(c,n − 2), an = β(c, n − 1), and use the function ∗. (b) Show: code of (r +1 (a,b)) = L(code of (r (a,b)). Hint: two cases: r (a,b) = c and r (a,b) = a1 , . . . , an−2 , an−1 , an . (c) Use primitive recursion to show that H is recursive.
6.4 Kleene Computation Relation Tn (e, a1 ,. . . , an , c) The goal of this section is to prove the following fundamental result. Main Theorem. (Kleene) For each n ≥ 1, the n+2-ary relation Tn is primitive recursive, where Tn (e, a1 , . . . , an , c) ⇔ e is the code of a program P, the P-computation with inputs a1 , . . . , an halts, and c is the code of that computation. Assume, for a moment, that the Kleene computation relation Tn is primitive recursive, and let us use this fact to prove two important results. Theorem 1 (a fundamental theorem of computability theory). There is a 1-ary relation that is RE but not recursive. Proof. Consider K, defined by K(e) ⇔ e is the code of a program P and the P-computation with input e halts. By a Cantor diagonal argument we already know that K is not RMdecidable and therefore K is not recursive. On the other hand, K(e) ⇔ PGM(e) ∧ ∃cT1 (e, e, c); this shows that K is RE.
198
Part 2: Computability Theory
Theorem 2. Every RM-computable function is recursive. Proof. Let F be an n-ary function that is RM-computable and let e be the code of a program P that computes F. Since P computes F, for all a1 , . . . , an ∈ N, there exists c ∈ N such that Tn (e, a1 , . . . , an , c); in other words, the relation R(a1 , . . . , an ,c) ⇔ Tn (e, a1 , . . . , an , c) is regular. A little later we will show that there is a 1-ary primitive recursive function U such that if c is the code of a P-computation that halts, then U(c) is the output. Assuming this result, we have: F(a1 , . . . , an ) = U(µcTn (e, a1 , . . . , an , c)). Therefore F is recursive as required. The use of minimalization in the above proof cannot be avoided. However, it is the only use of minimalization in the entire proof that RM-computable functions are recursive.
Coding Programs and Computations The first step in proving Kleene’s Theorem is to code programs and computations that halt. We will use the primitive recursive functions and relations • • • n , lh, β, and CD (see 6.3). Instructions. To each instruction I we assign a natural number # I, the code of I, as follows: #S(k) = k, 0 2 ;
#Z(k) = k, 2 2 ;
#D(k) = k, 1 2 ;
#B(k, q) = k, q, 0 3 .
For each instruction, this coding gives rise to an associated recursive relation. Theorem 3. The following relations are primitive recursive: INC(a) ⇔ a is the code of an increase instruction; DEC(a) ⇔ a is the code of a decrease instruction; • ZERO(a) ⇔ a is the code of a zero instruction; • BRANCH(a) ⇔ a is the code of a branch instruction; • INST(a) ⇔ a is the code of an instruction. • •
Chapter 6: Mathematical Model of Computability
199
Proof. For example: BRANCH(a) ⇔ CD(a) ∧ (lh(a) = 3) ∧ (β(a, 0) ≥ 1) ∧ (β(a, 1) ≥ 1) ∧(β(a, 2) = 0); INST(a) ⇔ INC(a) ∨ DEC(a) ∨ ZERO(a) ∨ BRANCH(a).
Programs. Assign to each program P with instructions I1 , . . . , It a code # P as follows: #P = #I1 , . . . , #It t . Theorem 4. The following 1-ary relation is primitive recursive: PGM(e) ⇔ e is the code of a program. Proof. We have: PGM(e) ⇔ CD(e) ∧∀j < lh(e)[INST(β(e, j))]. Let P be a program with instructions I1 , . . . , It and let e = # P = # I1 , . . . , # Ij , . . . , # It t . From e we can recover the code of each instruction and also the register of that instruction. For the instruction Ij we have: . #Ij = β(e, j−1); . register of Ij = β(#Ij , 0) = β(β(e, j −1), 0). Computations. Let P be a program with code e and let n ≥ 1. We want to assign a code c to each P-computation with inputs a1 , . . . , an that halts. But first we need to assign a code to each step of this computation. Now the registers Rk used by P satisfy k < e and so we will use e+n-steps to describe P-computations with n inputs that halt. The code of the e+nstep s = j; r1 , . . . , re+n is #s = j, r1 , . . . , re+n e+n+1 . Recall that the P-successor s of s = j; r1 , . . . , rk , . . . , re+n is defined by cases as follows: Ij = S(k) s = j + 1; r1 , . . . , rk + 1, . . . , re+n ; . Ij = D(k) s = j + 1; r1 , . . . , rk −1, . . . , re+n ; Ij = Z(k) s = j + 1; r1 , . . . , 0, . . . , re+n ;
200
Part 2: Computability Theory
Ij = B(k, q) s =
j + 1; r1 , . . . , rk , . . . , re+n rk = 0; q; r1 , . . . , rk , . . . , re+n
rk = 0.
Moreover, a P-computation with inputs a1 , . . . , an that halts is a finite sequence s1 , . . . , sm of e+n-steps such that (1) s1 = 1; a1 , . . . , an , 0, . . . , 0 ; (2) si+1 is the P-successor of si , 1 ≤ i ≤ m − 1; (3) sm = q; r1 , . . . , re+n , where q > t (t is the number of instructions in P). Finally, the code of this computation is c = #s1 , . . . , # sm m . This completes the coding of programs and computations. Now let n ≥ 1, let e be the code of a program P, and suppose that the P-computation with inputs a1 , . . . , an halts with output b. Let s1 , . . . , sm be the e+n-steps of this computation and let c be the code of the computation. We have codes as follows: e = #I1 , . . . , #It t ; #s1 = 1, a1 , . . . , an , 0, . . . , 0 e+n+1 ; : : #sm = q, b, r2 , . . . , re+n e+n+1 (where q < lh(e)); c = #s1 , . . . , #sm m . Note that we can recursively recover b and q from c: . #sm = β(c, lh(c)−1); b = β(#sm , 1); q = β(#sm , 0).
This observation is the key to proving the following result (used earlier in the proof of Theorem 2): Lemma 1. There is a 1-ary primitive recursive function U such that if c is the code of a P-computation that halts, then U(c) is the output of that computation.
Chapter 6: Mathematical Model of Computability
201
. Proof. We have: U(c) = β(# sm ,1) = β(β(c, lh(c) − 1), 1). We are almost ready to prove that the Kleene computation relation Tn is primitive recursive. But first we need a primitive recursive function that computes the code of s from the code of s. Lemma 2. For each n ≥ 1, there is a 2-ary primitive recursive function SUCCn (e,a) such that if e = code of a program with instructions I1 , . . . , It , a = code of an e + n-step s = j; r1 , . . . , rk , . . . , re+n with 1 ≤ j ≤ t, then SUCCn (e, a) = code of the P-successor s of s. Proof. The definition of SUCCn is by cases. First of all, we may assume that e, a satisfy PGM(e) ∧ CD(a) ∧ (lh(a) = e + n + 1) ∧ (1 ≤ β(a, 0) ≤ lh(e)) (otherwise set SUCCn (e,a) = 0). In this case we have codes as follows: e = #I1 , . . . , #Ij , . . . , #It t ; a = j, r1 , . . . , rk , . . . , re+n e+n+1 . There are now four subcases. Ij = D(k). First note that we can recursively recover j, # Ij , and k from e and a as follows: . j = β(a, 0); #Ij = β(e, j −1); k = β(#Ij , 0). For the case DEC(# Ij ) we have . SUCCn (e, a) = j + 1, r1 , . . . , rk −1, . . . , re+n e+n+1 . = β(a, 0)+1, β(a, 1), . . . , β(a, k)−1, . . . , β(a, e+n) e+n+1 . . Ij = B(k, q). As before, j = β(a,0); # Ij = β(e, j−1); k = β(# Ij ,0); q = β(# Ij ,1).
202
Part 2: Computability Theory
For the case BRANCH(# Ij ) we have: SUCCn (e, a) j + 1, β(a, 1), . . . , β(a, k), . . . , β(a, e + n) e+n+1 if β (a, k) = 0; = q, β(a, 1), . . . , β(a, k), . . . , β(a, e + n) e+n+1 if β (a, k) = 0. We leave the two remaining cases INC(# Ij ) and ZERO(# Ij ) to the reader. Main Theorem. (Kleene) For each n ≥ 1 the n+2-ary relation Tn is primitive recursive. Proof. By definition we have Tn (e, a1 , . . . , an , c) ⇔ e is the code of a program P, the P-computation with inputs a1 , . . . , an halts, and c is the code of that computation. Let P be a program with code e, let s1 , . . . , sm be the steps of a P-computation with inputs a1 , . . . , an that halts, and let c be the code of this computation. We have codes and calculations as follows: c = #s1 , . . . , #sm m ; #s1 = 1, a1 , . . . , an , 0, . . . , 0 e+n+1 ; #si+1 = SUCCn (e, #si ), 1 ≤ i ≤ m − 1; #sm = q, b, r2 , . . . , re+n e+n+1 with q > lh(e) and . q = β(β(c, lh(c)−1), 0). We now have: Tn (e, a1 , . . . , an , c) ⇔ PGM(e) ∧ CD(c) ∧ (lh(c) ≥ 2) ∧ (β(c, 0) = 1, a1 , . . . , an , 0, . . . , 0 e+n+1 ) . ∧∀i < lh(c)−1[β(c, i + 1) = SUCCn (e, β(c, i))] ∧ (q > lh(e)).
Exercises on 6.4 1.
Complete the proof of Lemma 2 by considering the following cases: (a) INC(# Ij ); (b) ZERO(# Ij ).
203
Chapter 6: Mathematical Model of Computability
2.
Let K0 be the 1-ary relation defined by K0 (e) ⇔ e is the code of a program P and the P-computation with input e halts with 0. Show that K0 is RE.
3.
Show that HLT is RE.
4.
Two sets A and B ⊆ N are recursively inseparable if A ∩ B = ∅ and there is / R}. Let no recursive set R with A ⊆ R and B ⊆ Rc = {a: a ∈ N and a ∈ A = {e: U(µcT1 (e,e,c)) = 0} and B = {e: U(µcT1 (e,e,c)) = 1}. Show that A and B are recursively inseparable. Hint: Suppose R is recursive and A ⊆ R, B ⊆ Rc and let F be the characteristic function of Rc . Let e be such that F(a) = U(µcT1 (e,a,c)).
6.5 Partial Recursive Functions; Enumeration Theorems There are algorithms that, for some inputs, do not halt. Consider, for example, the following: (1) (2) (3) (4)
Input a ∈ N and set c = 0. Is a = 2c? If so, print c and halt. Otherwise, replace c with c + 1 and go to instruction 2.
If the input a is even, the algorithm eventually halts with output a/2; but if a is odd, the algorithm continues forever. This algorithm “computes” the function
F(a) =
a/2
a even;
undefined
a odd.
The function F is an example of a 1-ary partial function. In this section we extend the concepts of computability (in the informal sense), RMcomputability, and recursiveness to n-ary partial functions. It turns out that we obtain a much richer theory by widening the class of functions under consideration from the n-ary functions to the n-ary partial functions. For example, there is an enumeration theorem for partial recursive functions that does not hold for the recursive functions. In addition, this new theory leads to a better understanding of the RE relations. Definition 1. Let n ≥ 1. An n-ary partial function is a function whose domain is a subset of Nn and whose co-domain is N. In other words we
204
Part 2: Computability Theory
have F: A → N, where A ⊆ Nn . By a partial function we mean a function that is an n-ary partial function for some n ≥ 1. The domain of an n-ary partial function F may be all of Nn ; in this case we say that F is total. An interesting special case of an n-ary partial function is the empty function n . This is the n-ary partial function whose domain is Ø; in other words, the partial function that is not defined for any a1 , . . . , an in N. The partial function n arises rather naturally in computability theory in connection with the existence of algorithms that do not halt for any inputs a1 , . . . , an ∈ N. The notion of computability extends to partial functions as follows. Definition 2 (informal). An n-ary partial function F is computable if there is an algorithm such that for all a1 , . . . , an ∈ N: if a1 , . . . , an ∈ dom F, then the algorithm with inputs a1 , . . . , an halts with output F(a1 , . . . , an ); • if a1 , . . . , an ∈ / dom F, then the algorithm continues forever. •
The function F described at the beginning of this section is a 1-ary partial function that is computable. Both subtraction and proper subtraction are 2-ary partial functions that are computable (proper subtraction is total but subtraction is not). The empty function n is an n-ary partial function that is computable. The required algorithm is any algorithm that, for any inputs a1 , . . . , an ∈ N, goes into an infinite loop. Many computer programs are unintentionally written to compute n . Here is another example of a 1-ary partial function that is computable. Let R be a 2-ary decidable relation and let F be defined by
F(a) =
µbR(a, b)
undefined
∃bR(a, b); ¬ ∃bR(a, b).
The required algorithm is as follows: (1) (2) (3) (4)
Input a ∈ N and set b = 0. Use the algorithm for R to decide if R(a,b). If so, F(a) = b; print b and halt. Otherwise, replace b with b + 1 and go to instruction 2.
RM-computability extends to partial functions in a natural way as follows.
Chapter 6: Mathematical Model of Computability
205
Definition 3. An n-ary partial function F is RM-computable if there is a program P such that for all a1 , . . . , an ∈ N: if a1 , . . . , an ∈ dom F, then the P-computation with inputs a1 , . . . , an halts with output F(a1 , . . . , an ); • if a1 , . . . , an ∈ / dom F, then the P-computation with inputs a1 , . . . , an continues forever. •
At this point we have: C = collection of all partial functions that are computable (in the intuitive sense); RM = collection of all partial functions that are RM-computable. Moreover, RM ⊆ C, since a program for the register machine satisfies our intuitive notion of an algorithm. Eventually we will use the ChurchTuring Thesis to show that RM = C. But first let us introduce yet another class of functions, namely R = collection of all partial functions that are partial recursive. This class of functions is obtained by modifying the operations of composition and minimalization. To do this, we need the following idea. The binary relation . Let s and t be expressions that, if defined, give a natural number. By s t we mean: if s is defined, then t is also defined and s = t; if t is defined, then s is also defined and s = t. Informally speaking, extends the usual equality relation = and is reflexive, transitive, and symmetric. Here are two examples to illustrate the use of . First, let F be a 1-ary partial function and let b ∈ N; the correct reading of F(a) b is that a ∈ dom F and F(a) = b. Reason: The “expression” b is obviously defined, and therefore F(a) is defined and equal to b. Next, suppose we write F(a) µb[2 × b = a]. We are defining a 1-ary partial function F. If a is even, the expression µb[2 × b = a] is defined and has value a/2; thus F is defined at a and F(a) = a/2. If a is odd, the expression µb[2 × b = a] is not defined; F is not defined at a. Now we can give the promised modifications of composition and minimalization. Composition (of partial functions). Let H1 , . . . , Hk be n-ary partial functions and let G be a k-ary partial function. The composition of G,
206
Part 2: Computability Theory
H1 , . . . , Hk is the n-ary partial function F defined by F(a1 , . . . , an ) G(H1 (a1 , . . . , an ), . . . , Hk (a1 , . . . , an )). Note that if H1 , . . . , Hk are all defined at a = a1 , . . . , an , and if G is defined at H1 (a), . . . , Hk (a), then F is defined at a and its value is G(H1 (a), . . . , Hk (a)). But if there is some j (1 ≤ j ≤ k) for which Hj (a) is not defined, or if G(H1 (a), . . . , Hk (a)) is not defined, then F is not defined at a. We leave the proof of the following lemma to the reader. Lemma 1. If F is the composition of the partial functions G, H1 , . . . , Hk , and G, H1 , . . . , Hk are computable, then F is computable. In other words, if G, H1 , . . . , Hk ∈ C, then F ∈ C. Minimalization (of a partial function). Let G be an n+1-ary partial function. The minimalization of G is the n-ary partial function F defined by F(a1 , . . . , an ) µb[G(a1 , . . . , an , b) = 0]. We elaborate. Let a = a1 , . . . , an ; then F(a) = b provided G(a,b) = 0 and for all k < b, G(a,k) is defined and G(a, k) > 0; otherwise F is undefined at a. Thus F is undefined at a if one of the following holds: G(a,b) is defined for all b ∈ N and G(a, b) > 0; there exists b ∈ N such that G(a,b) is not defined and for all k < b, G(a, k) > 0. Note that in either of these cases, a systematic search for the smallest b such that G(a,b) = 0 continues forever. For this new operation of minimalization, the function G need not be total, and even if G is total, it need not be regular. We leave the proof of the following to the reader. Lemma 2. If F is the minimalization of an n+1-ary partial function G that is computable, then F is computable. In other words, if G ∈ C, then F ∈ C. We now define the collection R of partial recursive functions. Definition 4. A partial function F is partial recursive if there is a finite sequence F1 , . . . , Fn of partial functions with Fn = F and such that for 1 ≤ k ≤ n, one of the following holds: •
Fk is a starting function (S, Z, or Un,k ); k >1 and Fk is obtained by composition of partial functions from among F1 , . . . , Fk−1 ; • k >1 and Fk is obtained by minimalization of a partial function from among F1 , . . . , Fk−1 ; •
Chapter 6: Mathematical Model of Computability
•
207
k >1 and Fk is obtained by primitive recursion applied to total functions from among F1 , . . . , Fk−1 .
Before beginning a systematic study of the partial recursive functions, let us quickly mention one of the reasons for extending the theory of recursive functions to partial functions. Consider the following. Theorem 1. There is no 2-ary recursive function that enumerates all 1-ary recursive functions. In other words, there is no 2-ary recursive function H such that, given any 1-ary recursive function F, there exists e ∈ N such that for all a ∈ N, F(a) = H(e,a). Proof. Suppose H exists; we will use a diagonalization argument to obtain a contradiction. Let F be defined by F(a) = H(a,a) + 1. Since H is recursive, F is recursive. Since H enumerates, there exists e ∈ N such that for all a ∈ N, F(a) = H(e,a). In particular, for a = e we have F(e) = H(e,e). On the other hand, from the definition of F we have F(e) = H(e,e) + 1. Thus H(e,e) = H(e,e) + 1, a contradiction. In our study of partial recursive functions we will prove: There is a 2-ary partial recursive function that enumerates all 1-ary partial recursive functions. In addition, we will show that an n-ary relation R is RE if and only if it is the domain of a partial recursive function. Thus we see that the extension of the theory of recursive functions to partial functions allows an enumeration theorem for partial recursive functions and gives yet another characterization of the RE relations. By Theorem 1, there is no enumeration theorem for the recursive functions. Why does the contradiction in the proof of Theorem 1 not apply in the case of the partial recursive functions? The proof applied to partial functions yields H(e,e) H(e,e) + 1, and this does not automatically give a contradiction due to the possibility that both sides may be undefined! We can summarize the situation as follows: We can diagonalize out of the class of recursive functions but not the class of partial recursive functions. Now let us turn to a systematic study of the partial recursive functions. Theorem 2. Let F be a total function (that is, F: Nn → N for some n ≥ 1). If F is recursive, then F is partial recursive; in other words, {recursive functions} ⊆ R. Proof. The conditions for a function to be recursive are stated in 6.1. These conditions are more restrictive than those given in Definition 4, and therefore a recursive function is automatically partial recursive. . By Theorem 2, familiar recursive functions such as +, −, ab , D, Cn,k , sg, csg, and so on are partial recursive. Later we will prove the converse
208
Part 2: Computability Theory
of Theorem 2, namely that if F is a total function and is also partial recursive, then F is recursive. In addition, we will give an example of a 1-ary partial recursive function that cannot be extended to a total recursive function. As in the case of recursive functions, the standard method of proving a function partial recursive is as follows. Lemma 3. Let F be an n-ary partial function. To prove that F is partial recursive, it suffices to show that F satisfies one of the following conditions: (1) (2) (3) (4)
F is a starting function; F is the composition of partial recursive functions; F is the minimalization of a partial recursive function; F is obtained by primitive recursion from total partial recursive functions.
The next theorem illustrates a typical use of Lemma 3. Theorem 3. Let Q be an n+1-ary recursive relation. Then F is partial recursive, where F(a1 , . . . , an ) µbQ(a1 , . . . , an ,b). Proof. Let KQ be the characteristic function of Q. For a ∈ Nn , F(a) µb[KQ (a,b) = 0], and therefore F is partial recursive by minimalization of a total (but not necessarily regular) recursive function. Example. The relation Q(a,b) ⇔ (a = 2 × b) is recursive, and therefore by Theorem 3 the function F(a) µb[a = 2 × b] is a partial recursive function. This is the function discussed at the beginning of this section. Induction on partial recursive functions proceeds as before. Theorem 4. Let Q be a property of partial functions. To prove that every partial recursive function has property Q, it suffices to show the following: • •
each starting function has property Q; if F is the composition of partial functions G, H1 , . . . , Hk , each of which has property Q, then F has property Q;
Chapter 6: Mathematical Model of Computability
209
•
if F is obtained by primitive recursion from total functions G and H that have property Q, then F has property Q; • if F is the minimalization of a partial function G that has property Q, then F has property Q. Theorem 5. Every partial recursive function is RM-computable. In other words, R ⊆ RM. Proof. We use induction on partial recursive functions. We already know that the starting functions S, Z, and Un,k are RM-computable. Consider composition; suppose F is defined by F(a1 , . . . , an ) G(H1 (a1 , . . . , an ), . . . , Hk (a1 , . . . , an )), where G, H1 , . . . , Hk are partial functions that are RM-computable. The program written in the proof of Theorem 1 in 5.2, call it P, has the following property (and therefore shows that F is RM-computable): If H1 , . . . , Hk are all defined at a = a1 , . . . , an , and if G is defined at H1 (a), . . . , Hk (a), then the P-computation with inputs a1 , . . . , an halts with output G(H1 (a), . . . , Hk (a)); • if one or more of the functions H1 , . . . , Hk is not defined at a, or if G is not defined at H1 (a), . . . , Hk (a), then the P-computation with inputs a1 , . . . , an does not halt. •
Next, suppose that F is obtained by primitive recursion from total functions G and H, both RM-computable. Then F is RM-computable by Theorem 2 in 5.2. Finally, suppose F is defined by F(a1 , . . . , an ) µb[G(a1 , . . . , an , b) = 0], where G is an n+1-ary partial function that is RM-computable. The proof of Theorem 7 in 5.3 shows how to write a program P such that for all a = a1 , . . . , an ∈ Nn : if there exists b ∈ N such that G(a,b) = 0 and G(a, k) > 0 for k < b, then the P-computation with input a1 , . . . , an halts and the output is b; • otherwise, the P-computation with inputs a1 , . . . , an does not halt. •
Corollary 1. Every partial recursive function is computable; in other words, R ⊆ C.
210
Part 2: Computability Theory
Corollary 2. A total function is recursive if and only if it is partial recursive. Proof. It suffices to show that every partial recursive function that is total is recursive. Let F: Nn → N be such a function. By Theorem 5, F is RM-computable. Since F is total, Theorem 2 in 6.4 applies and F is recursive. We now use the results in 6.4 on the primitive recursive relation Tn (e, a1 , . . . , an ,c) ⇔ e is the code of a program P, the P-computation with inputs a1 , . . . , an halts, and c is the code of that computation, and the primitive recursive function U(c) = output to obtain the following fundamental result. Theorem 6 (Kleene Normal Form Theorem). Let e ∈ N be the code of a program P that computes an n-ary partial function F. Then for all a ∈ Nn : (1) a ∈ dom F ⇔ ∃cTn (e,a,c); (2) F(a) U(µcTn (e,a,c)) (and therefore F is partial recursive); (3) a ∈ dom F and F(a) = b ⇔ ∃c[Tn (e,a,c) ∧ (U(c) = b)]. Proof. Let a ∈ Nn ; we consider two cases. a ∈ dom F: In this case the P-computation with input a halts with output F(a). Let c be the code of this computation. Then ∃cTn (e,a,c) and moreover F(a) = U(µcTn (e,a,c)) by Lemma 1 in 6.4. • a∈ / dom F: In this case the P-computation with input a continues forever. There is no c such that Tn (e,a,c), and both F(a) and U(µcTn (e,a,c)) are undefined. •
We now have (1), (2), and (3) as required. Corollary 3. Every partial function that is RM-computable is partial recursive. In other words, RM ⊆ R. In summary, for the class of partial functions we have R = RM ⊆ C. The Kleene Normal Form Theorem has another important application, namely that for each n ≥ 1, there is a fixed program that computes all n-ary partial recursive functions. Here is the required definition.
211
Chapter 6: Mathematical Model of Computability
Definition 5. Let n ≥ 1. The universal function for n-ary partial recursive functions is the n+1-ary partial function Un defined by Un (e, a1 , . . . , an ) U(µcTn (e, a1 , . . . , an , c)). Theorem 7 (enumeration of n-ary partial recursive functions). For each n ≥ 1, Un is an n+1-ary partial recursive function that enumerates all n-ary partial recursive functions. Thus, given any n-ary partial recursive function F, there exists e ∈ N such that F(a) Un (e,a) for all a ∈ Nn . Proof. The function Un is partial recursive by the way in which it is defined; Un enumerates by the Kleene Normal Form Theorem (Theorem 6(2) above) and the fact that every partial recursive function is RMcomputable. Corollary 4. There is a 1-ary partial recursive function that is not total and, moreover, does not have an extension to a total recursive function. Proof. The function F(a) U1 (a,a) +1 is partial recursive. Suppose there is a total recursive function G : N → N such that F(a) = G(a) for all a ∈ dom F. Since U1 enumerates, there exists e ∈ N such that for all a ∈ N, G(a) = U1 (e,a). In particular, for a = e we have G(e) = U1 (e,e). Since U1 (e,e) is defined, F(e) = U1 (e,e) + 1 and e ∈ dom F. But F(a) = G(a) for all a ∈ dom F; therefore G(e) = U1 (e,e) + 1. This contradicts the earlier calculation G(e) = U1 (e,e). By Corollary 4, we see that the class of partial recursive functions is more extensive than just the restriction of the recursive functions to subsets of Nn . Theorem 7 states that for each n ≥ 1, there is a universal function Un for the n-ary partial recursive functions. The next result states that, in a certain sense, U1 by itself is universal. Theorem 8 (universal function). The 2-ary partial function U1 is universal in the following sense: If F is any n-ary partial recursive function, then there exists e ∈ N such that for all a1 , . . . , an ∈ N, F(a1 , . . . , an ) U1 (e, a1 , . . . , an n ).
(∗)
Proof. Let F be an n-ary partial recursive function. Define a 1-ary partial recursive function F* by F*(a) F(β(a,0), . . . , β(a, n − 1)). Since U1 enumerates the 1-ary partial recursive functions, there exists e ∈ N such
212
Part 2: Computability Theory
that for all a ∈ N, F*(a) U1 (e,a). To see that (*) holds, let a1 , . . . , an ∈ N, let a = a1 , . . . , an n , and use F*(a) U1 (e,a). Theorem 8 suggests the existence of a universal RM-program P*; such a program has the following property: Given a program P with e = # P and a1 , . . . , an ∈ N, •
if the P-computation with inputs a1 , . . . , an halts, then the P*-computation with inputs e and a1 , . . . , an n halts with the same output; • if the P-computation with inputs a1 , . . . , an does not halt, then the P*-computation with inputs e and a1 , . . . , an n does not halt. The first such universal program was constructed by Turing (for Turing machines) and is referred to as a universal Turing machine. Informally speaking, and assuming the Church-Turing Thesis, a universal Turing machine is a list of instructions for calculating all computable functions. These ideas anticipate the existence of modern-day computers that can be programmed to perform a wide range of applications.
RE Relations We now study the interplay between RE relations and partial recursive functions. A major application is another proof of the fundamental result that there is an RE relation that is not recursive. Theorem 9 (characterization of RE relations). Let R be an n-ary relation. Then R is RE if and only if R is the domain of a partial recursive function. Proof. First assume R is RE. Then there is a recursive relation Q such that for all a1 , . . . , an ∈ N, R(a1 , . . . , an ) ⇔ ∃bQ(a1 , . . . , an , b). The required partial recursive function is F(a1 , . . . , an ) µbQ( a1 , . . . , an ,b) (see Theorem 3). Now assume that R is the domain of an n-ary partial recursive function F. In this case we have R(a) ⇔ a ∈ dom F for all a ∈ Nn . Let e be the code of a program that computes F. By the Kleene Normal Form Theorem, a ∈ dom F ⇔ ∃cTn (e,a,c) for all a ∈ Nn . Thus R(a) ⇔ ∃cTn (e,a,c) and so R is RE as required. Theorem 10 (enumeration of RE relations). For each n ≥ 1, ∃cTn (e, a1 , . . . , an ,c) is an n+1-ary RE relation that enumerates all n-ary RE
Chapter 6: Mathematical Model of Computability
213
relations. Thus, given any n-ary RE relation R, there exists e ∈ N such that for all a1 , . . . , an ∈ N, R(a1 , . . . , an ) ⇔ ∃cTn (e, a1 , . . . , an , c). Proof. It is obvious that ∃cTn (e,a,c) is RE. To prove enumeration, let R be an n-ary RE relation. By Theorem 9, R is the domain of an nary partial recursive function F and thus R(a) ⇔ a ∈ dom F. Let e be the code of a program that computes F. By the Kleene Normal Form Theorem, a ∈ dom F ⇔ ∃cTn (e,a,c) and therefore (*) holds as required. Theorem 11 (a fundamental theorem of computability theory). There is an RE relation that is not recursive. Proof. By the previous theorem, ∃cT1 (e,a,c) enumerates all 1-ary RE relations. Diagonalize this relation as follows: K(a) ⇔ ∃cT1 (a,a,c). Clearly K is RE. But K cannot be recursive. For if K is recursive, then ¬ K is also recursive, and therefore by enumeration there exists e ∈ N such that for all a ∈ N, ¬ K(a) ⇔ ∃cT1 (e,a,c). For a = e we obtain ¬ K(e) ⇔ ∃cT1 (e,e,c), and this contradicts K(e) ⇔ ∃cT1 (e,e,c). Definition 6. Let F be an n-ary partial function. The graph of F is the n+1-ary relation GF defined as follows: for a ∈ Nn and b ∈ N, GF (a, b) ⇔ F(a) b ⇔ a ∈ dom F and F(a) = b. Theorem 12 (graph). Let F be an n-ary partial function. Then F is partial recursive if and only if the graph GF of F is RE. Proof. First assume that F is partial recursive and let e be the code of a program that computes F. By the Kleene Normal Form Theorem (Theorem 6(3)) we have for all a ∈ Nn and b ∈ N: a ∈ dom F and F(a) = b ⇔ ∃c[Tn (e, a, c) ∧ (U(c) = b)]. Therefore GF (a,b) ⇔ ∃c[Tn (e,a,c) ∧ (U(c) = b)] and GF is RE as required. Now assume that GF is RE. For all a ∈ Nn and b ∈ N: GF (a, b) ⇔ a ∈ dom F and F (a) = b; GF (a, b) ⇔ ∃cR(a, b, c), where R is an n + 2-ary recursive relation R.
214
Part 2: Computability Theory
From these two results we have: (1) a ∈ dom F ⇔ ∃dR(a,β(d,0),β(d,1)); (2) if R(a,β(d,0),β(d,1)), then F(a) = β(d,0). It follows that F(a) β(µd[R(a, β(d,0), β(d,1))],0), and therefore F is partial recursive as required. Lemma 4. If F is computable, then GF is semi-decidable. Proof. Assume F is an n-ary partial function that is computable in the intuitive sense. The following algorithm shows that GF is semidecidable. (1) Input a1 , . . . , an ,b and set x = 1. (2) Does the algorithm for F applied to a1 , . . . , an halt in x steps with output b? If so, print YES and halt. (3) Add 1 to x and go to instruction 2.
Theorem 13. Assume the Church-Turing Thesis. Then C = RM = R. Proof. We already know that RM = R and RM ⊆ C. So the proof is complete if we can show that every partial function F that is computable (in the intuitive sense) is in fact partial recursive. We have: GF is semidecidable (Lemma 4); GF is RE (Church-Turing Thesis); F is partial recursive (Theorem 12).
Exercises on 6.5 1.
Prove the following: (a) (b) (c) (d)
Lemma 1; Lemma 2; Lemma 3; Theorem 4.
2.
Use Lemmas 1, 2, and 3 and Theorem 4 to give a direct proof of Corollary 1.
3.
Prove the following properties of : (a) s s; (b) if s t, then t s; (c) if s t and t u, then s u.
Chapter 6: Mathematical Model of Computability
4.
215
Let R be a 1-ary RE relation. Show that F is partial recursive, where F is defined by if R(a); 0 F(a) undefined if ¬R(a). Hint: R is the domain of a 1-ary partial recursive function.
5.
Let R be a 1-ary RE relation and let G be a 1-ary partial recursive function. Show that F is partial recursive, where F is defined by G(a) if R(a); F(a) undefined if ¬R(a). Note: a ∈ dom F if and only if a ∈ dom G and R(a).
6.
Let R be a 1-ary semi-decidable relation and let G be a 1-ary partial function that is computable . Write an algorithm to show that F is computable, where G(a) if R(a); F(a) undefined if ¬R(a).
7.
Let F be an n-ary partial function that is computable. Prove that the domain of F is a semi-decidable relation (write an algorithm).
8.
Let R be an n-ary relation. Show that R is semi-decidable if and only if R is the domain of an n-ary partial function that is computable (write algorithms).
9.
Let F be an n-ary function such that GF is semi-decidable. Show that F is computable (write an algorithm).
10.
(enumeration, negation, and diagonalization). Let be a class of relations on N (e.g., = all recursive relations). We say that admits enumeration if there is a 2-ary relation R ∈ such that, given any 1-ary relation Q ∈ , there exists e ∈ N such that for all a ∈ N, Q(a) ⇔ R(e,a). Let satisfy these two properties: • negation: if Q is 1-ary relation with Q ∈ , then ¬ Q ∈ ; • diagonalization: if Q is 2-ary relation with Q ∈ , then R ∈ , where R(a) ⇔ Q(a,a). Show that does not admit 1-ary enumeration.
216
11.
Part 2: Computability Theory
Refer to the previous exercise. For each below, state YES or NO if admits 1-ary enumeration. If NO, justify your answer. 1. 2. 3. 4.
12.
= primitive recursive relations; = recursive relations; = decidable relations; = RE relations.
Let F be a 1-ary partial recursive function. A recursive extension of F is a recursive function G : N → N such that F(a) = G(a) for all a ∈ dom F. Prove the following. (a) If F is a 1-ary partial recursive function whose domain is a recursive set, then F has a recursive extension. (b) Let F(a) µcT1 (a,a,c) × 0. Show that the domain of F is not recursive but that nevertheless F has a recursive extension.
6.6 Computability and the Incompleteness Theorem In this last section we briefly discuss the connection between Gödel’s Incompleteness Theorem and the subsequent development of computability theory by Church, Turing, Kleene, and Post. The key link is the fundamental assertion that there is an RE relation that is not recursive. Indeed, this important result is now often regarded as the abstract version of Gödel’s Theorem. Moreover, the first example of an RE relation not recursive comes from Gödel’s Incompleteness Theorem and an Undecidability Theorem of Church. Before proceeding further with these ideas, let us outline the setting of Gödel’s Theorem. Let L be the first-order language with non-logical symbols 0 (constant symbol), S (1-ary function symbol),
+ and × (2-ary function symbols), < (2-ary relation symbol).
The standard interpretation of this language is as follows: the domain is N, 0 is the number zero, S is the successor function, + and × are addition and multiplication, respectively, and < is the less than relation on N. Now consider the following nine sentences of L ; these will be the non-logical axioms for a formal system for arithmetic called recursive arithmetic (RA). RA1 ∀x1 ¬ (Sx1 = 0); RA2 ∀x1 ∀x2 [(Sx1 = Sx2 ) → (x1 = x2 )]; RA3 ∀x1 (x1 + 0 = x1 ); RA4 ∀x1 ∀x2 [x1 + Sx2 = S(x1 + x2 )];
Chapter 6: Mathematical Model of Computability
217
RA5 ∀x1 (x1 × 0 = 0); RA6 ∀x1 ∀x2 [x1 × Sx2 = (x1 × x2 ) + x1 ]; RA7 ∀x1 ¬ (x1 < 0); RA8 ∀x1 ∀x2 [(x1 < Sx2 ) → ((x1 < x2 ) ∨ (x1 = x2 ))]; RA9 ∀x1 ∀x2 [(x1 < x2 ) ∨ (x1 = x2 ) ∨ (x2 < x1 )]. We assume that these nine sentences are true in the standard interpretation and therefore RA is consistent. The following are special cases of results due to Gödel and Church respectively: •
The following 2-ary relation is recursive: PRF(a,b) ⇔ a is the code of a formula A and b is the code of a finite sequence of formulas that is a proof of A in RA.
•
The following 1-ary relation is not recursive:
THM(a) ⇔ a is the code of a formula A and A is a theorem of RA. Since THM(a) ⇔ ∃bPRF(a,b), we see that THM(a) is an RE relation that is not recursive. In summary, Gödel’s Incompleteness Theorem and Church’s Undecidability Theorem give rise to a natural relation that is RE but not recursive. We now show how the existence of an RE relation not recursive can be used to prove a weak version of Gödel’s Theorem. We will freely use the Church-Turing Thesis throughout. It is well known that RA is not complete; that is, there are sentences A of L such that neither A nor ¬ A is a theorem of RA (for example, ∀x(0 × x = 0)). This leads to the following idea. A set of sentences of L is an axiomatized extension of RA if it satisfies these properties: {RA1, . . . , RA9}⊆ ; there is an algorithm that, given a sentence A of L , decides, YES or NO, if A ∈ ; • all sentences in are true in the standard interpretation. • •
As a consequence of the third requirement, is consistent. In fact, under this assumption is ω-consistent, a somewhat stronger property than consistency that is defined as follows: if A is a formula with exactly one free variable x such that Ax [0a ] is a theorem of for all a ∈ N, then ¬ ∀xA is not a theorem of . The following lemma is a special case of a more general result due to Gödel. In the proof we will assume the following important theorem of Gödel: every recursive relation is expressible in recursive arithmetic (definition given below).
218
Part 2: Computability Theory
Lemma. Let be an axiomatized extension of RA and let R be a 1-ary RE relation. Then there is a formula A with exactly one free variable x such that for all a ∈ N, R(a) ⇔ Ax [0a ]. Outline of Proof. Since R is RE, there is a 2-ary recursive relation Q such that R(a) ⇔ ∃bQ(a,b) for all a ∈ N. Since Q is recursive, Q is expressible (due to Gödel), and therefore there is a formula B with exactly two free variables x and y such that for all a,b ∈ N, (1) if Q(a,b), then RA Bx,y [0a , 0b ] (and therefore Bx,y [0a , 0b ]); (2) if ¬ Q(a,b), then RA ¬ Bx,y [0a , 0b ] (and therefore ¬ Bx,y [0a , 0b ]). The proof is complete if we can show that for all a ∈ N, R(a) ⇔ ∃yBx [0a ]. The proof in the direction ⇒ follows from (1) and ∃-generalization. For the direction ⇐, assume ∃yBx [0a ], which can be written as ¬ ∀¬ Bx [0a ]. It follows from ω-consistency that for some b ∈ N, ¬ Bx,y [0a ,0b ] is not a theorem of . By (2), ¬ Q(a,b) fails, and therefore Q(a,b). Thus R(a) holds as required.
Theorem (Gödel). Let be an axiomatized extension of recursive arithmetic. Then is not complete (there is a sentence A such that A and ¬ A). Proof. Assume by way of contradiction that is complete. Let R be a 1-ary relation that is RE but not recursive. By the lemma, there is a formula A with exactly one free variable x such that for all a ∈ N, R(a) ⇔ Ax [0a ]. From this it follows that ¬R(a) ⇔ ¬Ax [0a ]. To see this, use complete for the direction ⇒ and consistent for the direction ⇐. Finally, since is axiomatized, there is an algorithm for identifying the axioms of . It follows that we can list all proofs using , and therefore ¬ R is semi-decidable. By the Church-Turing Thesis, ¬ R is RE. We now have that both R and ¬ R are RE and so R is recursive, a contradiction of the choice of R. Note: See [4] for a much deeper discussion of the Incompleteness Theorem.
219
Chapter 6: Mathematical Model of Computability
List of Symbols N = {0, 1, 2, . . . , n, . . .}, Nn = {a1 , . . . , an : a1 , . . . , an ∈ N}, a = a1 , . . . , an ; a ∈ Nn Z = {0, ±1, ±2, . . . , ±n, . . .} , : algorithms
L : first-order language
F, G, H: n-ary functions on N
Q, R: n-ary relations on N
KR : Nn → N, characteristic function of the n-ary relation R GF : graph of F, the n + 1-ary relation defined by GF (a, b) ⇔ F(a) = b Un,k (a1 , . . . , an ) = ak : n-ary projection functions, 1 ≤ k ≤ n Cn,k (a1 , . . . , an ) = k: n-ary functions with constant value k sg : N → N, defined by sg(0) = 0 and sg(a) = 1 for a = 0 csg : N → N, defined by sg(0) = 1 and sg(a) = 0 for a = 0 D : N → N, decrease function defined by D(a) = a − 1 if a > 0 and D(0) = 0 composition: F(a) = G(H1 (a), . . . , Hn (a)), where a = a1 . . . , an primitive recursion: F(a, 0) = G(a); F(a, b + 1) = H(a, b, F(a, b)), where a = a1 , . . . , an minimalization: F(a) = µb[G(a, b) = 0], where µb[. . .] = the smallest b ∈ N such that [. . . ] I1 , . . . , It : RM program with t instructions S(k): increase instruction
D(k): decrease instruction
220
Part 2: Computability Theory
Z(k): zero instruction
B(k,q): branch instruction
FOR(k) . . . END: loop instructions for LRM HLT(e, a) ⇔ e is the code of a program P and the P-computation with input a halts K(e) ⇔ e is the code of a program P and the P-computation with input e halts Tn (e, a1 , . . . , an , c) ⇔ e is the code of a program P and c is the code of a P-computation with inputs a1 , . . . , an that halts s t: if s is defined, then t is defined and s = t; if t is defined, then s is defined and s = t
References [1] Cutland, N. J. Computability, Cambridge University Press, Cambridge, 1980. [2] Davis, M. The Undecidable, Dover Publications, Inc., New York, 1993. [3] Hardy, G. H. Mathematical Proof, Mind 38 (1929), 1–25. Collected works, volume 7. [4] Hodel, R. An Introduction to Mathematical Logic, Dover Publications, Inc., New York, 2013. [5] Matiyasevich, Y. Hilbert’s Tenth Problem, MIT Press, Cambridge, MA, 1993. [6] Meyer, A. R., and Ritchie, D. M. The complexity of loop programs, Proc. Ass. Comp. Mach. Conf. 22 (1967), 465–469. [7] Odifreddi, P. Classical Recursion Theory, North Holland, Amsterdam, 1989. [8] Post, E. Formal reductions of the general combinatorial decision problem, Amer. J. Math. 65 (1943), 197–215. [9] Rado, T. On a simple source for non-computable functions, Proceedings of the Symposium on Mathematical Theory of Automata, Polytechnic Press, Brooklyn, 1963. [10] Shepherdson, J. C., and Sturgis, H. E. Computability of recursive functions, J. Assoc. Comp. Mach. 10 (1963), 217–55. [11] Shoenfield J. R. Recursion Theory, Springer-Verlag, Berlin, 1993. [12] von Neumann, J. Zur Hilbertschen Beweistheorie, Math. Z. 16 (1927), 1–46.
Note: The book by Davis is a source for important papers by Church, Gödel, Kleene, Post, and Turing.
PART 3. Philosophical Logic S. G. STERRETT
7 Non-Classical Logics 7.1 Alternatives to Classical Logic vs. Extensions of Classical Logic So far, we’ve studied some of the powerful possibilities — and some of the surprising limitations — of formal systems of logic. We’ve seen that it’s possible to automate some deductions, so that some proofs can be carried out by computer programs. In Part 1, we looked at the system of logical deduction behind the algorithms used in a core part of the Artificial Intelligence language PROLOG; that system of logical deduction is known as resolution logic. Algorithms based on resolution logic have been devised, implemented, and used to carry out proofs. Some automated question-answering systems make use of resolution logic, too. Resolution logic, though a deductive system for classical logic, is unlike many other deductive systems, in two ways: (i) it has no axioms, and (ii) the last line of a deduction is not a theorem. We also saw in Part 2 that, assuming a widely accepted, highly plausible, unprovable but (as yet) unrefuted hypothesis about algorithms (the Church-Turing Thesis), it has been shown that there are theoretical limits to what can be proven within a sufficiently strong formal system, and that there are theoretical limits to the capability of any algorithm to decide the truth of a statement in such a system. The result applies widely, to many different kinds of formal systems. These investigations into the possibilities and limitations of formal systems are investigations in metalogic, for the theorems and hypotheses are theorems and hypotheses about formal logical systems. In this part of the book, Part 3, after we present the alternative system of propositional logic referred to above (relevance logic), we will present a semantics for it that has four truth values (rather than two truth values, which is far more common among semantics for propositional logics). We will see that this logic lends itself to automated reasoning, too, and we’ll describe the kinds of situations and contexts in which it is especially useful. Those who invented it have argued that, in certain kinds of situations, reasoning using this kind of logic is “how a computer should think,” i.e., is the appropriate logic on which to base automated reasoning employed in such situations. Before introducing this alternative to classical propositional logic, let’s pause to reflect a bit on how the investigations into the capabilities
224
Part 3: Philosophical Logic
and limitations of logical systems were carried out.1 How are such decisive and far-reaching conclusions about logical systems obtained? The insights drawn upon to discover the possibilities and limitations of formal systems were insights about any formal logical system having certain specified characteristics. For example, one important feature appealed to in some of the proofs about decidability presented earlier (in Part 2 of this book) was that the theorems of the system be recursively enumerable. Another significant step in obtaining the farreaching conclusions presented in Part 2 was the clarification of the notions of proof and algorithm, which was done with the help of concepts from recursion theory. In Part 1, some of the proofs of theorems of metalogic connected the syntactic features of a formal system (its rules for the formation of sentences and the derivations of sentences from other sentences) with the semantics of the system (interpretations of its sentences and truth valuations of sentences). In Part 1, we also said that a formal system consists of a formal language defined by formation rules for its formulas, its axioms (if any), and its rules of inference. There are various formulations of classical logic that differ from each other in that they use different axioms or different rules of inference. The different formulations of classical logic are nevertheless regarded as equivalent because they yield the same theorems, i.e., whatever is derivable in one of them is derivable in the others. Though the propositional portion of the system used as the basis for automated theorem-proving presented in Part 1 of this book differs from other systems of classical propositional logic, it is nonetheless possible to establish within it any of the theorems derivable in other formulations of classical propositional logic. One thing that was not considered in Part 1, though, was whether there were any alternatives to classical logic: genuine alternatives to classical logic, rather than merely different but equivalent formulations of classical logic. In this section of the book, we are going to do exactly that; we are going to consider one particular alternative to classical propositional logic. The reason that this particular logic is considered an alternative to classical logic is that not all the theorems of classical 1
The presentation here loosely follows Anderson and Belnap, Entailment, Vol. 1, especially Sections 1 through 5, 16, 22, and 23 and Vol. 2, especially Sections 80, 81, and 83, and my personal class notes from a graduate seminar in relevance logic taught by Professor Belnap at the University of Pittsburgh. Credit for most of the original content is due to them; any errors or shortcomings in presenting it for the intended audience of this book are, of course, my own. I have at some points summarized parts of their commentary and at other points added commentary of my own. In some cases, I revised the notation, as the notation for lattices in different sections of their book varied slightly among different sections.
Chapter 7: Non-Classical Logics
225
logic will turn out to be theorems of it. Alternatives to classical logic are called non-classical logics. There are many of them. To help in understanding the difference between extensions of classical logic and alternatives to classical logic, recall how classical propositional logic was extended to classical first-order logic. Classical propositional logic had certain rules for well-formed formulas, axioms, and rules of inference. In order to capture the argument structure of some arguments whose structure could not be captured in propositional logic, it was necessary to add symbols for the universal quantifier (“for all”), the existential quantifier (“there exists”), and symbols for individuals, relations, and functions. Then, it was necessary to add criteria for well-formed formulas and rules of inference that included these additional symbols, as well. In the system amended to include quantifiers, all the wffs of classical propositional logic were still wffs of the amended logic. Further, it turned out that all the theorems of classical propositional logic were theorems in this extended classical system as well. To express it in the terminology of extensions and alternatives: we say that the resulting logic, with which readers of Part 1 will already be familiar—classical first-order logic—is an extension of classical propositional logic, rather than an alternative to it. There are other extensions of classical logic. For instance, there are modal logics, which are attempts to formalize and incorporate into classical logic such locutions as “it is necessary that . . .” and “it is possible that . . ..” Such extensions of classical logic add new formulas to the system, that is, formulas that aren’t formulas of classical non-modal logic. Yet, usually, when classical logic is extended to include additional formulas to come up with a modal logic, all the formulas of classical non-modal logic are also part of the modal logic, and every formula that is a theorem of classical logic is a theorem in the resulting modal logic as well. That’s why most modal systems of logic are regarded as extensions of classical logic. Another example of an extension of classical logic is deontic logic. Deontic logics introduce new symbols for concepts such as obligation and permission. So new kinds of sentences can be formed in the new (i.e., extended) formal system (e.g., “it is permissible for x to do —,” “it is obligatory upon y to do —”). Yet, as with modal logics, all the sentences of classical logic are usually also part of the logic, i.e., deontic logics are often extensions of classical logic. There are subtleties involved as one gets into more advanced logic, but we need not go into those here to understand one of the basic facts about extensions: in an extension of classical logic, all the statements of classical logic that are theorems are still theorems, whereas in an alternative, or rival, system this need not — in fact will not — be the case. Thus, the notion of an alternative to
226
Part 3: Philosophical Logic
classical logic is more radical than the notion of an extension of classical logic. The alternative to classical logic that we will be looking at is known as the logic of entailment or, sometimes, as relevance logic.2 Our presentation will be quite restricted, though: not only will we be looking at only the propositional calculus of relevance logic, as already mentioned, but we will look at only one formulation of its syntax, and at only one of a wide variety of semantics for it. Just as the propositional calculus of classical logic can be — and has been — extended to include quantifiers (symbols for “for all” and “there exists”), so the propositional calculus of relevance logic can be — and has been — extended to include quantifiers, too. Our focus in this part of the book will be on the fundamental differences in approach between relevance logic and classical logic, though. The differences between relevance logic and classical logic are not superficial. Nor are they of the sort everyone finds it easy to be neutral about; the differences have to do with the foundations of these two rival systems of propositional logic.3 Relevance logic is an attempt to do a better job than classical logic has done of formalizing the central notion in classical logic: the connective between two propositions expressed in English by the locution “if . . . then —.” It is meant as a genuine rival to classical logic in some respects.4 This alternative logic was developed by two philosophers, Alan Ross Anderson and Nuel D. Belnap, Jr.5 Table 7.1 displays the relationship between (i) classical propositional logic, (ii) the resolution logic formulation of classical propositional logic (presented in Part 1), (iii) modal logic, which is an extension of 2 Other names are occasionally used. In Australia, the terms “relevant logic” and “relevant logics” are in use. Anderson and Belnap in Entailment, present a number of different systems, including the system R (logic of relevance) and the system E (logic of entailment, which includes both relevance and necessity). 3 Susan Haack considers Anderson and Belnap’s logic to involve all three strategies of “challenge to classical metaconcepts,” “extension of the classical apparatus,” and “restriction [of the classical apparatus]” (Haack, Philosophy of Logics, p. 201). Her view puts things differently from the view presented here, but she does appreciate that in some sense relevance logic is meant as a rival to classical logic. Here, relevance logic is described as a genuine alternative. That it is appropriate to regard relevance logic as a genuine alternative follows from holding the view that Belnap endorses: i.e., that there is a very close connection between implication, entailment, and derivability. Thus, if one provides an alternative analysis of the notion of implication, one is providing an alternative to the notions of entailment and derivability as well. 4 It has become common more recently among some philosophers to call relevance logic a “substructural logic” (Restall 2008, “Substructural Logics,” Stanford Encyclopedia of Philosophy). What Restall means in calling a logic a substructural logic, in his words, is that it is non-classical but “weaker” than classical logic. 5 The propositional portion of it is presented and discussed in great detail in their book Entailment, vol. 1.
Various formulations exist. (Fitch formulation is one example; it has 13 rules.)
All wffs of the language that can be produced by the rules of deduction.
Two values: T, F.
Inference rules (deduction rules)
Theorems
Truth values
– Rules for wffs.
–Rules for wffs.
Two values: T, F.
Same as for Classical Propositional Logic.
One rule.
No axioms.
– Connectives for negation, conjunction, and disjunction.
– Connectives for negation, conjunction, disjunction, equivalence, and implication.
Various axiomatic and non-axiomatic formulations exist. (Fitch formulation has no axioms.)
– Alphabet: Letters and subscripted letters of the alphabet.
Resolution logic - prop’l part (equivalent to classical prop’l logic)
–Alphabet: Letters and subscripted letters of the alphabet.
Axioms
Language (alphabet, connectives & rules for well formed formulas (wff))
Classical propositional logic
Many different possibilities.
All the theorems of Classical Logic are theorems. In addition, all the wffs of the extended language that can be produced by the rules of deduction are theorems.
Various formulations exist. In formulations in which all the rules of Classical Logic are retained, there will be additional rules regarding added symbols.
Various axiomatic and non-axiomatic formulations exist. In formulations in which all the axioms of Classical Logic are retained, there will be additional axioms.
(ii) rules for forming wffs also cover wffs consisting of these additional symbols and rest of alphabet.
Same as for Classical Propositional Logic plus: (i) additional symbols for “it is possible that,” and “it is necessary that,” and
Modal logics (extensions of classical logic)
Many different possibilities. (Fourvalued logic is one example: T, F, Both, Neither.)
Not all the theorems of Classical Logic are theorems.
Different rules – not all the deductive rules of Classical Logic are rules of relevance logic.
Both axiomatic and non-axiomatic formulations exist. (In this book, only Fitch-style formulation is presented.)
Alphabet, Connectives, and Rules for wffs are the same as for Classical Logic.
Relevance logic (alternative to classical logic)
Table 7.1 Classical Propositional Logic, Extensions of Classical Propositional Logic, and Alternatives to Classical Propositional Logic.
228
Part 3: Philosophical Logic
classical propositional logic, and (iv) relevance logic, which is a genuine alternative to classical propositional logic. Why would anyone want to develop an alternative to classical logic, when classical logic is so widely accepted and employed? Certainly there are people who accept and teach classical propositional logic without any qualms. Many of them admit there are some oddities in classical logic, but consider them an inevitable feature of formal logic, and so not really shortcomings of classical logic. However, there are those who consider the oddities of classical logic to be shortcomings, and think it worthwhile to develop a logical system that does not have such oddities. We turn now to consider these opposing viewpoints on classical logic.
7.2 From Classical Logic to Relevance Logic 7.2.1 The (So-Called) “Paradoxes of Implication”6 Some of the oddities referred to above have to do with theorems of classical logic with which you should by now be quite familiar. Using the arrow (→) connective between two propositions to indicate the propositional connective that, translated into English, would be read “if . . . then —,” the following formulas are theorems of classical logic (where A, B are wffs): (i) A → (B → A), (ii) A → (B → B), (iii) (A ∧ ¬A) → B. Using the classical two-valued truth tables, and treating the arrow connective per the truth table in Part 1 on page 7, these formulas come out true for every combination of truth values that can be assigned to A and B; hence (by the definition of validity of a formula in classical propositional calculus) they are valid. By the completeness theorem for the propositional calculus, we know that they are thus theorems in classical propositional logic. The propositional calculus for classical logic was designed to have just that feature: that every theorem be a valid formula and that every valid formula be a theorem. So, if we accept the rules for how the arrow connective works in propositional logic, that the three statements (i), (ii), and (iii) above are theorems seems perfectly in order. They are, after all, valid formulas. What’s odd about the fact that they are theorems? 6 The discussion in this section is based in large part upon the presentation in volume 1 of Anderson and Belnap’s Entailment.
Chapter 7: Non-Classical Logics
229
The oddity of these statements shows up when we think about what they mean when translated from the formal calculus into our natural language. The oddities concern the features of relevance and/or necessity, both of which are associated with the notion of implication — at least they are in most people’s views about what the notion of implication includes. The first one, A → (B → A), would be translated into English as “if A, then, if B then A” or, in terms of the English locution “follows from,” as “if A, then A follows from B.” Since this wff is a theorem, it is to hold for any statements A and B, however unrelated they may be. Thus, this theorem says that if a given statement A is true then A follows from anything at all, i.e., “if A, then A follows from B” for any B at all. It is an odd sense of “follows from” that is in play in this theorem,7 yet there is no dispute that this is a theorem of classical logic. Looking at such theorems of classical propositional logic in terms of what instantiations of them mean after translation into our natural language, they look like oddities of logic rather than logical truths. From such a perspective, one might begin to wonder whether they should be theorems. The oddity of saying that a statement A is implied by, or follows from, any other statement, no matter how unrelated it may be to A, simply by virtue of the fact that A is true, now starts to look more like a symptom indicating that something is wrong with classical propositional logic. The second formula listed above, A → (B → B), is also easily seen to be valid using the method of truth tables, and so likewise is a theorem of classical propositional logic. Translated into English, it says that any statement whatsoever implies that B implies B. Or, put another way, that “If B, then B” is implied by any statement A. We can see the general reason why the formula A → (B → B) will always come out true using the method of truth tables in classical logic with which you are familiar: the consequent of the main connective here is (B → B), and (B → B) will come out true no matter what truth values are assigned to A and B. What seems odd is the statement A → (B → B), after it has been translated into English (or some other natural language), for it seems 7
The reader is encouraged to substitute a variety of related and unrelated sentence combinations for A and B to appreciate the nature of the objections to the theorem “if A, then A follows from B.” Common examples found in textbooks use totally unrelated sentences such as: “Grass is green” for A, and “The moon is made of cheese” for B. On this instantiation of the theorem, no one denies A, but few would find it natural to assert that it follows from the truth of A that B implies A. Other troubling examples of a slightly different nature may be constructed for sentences that are thematically related but between which one might not wish to assert dependence. For example, substituting “There is hunger in the world” for A and “You buy organic produce” for B into A → (B → A), we would get, “If there is hunger in the world, then the fact that you buy organic produce has the consequence that there is hunger in the world.”
230
Part 3: Philosophical Logic
odd that the logical truth (B → B) would be a logical consequence of any statement whatsoever — or does it? When the antecedent of a conditional statement is not relevant to its consequent, some feel that that there is something wrong, that there is a failure of relevance. When, in spite of the antecedent being relevant to the consequent, the consequent need not follow from the antecedent, some feel that there is something wrong there, too: that there is a failure of necessity. These failures are taken to indicate that something is amiss with classical logic. There is a rather standard reply to such observations about these oddities that instead explains away their oddness as a matter of a shortcoming on the part of the person who finds it odd to regard (i), (ii), and (iii) above as theorems. Anderson and Belnap call this “the Official view” (to indicate it is the answer often given by authority figures such as logic instructors in answer to naive questions), and summarize the view as follows: The two-valued propositional calculus sanctions as valid many of the obvious and satisfactory principles which we recognize intuitively as valid . . . ; it consequently suggests itself as a candidate for a formal analysis of “if . . . then —.” To be sure, there are certain odd theorems such as A → (B → A) and A → (B → B) which might offend the naïve, and indeed these have been referred to in the literature as “paradoxes of implication.” But this terminology reflects a misunderstanding. “If A, then if B then A” really means no more than “Either not-A, or else not-B or A,” and the latter is clearly a logical truth; hence so is the former. Properly understood there are no “paradoxes” of implication. . . . One may for certain purposes be interested in a stronger sense of the word [implication]. We find a formalization of a stronger sense in semantics, where “A implies B” means that there is no assignment of values to variables which makes A true and B false, . . . some rather odd things happen here too. But again nothing “paradoxical” is going on; the matter just needs to be understood properly — that’s all.8 Some philosophers accept this kind of explanation, or ones akin to it. Another view somewhat similar to what Anderson and Belnap call “the Official view” that likewise urges acceptance of these oddities sees 8
Anderson and Belnap, Entailment, Vol. 1, pp. 3–4. I have included here only the parts of the quote that relate to material implication. Anderson and Belnap also mention strict necessity and modal logic in discussing the so-called paradoxes of implication there.
Chapter 7: Non-Classical Logics
231
them as “degenerate cases” (of valid formulas) that we are obliged to accept, since they result from the definition of validity of a formula in the classical propositional calculus.9 The view has been spelled out by John Burgess, a philosopher of mathematics critical of relevance logic, and he employs an analogy with geometry to illustrate what is meant by “degenerate case”: the example of the conic section formed by a plane intersecting a double cone at a single point. Since the resulting point fits the definition of conic section, we are obliged to regard it as a conic section, albeit a degenerate case of a conic section. Analogously for entailment (i.e., logical implication), writes Burgess: Classic logic defines entailment to hold between a premise (or set of premises) and a conclusion if and only if their logical form guarantees that either the premise (or at least one element of the set of premises) is false, or the conclusion is true. The definition obliges the logician to recognize certain degenerate entailments. (Burgess 2005, p. 727) Burgess’s explanation of why he accepts the oddities discussed above in spite of acknowledging them as oddities helps us see one way in which philosophers can disagree about logic. Anderson and Belnap, of course, do not agree with the definition of entailment (i.e., implication) that Burgess provides. Thus, the disagreement is about how to formalize a certain informal notion (i.e., the informal notion of logical implication). When more than one definition is proposed to try to capture, or formalize, an informal notion, people can argue as to whether one definition is better than another at capturing an informal notion already in use. In Part 2 of this book, we looked at different ways of attempting to capture the informal notion of “computable function”: a machine model and a mathematical model. These different ways of formalizing the informal notion of “computable function” were seen as (thought to be) equivalent. In contrast, with the different attempts that classical logic and relevance logic make to capture the notion of implication (entailment), the two different ways of formalizing the informal notion are not equivalent. We know this because, as shown in Table 7.1, they do not yield the same theorems. How do people decide which of the two, if either, is right, then? Burgess appeals to the practices of mathematicians — but so do Anderson and Belnap! What Burgess argues based on mathematical practices is that, in their proofs, mathematicians appeal to some of the principles of classical logic that do not exist in relevance logic; 9
Burgess, “No Requirement.”
232
Part 3: Philosophical Logic
he especially cites the logical principle called the disjunctive syllogism, which, as we shall see, is not a principle of relevance logic.10 Mathematicians need this principle, says Burgess; others echo the concern.11 Now, there is no question that mathematicians do draw inferences that can be formalized as valid by virtue of being an instance of the disjunctive syllogism, but that does not settle the question as to whether the disjunctive syllogism is required in order to draw those inferences in every case. We shall see in Section 8.5 that the situation in relevance logic regarding inferences that are established by appeal to the disjunctive syllogism in classical logic is rather more complicated and the points more subtle than Burgess’s discussion lets on. What Anderson and Belnap point out about the practices of mathematicians, in arguing against “the Official view,” is that mathematicians actually do not accept reasoning about the logical consequences of theorems that does not require relevance of the consequent to the antecedent. They argue that mathematicians would not agree with the reasoning that, since a theorem is true, it is a logical consequence of any given proposition. Mathematicians are definitely interested in which mathematical statements and theorems logically imply others in a sense of “imply” that respects relevance. This indicates that there is a notion of logical consequence that mathematicians use (at least when discussing the logical consequences of mathematical conjectures and proofs) that does not warrant endorsing the oddities of classical logic mentioned above. There are some important results that address the concern Burgess raised about relevance logic and the disjunctive syllogism: researchers in relevance logic established results that yield a sort of variant of the disjunctive syllogism that is endorsed by relevance logicians. We shall see exactly what these results are when we discuss the disjunctive syllogism in Section 8.5. Some careful distinctions that are too subtle to summarize at this point of the book need to be made to state key points about them precisely; the important point for the discussion here about comparisons of relevance logic and classical logic is that the reasoning used by Burgess in criticizing relevance logic ignores the fact that these results about variants and/or analogues of the disjunctive syllogism in relevance logic exist.12 The reasoning used to dismiss relevance logic as a valuable alternative to classical logic based upon the fact that relevance logic does not include the disjunctive syllogism is faulty: just because 10
Burgess, “No Requirement,” p. 740. Sanford, If P, then Q. 12 Neither Burgess 2005 nor Burgess 2009 mentions that the formula (¬A ∨ B) ∧ A → E (A∧ ¬A) ∨ B is a theorem of relevance logic that could be invoked in lieu of the disjunctive syllogism in many cases. 11
233
Chapter 7: Non-Classical Logics
the disjunctive syllogism was invoked in a certain proof constructed for it in classical logic does not mean that there is no proof of it, or some desirable variant of it, in relevance logic. Relevance logic does require rejecting the oddities of classical logic mentioned earlier, but it doesn’t require that one reject the conclusion of a given proof in classical logic just because there is a proof of it in classical logic that uses a principle or theorem that is not valid in relevance logic.13 Another philosophical view is known as logical pluralism; in the words of the authors of a recent book on the subject: “Crudely put, a [logical] pluralist maintains that there is more than one relation of logical consequence.”14 Taking this approach to the notion of implication (or entailment), a pluralist’s reaction to the debate might be to give up the idea that rival logical systems are competing to capture the (i.e., a single) informal notion of implication and instead say that relevance logic and classical logic are indeed alternatives, but that they formalize different informal notions of implication.
Exercises on 7.2.1 1.
Show that each of the three statements referred to as paradoxes of implication A → (B → A); A → (B → B); and (A ∧ ¬A) → B is a tautology.15 What can be concluded from the fact that they are tautologies?
2.
Do you have any intuitions about whether any of the following tautologies are paradoxical? A → (B → A) A → (B → B) (A ∧ ¬A) → B) If so, write them down. Do you feel you can argue for those views? If so, explain how you would do so. Note: this exercise is suggested due to the value of recording your intuitions and views at this stage. It will be
13 David Sanford also makes this complaint. See Sanford, If p, then Q. The quote in Exercise 8.5 includes a quote from his book. 14 Beall and Restall, Logical Pluralism, p. 25. 15 Refer to Part 1 for the definition of a tautology.
234
Part 3: Philosophical Logic
instructive to compare them with your views after you reach the end of Part 3. (a) Construct interpretations of A → (B → A) and A → (B → B) to illustrate that interpretations of these wffs can exhibit failures of relevance. (b) Do you think it is possible to construct interpretations of A → (B → A) and A → (B → B) that do not exhibit failures of relevance in the natural language? If so, give examples and discuss how these interpretations differ from those in your answer in 3(a).
3.
7.2.2 Material Implication and Truth Functional Connectives “The Official view” described (though, of course, not endorsed) by Anderson and Belnap in the previous section defends the analysis of “if A, then B” as logically equivalent to “not-A or B,” the analysis known as material implication. Bertrand Russell argued for the appropriateness of formalizing “if . . . then —” in this way, although he explained that material implication, which is captured by this analysis, is often confused with formal implication, which is not.16 Many introductory logic courses present material implication as the logical analysis of the connective expressed in English by “if . . . then —.” One justification of using material implication as an analysis of “if . . . then —” that is given in the face of the oddity of some of the resulting theorems is that there is no choice but to put up with such oddities if our connectives are to be truth-functional. What is meant in saying that we “have no choice” in how to analyze the connective for “if . . . then —” is that, if we are to restrict ourselves to a two-valued semantics for a propositional calculus, there are only a few possible choices of truth table for the → connective. To refresh your memory, the truth table for the → connective prescribes the truth value of p → q for each combination of truth values assigned to p and q. So that the reader might understand why many have found an analysis of the arrow connective as material implication compelling, or at least acceptable, in spite of the oddities, we offer here one example of an account of how the reasoning based on an appeal to truth tables goes. The truth table for any two-place connective specifies the value of the formula consisting of two propositional variables joined by that connective, for every possible assignment of truth values to the propositional variables connected by it. If there are only two truth 16
Russell, Principles of Mathematics, Section 15.
235
Chapter 7: Non-Classical Logics
values, then the truth table for a two-place propositional connective has four rows, each row corresponding to one of the four possible assignments of truth values to the pair of propositional variables, as is illustrated in Figure 7.1 below: Figure 7.1 Truth table for any two-place connective * in two-valued semantics — no rows filled in.
p
q
p∗q
T
T
?
T
F
?
F
T
?
F
F
?
There are 16 different ways that a truth value can be assigned to a twoplace connective (i.e., there are sixteen different ways to fill out the last column using T’s and F’s.) The value of the formula p ∗ q where ∗ is the arrow connective is clearly determined for the first two rows of the truth table: p implies q means that whenever p is true, then q is true. In the truth table, then, given that we must choose between T or F : when p has the truth value T, p → q should be assigned the value for true when q is true, and the value for false when q is false. Thus, only the last two rows are still left unspecified: Figure 7.2 Truth table for arrow connective in two-valued logic — first two rows filled in.
p
q
p→q
T
T
T
T
F
F
F
T
?
F
F
?
There is another consideration, which provides some guidance as to how to assign truth values to the formula p → q. Consider a specific wff of the form p → q where antecedent and consequent are the same, e.g., (A → A) or (B → B). We would want this formula to have the value T no matter what the values of A or B. Given that we are restricted to truth values of either T or F, this consideration determines that the value of the last row of the above truth table should be T.
236
Part 3: Philosophical Logic
If we restrict ourselves to a two-valued logic, there are only two possibilities left open after the above considerations are taken into account; call them choice 1 and choice 2: Figure 7.3 Truth table for arrow connective in two-valued logic — the two possible ways to fill in the last two rows.
p
q
Choice 1
Choice 2
T
T
T
T
T
F
F
F
F
T
T
F
F
F
T
T
The choice between the two columns boils down to the question of which of the two truth values to assign to p → q when p is false and q is true. One consideration that provides guidance in this choice is the principle that the consequent of an entailment is contained in its antecedent. The material conditional does not fully respect this principle, of course, but if we would like to ensure that wffs of the form (A ∧ B) → B are deemed true when B is true, no matter what the truth value of A, then we want to at least allow that p → q can have the truth value of T when p has the value F and q has the value T. Only choice 1 allows that.17 Thus, the convention arose that the formula p → q is false only when the antecedent is true and the consequent false, and — on the assumption that there are only two truth values — it is true otherwise. Note that this is exactly the same way the column would be filled in for the wff ¬ p ∨ q. Another argument given for using the material conditional is this: All one can, or at least should, ask of logic is that it never allow one to infer a false statement from true premises. Using the truth table for ¬ p ∨ q to give the semantics of the arrow connective, and the rule of inference that one can infer q from the two premises p → q and p, one will never infer a false statement from true premises. Thus, this argument goes, taking p → q to be equivalent to ¬ p ∨ q achieves all one has a right to ask in terms of formalizing →. This is taken as sufficient justification, in spite of the recognition that taking p → q to be equivalent to ¬ p ∨ q does not capture relevance and necessity, which we associate with the notion of entailment. This seems to be the kind of 17
The consideration some logicians cite to argue for choice 1 over choice 2 (instead of the reasoning given here) is that choice 2 would result in a system in which p → q would always have the same truth value as q → p. Thanks to Anil Gupta for this point.
Chapter 7: Non-Classical Logics
237
consideration that motivated Russell to analyze the arrow connective as the material conditional, which he defined by stating that p → q is equivalent to ¬ p ∨ q. That the truth assignments for the column titled Choice 1 in the preceding table are determined by the considerations we reflected upon are also the truth assignments for the wff ¬ p ∨ q is then seen by such proponents as confirmation of the appropriateness of using the material conditional. Often, in the context of justifying use of the material conditional to formalize what is meant in English by “if . . . then —,” it is said that a connective that takes relevance between the antecedent and the consequent into account can’t be truth-functional. Actually, this is not so! This last fact — that a connective that takes relevance between the antecedent and consequent into account can be truth functional — is surprising to a lot of people. It is not even very well-known among philosophers who are not logicians. You will find the (incorrect) statement that the only truth-functional conditional is the material conditional in many an introductory logic lecture, textbook, and reference article.18 As mentioned earlier, we will see that the proof system that Anderson and Belnap came up with for relevant implication lends itself to an interpretation using a four-valued logic that is truth-functional. That is, the truth values assigned to the formula in which the main connective is the arrow is determined by the truth values of the two formulas it connects. (The four truth values can be thought of as “told true,” “told false,” “told both true and false,” and “told neither true nor false.”) The reason that it at first sight appears to some thinkers that they are more constrained than they actually are in their choice of how to analyze the → connective so that it is truth-functional is that they have restricted themselves to two truth values. There is no reason to restrict oneself to two truth values — nor to require any particular number of truth values, in fact. The four-valued logic that will be presented later in this part of the book is not dictated by the proof theory of relevance logic; there are many other semantics for relevance logic. Anderson and Belnap did not start with a semantics; the system of proof came first, and was developed based on capturing the most important features of logical implication.
18 Graham Priest’s A Very Short Introduction to Logic is one exception to such introductory treatments, in that it is a book intended to be accessible to the non-specialist that recognizes that formalizing relevance need not mean abandoning truth-functionality of the conditional.
238
Part 3: Philosophical Logic
Exercise on 7.2.2 1.
Compare the truth table definition of the arrow connective with the evaluation of “if . . . then” in natural language (e.g., English). Discuss your observations.
7.2.3 Implication and Relevance As a preliminary to developing a new proof calculus that better captures the connective “if . . . then —,” Anderson and Belnap begin by examining what is wrong with using material implication as the logical connective for implication. Their criticism of the use of material implication to capture the concept of “if . . . then” is rather pointed; to quote them verbatim: “Material implication is not a ‘kind’ of implication.” The contrast between the views can be put succinctly as follows: Defenders of the use of material implication as the connective for logical implication between propositions think that the paradoxes of material implication are only so-called paradoxes because they are no longer paradoxes when logical implication is properly understood (i.e., understood as material implication). In contrast, in Anderson and Belnap’s view, the paradoxes of material implication are only so-called paradoxes of implication because material implication is not, as they put it, any kind of implication at all. So, Anderson and Belnap take another approach. They identify what features the arrow connective (→) used for the notion of implication ought to have, and then identify a set of inference rules that results in the arrow connective that has those features. The features that the arrow connective ought to have, they say, should reflect the requirements of relevance (of the antecedent to the consequent), and of necessity of the “if . . . then —.” In their view, it is the failure of classical logic to honor the features of relevance and necessity in its formalization of implication that is responsible for the oddities that arise when the three formal statements of classical logic (i), (ii), and (iii) given earlier are translated into natural language. Anderson and Belnap developed relevance logic in the second half of the twentieth century. They were not the first to come up with an alternative to classical propositional logic in hopes of doing a better job of capturing implication, though. Much earlier, just a few years after Bertrand Russell had published Principia Mathematica, in which he presented material implication (that p → q is logically equivalent to ¬ p ∨ q) as the logical analysis of implication, C. I. Lewis criticized material implication, but on different grounds. It was not relevance that concerned Lewis; he was more concerned with what he saw as a shortcoming of the theorem in (i) above (A → (B → A)): a lack of necessity. The concern is the case in which the statement A is
Chapter 7: Non-Classical Logics
239
true contingently, as a matter of how things happen to be, and the statement B is a necessary truth. Lewis thought that reflection on such a case revealed that something was wrong with a system in which A → (B → A) was a theorem; for, if A were true, it would imply that B, a necessary truth, implied A, a contingent one. He developed a notion he called “strict implication,” and he and others eventually developed a number of formal systems for it that have become known as S, S1, S2, S3, S4, and S5. To define it, though, new symbols are introduced to denote possibility and necessity; these can be applied to propositions, including conditional propositions. For possibility, a diamond symbol is often used, which, put before a proposition p, is read in English as “it is possible that p”; for necessity, a small square, or box, is often used, which is read “it is necessary that p.” Thus, Lewis and those who developed his ideas amended classical logic to include an additional connective, a connective for strict implication. This led to the development of modal logics which, as explained earlier, are extensions of classical logic. The approach that Anderson and Belnap took is more radical in some ways. They set out to come up with a logical calculus in which the connective for implication would have both the feature of necessity and the feature of relevance. They weren’t adding a new connective to classical logic; we might think of what they were doing as correcting some of the missteps that had been taken in formalizing the notion of “if . . . then —” in classical logic. As we will see, among the resources they drew upon in coming up with the rules for the → connective so that it included both relevance and necessity were some that, in a way, were already at hand in ideas used in classical logic. Those who had judged such an endeavor to be hopeless evidently hadn’t realized that the resources to include relevance and necessity in classical logic were already at hand. The last of the three theorems listed above is (A ∧ ¬A) → B. This theorem holds a special place in logic. What it says is often stated informally as: “anything follows from a contradiction.” We say that there is a failure of relevance here, because the theorem says that B is a consequent of an antecedent upon whose truth value B’s truth value does not depend. That is, the theorem’s antecedent is not relevant to its consequent. Yet, if the arrow connective between propositions is not to be analyzed as the material conditional on the basis of such considerations, what other kind of analysis is possible? This was the challenge that Anderson and Belnap took on.19 19 Anderson and Belnap credit Wilhelm Ackermann’s 1956 paper on “strict implication” with providing them with the necessary inspiration and insights to take this challenge on.
240
Part 3: Philosophical Logic
Exercise on 7.2.3 1.
Can you think of a case in which a statement that is a translation into natural language of the form (A ∧ ¬A) → B would be a reasonable thing to say? If so, give an example of an interpretation of (A ∧ ¬A) → B and the context in which it would make sense to say it.
7.2.4 Revisiting Classical Propositional Calculus: What to Save, What to Change, What to Add? We now turn to presenting the alternative to classical logic known as relevance logic, or the logic of entailment. The discussion here loosely follows Anderson and Belnap’s published work and Belnap’s talks and graduate seminar. The intent is to include their concerns and their approach to dealing with those concerns as faithfully as is possible within the limited scope of this text. In this text, we do not presume that the reader is familiar with the major works and debates in philosophical logic that existed at the time when Anderson and Belnap first presented their work in Entailment: The Logic of Relevance and Necessity. The goal here is to present the motivation for, and the general lines of, their project to readers who may not have that specialized knowledge. As a result, however, some of the dry wit and almost all of the discussion of their contemporaries has been left aside. The reader who wishes to read the original works from which the discussion here is derived is referred to the very extensive treatment of the topic found in Entailment: The Logic of Relevance and Necessity, volumes 1 and 2. For briefer treatments of some of the key philosophical topics discussed in this section, two classic papers are Anderson and Belnap’s 1962 “The Pure Calculus of Entailment”20 and a famous paper by Belnap about the relation between deducibility and the implication connective memorably entitled “Tonk, Plonk, and Plink.”21 Though the paradoxes discussed above appeal to intuitions about the “if . . . then” locution in English, Anderson and Belnap were concerned with the notion of implication in formal logic. Implication means entailment, and so means that there is a necessary connection between antecedent and consequent. Implication, they point out, is “the heart of logic,” and it is intimately related to deducibility: if A entails B, then B is deducible from A. As pointed out earlier in this chapter, different formulations of classical propositional logic can result in different proof systems that yield 20 21
Belnap and Anderson, “The Pure Calculus of Entailment.” Belnap, “Tonk, Plonk, and Plink,” pp. 130–134.
Chapter 7: Non-Classical Logics
241
the same theorems. Some formulations are axiomatic, some are in terms of axioms plus a rule or rules of inference, and some are entirely rulebased. For the purpose of examining classical logic to see what we want to keep, what we want to discard, and what we want to revise in order that it might formalize the notion of implication appropriately, it turns out that the natural deduction rule-based system as presented by Fitch22 is especially appropriate. One reason it is especially appropriate for this purpose is that, for each connective, there is a “rule of introduction” and a “rule of elimination.” A “rule of introduction” for a connective is a rule for introducing that connective on a new line of the proof, based on lines already occurring in the proof. A “rule of elimination” for a connective is a rule for eliminating that connective from a formula on one of the lines in the proof in this sense: the line whose introduction is warranted by the rule has eliminated an instance of that connective found in a formula on an earlier line of the proof. A “connective” in propositional logic can be used to unite two propositions into a new proposition; connectives in classical propositional logic are conjunction (“and”), disjunction (“or”), and implication (“if . . . then —”). Sometimes negation (“not”) is also referred to as a connective, even though it does not unite two propositions into a new proposition, but instead can be used along with a proposition to form a new (i.e., different) proposition. The rules are given in the form of diagrams. Each diagram shows the lines of a proof needed to justify introducing the new line warranted by that rule. (The proofs are sometimes called intelim proofs because the rules have the form of warranting the introduction or the elimination of a connective.) There are also a few other rules regarding subproofs, such as introduction of a hypothesis, and rules about repeating or reiterating lines from outside a subproof into the subproof. Since we want to focus our examination on the rules of inference related to the arrow connective used for implication, we want first to examine the introduction and elimination rules for the arrow connective, and how the arrow connective is involved in the rules about hypothesis and subproofs. We therefore begin by presenting the fragment of classical propositional logic that involves the implication connective in Fitch’s system of natural deduction. After a critique of its treatment of implication, we will modify it to come up with a new system that addresses the concerns about relevance and necessity mentioned above. 22
Fitch, Symbolic Logic. Belnap notes that natural deduction was originally due to Gentzen 1934 and Jaskowski 1934. I follow Belnap in using Fitch’s formulation; Belnap remarked that he used Fitch’s formulation because it is “an especially perspicuous variant” of natural deduction (1975, p. 5).
242
Part 3: Philosophical Logic
Exercise on 7.2.4 1.
Reflect on how we apply “and,” “or,” “not,” and “if . . . then” to statements in natural language. For instance, we would use “and” between two propositions only when we mean that each of the two propositions conjoined by it is true. What would you say for “if . . . then”? That is, when is the connective for implication between two statements meant to be used? (Note: There is no particular answer that would be considered correct here. The point of this exercise is to encourage you to reflect on the question, and to record your reflections at this point in your progression through Part 3.)
8 Natural Deduction: Classical and Non-Classical
8.1 Fitch’s Natural Deduction System for Classical Propositional Logic In Part 1, it was stated that a deductive logic has two components: a formal language and a notion of deduction. The language of sentence logic, or propositional logic, which consisted of an alphabet and rules for forming well-formed formulas, was presented there, as follows. The alphabet of the formal language for propositional logic consisted of 1. statement letters (for which we use uppercase letters, which may be subscripted): P, Q, R, P1 , Q 1 , R1 , . . .; 2. logical connectives: ∧, ∨, ¬, →, ↔; and 3. punctuation signs, which were the signs for parentheses, i.e., ) and (. The rules for well-formed formulas (wffs) were as follows: 1. Statement letters are wffs; 2. if A and B are wffs, then so are (¬A), (A ∨ B), (A ∧ B), (A → B), and (A ↔ B); and 3. the above include all the wffs; nothing else is a wff. The English labels for the connectives in this classical propositional logic are as follows: ¬ (negation), ∧ (and, conjunction), ∨ (or, disjunction), → (implication), and ↔ (equivalence, if and only if, iff), and the rules for closure are those given in Part 1. In this part of the book, we may sometimes refer to statement letters as propositional variables, and we use lowercase letters such as p, q, r, s as variables ranging over statement letters. In this starting segment of Part 3, we are concerned with appropriate rules for the implication connective. We will eventually want a system that contains other connectives, too, but we begin by narrowing our focus to how the connective for implication is treated, because it is the paradoxes of implication that spurred the inquiry into alternatives to classical propositional logic. Thus, for the moment we leave aside the rules for the other propositional connectives. We will return to consider them after we have developed an alternative treatment of the
244
Part 3: Philosophical Logic
connective for implication that addresses relevance and necessity. At that time we will also raise and address questions about the interaction of the rules for the new, alternative, implication connective and the rules for the other connectives. We start the search for an alternative connective for implication by examining and comparing some existing systems of deduction called “pure implicational systems”; these are systems of propositional logic in which the only connective is the connective meant to capture the notion of implication. We designate the connective for implication in a system by the arrow symbol. Thus the arrow symbol in one system will not necessarily follow the same rules as the arrow symbol in another system. The wffs in pure implicational systems of propositional logic are defined more restrictively than they are for propositional logics containing other connectives in addition to the arrow. In a pure implicational system, a wff is defined as follows: 1. All propositional variables (statement letters) are formulas; and 2. if A and B are wffs, then so is (A → B); and 3. nothing else is a wff. Fitch’s definition of a formal proof fits the definition of an (affirmative) deduction as given in Part 1 (in Section 1.2), for a Fitch-style proof is “a finite sequence of items (usually written as a vertical column or list) such that the items which are the hypotheses precede all the others, and such that each item satisfies at least one of the following three conditions: (1) It is an axiom of the system, or (2) It is a direct consequence of preceding items of the sequence, or (3) It is a hypothesis of the sequence.” The “items” are the wffs of the system. Notice that, in this definition of a proof, it is not necessary that a proof make use of any axioms. Notice, too, that the second condition does not specify how one is to determine what counts as a direct consequence of the preceding items in a proof. In Fitch’s system, as in many other natural deduction systems, there are rules determining which wffs can be introduced next into the proof sequence as a direct consequence of preceding wffs of the sequence. The rules also permit introducing a hypothesis into the proof sequence; it is crucial that it be permissible to introduce hypotheses. There are rules governing exactly how this may be done, which will be described later; for now we note only that each hypothesis must be listed and designated as a hypothesis, and that there are means permitting
245
Chapter 8: Natural Deduction
the discharging of a hypothesis. If a proof contains no undischarged hypotheses, it is called a categorical proof. If a wff has a categorical proof, then it is a theorem. Fitch’s system also includes a proof procedure called “the method of subordinate proofs,” which allows for “writing one proof as part of another,” as he put it.1 This consists in drawing a vertical line on the lefthand side of the list of lines of a proof to indicate the extent, or scope, of each hypothesis. A subordinate proof is also referred to as a subproof. We shall see how this method is used in the course of discussing the rules for → Introduction (arrow introduction or entailment introduction), → Elimination (arrow elimination or entailment introduction), and the use of Hypothesis. The rule of → Introduction in Fitch’s system for propositional logic employs the definition of a hypothetical proof. The rule says: Suppose we have in hand a hypothetical proof, i.e., a valid deduction of the form: A
Hypothesis
... ... ... B
(justified line of proof),
where “|A,” “|B,” and “| . . .,” along with the vertical line indicating they are within the scope of the hypothesis A, indicate lines of the sequence forming a formal proof. Then we may assert A → B as a consequence of this sequence of lines in a proof. We indicate this rule as follows: A
Hypothesis
... ... ... B A→ B
(justified line of proof) → Introduction (entailment introduction).
That is, this rule permits us to assert, as the next line of a proof, “A → B,” without the vertical bar associated with the hypothesis A. 1
Fitch, Symbolic Logic, p. 20.
246
Part 3: Philosophical Logic
It is important that the line introduced using the rule (“A → B”) does not include the vertical bar associated with the hypothesis A. That it does not include the vertical bar associated with hypothesis A expresses that A → B is not within the scope of the hypothesis A. A → B holds independently of whether or not hypothesis A holds. The entailment A → B does not depend on the truth value of A. This is consistent with the meaning of entailment. This rule captures the idea that we ought to be able to assert (A → B) whenever B is deducible from A. That there is a valid deduction of B from A is just what the arrow connective should indicate: that is, that B follows from A, i.e., that A entails B. The rule of → Elimination in Fitch’s system can be expressed as follows: A→ B
(justified line of proof)
...
(justified lines of proof)
A
(justified lines of proof)
... B
→ Elimination (entailment elimination)
where “A → B,” “B,” and “. . .” are lines for which a justification has been provided. The vertical line shown to the left of the wffs “A → B,” “B,” and “. . .” in the above proof indicates that in the proof, the wff is to the right of the vertical line associated with the hypothesis (if any) in the scope of which they were derived. The lines indicated by “| . . .” are included above only to show that the lines “|A → B” and “|B” need not be consecutive in the proof sequence in order to use this rule as a justification to enter the wff B as a new line of the proof. In fact, it is not necessary that the two lines “|A → B” and “|A” appear in any particular order. The justification for the new line, the wff B, is the existence of the two lines “|A → B” and “|A” appearing earlier in the same proof or subproof, along with an appeal to the rule of entailment elimination. The → elimination rule employs a notion of implication that involves deduction, too; it treats implication as the reverse of deducibility. It is the most basic rule in logic, known as modus ponens: if we know that A implies B, then we can deduce B from A. We have not yet said what counts as a valid deduction for this pure implicational logic. So far, we have only said how the rule of hypothesis, the rule of → introduction, and the rule of → elimination work. In order to use these rules to derive wffs containing →, we need to say
247
Chapter 8: Natural Deduction
what kinds of steps are permitted in a proof, i.e., the lines indicated by “| . . .” in the rules above. Even in a logic with wffs restricted to only the → connective, there are some other rules besides the introduction and elimination rules for the arrow that are appropriate for a deductive system. One of these is the rule of repetition. Repetition. The rule of repetition may be stated very simply: “Under the hypothesis A, we may assert A.” This is among the simplest of things that the notion of a hypothesis is meant to capture. It may seem too obvious to be worth stating. But we want to be explicit about all the rules we use in introducing lines in a proof. This “obvious” rule, along with the rule of entailment introduction, enables us to construct a valid proof of the so-called “law of identity”: 1
A
Hypothesis
2
A
Repetition
3
A→ A
1–2, Entailment introduction.
The theorem A → A may also seem too obvious to state; the point here is that any implicational system ought to permit constructing a valid proof of it. We see that using only the rules stated so far, we are able to do so. Hypothesis. The rule called hypothesis applies when you wish to introduce a new hypothesis (under which other wffs may be derived) within the scope of a hypothesis that has already been introduced. It helps make the structure of a proof clear when reasoning with suppositions. The rule is this: “In the course of a deduction, under the supposition that A (say), we may begin a new deduction, with a new hypothesis”: Hypothesis
A .. . B .. .
Hypothesis.
Thus one proof is nested inside another, or, in terms of the terminology used earlier, is considered “subordinate” to the other. The graphical representation of subordinate proofs, or “subproofs” for short, is very helpful in cognizing proof structure when invoking the rules of → introduction and → elimination.
248
Part 3: Philosophical Logic
Reiteration. The use of subproofs raises a new question: What if we wish to assert a statement we have derived under the hypothesis A inside a subproof that is subordinate to hypothesis A? Fitch’s system contains a rule specifically providing for this, the rule of reiteration, which says that a statement derived under the hypothesis A in the outer proof may be reiterated into the subproof. Illustrating the use of the rule graphically: Hypothesis
A .. . C .. . B .. .
Hypothesis
C .. .
Reiteration.
So far, we have rules for: • • • • •
The introduction of the arrow or implication connective (→ Introduction), the elimination of the arrow or implication connective (→ Elimination, which is the rule of modus ponens), the introduction of a new hypothesis (Hypothesis), Repetition, and Reiteration.
These rules can be used to derive wffs in which the main connective is the implication connective.2 One such wff was derived above: A → A. This is a necessary truth, if anything is; that it be derivable is a desirable feature of any logical system of implication. We now ask: What about the so-called paradoxes of implication? Can they be derived in this
2 Belnap shows that any proof in Fitch’s natural deduction system using only these five rules can be translated into a proof in the pure implication fragment of Hilbert’s system of intuitionistic logic. Because so much has been shown about Hilbert’s system, this fact can be used to determine certain things about Fitch’s natural deduction system. We simply mention this and refer the interested reader to Anderson and Belnap, Entailment, Vol. 1, Section 1.2.
249
Chapter 8: Natural Deduction
system? Given that our project is the development of a system in which these paradoxes of implication do not arise, if they are theorems of this system, then we would not regard these rules as properly formalizing implication as entailment. So let’s see how far attempts to derive them can get. Even if we find that these paradoxes do arise in this system of pure implication, going through the proofs in detail is helpful in that it provides some guidance as to what changes we might need to make to the rules to obtain a system in which they do not arise. We now revisit those statements we consider problematic; they are: (i) A → (B → A), (ii) A → (B → B), and (iii) (A ∧ ¬A) → B. Let us see how things go if we try to derive the first of these, i.e., A → (B → A) using these rules. We proceed as follows. We first notice that the main connective connects an antecedent A to a consequent (B → A). The proof will need to use the rule of → introduction (for there is no other way in which a wff with → as its main connective can be derived via a categorical proof other than using → introduction). That rule says that if we wish to introduce the → connective between A and (B → A), we need a valid derivation of (B → A) under the hypothesis A. Thus, we need to introduce hypothesis A, with the goal of deriving (B → A) under it. We begin by writing the lines for the wff we wish to derive (“A → (B → A)”), and the lines we would need to justify asserting it (“A” and “(B → A)”) using the rule of → introduction, as follows: Hyp.
A .. .
?
(B → A)
?
A → (B → A)
→ Introduction.
This is not yet a proof, because the lines whose justification are indicated with a question mark are not justified steps in a proof unless and until we have provided a justification for asserting or inferring them. Our goal now is to derive the line (B → A) under the hypothesis A. How should we proceed? Again, we plan a strategy that involves employing the rule of → introduction. We introduce the hypothesis B and the wff we wish to derive from it, “A,” so that our attempt to construct a proof
250
Part 3: Philosophical Logic
now looks like this: 1
Hyp.
A
2
B
Hyp.
3
A
?
4
(B → A)
→ Introduction, lines 2–3
5
A → (B → A)
→ Introduction, lines 1–4.
The task of constructing a proof has now been reduced to the problem of finding a justification for the wff “A” in the third line. Such a justification is provided by the rule of reiteration. Replacing the question mark on the third line by “Reiteration” results in a proof of the wff “A → (B → A).” This is a proof of the first so-called paradox of implication on our list, from no premises at all, using the rules for Fitch’s system of natural deduction for classical propositional logic. The second so-called paradox of implication on our list is “A → (B → B).” We proceed almost exactly as we did for the first one on our list, and find that A → (B → B) can also be derived from no premises using the rules for Fitch’s system of natural deduction for classical propositional logic. This is left as an exercise for the reader; we note that, as before, the proof employs the rule of → introduction and the rule of repetition. Thus the first two paradoxes of implication on our list can be derived in this system, whereas we aim to develop an alternative to classical propositional logic in which they are not derivable.
Exercises on 8.1 Prove the following using the Fitch-style rules for natural deduction for classical logic given in this section. (Note that we will be revising these rules to obtain an alternative logical system for proving theorems; this exercise is to help in familiarizing you with the Fitch-style proof system for classical logic in order for you to appreciate and understand the revisions that will be made.) 1. A → (B → B) (This is one of the paradoxes of implication.) 2. (A → B) → ((B → C) → (A → C)) (This is sometimes called the law of transitivity of classical logic.) 3. (A → (B → C)) → ((A → B) → (A → C)) (This is sometimes called the self-distributive law of classical logic.)
Chapter 8: Natural Deduction
251
8.2 Revisiting Fitch’s Rules of Natural Deduction to Better Formalize the Notion of Entailment — Necessity We now consider how we might revise the rules in (the pure implicational fragment of) Fitch’s system so that the requirement that the antecedent is actually used in deriving the consequent of the implication is made a part of the rule for introducing the connective for entailment, i.e., for introducing the → connective. Moreover, we wish to revise the rules in Fitch’s system in a principled manner (rather than an ad hoc way), so that the nonderivability of the paradoxes of implication arises from having selected appropriately intuitive rules. We observe that the rule of reiteration was invoked in both of the proofs above. Since the proofs of both of these objectionable theorems of classical logic used the rule of reiteration, this rule calls out for examination as a good candidate for one of the rules of classical logic that ought to be revised. Anderson and Belnap look to proofs in mathematics for clues to finding a principled manner of revising the rule of reiteration: As a start, picture an (outermost) subproof as exhibiting a mathematical argument of some kind, and reflect that in our usual mathematical or logical proofs, we demand that all the conditions required for the conclusion be stated in the hypothesis of a theorem. After the word “PROOF:” in a mathematical treatise, mathematical writers seem to feel that no more hypotheses may be introduced; and it is regarded as a criticism of a proof if not all the required hypotheses are stated explicitly at the outset.3 What about lemmas in mathematics, though? Mathematicians do sometimes invoke lemmas that are not stated at the outset of the proof as hypotheses, and they may invoke them at almost any point of a proof, including within subproofs. Mathematical lemmas4 are independently proven theorems, usually established for a specific purpose in a specific proof. Were a Fitch-style proof constructed to exhibit the use of a lemma, the lemma would be justified by using the rule of reiteration, since it would be a case of a line proven previously in a proof being used again, perhaps in a subordinate proof. Here is the crux of the issue, though: in appealing to a lemma, we are appealing to a wff that is 3
Anderson and Belnap, Entailment, Volume 1, p. 15. A mathematician would refer to a lemma as simply a “lemma” rather than as a “mathematical lemma.” We use the term mathematical lemma here only because the word lemma may be used to mean something else in other disciplines, and we aim to be unambiguous to readers from many different disciplines. 4
252
Part 3: Philosophical Logic
derivable from no premises, i.e., to a wff for which there is a categorical proof. Thus lemmas hold as a matter of logical necessity. This seems to be the rationale behind the common practice in mathematics of reiterating lemmas. Further, it is a rationale that does not extend to reiterating statements that are not matters of logical necessity, as reflected by the fact that there is no common practice of doing so. It is essentially this point that Anderson and Belnap take up in the following passage: Of course additional machinery may be invoked in the proof, but this must be of a logical character, i.e., in addition to the hypotheses, we may use in the argument only propositions tantamount to statements of logical necessity. These considerations suggest that we should be allowed to import into a deduction (i.e., into a subproof by [the rule of reiteration]) only propositions which, if true at all, are necessarily true: i.e., we should reiterate only entailments.5 This point leads to revising Fitch’s Rule of Reiteration so that it states instead that we may reiterate only entailments into a subproof, i.e., wffs of the form (C → D) (where C and D are wffs). If we try to derive the wff A → (B → A) in such a system of pure implication, i.e., one in which Fitch’s Rule of Reiteration is replaced by this revised version of the rule of reiteration, we find that a person would meet with frustration during attempts to derive this wff for arbitrary A and B. One cannot derive A → (B → A) if the rule of reiteration is restricted to reiterating only entailments. Thus, the wff corresponding to the statement that “seems to say that anything whatever has A as a logical consequence, provided only that A is true” (p. 12) is not a theorem of the system of pure implication we have developed so far. We have made some progress in developing a system in which the paradoxes of implication do not arise, in that the wff A → (B → A) is not derivable in the revised system. There is an analogous theorem we can derive in the system we have developed so far using the revised rule of reiteration, though; we can derive a wff of the form A → (B → A) for the special case in which A is an entailment, e.g., we can derive the wff: (C → D) → (B → (C → D)). Notice, however, that this wff does not say that anything at all has a particular wff as a logical consequence simply by virtue of that wff being true.
5
Anderson and Belnap, Entailment, Vol. 1, p. 15.
Chapter 8: Natural Deduction
253
Exercises on 8.2 1.
Derive the wff (C → D) → (B → (C → D)) using Fitch’s system of pure implication with the rule of reiteration replaced by the revised rule of reiteration.
2.
Attempt to derive A → (B → A) in the revised system and explain where the difficulty in doing so arises and what it is due to.
8.3 Revisiting Fitch’s Rules of Natural Deduction to Better Formalize the Notion of Entailment — Relevance What about relevance of the consequent to the antecedent? Though we have dealt with a problem of classical logic regarding necessity, we have not dealt with the problem of being able to derive wffs in which the main connective is an implication that lacks the feature of relevance. This is shown by the fact that, even with the revisions to classical logic that we have made so far, we can derive the wff (C→D)→(B→(C→D)). Thus, there is still more work to do: what we want is a system in which it is possible to derive a wff saying that A entails B only if A is relevant to B. Another consideration in coming up with an alternative system is that it is part of the meaning of “entail” that if A entails B, there exists a proof of B from A. In order to illustrate the point that any notion of implication that does not include relevance would not be recognizable as implication, Anderson and Belnap tell an imaginary story of a mathematician who uses “if . . . then” in the sense of material implication in a communication with the editor of a mathematical journal. (The story is meant to be totally imaginary, and the whole point of coming up with such a story is to make the point that it would be bizarre if any mathematician did such a thing in all seriousness. The point of imagining how the back and forth between editor and mathematician would go is to show that such behavior is not in line with the practices of mathematicians.) The mathematician proposes a conjecture, and then goes on to discuss the implications of the conjecture in the paper he is submitting to the journal: [This mathematician] writes . . . “this conjecture has connections with other parts of mathematics which might not immediately occur to the reader. For example, if the conjecture is true, then the first order functional calculus is complete; whereas if it is false, then it implies that Fermat’s last conjecture is correct.” The editor replies that . . . he can see no connection whatever between the conjecture and the “other parts of mathematics.” . . . So the mathematician replies,
254
Part 3: Philosophical Logic
“Well, I was using ‘if . . . then’ and ‘implies’ in the way that logicians have claimed I was: the first order functional calculus is complete, and necessarily so, so anything implies that fact — and if the conjecture is false it is presumably impossible, and hence implies anything. And if you object to this usage, it is simply because you have not understood the technical sense of ‘if . . . then —’ worked out so nicely for us by logicians.”6 At the time the above was written, it had been proven that the firstorder functional calculus was complete, but it had not been proven that Fermat’s Last Conjecture was correct. That a certain conjecture, true or not, has as a consequence that Fermat’s Last Conjecture is correct would have been of some interest to the mathematical community, because there were many looking for clues as to what sort of proof Fermat’s Last Conjecture might be susceptible. The editor is put in the position of responding to what we have called “the Official view,” i.e., his objections to treating implication as material implication are being treated as naivete in this imagined scenario. Reflecting on the standards to which the editor feels bound by duty reveals at least some intuitions about how implication ought to be formalized. If such a situation really were to happen in real life, how could an editor respond? Anderson and Belnap continue the imagined scenario: To this the editor counters: “I understand the technical bit all right, but it is simply not correct. In spite of what most logicians say about us, the standards maintained by this journal require that the antecedent of an ‘if . . . then —’ statement must be relevant to the conclusion drawn.”7 Their point in discussing such a scenario is to point out that a journal editor’s response in this situation would, of course, be emphatic resistance to the idea that the truth or falsity of one mathematical statement implies the truth or falsity of another on the basis that implication is material implication. Examining how people use implication when reasoning about mathematical conjectures reveals that it is fundamental to any mathematical practice that “to argue from the necessary truth of A to if B then A is simply to commit a fallacy of relevance.”8 In their proofs, mathematicians use implication in such a way that they pay attention to which hypotheses are used in deriving a conclusion. For the same reasons that it is important to mathematical practice, a logical system that aims to capture the notion of implication should, too. 6
Anderson and Belnap, Entailment, Vol. 1, p. 17. Anderson and Belnap, Entailment, Vol. 1, pp. 17–18. 8 Ibid., p. 17. 7
Chapter 8: Natural Deduction
255
In revising the Fitch system to obtain such a logical system, then, we ought to keep the requirement (of paying attention to which hypotheses are appealed to) in mind when revising rules about how wffs involving implication are justified. The rule by which wffs whose main connective is the connective for implication are justified is the rule of → introduction. The revisions that need to be made to the rule of → introduction in Fitch’s system in order to capture the feature of relevance (of the antecedent to the consequent) are motivated by the insight exemplified in the editor’s reaction to the mathematician in the fictitious story above. What we need to do is to revise the rule for → introduction so that we allow an implication (i.e., an entailment) to be justified only when the antecedent is relevant to the consequent. Guided by the insight exemplified in the story above, the sense of relevant in which we are interested here can be made more precise and a formal account of it can be given. We can make the notion of relevant implication precise by pointing out that the sense of an antecedent being relevant to a consequent important to understanding the connective for relevant implication is captured in the following requirement: in order to derive a wff in which the main connective is the arrow, there must be a proof of the consequent in which the antecedent was actually used to prove the consequent. Historically, many have found the problem of saying when one wff is relevant to another a daunting, even hopelessly unattainable, task. This is somewhat of a mystery, for the additions to Fitch’s system that need to be put into place in order to trace the use of one wff in justifying another merely make use of the information already stated in the justifications that are part of a proof in a natural deduction system. That is, in Fitch’s natural deduction system, each new line of the proof must be justified by citing a rule along with the line numbers of the wffs on previous lines fulfilling the conditions of applicability of the rule. An example may make this clearer: to justify a wff using the rule of → introduction, one must cite a hypothesis (which must be the same wff as the antecedent of the wff being justified), along with the lines of the subordinate proof that show that the wff that is the consequent of the wff being justified occurs within it, i.e., within the scope of that hypothesis. Now, we wish to revise the rule of → introduction to make it more restrictive, such that a wff can be justified by the rule of → introduction only if the antecedent of the wff being justified by the rule was actually used in deriving the consequent. This is more restrictive than the requirement for justifying an entailment in the Fitch system for classical logic, but it includes all of the restrictions for the rule in Fitch’s system. Recall that there the antecedent of the wff being justified by the rule of → introduction must be the wff that is the hypothesis
256
Part 3: Philosophical Logic
of the subordinate proof cited in the justification for applying the rule. The additional work we need to do for arrow introduction is to check whether the wffs cited to justify the consequent actually include the hypothesis among them. This is easily checked, since, in Fitch’s natural deduction system, the justifications for each line of the subordinate proof that is cited in justifying the applicability of the rule of → introduction include this information. For example, consider the following attempt at a proof in the system as we have revised it so far, i.e., Fitch’s natural deduction system, amended to use the restricted version of the rule of reiteration: 1
(C → D)
Hyp.
2
A
Hyp.
3
(C → D)
Reit., line 1
4
A → (C → D)
5
(C → D) → (A → (C → D))
→ Introduction, lines 2–3 → Introduction, lines 1–4.
The rule of → introduction is used on line 4 to justify the wff A → (C → D). The antecedent A was not used to derive the consequent C → D, so we wish to block such a step. The wff C → D does occur within the scope of the hypothesis A, however, so we cannot use being in the scope of a certain hypothesis as a criterion for whether or not a wff was used in proving another wff. How do we distinguish between the uses in which the consequent is actually derived from the antecedent, which we wish to allow, and the uses we do not wish to allow? It sounds more difficult than it is: actually, in line 4, the fact that the antecedent was not used in deriving the consequent is reflected in the fact that the justification of the consequent (wff C → D) on line 3 does not cite the antecedent (wff A) on line 2. More generally, if we were to restrict Fitch’s rule of → introduction such that the arrow can be introduced only if the justification of the consequent includes the line number of the antecedent, this use would be blocked, and line 4 could not be justified using the rule of → introduction. That the use of the rule on line 4 ought to be forbidden because of these relevance considerations undermines the later steps of the proof that refer to it, but, otherwise, the use of → introduction on line 5 would be unobjectionable from the standpoint of relevance considerations, because the justification of the consequent does cite the line number of the antecedent. This example illustrates the claim made earlier: that the information needed to justify saying that a wff that is the antecedent of an implication being justified by the rule of → introduction is relevant to the
Chapter 8: Natural Deduction
257
wff that is the consequent of that implication is already available in the Fitch-style proof in which the rule of → introduction has been invoked. All that is needed now is a system of notation that keeps track of which lines of a proof were cited in proving a particular wff, and incorporation of the notation into the rules for natural deduction. The notation provides a means of expressing the restrictions we wish to put on the Fitch formulation of the rules of natural deduction. We need to devise some way of keeping track, for each wff at each step of a proof, of all the other wffs that were used in deriving that line, prior to invoking the rule of entailment introduction. Thus, we will provide a means of tagging each hypothesis used in each step, and we will need to pay attention to these tags in all five of our rules. Because this revision involves a notation that tags each wff as it is introduced as a line of the proof, this revision of Fitch’s system will involve reformulating all of the rules, not just the rule of → introduction. If we are going to revise the rule for → introduction in this way, however, certain theorems that can be derived in Fitch’s system won’t be derivable in the revised Fitch-style system we end up with. To distinguish the two, we hereby acknowledge that we are explaining the construction of the system of the logic of entailment E, and that we will refer to a Fitch-style formulation of E as FE. The implicational part of FE will be designated by subscripting the FE with an arrow: FE→ , and the fragment of FE in which negation is added to the implication part of FE will be subscripted with both a negation sign and the arrow: FE¬→ . We will use classes of numerals for the tags on wffs; sometimes a wff will be tagged with a singleton class, sometimes with a class with more than one numeral. We will frequently refer to them as relevance indices. These tags, or classes of relevance indices, will be associated with wffs, and will be shown as subscripts of the wffs. The natural deduction proofs in relevance logic will thus contain wffs with subscripts that are classes of numerals, unlike those of Fitch’s natural deduction system for classical logic, for which the wffs are not so subscripted. Thus, the rules for natural deduction in relevance logic must address how the relevance indices by which the wffs are subscripted are to be used. The notation that Anderson and Belnap developed for tagging wffs with a subscript of a class of numerals is shown below, along with the corresponding revised rule associated with the notation, for each of the five rules in FE→ . It is crucial to note that what is subscripted is an entire wff, not a part of a wff. 1. When a hypothesis is introduced, it is tagged with a unique numeral: the hypothesis is subscripted with a unit class (singleton) containing just that numeral.
258
Part 3: Philosophical Logic
The revised rule of hypothesis is thus: “One may introduce a new hypothesis A{k} , where k should be different from all subscripts on hypotheses of proofs to which the new proof is subordinate.” 2. When the rule of → elimination (modus ponens) is used, the subscript of the wff justified by the rule should include all the numerals that appear in the subscripts of the wffs that were used to justify it. The revised rule of → elimination is thus: “From Aa and A → Bb we may infer Ba∪b , where ‘a ∪ b’ indicates a union of the two sets a and b.” 3. When the rule of → introduction is used, the antecedent of the wff to be justified must be the hypothesis of the subordinate proof cited. (Per 1. above, the hypothesis will have as its subscript a unit class, e.g., {k} where k is a numeral.) The subordinate proof cited must contain a proof of the consequent of the wff from the hypothesis, which will be indicated by the fact that the subscript of the consequent of the wff to be justified is a class containing k among its members. The revised rule of → introduction is thus: “From a proof of Ba from the hypothesis A{k} , we may infer A → Ba−{k} , provided the numeral k is in the set a.” 4. When the rules of reiteration and repetition are used, the subscripts are retained. Thus these two rules are revised by adding the qualification “retaining the subscript a” where a indicates a class of numerals. A comparison of the rules for the pure implicational fragment of classical logic in Fitch’s system of natural deduction and the rules in the revised system for the pure implicational fragment of relevance logic (the system FE→ ) is shown in Table 8.1. In Table 8.1, the arrow symbol is used in both columns, even though the rules for its use are different in the two columns. This is consistent with the usage in the book so far, but will not be continued in the remainder of the book. It is done up to this point of the book explicitly and deliberately, and for good reason: so far we have been exploring the question of how the notion associated with implication or entailment ought to be formalized, and have used the arrow to indicate whatever connective of the system is intended to formalize that notion, i.e., the notion associated with the English locution “if . . . then,” “implies,” “therefore,” “consequently,” or “follows from.” It should now be clear to the reader that the notion that the arrow formalizes in one column
259
Chapter 8: Natural Deduction
Table 8.1 Comparison of Natural Deduction Rules for the Pure Implication Portions of Classical Propositional Logic and the Logic of Entailment. F→
FE→
In Classical Logic the Fitch-style natural deduction rule is:
In The Logic of Entailment the Fitch-style natural deduction rule is:
Notation: We designate the revised system by FE→ ; F indicates a Fitch-style system of natural deduction for classical propositional logic; FE indicates a revision of F to create an alternative logical system that incorporates both necessity and relevance into the → connective, as befits the notion of entailment. The arrow subscript indicates the pure implicational fragment of FE, in which the only connective is the arrow.
→ I (Entailment Introduction) From a proof of B in the scope of hypothesis A, to infer A → B.
→ I (Entailment Introduction) From a proof of Ba on hypothesis A{k} , to infer (A → B)a−{k} , provided {k} is in a.
→ E (Entailment Elimination) From A and (A → B), to infer B.
→ E (Entailment Elimination) From Aa and (A → B)b , to infer Ba∪b .
Rep. (Repetition) A may be repeated.
Rep. (Repetition) Aa may be repeated, retaining the relevance indices a. Hyp. (Hypothesis) A step may be introduced as the hypothesis of a new subproof, and each new hypothesis receives a unit class {k} of numerical subscripts, where k is new.
Hyp. (Hypothesis) A step may be introduced as the hypothesis of a new subproof.
Reit. (Reiteration) A may be reiterated.
Reit. (Reiteration) (A → B)a may be reiterated, retaining a.
of Table 8.1 is not the same as the notion that the arrow formalizes in the other column. We will be explicit about such differences when we introduce all the rules of the natural deduction system of propositional logic that constitutes a genuine alternative to classical propositional logic (the system FE) in the next section. The only connective of the system we have introduced so far is the → connective, and there are only five natural deduction rules in all in the system we have discussed
260
Part 3: Philosophical Logic
to this point, i.e., in FE→ , which is only the pure implicational fragment of FE. At this point, the student is advised to study the comparisons laid out in Table 8.1, noting the differences between the natural deduction rules for classical logic and those for the logic of entailment, and to reflect on the motivation for and significance of those differences. In the next section, we shall discuss in more detail how to construct proofs using not only the five rules of FE→ , but all the rules of the system FE, which include connectives for conjunction of propositions and disjunction of propositions as well as for implication.
Exercises on 8.3 1.
In the schematic of a partial proof below, fill in the justifications for lines 3, 9, and 18 using Hyp., → I, or → E. Fill in the subscript for the wff on line 18. 1
...
...
2
...
... (fill in)
3
A{1}
4
...
...
5
...
...
6
...
...
7
...
...
8
B{1,6,7}
...
9
(A → B){6,7}
(fill in)
10
...
...
11
...
...
12
...
...
13
...
...
14
...
...
15
...
...
16
...
...
17
A{5}
...
18
B{
(fill in)}
(fill in)
Chapter 8: Natural Deduction
2.
261
Construct a Fitch-style proof of A → (B → B) in classical logic. Then attempt a Fitch-style proof of it in the system as we have revised it so far. Discuss the differences between the two, and the reasons for them.
8.4 The Rules of System FE (Fitch-Style Formulation of the Logic of Entailment) In this section, we discuss how to use the rules of the pure implicational calculus FE→ , and then to extend the system FE→ to include Fitch-style natural deduction rules for conjunction (“and”), disjunction (“or”), and negation. The resulting system is a genuine alternative to classical propositional logic. It is called FE; F indicates a Fitch-style natural deduction formulation, and E indicates the logical system, the logic of entailment. There are thirteen rules in all. All thirteen rules are shown in Table 8.2. Within proofs, we shall refer to the rules by their abbreviations; each rule’s abbreviation is shown on the chart in boldface. We present and discuss the rules one by one; the discussion in this section is structured around how these thirteen rules are used in constructing proofs in FE→ and FE, and we discuss them in the order in which it makes the most sense for that discussion. From now on, we indicate that the arrow is used to indicate entailment in the formal system E by using the symbol →E instead of →. In a Fitch-style formulation of a logical system, whether it be classical logic or the logic of entailment, the rule of hypothesis is one of the most basic rules used in structuring a proof. You will understand why this is so after you have gained some experience in proving wffs in such a system, if it is not already apparent to you. For one thing, the rule of hypothesis (Hyp.) must be used in order to subsequently employ the rule of entailment introduction (→E I) in a proof. In FE, as we have seen, the rule of hypothesis (Hyp.) requires that we tag the wff being put forth as a hypothesis with a unit class subscript: Hyp. A step may be introduced as the hypothesis of a new subproof, and each new hypothesis receives a unit class {k} of numerical subscripts, where k is new. The rule of entailment introduction (→E I) refers to the subscript required by Hyp.: →E I From a proof of Ba on hypothesis A{k} to infer (A →E B)a−{k} provided k is in a. The subscript a is a set of relevance indices; notice that the latter rule refers to the set of relevance indices of both the wffs it uses to justify introducing a new wff as well as the set of relevance indices of the wff
Aa may be repeated, retaining the relevance indices a. (A → B)a may be reiterated, retaining the relevance indices a. From a proof of Ba on hypothesis A{k} to infer (A → B)a−{k} provided k is in a. From Aa and (A → B)b to infer Ba∪b .
Rep. Reit. →I →E
Reiteration
Entailment introduction
Entailment elimination
From (A ∧ B)a infer Aa . From (A ∧ B)a infer Ba . From Aa to infer (A ∨ B)a . From Ba , to infer (A ∨ B)a . From A ∨ Ba , A → Cb and B → Cb , to infer Ca∪b . From A ∧ (B ∨ C)a infer (A ∧ B) ∨ Ca . From a proof of ¬Aa on the hypothesis A{k} to infer ¬Aa−{k} provided k is in a. From Ba , and a proof of ¬Bb on hypothesis A{k} where k is in b, to infer ¬Aa∪b−{k} . From ¬¬Aa to infer Aa .
∧E ∨I ∨E dist. ¬I Contrap. ¬¬E
Conjunction elimination
Disjunction introduction
Disjunction elimination
Distribution
Negation introduction
Contraposition
Double Negation Elimination
Note: The thirteen rules above are all the rules needed to yield the (Fitch-style natural deduction formulation of the) system FE, a genuine alternative to classical propositional logic. From Entailment: The Logic of Relevance and Necessity, Volume I, Alan Ross Anderson and Nuel D. Belnap, Jr., Section 23.5 (page 276). Notational changes have been made: the notation for negation has been updated from the symbol ∼ to the symbol ¬; the notation for conjunction has been updated from the symbol & to the symbol ∧.
From Aa and Ba , infer (A ∧ B)a .
∧I
Conjunction introduction
Note: the five rules above are the five rules of FE→ , the pure implicational fragment of FE.
A step may be introduced as the hypothesis of a new subproof, and each new hypothesis receives a unit class {k} of numerical subscripts, where k is new.
Repetition
Hypothesis
Statement of rule
Abbreviation of rule Hyp.
Descriptive name of rule
Table 8.2 The Rules of FE (Fitch-style natural deduction formulation of the logic of entailment)
263
Chapter 8: Natural Deduction
it justifies. These two rules together can be used to plan how to justify introducing a wff of the form A→E B in a proof. You may justify a wff of the form A→E B only under the conditions that these two rules jointly specify; among these conditions are that you must have introduced A as a hypothesis, and you must have derived B using A. Here is an example of a schematic of a proof in which a wff of the form M→E S has been justified at a certain point in the proof. The line numbers to the left indicate the numbering of the lines of the proof. Notice that the line numbers need not correspond to the numbers used in the subscripts. 1
...
2
...
3
...
4
...
[unspecified step justified by rule in FE]
5
M{2}
Hyp.
6
...
[unspecified step justified by rule in FE]
7
...
[unspecified step justified by rule in FE]
8
...
[unspecified step justified by rule in FE]
9
S{1,2,3}
[line of proof justified by rules of FE]
10
M →E S{1,3}
→E I, lines 5–9
The schematic of a subproof exhibited above as lines 5 through 10 is shown as nested immediately inside the outermost vertical line of the proof, and not subordinate to any hypothesis, but it could occur nested inside a subordinate proof, too. As in the Fitch-style formulation of classical propositional logic, vertical lines or “bars” are used to indicate the scope of a hypothesis. Every time we introduce a new hypothesis by the rule Hyp., we set it off with a new vertical bar just to the right of the rightmost vertical bar that occurs at that point in the proof. Whenever we use the →E I rule, we write the entailment being introduced (i.e., justified) next to the vertical bar just to the left of the one that was begun when we introduced the antecedent of the entailment as a hypothesis. This is referred to as “discharging” the hypothesis, and is illustrated in the schematic above, in which we used the rule Hyp. on line 5 and the rule →E I on line 10. The use of subscripts provides a way to express and enforce the additional restrictions motivated by the aim of avoiding the so-called “paradoxes of implication”: In order to justify the wff M→E S
264
Part 3: Philosophical Logic
on line 10, the subscript of M ({2}) must be a singleton subset of the subscript of S ({1, 2, 3}). Similarly, the rules tell us what subscript the derived wff M→E S is to be assigned: a straightforward application of the rules to the subscripted wffs invoked to justify M→E S by the two rules Hyp and →E I determines that the subscript of the wff M→E S is the set resulting from subtracting the subscript of M, {2}, from the set that subscripts S, {1, 2, 3}. Per the rule of →E I, then, the subscript of M→E S is computed as {1, 2, 3} − {2}, or {1, 3}. The justification for the wff M→E S on line 10 of the proof consists of citing the rule →E I and the entire subproof in which S is derived from M, which is indicated by citing lines 5 through 9. Notice where the wff M→E S is placed on line 10: it is placed so as to indicate that it is outside the scope of the hypothesis M. Hypotheses may be nested, i.e., a hypothesis may be introduced within the scope of another hypothesis. That is, in FE, as in Fitch-style proofs in classical logic, one subproof may occur within another. To illustrate, below is another schematic of a portion of a proof that shows the structure of a proof in which one hypothesis occurs within the scope of another. A, B, C, X, and Y are symbols for wffs in FE. 12 13
...
[unspecified step justified by rule in FE] Hyp.
A{1}
14
B{2}
Hyp.
15
...
[unspecified step justified by rule in FE]
16
...
[unspecified step justified by rule in FE]
17
...
[unspecified step justified by rule in FE]
18
C{4}
Hyp.
19
...
[unspecified step justified by rule in FE]
20
...
[unspecified step justified by rule in FE]
21
X{4,5,6}
[line of proof justified by rules of FE]
22
C →E X{5,6}
18–21, →E I
23
...
[unspecified step justified by rule in FE]
24
...
[unspecified step justified by rule in FE]
25
...
[unspecified step justified by rule in FE]
26
Y{1,2,3,5,6}
[line of proof justified by rules of FE]
27
B →E Y{1,3,5,6}
28
A →E (B →E Y){3,5,6}
14–26,→E I 13–27, →E I
265
Chapter 8: Natural Deduction
The above schematic of a portion of a proof illustrates how the rules of Hyp. and →E I can work in a nested manner. In an actual proof, every line would contain a wff of FE and a justification for entering that wff on that line of the proof. Hypotheses can also be introduced serially any number of times. The schematic of a portion of a proof shown below illustrates serial introduction of hypotheses, without either of the hypotheses being subordinate to the other. 6
A{1}
Hyp.
7
...
[unspecified step justified by rule in FE]
8
...
[unspecified step justified by rule in FE]
9
B{1,3}
[line of proof justified by rules of FE]
10
A →E B{3}
6–10, →E I
11
A{5}
Hyp.
12
...
[unspecified step justified by rule in FE]
13
...
[unspecified step justified by rule in FE]
14
B{5,6}
[line of proof justified by rules of FE]
15
A →E B{6}
12–15, →E I
Each time the wff is introduced as a hypothesis, it is given a new subscript, even when the same wff is used more than once. In the example above, A was assigned a subscript of {1} when introduced as a hypothesis on line 1, but was assigned a subscript of {5} when it was introduced on line 6. Likewise, whenever a wff is introduced as a new hypothesis inside a subproof — even one in which that wff is the hypothesis — it is assigned a new subscript. This should not be surprising, if it is kept in mind that subscripts have to do with keeping track of the justifications given for a particular use of a wff in the course of a particular proof, at a particular point in that proof. Whereas the rules of hypothesis (Hyp.) and entailment introduction (→E I) can be used to construct proofs of entailments, the rule of entailment elimination (→E E) is used to construct proofs that make use of entailments to derive other wffs. The rule is: →E E
From Aa and (A→E B)b to infer Ba∪b .
266
Part 3: Philosophical Logic
In terms of a schematic of a portion of a proof: i
Aa
[line of proof justified by rules of FE]
j
(A →E B)b
[line of proof justified by rules of FE]
k
Ba∪b
i, j →E E.
1 Except for the use of subscripts, this resembles the rule in classical logic known as modus ponens. Below is a specific example of the use of the rule →E E in FE: 1
...
2
...
3
M{2}
[line of proof justified by rules of FE]
4
(M →E S){1,3}
[line of proof justified by rules of FE]
5
S{1,2,3}
3, 4,→E E.
In the rule of →E E, unlike in the rule of →E I, the subscripts are not used to restrict what can be derived using the rule, as compared to the classical counterpart of the rule. Whatever the subscripts of A and (A→E B), the rule of →E E can be used to derive the wff B. The use of subscripts is imperative, though: their importance when using this rule is to keep track of what other wffs are relevant to S. They indicate which wffs were actually used in justifying S at a particular point in this particular proof. The important role of relevance indices in this example of the rule is that, in order to derive S at this point in the proof, we need to add a relevance index to S that is the union of the set of relevance indices for M and the set of relevance indices for M→E S. Per the rule of →E E for this example, the subscript of S is computed as: {2} ∪ {1, 3}, or {1, 2, 3}. Both the rules of reiteration and repetition in FE incorporate the use of subscripts on wffs. In FE, they differ slightly from each other both in when they apply and in what can be reiterated. The rule of reiteration in FE contains the restriction “reiterate only entailments,” as discussed in the section about revising classical logic to address the aspect of necessity in entailments. The rule of repetition doesn’t contain any such restriction. Below we discuss and illustrate use of these two rules. The rule of repetition (Rep.) is used to justify a wff that appears on an earlier line of a proof or subproof, within that same proof or subproof,
267
Chapter 8: Natural Deduction
without a change in the scope of hypothesis. It can be used to repeat a hypothesis. The rule is: Rep. Aa may be repeated, retaining the relevance indices a. One thing we can do using this rule is to assert A{k} under the supposition A{k} , i.e., we may repeat the hypothesis in a subproof. This is useful in proving the wff A→E A, which is known as the “law of identity,” in FE→ . We give the proof in FE→ here, so that the reader may compare it with the proof of it in FE given earlier: 1
A{1}
Hyp.
2
A{1}
1, Rep.
3
A →E A
1–2, →E I.
More generally, if Aa has been deduced under any supposition B{k} , we are entitled to infer Aa on the supposition B{k} . Illustrating these uses of the rules graphically: 1
A{1}
Hyp.
2
A{1}
1, Rep.
3
...
and Hyp.
1
B{1}
2
...
3
...
4
A{1,2,3}
[some justification according to rules of FE]
5
A{1,2,3}
4, Rep.
The rule reiteration (Reit.), like the rule of repetition (Rep.), permits one to repeat a wff that occurs on an earlier line of the proof. There is an important difference between Reit. and Rep., though: whereas Rep. is used to repeat a wff within a proof or subproof without a change in the scope of hypothesis, Reit. can be used to repeat a wff derived from a hypothesis A in an outer proof, into a subproof being constructed under another hypothesis, say B. There is a restriction on the wffs that this
268
Part 3: Philosophical Logic
rule can be applied to: only entailments can be reiterated. The reasoning behind this restriction was discussed in Section 8.2. The statement of the rule reiteration is: Reit. (Reiteration) ( A→E B)a may be reiterated, retaining the relevance indices a. These five rules are all the rules of FE→ (the pure implicational fragment of FE). Using these five rules, one can construct proofs of wffs; since the arrow is the only connective for which FE→ has an introduction rule, the arrow is the only connective that wffs in FE→ will contain. We illustrate some points about constructing proofs in FE→ with an example, and encourage the reader to carry out the associated exercises. The wff (A→E (A→E B))→E (A→E B) is known as “the law of contraction.” To construct a proof of it in FE→ , we begin by noticing that the main connective is the third occurrence of the arrow. In order to use the rule of entailment introduction (→E I) to justify the wff (A→E (A→E B))→E (A→E B), we must derive (A→E B) from the hypothesis (A→E (A→E B)). We can lay out our plan to do so, leaving the intermediate steps to be determined and filled in later. Our first step in constructing the proof looks like this: 1
(A →E (A →E B)){1}
2
... .. .
j −1 j j +1
(A →E B){1} (A →E (A →E B)) →E (A →E B)
Hyp.
? 1– j, →E I.
Steps 2 through j − 1 are left blank, since they are yet to be determined, and a question mark indicates that the justification of line j, which we aim to derive, is yet to be determined. (Of course, if actually carrying out this proof for the first time, we would not know how many lines the proof contains. The use of j − 1, j, and j + 1 to number lines in this example is meant to indicate that, in practice, lines may be left unnumbered until the proof is completed, or renumbered after the proof is completed.) The next steps in constructing the proof are focused towards filling in these parts of the potential proof. Thus we next address the question: “how might it be possible to justify line j?”; i.e., what rule can be used to justify it, and what wff must be cited along
Chapter 8: Natural Deduction
269
with the rule to do so? Notice that, unlike proofs in classical logic, in FE→ we must pay attention to the relevance indices that are required in order to use the rule; accordingly, we place the relevance indices on the wff in step j that are needed in order to justify step j + 1. Skill in outlining potential proofs that can subsequently be successfully filled in tends to come with practice, once the technique of proving things in Fitch-style systems described above is understood. It develops hand in hand with the facility one develops in using the various particular rules. One might think of carrying out this part of the task of constructing proofs as being a skillful broker between needs and resources: between identifying the wffs that are needed in order to deduce the desired wff, and the wffs that can be deduced from the resources available. Imagining the task thus, one is prodded on the one hand to think of how the desired wff could possibly be derived using the rules, and, on the other hand, to think of what can be derived from what has been hypothesized in the proof so far. The process has sometimes been described as working from both ends of the proof — from the bottom up (or “backward”) and from the top down (or “forward”) simultaneously — until the entire proof is filled in. One advantage of such a technique is that, at each step, one’s efforts are directed towards proving a specific wff with specific relevance indices. Such direction of one’s efforts can be very helpful in finding a proof, especially if the proof is complicated. After the proof is completed, evaluating a proof’s correctness requires only knowledge of the rules of FE→ in order to straightforwardly verify that each step is permitted. However, most people will find understanding and practicing this technique crucial to becoming adept at constructing proofs. To return to the proof at hand, what rule might be used to justify the wff A →E B on line j with a relevance index of 1? As with all wffs whose main connective is the arrow, the rule of entailment introduction (→E I) is one possibility to consider. A little reflection shows that none of the other four rules (Hyp., Rep., Reit., and →E E) can be used to do so using the wffs that occur above j at this point in the construction of the proof. In order to use the rule →E I to justify the wff A→E B on line j with a relevance index of 1, we need previous lines of a hypothesis A with a unit class relevance index different from 1, and a subproof in which we have derived the wff B with relevance indices of 1 and whatever relevance index is assigned to the hypothesis A. Thus we lay out the needed structure, i.e., on line 2 we introduce A as a hypothesis with the new (i.e., not previously occurring) relevance index 2, justifying that step by citing the rule Hyp., and on line j − 1 we
270
Part 3: Philosophical Logic
fill in the wff B with the subscript {1, 2}. 1
(A →E (A →E B)){1}
2
A{2}
3
... .. .
4 j −1 j j +1
B{1,2} (A →E B){1} (A →E (A →E B)) →E (A →E B)
Hyp. Hyp.
? 2– j − 1, →E I 1– j,→E I
How do we know what subscripts the wff is going to have before we know how it is going to be obtained? We don’t. The technique we are using calls for laying out a tentative strategy. At this point, we may not know whether or not we will be able to derive the wff B with subscript {1, 2} under the hypothesis A with subscript {1}. What we do know is that if we can do so, then we can justify line j. This is working backward from line j + 1. Completing the proof has been reduced to the relatively straightforward task of filling in the steps of the subproof. The completed proof is given below, to show how the rules and the relevance indices are employed in the remaining steps. 1
(A →E (A →E B)){1}
Hyp.
2
A{2}
Hyp.
3
(A →E (A →E B)){1}
1, Reit.
4
(A →E B){1,2}
2, 3,→E E
5
B{1,2}
2, 4,→E E
6 7
(A →E B){1} (A →E (A →E B)) →E (A →E B)
2–5, →E I 1–6,→E I
Further examples of proofs in FE→ are provided as exercises. Theorems of FE→ are also theorems of FE. We now turn to extending FE→ to FE. The five rules discussed so far can be used to derive theorems in FE in which the only connective that occurs is the arrow. To extend the system FE→ to the system FE, we need to add rules for the use of other connectives. Whereas FE→ , being a pure implicational system, is an alternative to other pure implicational systems, the system FE is a genuine alternative to classical propositional logic, since it contains
Chapter 8: Natural Deduction
271
wffs with connectives for the conjoining of propositions, the negation of propositions, and the disjunction of propositions, in addition to the arrow connective. The kinds of rules that are needed are: rules for introducing a wff whose main connective is among these additional three connectives, rules for eliminating one of these three main connectives in a wff (justifying other new wffs in the process), and rules for addressing wffs in which there is more than one occurrence of these three additional connectives, or in which one or more of these three additional connectives occurs along with the arrow connective. We’ll discuss the additional rules needed for negation first, then for conjunction, and finally for disjunction. Adding the rules for the negation connective to FE→ yields the system FE¬→ .9 FE¬→ warrants being called a system because it has been shown that we can add negation to the system E→ and develop a calculus of entailment and negation, independently of the connectives for conjunction and disjunction. Further, it can be shown that E¬→ is free of the fallacies of necessity and relevance; i.e., that the three fallacies we identified cannot be derived in it. We shall not go into such proofs here; the interested reader is referred to (Belnap 1975, p. 107; p. 119ff.). There are three rules associated with the connective for negation: the rule of Negation Introduction, the rule of Contraposition, and the rule of Double Negation Elimination. These rules look very much like rules of the same name in classical logic, but there are important differences; as you read and reflect on them, pay attention to the role of relevance indices in the rules. Negation Introduction (¬I). From a proof of ¬Aa on hypothesis A{k} , to infer ¬Aa−{k} , provided k is in a. Contraposition (Contrap.). From Ba and a proof of ¬Bb on the hypothesis A{k} , where k is in b, to infer ¬Aa∪b−{k} . Double Negation Elimination. (¬¬E) From ¬¬Aa to infer Aa . We can also derive a rule of Double Negation Introduction using these three rules for negation along with the rules of FE→ : From Aa to infer ¬¬Aa . We turn to considering how each of the three rules looks schematically, as used in Fitch-style natural deductions. 9 The three rules associated with negation are given in “Entailment and Negation,” Chapter 2, of Anderson and Belnap’s Entailment, Volume 1 (pp. 107–110).
272
Part 3: Philosophical Logic
Put in terms of a schematic illustration, the rule of Negation Introduction (¬I) looks like this: i
A{k} .. .
Hyp.
j
¬Aa
...
¬Aa−{k}
j +1
i, j, ¬I (provided k is in a).
Notice that the restriction that the relevance index k must be in the class of relevance indices a ensures that the derivation of ¬A cited in line j + 1 actually uses the hypothesis A. The rule says that if ¬A can be (relevantly) derived from A, then you can assert ¬A. There are some rules that are equivalent to the above rule of introduction, and so could be used in place of it. That is, the same things that can be proved in FE¬→ using the rule of negation introduction can be proved if it is replaced by either of these equivalent alternate rules, and vice versa. Two alternate equivalent rules of negation introduction are: (i) “From A →E ¬Aa to infer ¬Aa ”; and (ii) “From a proof with hypothesis A{k} containing steps Ba and ¬Bb , to infer ¬Aa∪b−{k} , provided k is in either or both of a and b.” Put in terms of a schematic illustration, the rule of contraposition (Contrap.) looks like this: i
Ba .. .
[unspecified justification]
j
A{k} .. .
Hyp.
j +m
¬Bb
...
j + m+ 1
¬Aa∪b−{k}
i, j through j + m, Contrap. (provided k is in b).
Notice that the restriction that the relevance index k must be in the class of relevance indices b ensures that the derivation of ¬B cited in line j + m actually uses the hypothesis A. The rule says that if you have derived B (with a class of relevance indices a) and if ¬B (with a class of relevance indices b) can be derived from a hypothesis A (with the single class of relevance index k), then you can assert
273
Chapter 8: Natural Deduction
¬A (with a class of relevance indices that is the union of a and b, minus the relevance index k). Consider how this rule compares with the rule of contraposition in classical logic: An important difference is the requirement that the derivation of ¬B actually uses A. The use of relevance indices within proofs in FE allows us to express this restriction in the logic of entailment. There are a number of other alternative rules that are equivalent to this rule of contraposition; they fall naturally into two groups consisting of rules of similar formats: 1. From A→E Ba to infer ¬B→E ¬Aa , From A→E ¬Ba to infer B→E ¬Aa , From ¬A→E Ba to infer ¬B→E Aa , From ¬A→E ¬Ba to infer B→E Aa ; and 2. From A→E Ba and ¬Bb to infer ¬Aa∪b , From A→E ¬Ba and Bb to infer ¬Aa∪b , From ¬A→E Ba and ¬Bb to infer Aa∪b , From ¬A→E ¬Ba and Bb to infer Aa∪b . We allow one to cite the rule of contraposition (Contrap.) and use any of these eight equivalent rules as well as the statement of the rule given first (“from Ba and a proof of ¬Bb on the hypothesis A{k} , where k is in b, to infer ¬Aa∪b−{k} ”). Put schematically, the rule of Double Negation Elimination is very simply: i
¬¬Aa .. .
[unspecified justification of line 1]
j
Aa
1, DN Elim.
As explained above, adding the three rules of negation to FE→ yields the system FE¬→ . However, neither FE→ nor FE¬→ contains wffs with connectives for conjunction or disjunction. We discuss the considerations involved in choosing the rules of introduction and elimination for each briefly before stating them below. Rules in FE for the Conjunction Connective. For conjunction of two propositions, what is clear is that we wish the rule of conjunction introduction to permit introducing the conjunction of two propositions when each of the two propositions has been derived. But how should the relevance indices be handled? Should conjunction of two propositions that have different relevance indices be permitted, or
274
Part 3: Philosophical Logic
not? Recall that we did permit rules that cited two wffs with differing relevance indices (e.g., entailment elimination and contraposition) so it is not without precedent for a rule to do so. Whether or not the rule of conjunction introduction ought to permit introducing a conjunction of two wffs on the basis of their occurrence on earlier lines of a proof on which they have different relevance indices is not a matter that need be decided in isolation of other rules of the system, though. In the approach we are describing, that of Anderson and Belnap, one consideration we ought to take into account in considering the rules for a connective is their effect on deducibility within the system as a whole. Specifically, then, we ought to consider how adding a proposed rule of conjunction introduction to FE¬→ will affect the deducibility of wffs in the system. As was emphasized in the comparisons of the rules for the systems F→ and FE→ (the comparison in Table 8.1 between Fitch formulations of the pure implicational calculi of classical logic and the logic of entailment, respectively), we aim for a system in which the so-called paradoxes of implication are not deducible. More generally, we aim for a system in which wffs known to be fallacious are not deducible.10 Thus, in determining how relevance indices for conjunction ought to be handled in any proposed rule of conjunction introduction, we consider the effect of the choice of introduction rule on the wffs that can be derived using whatever rule of conjunction elimination we choose as well. Considering both rules together, the choice becomes much clearer. First, we note that, to capture what is meant by conjunction, the rule of conjunction elimination should permit deducing either one of the conjuncts of a conjunction. The conjunct thus deduced should have the same class of relevance indices as the conjunction does, since there is no basis for selecting any of the indices in the class of relevance indices by which the conjunction is tagged over the others. In choosing how to handle relevance indices in the rule of conjunction introduction, then, it is important to ensure that it would be appropriate to tag either conjunct with the relevance indices by which the conjunction is tagged (since either conjunct can be derived directly from the conjunction). It then becomes clear that — and why — the rule of conjunction introduction in FE ought to require that both conjuncts have the same relevance indices, and that the wff resulting from conjoining them ought to be tagged with the same relevance indices as well. Were the requirement that both conjuncts of a conjunction have the same relevance indices not included in the rule of conjunction 10 For a fuller description and defense of this approach, see Belnap’s view in his 1962 paper “Tonk, Plonk, and Plink.”
275
Chapter 8: Natural Deduction
introduction, there would be no way to keep track of which relevance indices are associated with each conjunct, and thus no way to infer a conjunct from a conjunction while still maintaining the use of relevance indices. The restrictions on relevance indices in the rules for conjunction in FE ensure that the rules of conjunction introduction and conjunction elimination cannot be used, one after another, to change the relevance indices on a particular wff. To see this, consider what could be deduced were such a restriction not in place. If a conjunction could be formed by conjoining a wff with another wff that is tagged with different relevance indices, and subsequently applying the rule of conjunction elimination to the resulting conjunction, one could apply such rules in order to tag a given wff with arbitrary relevance indices! Clearly, such a feature in a logical system is to be avoided, for if relevance indices could be changed arbitrarily, it would mean that relevance indices could no longer be relied upon to hold any significance. If relevance indices could be changed arbitrarily, the notation could not be used to restrict steps in proofs that lead to deriving the dreaded paradoxes of implication. We work out an example of one alternative to illustrate: Suppose the rule instead to be that a conjunction can be formed from two conjuncts with differing relevance indices, and that the relevance indices for the conjunct be the union of the two classes of relevance indices on the conjuncts. We could then prove that (B → C) → A → (B → C), as follows (the incorrect conjunction introduction rule is on the third line of the proof; the other rules are all actual rules of FE): (B → C){1}
Hyp.
A{2}
Hyp.
A ∧ (B → C){1,2}
– INCORRECT ∧ I rule –
(B → C){1,2}
Conjunction Elimination (∧ E)
A → (B → C){1} (B → C) → (A → (B → C))
→I → I.
Allowing the introduction of a conjunction between two wffs that do not have the same relevance indices (as is done on the third line in the above deduction using such an incorrect rule) along with the conjunction elimination rule in FE would undo the work that keeping track of the relevance indices in other rules does. The fact that we would be able to derive an implication that does not respect the requirement of relevance if such a rule were added to the pure implication calculus
276
Part 3: Philosophical Logic
FE→ is a sign that something is not right with adding such a rule of conjunction introduction to FE→ . Thus, we choose the following rules for conjunction introduction and conjunction elimination: Conjunction Introduction (∧I) rule of FE. ‘From Aa and Ba , infer (A ∧ B)a .’ Put schematically: .. . i
Aa .. .
?
j
Ba .. .
?
k
(A ∧ B)a .. .
i, j ∧ I
Conjunction Elimination (∧E). ‘From (A ∧ B)a infer Aa . From (A ∧ B)a infer Ba .’ Put schematically (the conjunction elimination rule may be cited when using either one): i
(A ∧ B)a .. .
j
Aa .. .
i, ∧ E
or i
(A ∧ B)a .. .
j
Ba .. .
i, ∧ E
There are other considerations in support of the same conclusion regarding how relevance indices ought to be handled in the natural
Chapter 8: Natural Deduction
277
deduction rules for conjunction in FE. There are independent reasons, which we shall not provide here, for holding A →E (B ∧ C) to be equivalent to (A →E B) ∧ (A →E C).11 Recall that the purpose of relevance indices on a given wff was to keep track of which wffs were used in deducing the given wff, and that entailment can be regarded as the inverse of deducibility. If (B ∧ C) is entailed by A, then A is relevant to (B ∧ C). Likewise, if B is entailed by A, A is relevant to B, and if C is entailed by A, A is relevant to C. Thus, per the equivalence above (i.e., that A →E (B ∧ C) is equivalent to (A →E B) ∧ (A →E C)), what is relevant to the conjunction B ∧ C is what is relevant to each of the conjuncts. Anderson and Belnap draw on this point — “what is relevant to a conjunction is what is relevant to each conjunct”12 — to argue for the appropriateness of the choice of conjunction introduction rule in FE they made, i.e., that two propositions can be conjoined only if they have identical relevance indices. Rules in FE for the Disjunction Connective. Disjunction is somewhat more ambiguous than conjunction, as there are different senses of “or.” However, we can identify what we would like to capture in a connective for disjunction in the logic of entailment. First, we would like the rules of E to say that, from the disjunction “A ∨ B” and the two entailments “A →E C” and “B →E C,” one may infer the proposition C. There is a further question about what the rules permit regarding relevance indices, which we shall address a little later. Secondly, we also wish to capture the feature of disjunction that, if in fact we know A, we may assert A ∨ B, though how relevance indices are to be handled for such a step in a proof in FE must be addressed in the rules for disjunction, too. The rules chosen for disjunction introduction (∨I) and disjunction elimination (∨E) in FE are as follows: Disjunction Introduction (∨I). From Aa to infer (A ∨ B)a . From Ba to infer (A ∨ B)a . Disjunction Elimination (∨E). From (A ∨ B)a , A →E Cb , and B →E Cb , to infer Ca∪b . 11 This is the path taken in Anderson and Belnap 1975, p. 271. The equivalence of A →E (B ∧ C) and (A →E B) ∧ (A →E C) comes from earlier sections of the book, in which they develop axiomatic formulations of E. Since they develop the Fitch-style formulation for E after they have already developed an axiomatic formulation and proved things about E with it, they draw on their results from the axiomatic formulation. 12 Anderson and Belnap, Entailment, Vol. 1, p. 272.
278
Part 3: Philosophical Logic
Notice the restriction in the rule of ∨E that requires that the relevance indices of the two entailments A →E C and B →E C must be identical, as in the rule of conjunction introduction. However, when using the rule of ∨E, the subscripts on the two entailments need not be the same as that of the disjunction A ∨ B; as with the rule of entailment elimination, the relevance indices of the wff being inferred include the relevance indices of the wffs cited in applying the rule. Schematically, the disjunction elimination rule is: i
(A ∨ B)a .. .
j
A →E Cb .. .
m
B →E Cb .. .
n
Ca∪b
i, j, m, ∨E.
Distribution Laws. Finally, we need a rule about the distribution of conjunction over disjunction, and one about the distribution of disjunction over conjunction. Such rules allow one to transform a wff whose main connective is a conjunction into one whose main connective is a disjunction, and vice versa. Such transformations are often used in proofs in order to put wffs into a form required in order to apply a rule. In classical logic, one can derive such distribution laws from the classical analogues of the twelve rules above. However, in the logic of entailment, although we can do this for the distributive law of disjunction over conjunction, we cannot derive the distribution of conjunction over disjunction. Thus the following is added to FE, as the thirteenth rule: Distribution of Conjunction over Disjunction (dist.). From A∧(B ∨C)a to infer (A ∧ B) ∨ Ca . Put schematically: .. . i
A ∧ (B ∨ C)a .. .
j
(A ∧ B) ∨ Ca
i, dist.
Chapter 8: Natural Deduction
279
These thirteen rules yield the system FE: a Fitch-style formulation of E, the logic of entailment. They are exhibited in Table 8.2. There are also axiomatic formulations of the system E. The different formulations of E are equivalent in that they yield the same theorems: whatever can be derived in one formulation of E can be derived in all the others. The system E of which all these are formulations, however, is distinct from the system of classical propositional logic. As discussed earlier, E is a genuine alternative to classical logic: if identical vocabularies are used, the set of wffs of the languages are identical — but not everything that can be derived in classical logic can be derived in E. One of the aims we began with in describing the development of the rules of FE was that the paradoxes of implication cannot be derived in FE. In particular, we regard it a virtue of the logic of entailment that the two paradoxes that are failures of relevance, i.e., (A ∨ ¬A) → B and A → (B → A), are not derivable in it. We did so because in these implications the antecedent of the conditional need not be relevant to the consequent, and we considered the relevance of an antecedent to the consequent of an implication basic to the notion of implication. In our discussion of the fragment FE→ of FE, we compared the rules of FE and the pure implicational fragment of classical propositional logic, to highlight the features of the rules that blocked the derivation of A → (B → A). Now that we have added the rules for negation, conjunction, and disjunction, we can address the question of derivability in FE for another of the three paradoxes of implication we identified: (A∧¬A) → B. If one goes through the derivations in classical logic for (A ∧ ¬A) → B, and attempts to rewrite them using the rules of FE, one finds that the proof cannot be carried out in FE. We consider this a virtue, of course, since (A ∧ ¬A) → B is an implication in which the antecedent is not relevant to the consequent. However, considering this a virtue of FE puts us at odds with those who hold that the disjunctive syllogism should be a tenet of any acceptable propositional logic. The topic warrants a section of its own.
Exercises on 8.4 1.
Fill in the relevance indices and justifications in the proof in FE→ below.
280
Part 3: Philosophical Logic
((A →E A) →E B){
1
Hyp.
2
A{
(fill in)}
(fill in)
3
A{
(fill in)}
(fill in)
4
(A →E A){
5
B{
6 2.
(fill in)}
(fill in)}
(fill in)}
(((A →E A) →E B) →E B)
(fill in) (fill in) (fill in)
How would you begin constructing a proof of the following? ((A →E B) →E ((B →E C) →E (A →E C))) (Hint: The main connective of the wff you are trying to prove is the arrow after (A →E B).) After that first step, what would your next goal be? Explain your strategy in terms of the practice of working “from both ends.”
3.
Construct a proof of ((A →E B) →E ((B →E C) →E (A →E C))) in FE→ .
4.
Prove ((A →E B) →E ((C →E A) →E (C →E B))) in FE→ .
5.
Show that adding the rule for negation introduction— from A →E ¬Aa , to infer ¬Aa — is equivalent to adding the rule for negation introduction that was chosen for FE, i.e., the rule ¬I in FE. (Hint: Show first that you can replace any segment of a proof in FE in which you use the rule ¬I by a segment of proof that uses the rule “from A →E ¬Aa , to infer ¬Aa ,” but does not use the rule ¬I. Then show that you can replace any segment of proof in FE in which you use “from A →E ¬Aa → to infer ¬Aa ,” but not the rule ¬I, by a segment of proof that uses the rule ¬I instead.)
6.
The system FE permits deriving wffs that include negation, conjunction, and disjunction.
281
Chapter 8: Natural Deduction
(a) Can you explain why ( A ∧ ¬A) →E B, which is derivable in classical propositional logic, is not derivable in FE? If so, provide your explanation. If not, at least provide your attempt or attempts at proving it in FE, and show where these attempts lead. (b) Can you find some other theorems of classical logic containing the connectives for conjunction and disjunction that are not derivable in the system FE? If so, identify one (or more), and explain why you suspect that it is not possible to derive it in the system FE. 7.
Construct a proof of the following in FE: (A ∧ B) →E (C ∨ B).
8.
Construct a proof of the following in FE: (A ∧ B) ∨ (A ∧ C) →E (A ∧ (B ∨ C)). Briefly explain the proof strategy you employed in constructing the proof.
8.5 The Connective “Or,” Material Implication, and the Disjunctive Syllogism It was explained earlier that, in classical logic, the arrow connective is treated as material implication. In so doing, the arrow connective is related to the connectives for negation and “or,” i.e., the connectives for negation and disjunction. This is because, by the very notion of material implication, taking the arrow connective to be material implication equates “A → B” with “not-A or B.”13 If the latter formula is substituted for the former in the rule for arrow elimination (modus ponens) in classical propositional logic, a rule known as the disjunctive syllogism results. That is, if implication is taken to be material implication, as is done in classical logic, then the rule “from A → B and A, we may infer B” (modus ponens) amounts to the same rule of inference as “from notA ∨ B and A, we may infer B” (which is one form of the disjunctive syllogism). However, this is not so in the logic of entailment. In the logic of entailment, the arrow connective is taken to be relevant implication,
13
In this section we refer to the arrow connective and use not-A instead of the symbols used for negation when we are speaking about logic and reasoning in general and not about any particular system of logic.
282
Part 3: Philosophical Logic
which differs from material implication, and the disjunctive syllogism (“from A or B and not-A to infer B”) is not a permissible rule. The disjunctive syllogism (“from A or B and not-A to infer B”) and the claim that a contradiction of any sort entails any given proposition whatsoever (“from A and not-A to infer B”) are related. This is because the claim that a contradiction of any sort entails any given proposition whatsoever appeals to the disjunctive syllogism as a mode of inference in classical logic.14 The reasoning behind the claim “from A and notA to infer B” proceeds as follows in classical logic (we use the English “not” and “or” here, as our interest is in the status of the disjunctive syllogism): 1
A and not-A
Hypothesis
2
A
1, Conjunction Elimination
3
not-A
1, Conjunction Elimination
4
A or B
2, Disjunction Introduction
5
B
3, 4, Disjunctive Syllogism.
Thus, if the disjunctive syllogism is a rule in one’s logical system, then from A and not-A, one will be able to derive anything whatsoever, and the implication (A and not-A) implies B where A and B can be any wffs whatsoever, will be derivable in the system. It is thus no surprise that the disjunctive syllogism is not an admissible rule in the logic of entailment. The question is: should the disjunctive syllogism be a rule in one’s logic? If yes,15 then classical logic has the right view of it and the logic of entailment has the wrong view of it. If not, then the logic of entailment has the right view of it and classical logic has the wrong view of it. The way that Anderson and Belnap analyze the situation, there is something that both classical logic and the logic of entailment get right, but classical logic gets something wrong as well, and in the end it is seen that it is the logic of entailment that has the right view of things. Anderson and Belnap point out that there are really two different senses of “or” at work in the derivation above. In line 4, when, in classical logic, someone justifies the introduction of a disjunction based solely upon the fact that one of the disjuncts is true, they are employing 14 The argument is usually attributed to C. I. Lewis; it appears in Lewis and Langford, Symbolic Logic, pp. 250–251. 15 As discussed in earlier sections, there are philosophers (e.g., John Burgess) who do think so.
Chapter 8: Natural Deduction
283
a truth-functional sense of “or.” In line 5, however, when, in classical logic, someone employs the disjunctive syllogism, they are really employing another sense of “or”: what we might call an intensional sense of “or.” An intensional sense of “or” just means that, in a statement of the form “A or B,” the “or” is meant in the sense that it would support the subjunctive conditional: if A weren’t true, then B would be true. Some examples are: “If I hadn’t turned the light on, the room would be dark,” or “If the event had not been held outdoors, it would have been held in the auditorium of Blue Hall.” There is nothing wrong with using the intensional sense of “or” per se; we do it all the time in natural language. Using “or” in either sense is fine; the problem arises when one isn’t clear about which sense is in use, and applies inappropriate rules to a disjunction as a result. Anderson and Belnap’s critique of the paradoxical situation in classical logic that “anything follows from a contradiction” is that the rule for disjunction introduction in classical logic is appropriate only for the truth-functional sense of “or,” whereas the rule for disjunction elimination in classical logic (i.e., the disjunctive syllogism) is appropriate only for the intensional sense of “or.” There’s a mismatch in classical logic: one sense of “or” is employed for the disjunction introduction rule, and another sense is employed for the disjunction elimination rule. What does the logic of entailment do, then? The logic of entailment uses the truth-functional “or” throughout, and uses rules appropriate for it for both the disjunction introduction rule and the disjunction elimination rule. Thus, the logic of entailment permits inferring a disjunction from just one of the disjuncts, as does classical logic. This is appropriate for the truth-functional sense of “or.” However, the logic of entailment does not permit the use of the disjunctive syllogism; the disjunction elimination rule for the logic of entailment permits instead only “from A ∨ Ba , A →E Cb and B →E Cb , to infer Ca∪b .” This is a disjunction elimination rule that is appropriate for the truth-functional sense of “or,” whereas the disjunctive syllogism is not. What if the intensional “or” is meant? In the logic of entailment, this would be expressed using the arrow connective, i.e., the connective for entailment, which captures both relevance and necessity. There is no way to reduce an entailment by using the negation and disjunction connectives in the logic of entailment, though. Unlike the arrow connective in classical logic, entailment is not reducible to the material conditional, which explains why the disjunctive syllogism is not a derived rule in the logic of entailment. You may feel that there is something right about the disjunctive syllogism despite Anderson and Belnap’s critique. We saw that one of the advantages of rejecting the disjunctive syllogism was the freedom
284
Part 3: Philosophical Logic
from one of the burdens of accepting the disjunctive syllogism as a rule; the disjunctive syllogism (along with the rule that permits inferring a disjunction from just one of the disjuncts) permits deriving one of the paradoxes of implication: that from a contradiction one may infer anything at all. You may wonder, what if there are no contradictions? Wouldn’t it be unobjectionable to use the disjunctive syllogism if there were no contradictions? This is a worthy line of speculation, but on closer examination, it is hard to see how one could effectively incorporate the requirement that there be no contradictions into a rule of reasoning.16 There is some consolation to be taken in another rule that is similar to the disjunctive syllogism, however. Later in this book, after we have presented the fourvalued semantics for the logic of entailment, we will see that there is a valid implication in the logic of entailment that is somewhat similar to the disjunctive syllogism. Instead of the implication ((¬A ∨ B) ∧ A) → B which corresponds to the disjunctive syllogism and is not valid in E, we will see that the implication ((¬A ∨ B) ∧ A) → (A ∧ ¬A) ∨ B is valid in E. We have not yet explained how to show that a statement is valid in E; validity can be shown using a semantics for E, and we will provide a four-valued one for the propositional calculus for E in the next section of this book, in which the truth values are Told True, Told False, Told Both, and Told Neither. Meanwhile, we provide the above entailment and here merely state the result that it is valid in the system E. The resemblance between the above implication and the disjunctive syllogism is easy to see — the consequent is slightly different. In the latter implication, the consequent is ( A ∧ ¬A) ∨ B, rather than simply B. Here is Anderson and Belnap’s remark on the significance of the difference between the two: “This is right on target. If the reason that (A∨ B)∧¬A is getting thought of as a Truth is because A has been labeled as both told True and told False, then we certainly do not want to go around inferring B. The inference is wholly inappropriate in a context where inconsistency is a live possibility.” This remark about inconsistency being a live possibility hints that perhaps a common ground could be found between those who view 16 This point is made in the section under the heading “I’m allright, Jack” in Anderson and Belnap, Entailment, Vol. 2 in Section 81.
Chapter 8: Natural Deduction
285
the disjunctive syllogism as a good way to reason and those who don’t. Some who view the disjunctive syllogism as a good way to reason may not see that there is much difference between ((¬A ∨ B) ∧ A) → (A ∧ ¬A) ∨ B, which is valid in the logic of entailment, and ((¬A ∨ B) ∧ A) → B, which is the disjunctive syllogism and is not valid in the logic of entailment. Perhaps those who do not consider inconsistency a live possibility could be persuaded to replace the disjunctive syllogism by ((¬A ∨ B) ∧ A) → (A ∧ ¬A) ∨ B. Proponents of relevance logic have good reasons for rejecting the disjunctive syllogism, due to the critique described above, as well as for practical reasons. There are in fact contexts in which inconsistency is a live possibility, in a sense that does make a difference to which rules of reasoning are used; we consider some such contexts in the section on the four-valued semantics for E. The matter is not considered settled among philosophers, though: debates about the status of the disjunctive syllogism are still lively and sometimes impassioned. There is one more important and striking clarification that ought to be made before we leave the topic of the disjunctive syllogism. The distinction is between the existence of a proof of an entailment in E versus the existence of a proof of the consequent of an entailment. There are some careful distinctions to be drawn regarding valid entailments and the existence of proofs which, once understood, help dispel some misconceptions about the logic of entailment. While there is no proof in the logic of entailment that A and ¬A∨ B entail B, it has been shown that whenever there is a proof of A and a proof of (¬A ∨ B) in the logic of entailment, a proof of B also exists in the logic of entailment. All that can be said is that the proof of B exists, however; there is no general method of generating a proof of B from the proofs of A and (¬A ∨ B). We leave the topic of the disjunctive syllogism with a summary of what is and is not true about it in the system E (per Anderson and Belnap 1975, §25.1): 1. It is not true that A ∧ (¬A ∨ B) →E B. HOWEVER, it is true that ((¬A ∨ B) ∧ A) →E (A ∧ ¬A) ∨ B. 2. It is not true that there is a proof that A and ¬A ∨ B entail B. 3. It is not true that there is a deduction of B from premises A and ¬A ∨ B. HOWEVER, it is true that whenever A and (¬A ∨ B) then B. That is, whenever there is a proof of A and a proof of (¬A ∨ B) in the logic of entailment, then a proof of B in the logic of entailment also exists. That the items that are true of the logic of entailment in the above list are so should help dispel the discomfort that some feel about rejecting
286
Part 3: Philosophical Logic
the disjunctive syllogism as a rule of the system. It is not clear how widely appreciated these distinctions are, however.17
Exercises on 8.5 1.
Give some examples of the use of the disjunctive syllogism in everyday use that you feel are good arguments. Do you find that the “or” in the premise is the intensional or, or only the truth-functional or?
2.
Can you think of an interpretation of (((¬A ∨ B) ∧ A) →E (A ∧ ¬A) ∨ B) that would be a reasonable thing to say in some context? If so, provide the interpretation and describe the context.
3.
Imagine that you are asked to serve on a panel discussing how different logics treat the paradoxes of implication. One of the speakers is going to argue that “relevance logic rejects too much.” In the advance copy of his talk you have been given, he provides the following example of the use of the disjunctive syllogism. [My vacuum cleaner] was not where it belonged in the downstairs hall closet. Eventually, I looked for it everywhere (in the house) except in the upstairs bedroom and the storage room. (I assumed it was somewhere in the house.) Then I discovered it was not in the upstairs bedroom, and I reasoned by disjunctive syllogism: Either the vacuum cleaner is in the upstairs bedroom or it is in the storage room. The vacuum cleaner is not in the upstairs bedroom. Therefore, the vacuum cleaner is in the storage room.18 The speaker then reasons that the “or” in the first premise is not intensional, because it is not true that if the vacuum cleaner had not been in the storage room, it would have been in the upstairs bedroom. Rather, he maintains, what is true is that if the vacuum cleaner had not been in the storage room, it would have been “back where it belongs in the downstairs hall closet.” Thus, he says, Anderson and Belnap’s critique of disjunctive syllogism is not convincing to him, and he 17
The discussion by John Burgess in Burgess, “No Requirement,” and Burgess, Philosophical Logic, does not indicate an appreciation of these facts. Nor does the discussion in Sanford, If P , the Q, take cognizance of them. 18 This is the example of the use of the disjunctive syllogism given by David H. Sanford, exactly as it is stated in his classic book If P, then Q, p. 132.
Chapter 8: Natural Deduction
287
cannot accept relevance logic because “relevance logic rejects more traditionally valid forms than most of us meant to discard.”19 Exercise: Prepare a response to this portion of the speaker’s talk in which you give a critique of his argument. Can you defend relevance logic’s rejection of the disjunctive syllogism in the face of his argument? If so, write such a defense. Hint: You may wish to review the summary at the end of this section listing what is and what is not true about the disjunctive syllogism in the system E. Given what the speaker says about the conclusion that could be drawn were the vacuum cleaner not in the storage room, you might begin by evaluating both the case where the vacuum cleaner is in the storage room and the case where it is not in the storage room. You might then ask whether the speaker’s reasoning about the location of his vacuum cleaner is actually an example that is more appropriately represented formally by (((¬A ∨ B) ∧ A) → (A ∧ ¬A) ∨ B), which is valid in relevance logic, rather than an example of the disjunctive syllogism. Or, you may wish to develop another sort of critique of the speaker’s argument.
19
The overall argument against accepting relevance logic’s response to the paradoxes of implication attributed to the fictitious speaker is meant to be David H. Sanford’s argument given on pp. 131–132 of his If P, then Q.
9 Semantics for Relevance Logic: A Useful Four-Valued Logic
In this chapter we look at one semantics for the logic of entailment.1 There are many possible semantics for a given logic, and a variety of semantics has been developed for E and fragments of E. Different semantics are useful for different purposes. The four-valued semantics presented here is especially useful for certain types of automated reasoning applications.
9.1 Interpretations, Valuations, and Many Valued Logics As explained in Part 1 of this book, the truth value of a wff is determined by a valuation function. The valuation function comes into play after an interpretation of the atomic statements has been made. An interpretation associates a truth value with each statement letter; the valuation function takes as input these interpretations of the statement letters that occur in it and, for each wff, maps the wff to a truth value. Basically, the valuation is a calculation of the truth value of a wff based on the truth value of the statement letters, using the defining truth tables of the connectives that occur in the wff. Up to now, the set of truth values was a set of just two designated truth values (e.g., {T, F} or {0, 1}), and the truth tables for the connectives in classical logic reflected that fact. Recall also from Part 1 that a language has two parts: a syntax and a semantics; the two-valued valuation of wffs is part of the semantics. Semantics using other kinds of truth values, other kinds of truth tables, and other kinds of valuation functions have been developed. One that has been developed for the system E is a four-valued logic where the four truth values are the subsets of {T , F}. That is, the four truth values are the elements of the set {{ }, {T}, {F}, {T, F}}. Even to someone who insists that a proposition can only be either true or false, a semantics with more than two truth values can still
1
The information in this section is based in part upon “A Useful Four-Valued Logic: How a Computer Should Think,” from Anderson and Belnap’s Entailment, Vol. 2, section 81.
Chapter 9: Semantics for Relevance Logic
289
be useful. One might want to use a truth value to indicate that it isn’t known whether the statement is true or false. Such truth values are sometimes referred to as epistemic truth values; they reflect what one knows, rather than what is. The contrast between epistemic truth values and ontological truth values is associated with the contrast between a state of knowledge and a state of the world. An example drawn from real life is the confusion surrounding the reporting of the ballots cast in the US presidential election in 2000: even after all the ballots had been cast, people could (and did) distinguish between being told by an official tally that more ballots had been cast for Bush than had been cast for Gore, and the fact that more ballots had been cast for Bush than had been cast for Gore. Even if all the problems associated with determining what constituted casting a ballot for a particular candidate were settled, and thus that it would be fair to say that it was either true or false that there were more ballots cast in a certain district for Bush than for Gore, the possibility of contradictory reports is not thereby eliminated. As anyone who watched the news on the evening of the election can attest, there were contradictory reports. This kind of situation is not uncommon. There are various kinds of situations in which contradictory reports can arise. In general, using different methods of detecting something, or taking multiple measurements using a less than perfectly reliable method can present a user with inconsistent information. You might have one piece of evidence for a certain statement being true, and another for it being false. The use of epistemic truth values doesn’t require that the situation be one in which inconsistency may arise. The fact that there are such situations, though, means that there is some use for a logic that tolerates inconsistent reports. By use, we mean a logic we can use to draw inferences; by tolerate inconsistent reports, we mean a logic that can be used to draw useful inferences even when given inconsistent reports. We will be considering one kind of epistemic truth value assignment in which inconsistency is tolerated: the case of automated reasoning using wffs where the value assigned to a wff reflects what one has “been told.”
Exercise on 9.1 1.
(a) Identify a situation in which inconsistencies can arise for each of the following four activities or topics: sports, scientific work, medicine, and detective work.
290
Part 3: Philosophical Logic
(b) Think of a specific inconsistency that might realistically arise in each of the four situations you identified above. Clarification of what is being asked for: For the example in this section, the situation in which inconsistencies can arise was “media reporting of election results,” and “concurrent reports that Gore had won and that Gore had lost” was an example of a specific inconsistency that arose. Your examples may be more detailed if you like.
9.2 Contexts in Which This Four-Valued Logic Is Useful Rather than simply giving further examples, we here identify the general characteristics of the kind of situation in which this four-valued logic is useful. (i) The logic is to be used by an ARTIFICIAL REASONER (i.e., a computer). Because it is an artificial reasoner, using an unfamiliar logic is no disadvantage. Also, artificial reasoners do not have the computational or memory limitations of a human. (ii) The user is to DRAW INFERENCES. The user (computer) is also to fulfill the role of answering questions based on what it has been told. The artificial reasoner is sufficiently sophisticated that it does more than just retrieve data. When asked a question, it is to attempt to derive an answer. Thus it has a function of reasoning with (making deductions, i.e., drawing inferences from) the information it has been given. (iii) There is a THREAT OF INCONSISTENCY. There are multiple sources of information. No source is regarded as overriding any other. Thus, all information is to be kept. The threat of inconsistency is not to be dealt with by dismissing some of the information. Informally put, the user is to follow the principle: “Don’t throw information away!” Yet, the artificial reasoner is expected to carry out those tasks that can be carried out in spite of the inconsistency: it is expected to record the inconsistencies that it finds, and to continue reasoning in a sensible manner with respect to things unrelated to the inconsistent information. The expectation here is in sharp contrast to the situation with classical two-valued systems: if an inconsistency exists
Chapter 9: Semantics for Relevance Logic
291
when using classical logic, there are really no constraints on what the user might deduce, and such a reasoning system could not be relied upon to produce useful answers about any topic. (iv) The user is NOT A COMPLETE REASONER. We are here considering the case in which the computer does not attempt to resolve inconsistencies. There do exist automated strategies for belief revision, in which inconsistencies are eliminated; we are considering contexts in which this is not done. (v) The user is to BASE ANSWERS ONLY ON WHAT IT HAS “BEEN TOLD.” Even though the user is to draw inferences and respond to questions by reasoning from all the information available to it, the inferences drawn are to reflect the shortcomings in what it has been told, e.g., its answers should reflect that there is missing information and inconsistent information in what it has been told, and it is to employ methods that do not result in drawing inappropriate inferences due to these shortcomings in the available information.
9.3 The Artificial Reasoner’s (Computer’s) “State of Knowledge” We imagine the artificial reasoner, or computer, to operate as follows: (i) The computer receives assertions and denials of atomic sentences.2 Upon receiving an assertion, it is to mark the item “told true.” Upon receiving a denial, it is to mark the item “told false.” (ii) The computer has exactly four possibilities for any particular atomic sentence: told true but never told false, i.e., {“told true”}; told false but never told true, i.e., {“told false”}; never told true and never told false, i.e., { }; told true and told false, i.e., {“told true,” “told false”}. 2 Methods for dealing with compound (i.e., non-atomic) inputs such as conjunctions of atomic sentences have been developed (see Section 81.3 of Anderson and Belnap, Entailment, Vol. 2, p. 524ff.). For practical reasons, here we show only how atomic inputs are dealt with.
292
Part 3: Philosophical Logic
(iii) We give the four possibilities in (ii) the following proper names, respectively: T, F, None, Both. We refer to the set of these four truth values as 4. That is, the set we call 4 is {T, F, None, Both}.
(iv) The computer may have questions put to it, which it is to answer. In terms of answering a question of the form “ p?,” the computer answers as follows: if the item has the value T then answer “yes”; if the item has the value F then answer “no”; if the item has the value None then answer “don’t know”; if the item has the value Both then answer “yes and no”.
(v) As the computer receives additional information, it incorporates it into the information it already has. It does this by representing what it has been told by a setup. A setup maps atomic sentences into the set 4 = {T, F, None, Both}, according to the following rules: If an atomic formula p is affirmed, the setup is revised as follows: If the current value of the atomic formula p is None, it is mapped to T. If the current value of the atomic formula p is Both, it is mapped to Both. If the current value of the atomic formula p is T, it is mapped to T. If the current value of the atomic formula p is F, it is mapped to Both. If an atomic formula p is denied, the setup is revised as follows: If the current value of the atomic formula p is None, it is mapped to F. If the current value of the atomic formula p is Both, it is mapped to Both. If the current value of the atomic formula p is T, it is mapped to Both. If the current value of the atomic formula p is F, it is mapped to F.
Thus we envision that the database we are working with can be represented by a list of statements. Each statement is assigned an element of the set 4, which reflects the computer’s current state of knowledge (in terms of what it has been “told”). For the example of keeping track of what has been reported about statements as to which presidential candidate has won a certain state, we can visualize the hypothetical database of the example as shown in Figure 9.1. (vi) Notice that as information is added, there are certain directions that the assignments move in, and others in which it does not: From None, an assignment can move “up” to states in which it
293
Chapter 9: Semantics for Relevance Logic
Figure 9.1 Chart representing a hypothetical setup.
Statement
Truth Value Assigned
Bush won NC
{told true} = T
Gore won NC
{told false} = F
Nader won NC
{told false} = F
Bush won FL
{told true, told false} = Both
Gore won FL
{told true, told false} = Both
Nader won FL
{told false} = F
Bush won CA
{ } = None
Gore won CA
{ } = None
Nader won CA
{ } = None
has more information: i.e., to T or to F (but never down from T or from F to None). From T or F, an assignment can move “up” to Both, but never down from Both to T or to F. So we see that the values assigned are ordered by a relationship we might intuitively describe as a relation of “more information.” In terms of a diagram (see next page), in which a node corresponds to an element of the lattice. We will define such diagrams, called Hasse diagrams, a little later. We will see why the mathematical structure represented in the diagram is called the “approximation lattice A4.” The diagram reflects the rules in (v). The rules in (v) determined how the computer is to establish the setup and update it as it receives more information (in the form of affirmation or denials of atomic sentences). With these updated setups, the computer is able at any point in time to answer questions that can be replied to in terms of its current state of knowledge about an item that appears as an atomic statement in its database. What we will do next is to extend the question-answering ability developed so far. In addition to answering questions about the computer’s state of knowledge of atomic statements, the computer is to
294
Part 3: Philosophical Logic
Both More Information ↑
T
A4
F
None Approximation Lattice A4 In this Hasse diagram, upwards on the graph represents higher in the ordering. Here upwards is the relation of set inclusion, if you recall that: T is the set {“told true”}, that is, “told true and not ever told false.” F is the set {“told false”}, that is, “told false and not ever told true.” Both is {“told true,” “told false”}, that is, “told true and told false,” and None is { }, the empty set, that is, “not ever told true nor ever told false.”
answer questions about its state of knowledge of compound sentences formed from the atomic sentences and the connectives negation (¬), conjunction (∧), and disjunction (∨).
Exercises on 9.3 1.
Imagine that an automated reasoner (computer) is being employed to help keep up-to-date data about flood damage after a disaster. The computer is to answer requests for information about whether there has been flood damage at the following sites: Walnut School, Kent School, Chapman School, Negri School, Schupp School, Beechwood Market, Walnut Market, Yak Market, and Merry Market. It will be queried every half hour by a news anchor.
295
Chapter 9: Semantics for Relevance Logic
Draw a table showing the computer’s state of knowledge each time it is queried by the news anchor. It receives the information at uneven intervals, depending on when observers have new information to report. The incoming reports received from various observers are as follows: At 1:00
At 1:08
At 1:16
At 1:38
At 1:50 At 2:08
2.
Walnut School: Flood Damage Walnut Market: No Flood Damage Beechwood Market: Flood Damage Chapman School: No Flood Damage Beechwood Market: Flood Damage Yak Market: No Flood Damage Schupp School: Flood Damage Negri School: Flood Damage Chapman School: No Flood Damage Walnut Market: Flood Damage Negri School: Flood Damage Walnut School: No Flood Damage Chapman School: Flood Damage Schupp School: Flood Damage Beechwood Market: Flood Damage Walnut Market: Flood Damage Beechwood Market: Flood Damage
Explain how your table in Exercise 1 illustrates the point that more information means going upwards on the diagram of the approximation lattice A4 given in this section.
9.4 Negation in This Four-Valued Logic Suppose the computer has the atomic sentence p in its setup and is asked the question “¬ p?” How should it reply? Using an intuitive notion of what the values indicate in terms of being “told” whether something is true or false, and our desiderata “don’t throw information away!,” we would get:
If p has the value:
None
T
F
Both
¬ p should be assigned the value:
None
F
T
Both
Reflection on the above truth table for negation along with the diagram of the approximation lattice A4 displayed in the previous section
296
Part 3: Philosophical Logic
yields the following observations: if the computer was in a state of information for p at a certain point in its updating process, it did not assign a state of information that was lower in A4 (i.e., “lower” meaning “less information”) to ¬ p. That is, if the computer was told only one of “told true” or “told false” regarding p, it was told at least either “told true” or “told false” regarding ¬ p; if it was assigned Both for p, which is “more” information than T or F, then it did not assign a value corresponding to any less information for ¬ p, i.e., ¬ p is assigned Both if p is. (The reader should convince herself or himself of the preceding.) It is natural at this point to ask whether there is a principle here we can generalize. We will address this question a little later, and propose one, but the reader is encouraged at this point to try his or her own hand at identifying one, to gain a sense of the patterns involved as the computer incorporates new information into what we might call its “state of knowledge” (Exercise 2 below). We can think of negation as a function from the set 4 = {None, F, T, Both} onto the set 4 = {None, F, T, Both}. I.e., the function maps the set 4 onto itself. The observations above regarding the map for negation (about the “up” and “down” directions in the lattice A4) hint at general principles about how such mappings should behave, i.e., which ones are the good ones and which ones are not as good. These have been given a mathematical formulation. The mathematical structure used to capture these intuitions is a lattice. We need some definitions in order to be able to formulate the corresponding mathematical statements.
Exercises on 9.4 1.
In this section, it was stated that “we can think of negation as a function from the set 4 = {None, F, T, Both} onto the set 4 = {None, F, T, Both}.” Is it possible to describe negation in classical two-valued logic as a function? If so, provide such a description. If not, explain.
2.
Consider the following proposed principle concerning negation in this four valued logic: ¬A should be marked with at least “told true” just in case A is marked with at least “told false.” Is this principle consistent with the truth table for negation given in this section? If so, does this principle determine all the truth value assignments in the table in this section? Discuss.
Chapter 9: Semantics for Relevance Logic
297
9.5 Lattices: A Brief Tutorial One way to define a lattice is in terms of a set and a relation on it that partially orders the set. We simply presented the approximation lattice A4 in a previous section without defining what a lattice was or explaining why it was called an approximation lattice. In this section, we first define the terms that will allow us to define the notion of a lattice, and illustrate it with a variety of examples. This section culminates with a formal definition of a lattice. Definition. A set is said to be partially ordered by a relation if the relation is a binary (two-place) relation on the set satisfying reflexivity, antisymmetry, and transitivity. (i) satisfies reflexivity if, for any a in the set, a a. (ii) satisfies antisymmetry if, for any a and b in the set, a b and b a imply a = b. (iii) satisfies transitivity if, for any a and b in the set, a b and b c imply a c. A partially ordered set is sometimes called a poset. Two elements a and b of a poset are said to be comparable if either a b or b a. Otherwise, the two elements are said to be incomparable. In some posets, every pair of elements is comparable, but there are also posets in which not every pair is comparable. There may be many different relations defined on the same set. Some of these relations might be partial orderings while others are not. That’s why a partial order (and hence a lattice that is defined in terms of it) is defined in terms of a set and a relation on the set. Sometimes people just refer to the set when the relation is implied by the context. It is important to keep in mind, however, that the relation is part of the definition of a particular partially ordered set, too, even if it is not always made explicit. We will be using two different partial orders on the set 4, so it will be important in what follows to be clear about which partial order (and hence which lattice) is being referred to. In order to better understand the notion of a partial order, we present a variety of examples of partially ordered sets: Example 1. Let the set be the set N of all the natural numbers. Let the binary relation be the relation “is less than or equal to” (i.e., ≤). It is straightforward to verify that this is a partial ordering by verifying that (i), (ii), and (iii) hold. Example 2. Let the set be the set of subsets of a certain set S. (In set theory, this is known as the “powerset” of the set S and is designated
298
Part 3: Philosophical Logic
by “P(S).”) Let the relation be the relation of set inclusion, i.e., ⊆. This example is abstract, but it can be illustrated easily with a concrete example. Take an example of a set of a few elements, write out all the subsets, and then show that the relation of set inclusion satisfies (i), (ii), and (iii). You can also sketch the relation graphically in two dimensions, as we did in showing the relation of “less information than” for the set 4. The set of letters of the alphabet is an example whose powerset you would not want to have to write out, but it is a useful example to think about. Thinking through Examples 1 and 2 will develop an understanding of a partially ordered set. For the specific purpose at hand, let’s look at the subsets of the set of two truth values: 2 = {T, F}. The powerset of this set is a set of subsets: i.e., P(2) ={ { }, {T}, {F}, {T, F}}. (The order doesn’t matter here, as we are merely listing all the subsets of 2.) Note that { } denotes the empty set and is a subset of every set. If we were to write out the set inclusion relations, we would find that what we have is very similar to the set 4 with the ordering of “less information than”! Example 3. A set of propositions, along with the relation of “is entailed by” is a partially ordered set. Convince yourself that “is entailed by” satisfies (i), (ii), and (iii). (Note: A proposition might have more than one wff that expresses it. If two wffs are equivalent, we say that they express the same proposition.) A lattice is defined in terms of a partially ordered set. A lattice is a special case of a partially ordered set. To give the definition of a lattice, we need first to define the terms upper bound, lower bound, least upper bound, and greatest lower bound. Definition. An upper bound of a subset B of a partially ordered set A is an element u of A such that for every b in B, b u. Notice that what the definition states about u’s membership is that it is an element of A. It might be in B, or it might not! Example 4. Let the set A be the set of all the natural numbers, and the relation be the relation of “less than or equal to.” Let the subset B be the set {2, 3, 8, 17, 201}. Is 17 an upper bound of B? NO. What about 201? YES. What about 202? Also YES. What about 437? Also YES. (There are many upper bounds of B in this example.) Example 5. Let the set A be the powerset of all the letters of the alphabet; i.e., A = P(L), where the set L is the set of all the letters
Chapter 9: Semantics for Relevance Logic
299
of the alphabet), and let the relation be the relation of set inclusion. Elements of A are thus sets of letters. Let B be the subset of A that consists of all the sets of letters that can be used to form a word (you are allowed to use the same letter more than once in the word). For example, the following elements of A would be in B: {a, t}, {c, a, t}, {b, a, t}, {a}, {a, d}, {e, l, p, h, a, n, t}, whereas the following subsets of A would not be in B: {q}, {k, z}, {b, x, n}. What is an upper bound of B? Well, the set L, the set of all the letters of the alphabet, would be an upper bound, because every set in B would be included in it. And, the set L is in A, because the set A is the powerset of L, that is, the set of all the subsets of L, and so A includes the set L itself. Notice that this upper bound of B is not in B, however. Just out of interest, we might ask: are any other upper bounds in Example 5? In Example 5, L is the only upper bound. Since every member of B must be included in the upper bound, the upper bound must include all letters of the alphabet that are used in any word whatsoever. So it has to include all the letters included in L. There’s no other set in A that includes all the letters included in L. Sometimes there are many upper bounds; sometimes, as in this case, there is only one. Example 6. Let the set A be the set of fractions of the form 21n where n is a natural number, with the ordering “is less than or equal to.” Then A is 1 1 1 , 32 , . . .}. Let B be the subset of A { 41 , 18 , 128 }. Then the set {1, 12 , 14 , 18 , 16 1 1 upper bounds of B are 4 , 2 , and 1. Notice that all the upper bounds are in A, but not all of them are in B. We now define least upper bound. Definition. The least upper bound (l.u.b.) of a subset B of a partially ordered subset A is an upper bound u such that if u is also an upper bound of B, then u u . So, although upper bounds are not in general unique, least upper bounds are unique. The proof that least upper bounds are unique is straightforward. Suppose u and u are both least upper bounds. Then by definition of a least upper bound, if u is a least upper bound, u u and if u is a least upper bound, then u u . By antisymmetry of the relation , u = u . QED. Let us revisit our examples of upper bounds, and figure out what the least upper bound is in each example. •
Example 4: above: There were an infinite number of upper bounds of B. But the least upper bound of B is 201. It just so happens that 201 is in B as well as in A.
300
Part 3: Philosophical Logic
•
Example 5 above: The upper bound of B was the set L, the set of all the letters of the alphabet. There was only one upper bound, and it is thus the least upper bound. In this example, unless there is a word that uses all the letters of the alphabet, the least upper bound of B is not in B. If there is a word that uses all the letters of the alphabet, then the least upper bound of B would be in B. • Example 6 above: There were three upper bounds of B: 1 , 1 , and 4 2 1. The least upper bound of B is 14 . In Example 6, too, the least upper bound of B is in B as well as in A. However, not every set has an upper bound, much less a least upper bound. An example of a set with no upper bound is the set of natural numbers. Similar to how we defined upper bounds and least upper bounds, we define lower bounds and greatest lower bounds. Definition. A lower bound of a subset B of a partially ordered set A is an element l of A such that for every b in B, l b. Let’s see what the lower bounds of B would be for some of the examples given earlier. •
Example 4 above: A is the set of all the natural numbers, and B is the set {2, 3, 8, 17, 20}. B has lower bounds in A of 0, 1, and 2. • Example 5 above: A is the powerset of the set L of letters of the alphabet, and B is the set of all sets of letters that make up a word. What lower bounds does B have? Well, it has to be contained in every member of B. The sets in B don’t have any member in common, because there is no letter of the alphabet that occurs in every word of the English language. Thus, the only lower bound of B is the empty set { }. • Example 6 above: A was the set of fractions of the form 1n , 2 1 }. B has lower where n is a natural number. B is the set { 41 , 18 , 128 1 1 , 256 , and so on. There are an infinite number of bounds of 128 lower bounds of B in this example. (Notice that not every set has a lower bound that is contained in the set itself. An example is the set of positive real numbers less than 1; it has a lower bound (0), but this lower bound is not in the set of positive real numbers.) The greatest lower bound is defined in a manner similar to how the least upper bound was defined. Definition. The greatest lower bound (g.l.b.) of a subset B of a partially ordered subset A is a lower bound l such that if l is also a lower bound of B, then l l.
Chapter 9: Semantics for Relevance Logic
301
Applying this definition, we can identify the greatest lower bounds for the examples given above. •
Example 4 above: B has lower bounds 0, 1, and 2. The greatest lower bound is 2. • Example 5 above: The only lower bound that B has is { }, so { } is the greatest lower bound of B. In this example, the g.l.b. of B is not in B. • Example 6 above: B has an infinite number of lower bounds, but 1 . In this example, the g.l.b. of B is the greatest lower bound is 128 in B. When we consider subsets B of A that consist of only two elements of A, there is a special term for the l.u.b.’s and the g.l.b.’s. They are called joins and meets of the two elements in B, respectively. Definition. Suppose x and y are elements of A. We define the join of x and y as the least upper bound of the set {x, y}. The join of x and y is denoted by “x y.” We define the meet of x and y as the greatest lower bound of the set {x, y}. The meet of x and y is denoted by “x y.” You might wonder: Do the join and meet of two elements of a partially ordered set A always exist? The answer is no, not always. For the special case in which any two elements of the partially ordered set have both a meet and a join, we call the partially ordered set a lattice. We end this section with the definition of a lattice, as promised. Definition. A lattice is a non empty partially ordered set such that any two of its elements have both a join and a meet.
Exercises on 9.5 1.
Let A be the set of the letters of the alphabet, i.e., A = {a, b, c, d, . . . , w, x, y, z}
(9.1)
Let S be the powerset of A, that is, the set of all subsets of A. (So, for instance, the set of letters in the word “cat,” {c, a, t}, is an element of S.) The relation of set inclusion is a binary relation on S and, in fact, the set S is partially ordered by the relation of set inclusion. Answer the following questions: (i) Show how the following elements of S are ordered by the relation of set inclusion: the letters in the word “a,” the letters in the word “act,” the letters in the word “tack,” the letters in the word “tacky,” and the letters in the word “dog.”
302
Part 3: Philosophical Logic
(ii) What is the greatest lower bound of the set of letters in the word “act” and the set of letters in the word “tack”? (iii) What is the least upper bound of the set of letters in the word “a” and the set of letters in the word “tack”? (iv) What is the least upper bound of the set of letters in the word “a” and the set of letters in the word “dog”? (v) What is the greatest lower bound of the set of letters in the word “tacky” and the set of letters in the word “dog”? 2.
Show that the approximation lattice A4 presented earlier (in Section 9.3) satisfies the mathematical definition of a lattice presented at the end of this section.
9.6 Finite Approximation Lattices and Scott’s Thesis A “finite approximation lattice” is a finite lattice that satisfies an additional, nonmathematical condition in addition to satisfying the definition for a lattice. The nonmathematical condition is: it is appropriate to read x y as “x approximates y.” If the condition is nonmathematical, why mention it? The condition is used to capture the intuitive notion of “less information.” Recall the remark that the notion of some of the values of 4 having “more information” than others might be useful in implementing the guidance: “Don’t throw information away!” The term “approximation lattice” is used in Scott’s Thesis. Scott’s Thesis is a thesis about choosing the functions that we ought to consider as candidates for mapping one approximation lattice to another (including the case of mapping an approximation lattice onto itself). Recall that the function of negation mapped the set 4 = {None, F, T, Both} onto itself. Now that we have the mathematical structure of a lattice, we can consider the lattice consisting of the set 4 and the relation of “less information than.” It turns out that Scott’s Thesis gives guidance as to which functions from that lattice onto itself are the “good” ones. If we restrict ourselves to finite lattices, Scott’s Thesis is as follows. Scott’s Thesis (for finite lattices).3 In the presence of finite lattices A and B naturally thought of as approximation lattices, pay attention only to the monotonic functions from A into B, resolutely ignoring all other functions as violating the nature of A and B as approximation lattices. 3 The statement called “Scott’s Thesis” given here is a revised version of the statement in Anderson and Belnap to which they gave the name “Scott’s Thesis.” We have revised it here only to make it specific to finite lattices; they give a more general form of this statement and explain that both the name of the thesis and the formulation are their own.
303
Chapter 9: Semantics for Relevance Logic
A monotonic function g from the finite lattice A into the finite lattice B is one such that if a b in the lattice A, then g(a) g(b) in the lattice B. Belnap remarks that Scott’s Thesis is analogous to Church’s Thesis4 in that it is a statement connecting an informal notion and a formal one, just as Church’s Thesis is. In Part 2, the status of Church’s Thesis was explained; we saw that there is “evidence for” the fact that Church’s Thesis is true, i.e., it is the right way to formalize an informal concept, but there can be no mathematical proof for a statement of this kind. Now, let’s see what guidance Scott’s Thesis would have given us in the case of negation, and compare it with what we actually concluded earlier about how the negation function should work. First, we note that the set 4 = {None, F, T, Both} constitutes a lattice under the ordering relation “approximates the information in,” as follows:
Both is at the top and contains the most information. Both {“told true,” “told false”}
T {“told true”}
A4
F {“told false”}
None {} None is at the bottom and contains no information. Approximation Lattice A4 “Approximates the information in” is a relation going “uphill” in this Hasse diagram.
4
Anderson and Belnap, Entailment Vol. 2, p. 509.
304
Part 3: Philosophical Logic
This is the approximation lattice A4 presented previously, with which you are already familiar. (Recall that we are defining a particular lattice in terms of a particular set and a particular relation on that set; here the relation is “approximates the information in.”) What Scott’s Thesis tells us is that we should consider only monotonic functions. In our case, the lattice A and the lattice B are the same lattice, i.e., the lattice A4. Incorporating this fact and Scott’s Thesis together: the kinds of functions suitable for connectives are monotonic functions that map the approximation lattice A4 onto itself.
9.7 Applying Scott’s Thesis to Negation, Conjunction, and Disjunction We now apply this guidance to identifying truth tables in our fourvalued semantics for the connectives negation, conjunction, and disjunction. Applying Scott’s Thesis to negation, to figure out the mapping from the lattice A4 onto itself, we would start out first specifying that the negation function would map the value of F to T and T to F, based on the definitions of classical truth table considerations applied to “told true” and “told false.” Let us denote the negation function by g; then g(F) = T and g(T) = F. Then, the advice to consider only monotonic functions would dictate our choices, as follows: Since None F, one requirement on g is that g(None) g(F), i.e., that g(None) T. • Since None T, another requirement on g is that g(None) g(T), i.e., that g(None) F. •
Thus, we can conclude that the only assignment that satisfies both these requirements has to also satisfy g(None) = None. By similar reasoning, since F Both, g(F) g(Both), i.e., T g(Both). And, since T Both, g(T) g(Both), i.e., F g(Both). The only assignment that satisfies both the requirement that T g(Both) and the requirement that F g(Both) is g(Both) = Both. So we see that applying Scott’s Thesis gives the same result that we got earlier. (This is “evidence” in favor of Scott’s Thesis.) An observation that the reader may find helpful is this: graphically, we can think of the negation function as mapping the values of the approximation lattice A4 onto itself: Both gets mapped to itself, None gets mapped to itself, T gets mapped to F, and F gets mapped to T. Visually, we can see that although this shifts some elements
Chapter 9: Semantics for Relevance Logic
305
of the lattice around, the order between any two elements of A4 is preserved (under the ordering relation “approximates the information in”). The same method can be applied to conjunction and disjunction. We will skip the details here because they are rather tedious, and merely indicate what the general approach would be. First, we can stipulate that F ∧ F gets mapped to F, as do T ∧ F and F ∧ T, and that T ∧ T gets mapped to T. Then, we look at what requirements Scott’s Thesis provides. Since these relations are two-place relations, we treat one row (or, alternatively, column) of the conjunction table at a time. That is, let’s consider the one-place function of (None ∧ _ _ _ _). We can then go through the same steps as we did for the one-place function of negation. That addresses the first row of the table for ∧. Then, consider the one-place function (F ∧ _ _ _ _), which addresses the second row of the table for conjunction. Then consider (T ∧ _ _ _ _), and finally (Both ∧ _ _ _ _). Finally, we collect all the requirements, and see which values are determined. Even after using Scott’s Thesis, which amounts to restricting ourselves to monotonic functions only, though, we are still left in the situation that not all the entries in the table are determined. It turns out that the entries in the table are determined, however, if we specify a rather minimal relationship between conjunction and disjunction. If a and b are elements of 4 = {None, F, T, Both}, we stipulate that a ∧ b = a iff a ∨ b = b, and a ∧ b = b iff a ∨ b = a. The choice of relation between ∧ and ∨ here is based upon how the two elements of 2 are related by conjunction and disjunction in classical logic. This narrows the possibilities for entries in the tables and they are fully determined. In Table 9.1 and 9.2 , it is indicated which entries were determined by monotonicity requirements alone, and which ones were determined by considerations based upon classical truth tables. The remaining entries — those still undetermined after using truth table considerations and monotonicity — were determined by the use of the relation between ∧ and ∨ just stipulated above. Tables 9.1 and 9.2 are for the truth values to be assigned to wffs formed from atomic wffs by the connectives ∧ and ∨. Thus, these tables provide a method for the computer to construct answers to questions about formulas that are formed from the atomic statements in its setup by the connectives of negation, conjunction, and disjunction.
306
Part 3: Philosophical Logic
Table 9.1 Truth Tables for Wffs with the Connective for Conjunction. ∧
None
F
T
Both
None
None
F
None
F
(by monotonicity) F
F
T
None
(by monotonicity) F
F
(by classical tt)
(by classical tt)
F
(by (by classical monotonicity) tt) Both
F
F
F
T
Both
(by classical tt)
(by monotonicity)
Both
Both
(by (by monotonicity) monotonicity)
Table 9.2 Truth Tables for Wffs with the Connective for Disjunction. ∨ None
None
F
None None (by (by monotonicity) monotonicity)
T
Both
T
T
F
None (by monotonicity)
F (by classical tt)
T Both (by classical (by tt) monotonicity)
T
T
T (by classical tt)
T (by classical tt)
T
Both
T
Both
T
Both (by monotonicity)
Exercises on 9.7 1.
Suppose the epistemic state of the computer is represented by the following single setup (below the setup is represented by a horizontal
307
Chapter 9: Semantics for Relevance Logic
table rather than a vertical one just to save space): A
B
C
D
E
F
G
H
Both
None
T
T
Both
F
None
T
Suppose the computer is asked the questions below. Say whether it answers YES, NO, DON’T KNOW, or YES&NO. (Hint: This is just a matter of using the truth tables for our four-valued logic.) F ? ¬F ? A? ¬A? ¬B? (A&B)? (A ∨ B)? (C&D&H)? (C&D&F )? (G&H)? (G ∨ H)? ¬(G ∨ H)? 2.
Suppose we had wanted to define negation as follows: p
None
F
T
Both
¬p
F
T
F
Both
Is this another possibility permitted by Scott’s Thesis, or would Scott’s Thesis have ruled out this choice? Prove your answer. 3.
Could Scott’s Thesis apply in determining the negation function for classical logic? How about the functions for conjunction and disjunction in classical logic? Why or why not?
9.8 The Logical Lattice L4 The tables given just above for the truth values of conjunction and disjunction determine a lattice consisting of the elements of 4 if we order the elements of the set 4 according to the relation defined as follows: •
The “join” (l.u.b.) of any two elements is their disjunction. The “meet” (g.l.b.) of any two elements is their conjunction. • The informal idea of the relation is that uphill is “more true.” •
The result is another, different lattice. How can we tell that it is a lattice? The resulting partial order on the set 4 is determined to be a lattice because the meets and joins of any two elements of 4 using this relation are also in 4.
308
Part 3: Philosophical Logic
T is at the top; it is the “most true and least false.” T
None
L4
Both
F F is at the bottom; it is the “most false and least true.” Logical Lattice L4 “Greater presence of true and absence of false” is a relation going “uphill” in this Hasse diagram.
The order here is not a reflection of how much information is contained in the truth value, but, roughly speaking, of how much evidence of being “true” there is for the statement to which that truth value is assigned. The strongest evidence for the truth of a statement would be T = {“told true”}, that is, told true and not told false. The worst situation in terms of evidence for the truth of a statement would be if it were assigned F = {“told false”}, that is, told false and not told true. Intermediate between these two are the situations of being told nothing about the statement i.e., None = { }, or being told both that the statement was true and that it was false, i.e., Both = {“told true,” “told false”}. Let us denote the ordering on logical lattice L4 by the symbol “≤” to reflect that the ordering that yields the logical lattice L4 is different from the set inclusion relation that orders the set 4 and yields the approximation lattice A4. Then, a ≤ b says that a is lower than b in the lattice ordering, and here that means roughly that “b is more true and/or less false than a. Since a ∧ b is the meet of a and b, and a ∨ b is the join of a and b, we can use the lattice as a graphical way to tell us how to determine
Chapter 9: Semantics for Relevance Logic
309
truth values of conjunctions and disjunctions of atomic formulas. You can read right off the lattice the fact that F conjoined with anything is F and that T disjoined with anything is T. You can check that this lattice encapsulates all the assignments in the tables for ∧ and ∨ given just above. Thus, one can simply refer to the Hasse diagram for the lattice L4 in lieu of referring to the tables for conjunction and disjunction. The lattice ordering expressed by the Hasse diagram induces a semantics for all wffs in the language, defined inductively as follows: s(A ∧ B) = s(A) ∧ s(B), s(A ∨ B) = s(A) ∨ s(B), s(¬A) = ¬s(A), where, here, A and B can be any wffs whatsoever; i.e., they are not restricted to atomic wffs.
9.9 Intuitive Descriptions of the Four-Valued Logic Semantics The semantics defined above does not rely on any particular facts or insights about the meaning in the natural language of the connectives “¬,” “∧,” or “∨.” A formal logic is neutral about how the English words for “not,” “and,” or “or” ought or ought not to be interpreted. The semantics presented in the previous sections could be presented in an abstract manner without reference to how the connectives are used in any natural language. However, since the intent of philosophical logic is often to discern the structure of good reasoning at least in part from understanding discourse in natural languages, it may be of interest to reflect upon the intuitive basis for how questions ought to be answered on the basis of information reported to our artificial reasoner, for the rules our computer employs. We already examined the intuitive notions we drew upon for how negation ought to work. There is also a nice way to express the intuitions underlying the above rules for conjunction and disjunction. Below are summaries of intuitive statements of the rules expressed in the tables and by the logical lattice above. The intuitive rules below are expressed in terms of how much information a user has, so the phrase “at least” is with respect to the ordering we used for the approximation lattice A4. We emphasize once again that the logical lattice L4 and the approximation lattice A4 are different lattices, even though they have the same elements. They are different partial orders on the same set. To make this clear, and to exhibit what is the same and what is different, a diagram showing the lattice A4 and the lattice L4 side by side is provided
310
Part 3: Philosophical Logic
in Figure 9.2; studying Figure 9.2 until you can produce it yourself and explain it is strongly recommended.
Both
T
A4
None
T
Both is at the top and contains the most information.
F
None
None is at the bottom and contains no information.
T is at the top; it is “most true” and “least false.”
Both
L4
F
F is at the bottom; it is “most false” and “least true.”
Approximation Lattice A4
Logical Lattice L4
“Approximates the information in” is a relation going “uphill” in the diagram.
“Presence of true and absence of false” is a relation going “uphill” in the diagram.
Notice that the lattice A4 and the lattice L4 are different partial orders on the same set 4 ({T, F, Both, None}); they are ordered by different relations. The elements of the set 4 are as follows: T is {“told true”}, that is, “told true and not ever told false.” F is {“told false”}, that is, “told false and not ever told true.” Both is {“told true,” “told false”}, that is, “told true and told false.” None is { }, the empty set, that is “not ever told true nor ever told false.”
Figure 9.2. The Lattice A4 and the Lattice L4.
The intuitive statements of the rules corresponding to the information in the logical lattice L4 and Tables 9.1 and 9.2 are as follows: •
Conjunction (of wffs A and B): • Mark (A ∧ B) with at least told True just in case both A and B have been marked with at least told True.
Chapter 9: Semantics for Relevance Logic
311
• Mark (A ∧ B) with at least told False just in case at least one of A and B have been marked with at least told False. •
Disjunction (of wffs A and B): • Mark (A ∨ B) with at least told True just in case at least one of A and B have been marked with at least told True. • Mark (A ∨ B) with at least told False just in case both A and B have been marked with at least told False.
The two rules for conjunction completely determine how to mark conjunctions of any two wffs that have received truth value assignments, and the two rules for disjunction completely determine how to mark disjunctions of any two wffs that have received truth value assignments. These rules are not needed by an artificial reasoner, of course; the tables suffice for an artificial reasoner. They may be helpful for human reasoners, though. The fact that these intuitive rules give results that coincide with those in the table, which were motivated using different principles than these rules, may also be taken as some sort of confirmation that the choices made in defining the connectives via the tables presented in the previous section are well motivated.
Exercise on 9.9 1.
Recall that a lattice is a special case of a partially ordered set, and a partially ordered set is defined in terms of a set and a relation that induces a partial order on it. Let 2 = {told true, told false}. Let 4 be the powerset of 2, i.e., 4 = { { }, {told false}, {told true}, {told true, told false} }. We named the elements of 4 as follows: None = { }; F = {told false}; T = {told true}; Both = {told true, told false}. (i) Draw the lattice A4, the “approximation lattice” for the set 4, without referring to the diagrams provided in the text. Also state the relation on 4 for this lattice. Prove that A4 satisfies the definition of a partially ordered set. (ii) Draw the lattice L4, the “logical lattice” for the set 4, without referring to the diagrams provided in the text. Also state the relation on 4 for this lattice. Prove that L4 satisfies the definition of a partially ordered set. (iii) Does the relation in (ii) induce a partial order on the set 2? Is it a lattice? Support your answers.
312
Part 3: Philosophical Logic
9.10 Inferences and Valid Entailments We now have a semantics that enables our automatic reasoner to assign truth values to any wff given the truth values of all the atomic formulas in those wffs. But that’s not all we need to do logic. We want our artificial reasoner to be able to make inferences, and we wish it to make only good inferences. Thus, we need to be able to evaluate inferences made in this four-valued semantics. It is not immediately straightforward how one ought to use this four-valued logic to evaluate inferences, i.e., to distinguish between valid and invalid inferences. In classical propositional logic, we used truth tables to find truth valuations of wffs, and an inference was valid if it was impossible for the conclusion of the inference to be false and the premises to be true. As discussed in Part 1, there is a way of using truth tables to determine whether or not an inference in classical propositional logic was valid: If an implication consisting of an antecedent that is the conjunction of the premises and a consequent that is the conclusion was a tautology, then the inference was valid. The definition of a tautology in classical propositional logic is a wff that is true for all interpretations. Put informally, a tautology “comes out as T on all the rows of the truth table.” Might we apply that criterion here? The answer is no. That criterion doesn’t apply for our four-valued logic, mainly because the value T in our four-valued logic here is “told true” and it has neither the significance nor the function that the value T of “truth” has in twovalued classical propositional logic. There are no wffs that are always assigned the value “True” of the two-valued classical propositional logic semantics. Another difference between this logic semantics and the two-valued logic semantics for classical propositional logic is that in our logic semantics, there is not a truth value assigned to an entailment which is determined by the truth tables for the other connectives. As discussed in the section on material implication and the disjunctive syllogism, the entailment connective is not reducible to the other connectives, whereas material implication is reducible to the other connectives (i.e., A → B is equivalent to not-A or B if the arrow is taken to be the connective for material implication). Yet, even though we are not able to assign truth values to an entailment in the same manner as we can for the other connectives using the four-valued truth tables developed above, we might still ask: Is there a way to generalize the criterion for valid entailments so that there is an appropriate criterion that does apply? The answer to that question is yes.
Chapter 9: Semantics for Relevance Logic
313
The line of reasoning that leads to the generalization of the criterion goes as follows. One way to describe the criterion for a valid inference in classical logic is that a valid inference never “takes you from truth to falsehood”; by this is meant that, if the premises of an inference are true, and the inference has a valid form, the conclusion of the inference will not be false. Here we have four truth values, none of which functions exactly like the truth value True of two-valued classical logic, and none of which functions exactly like the truth value False of two-valued classical logic. However, the logical lattice does provide a partial ordering on our four truth values which, we pointed out above, could be thought of as “more true than and less false than.” Speaking loosely, in terms of the graphical depiction of the logical lattice L4 shown above, going “upwards” (vertically), or “uphill,” in the logical lattice corresponds to being closer to the truth, i.e., to being “more true.” More accurately, since we are talking about epistemic values, we can think of these truth values in terms of evidence of being true or evidence of being false, or “the presence of the true and the absence of the false.” It turns out that an appropriate criterion for a valid inference in this four-valued logic is, to put it informally, that an inference never take one farther away from the truth, i.e., never take one to a conclusion that is “less true” or “more false” than the premises. It is this criterion that functions as an appropriate analogue of the criterion in classical logic of “never taking one from truth to falsehood” in our four-valued logic. In terms of the graphical depiction of the logical lattice L4 above, the criterion for a valid inference is that the inference only “go uphill” in L4, if it moves at all. The criterion that the inference never takes one from a place higher in the lattice L4 to lower in the lattice L4—that an inference “never goes downhill” — is the generalization we seek. We can use the criterion that a valid inference is one that never takes one downhill in the logical lattice L4 (along with the truth tables for negation, disjunction, and conjunction in our four-valued logic) to sort through entailments and identify some valid ones and some invalid ones. Evaluating whether or not a wff whose main connective is the arrow is the wff of a valid entailment becomes a completely straightforward matter of computation. The computation is carried out using the tables above to compute the truth values of the antecedent and the consequent of a given conditional for all possible combinations, and then using the logical lattice to determine whether the consequent is either the same truth value or is higher in the lattice L4 than the antecedent. We work out a simple example with one propositional variable to illustrate:
314
Part 3: Philosophical Logic
Example: p ∧ ¬ p → p p
¬p
p ∧ ¬p
F Both None T
T Both None F
F Both None F
Antecedent/ Consequent of p ∧ ¬ p → p F/F Both / Both None/None F/T
Direction in L4 Not downhill Not downhill Not downhill Not downhill
The truth table for an entailment with two propositional variables will be more involved in that it will have 16 rows, rather than 4, but the computation is straightforward. It is easily verified that p → p ∨ q is valid. It is also straightforward to show that the theorem schema corresponding to the paradoxes of implication, p ∧ ¬ p → q and p → q ∨ ¬q, are not semantically valid. Nor is ( p ∨ q) ∧ ¬ p → q, the schema corresponding to the disjunctive syllogism, semantically valid. We mentioned earlier that we would explain how to show that (A∨ B) ∧ ¬A → (A∧ ¬A) ∨ B is semantically valid. The reader can now verify that the theorem schema corresponding to the wff ( p∨q)∧¬ p → ( p∧¬ p)∨q is semantically valid according to the criterion just articulated. These are left as exercises.
Exercises on 9.10 1.
Using the truth tables for negation, conjunction, and disjunction in the four-valued logic presented in this book, along with the logical lattice L4, apply the criterion that entailment goes only uphill in order to determine which of the following are valid entailments in E. Explain the reasoning used in obtaining your answer. (a) (b) (c) (d) (e) (f)
2.
¬p ∧ q → q q∨p→ p ( p ∧ ¬ p) → q p → (q ∨ ¬q) ( p ∨ q) ∧ ¬ p → q ( p ∨ q) ∧ ¬ p → ( p ∧ ¬ p) ∨ q
How does the criterion “entailment goes [only] uphill” in E compare to the criterion for a valid entailment in classical logic? Is there a logical lattice for two-valued classical logic? If so, identify the set and partial order that defines the lattice and sketch the graphical depiction of the lattice.
10 Some Concluding Remarks on the Logic of Entailment
Any of the numerous proof systems of the logic of entailment can be used to define the system E; the system FE presented in Part 3 of this book was only one such proof system. Likewise, the semantics presented here is only one of numerous semantics that have been developed for E. As we mentioned, fragments of E can also be defined, and proof systems and logic semantics for some of these fragments have been developed as well. We presented only the propositional calculus for E; the reader may be interested to know that a quantifier logic for E has also been developed. The literature on the logic of entailment is extensive. The research has developed in many different directions. There is research in the discipline of pure mathematics at the intersection of logic and algebra, which has been most fruitful.1 There is research in practical applications, as both the proof theory and the four-valued semantics presented here have been employed in artificial intelligence applications (e.g., SNePS); these applications are much more sophisticated than the question-answering application presented here, and are becoming ever more so.2 There is research in the discipline of philosophical logic into developing relevance logic further and proving more things about it. There is even some research in other fields of philosophy (theory of knowledge, epistemology, ontology, and Buddhist philosophy). One research direction that is especially closely related to the paradoxes discussed here is an unresolved debate about the significance of relevance logic to paradoxes in philosophy of science initiated by C. Kenneth Waters’s “Relevance Logic Brings Hope to Hypothetico-Deductivism.”3 It would probably take dozens, if not hundreds, of pages just to list all the work related to relevance logic that has been done in these 1 A recent piece directed at the more advanced reader is Bimbó, “Relevance Logics,” pp. 723–789. 2 Section 83 of Volume 2 of Anderson and Belnap’s Entailment is a survey of applications in computer science as of the early 1990s. Much more work in AI and cognitive science using logical systems for inference and belief revision based upon relevance logic has been carried out since then by Stuart C. Shapiro, William Rapaport, and others in the research group SNeRG at the University of Buffalo. An extensive bibliography of that work from 1969 to the present is available at http://www.cse.buffalo.edu/sneps/Bibliography/. As of this writing, the research group is still very active. 3 Waters, “Relevance Logic,” pp. 453–464.
316
Part 3: Philosophical Logic
various areas. Even so, many people who are not already in one of the communities of specialists working in these areas are not cognizant of the simple basics of the nature of the logic of entailment. Even among those who have heard something about “Anderson and Belnap’s relevance logic,” misconceptions of it abound.4 It is hoped that the reader who has taken the time to reflect on the philosophical topics in this part of the book will find himself or herself well equipped to address some of the basic misconceptions to be found, such as the claim that relevance logic is not a formal logic, or that the only possible truth functional connective for implication is the material conditional. More importantly, perhaps, the reader will have gained some insight into, and appreciation of, the kinds of concerns that can be addressed in philosophical logic.
References [1] Anderson, Alan R., and Belnap, Nuel D. Entailment: The Logic of Relevance and Necessity, Vol. 1. Princeton: Princeton University Press, 1975. [2] Anderson, Alan R. , Belnap, Nuel D., and Dunn, J. Michael. Entailment: The Logic of Relevance and Necessity, Vol. 2. Princeton: Princeton University Press, 1992. [3] Beall, J. C., and Restall, Greg. Logical Pluralism. Oxford: Clarendon Press, 2006. [4] Belnap, Nuel D. “Tonk, Plonk, and Plink,” Analysis, Vol. 22, No. 6, June 1962, 130–134. Recently reprinted in Cahn, Steven M. Thinking About Logic. New York: Westview Press, 2011. [5] Belnap, Nuel D., and Anderson, Alan R. “The Pure Calculus of Entailment,” Journal of Symbolic Logic, Vol. 27, No. 1, March 1962. [6] Bimbó, K. “Relevance Logics,” in Philosophy of Logic, ed. D. Jacquette, vol. 5, Handbook of the Philosophy of Science, ed. D. Gabbay, P. Thagard, and J. Woods. Amsterdom: Elsevier 2006, 723–789. [7] Burgess, John P. Philosophical Logic. Princeton: Princeton University Press, 2009. [8] Burgess, John P. “No Requirement of Relevance” in The Oxford Handbook of Mathematics and Logic, ed. Stuart Shapiro. New York: Oxford University Press, 2005. [9] Fitch, Frederic Brenton. Symbolic Logic: An Introduction. New York: Ronald Press Company, 1952. 4
The discussions in Section 7.2.1 and in the exercises for Section 8.5 mention but two prominent examples of philosophers’ discussions of relevance logic (i.e., the discussions by John P. Burgess and David H. Sanford, respectively) that do not appear to be cognizant of the later results about reasoning with the disjunctive syllogism, and that dismiss relevance logic based on misconceptions about it.
Chapter 10: Logic of Entailment
317
[10] Haack, Susan. Philosophy of Logics. Cambridge: Cambridge University Press, 1978. [11] Lewis, Clarence Irving, and Langford, Cooper Harold. Symbolic Logic. New York: Dover, 1932. [12] Priest, Graham. A Very Short Introduction to Logic. New York: Oxford University Press 2000. [13] Restall, Greg. “Substructural Logics,” The Stanford Encyclopedia of Philosophy (Spring 2008 Edition), ed. Edward N. Zalta URL = http:// plato.stanford.edu/archives/spr2008/entries/logic-substructural/. [14] Russell, Bertrand. Principles of Mathematics. 1st, 2nd and 3rd Edition. London and New York: Routledge, 1903, 1938, 1992. [15] Sanford, David H. If P, then Q: Foundations of Conditional Reasoning, 2nd edition. London and New York: Routledge, 2003. [16] Shapiro, Stuart C. “SNePS: A Logic for Natural Language Understanding and Commonsense Reasoning” in Natural Language Processing and Knowledge Representation: Language for Knowledge and Knowledge for Language, ed. Lucja M. Iwanska and Stuart C. Shapiro. Menlo Park, CA/Cambridge, MA: AAAI Press/MIT Press: 175–195. [17] Waters, C. Kenneth. “Relevance Logic Brings Hope to HypotheticalDeductivism,” Philosophy of Science, Vol. 54, No. 3, 463–464.
Index
θ -subsumption, 20, 50 ω-consistency, 217 Anderson, J. A., and Nuel D. Belnap, Jr., 230–232, 234, 237–240, 241, 252–254, 257, 277, 282–285, 303 Ackermann’s function, 188 addition: LRM–computable, 142; recursive, 169; RM-computable, 126 aids: Aid 1, 20, 49; Aid 2, 21, 49; Aid 3, 21, 49 algorithm, 28; definition of, 100; properties of, 102–103 alphabet: of predicate logic, 31; of propositional logic, 3; of resolution logic, 14; of rewrite systems, 158 arc: definition of, 22; ancestor, 22; immediate successor, 22; level of an, 22 arrow connective, 228, 234, 234–239, 244, 246, 281, 283 assignment, 33 association-to-the-right, 4 atom, 3 Belnap, Nuel D. See Anderson, J. A., and Nuel D. Belnap, Jr. busy beaver function, 128 Burgess, John, 231–232, 282n, 286n17, 316n4 Church-Turing Thesis, 139, 172 classical logic, 22, 223–224, 227–233, 238–239; alternatives to, 223–228, 232–233, 238, 240, 248, 253, 259–263, 271; extensions of, 225, 227, 239; Fitch’s Natural Deduction System for, 243–250 clause, 15, 46; empty, 15; far parent, 63; given, 16; ground, 47; near parent, 63; negative, 71; parent, 15; positive, 71; unit, 15 clause ordering, 63, 67; suitable, 69, 73, 75 clause set, 18; ground, 47; unsatisfiable, 19 closed tree: size, 24. See also tree closure: universal, 39 code: of an instruction, 146, 198; of a program, 146, 199; of a P-computation that halts, 199–200; of a w-step, 199 coding and decoding functions and relations, 191–193 coding system: primitive recursive, 191
completeness, 14 Completeness Theorem: first-order, 50, 57, 60; OI-resolution, 73, 76; OSLresolution, 69; propositional, 24 conjunction connective, 273–277 conjunctive normal form (CNF), 27 contrapositive, 10 decision problem: definition of, 100; examples of, 96–99; existence of unsolvable, 104 decrease function: definition of, 108; LRM-computable, 143; recursive, 169 deduction of a wff: affirmative, 13; grounded, 47; resolution, 14, 16, 45–46; OSL-resolution, 67; OI-resolution, 69 definition by Cases, 177 disjunction connective, 277–278 disjunctive syllogism, 232–233, 236n12, 281–286, 286n18, 287, 312, 314, 316n4 DPLL, 28 embellished signature, 33 empty word, 158 English, formalizing, 5, 32 entailment, 226, 231–233, 236; logic of, 240, 245–246; natural deduction rules of, 251–287; valid, 312–314 excess literal occurrences, 73 factorization rule, 46 Fitch, Frederic, 241, 244 formal system: predicate logic, 45; propositional resolution, 14 four-valued logic, 288–314; truth tables for, 295, 307–308; truth values in, 288, 305, 308–309, 312–313 function: characteristic, 115; composition of, 110; computable, 108; csg, 108, 171; decrease, 108; empty, 204; graph of, 119; logical, 6; LRM-computable, 141–142; minimalization of, 112; noncomputable, 109; partial, 203; partial recursive, 206; primitive recursive, 188; projection, 108; proper subtraction, 108; recursive, 166; regular, 112; RM-computable, 125; successor, 108; sg, 108, 171; total, 204; universal, 211; zero, 165
320
Index
Gödel Incompleteness Theorem, 218 Gödel numbering: computations, 199; discussion of, 104–105; instructions and programs, 146, 199 Graph Theorem, 213
logical consequence, 8 logical equivalence, 8 logical pluralism, 233 logical symbols, 3, 31 LRM-computable function, 141
Haack, Susan, 226n3 Halting Problem, 99, 145; for a fixed program, 151; semi-decidable, 116 Herbrand, 51; term, 52; universe, 51 Herbrand-Gödel Theorem, 51 hierarchy of connectives, 4, 32 Hilbert’s Decision Problem, 97, 156–158; semi-decidable, 117 Hilbert’s Tenth Problem, 96, 184–185; semi-decidable, 116 HLT, 147 Horn: Alfred, 62; clause, 69; definite clause, 69; wff, 69 hypothesis, 13
machine model of computation, 123–124 material implication, 8, 234, 238, 281. See also Official view mathematical model of computation, 165 metalogic, 223–224 minimalization: bounded, 179–180 — of an n+1-ary: partial function, 206; regular function, 112; regular computable function, 112; regular RM-computable function, 150 model, 8 monotonic function, 302–304 most general unifier(mgu), 42; algorithm for, 43 multiplication: LRM-computable, 142; recursive, 169; RM-computable, 133
implication, 5, 226, 229, 241–244, 249–250, 253. See also material implication induction: complete, 81; mathematical, 81; on partial recursive functions, 208; on primitive recursive functions, 191; on recursive functions, 168 inference system: complete, 17 infix notation, 34, 86 instructions: for the limited register machine, 141; for the register machine, 124 interpretation, 7, 33 K, 147, 183, 197 Kleene computation relation Tn , 197 Kleene Normal Form Theorem, 210 label: ancestor, 22; immediate successor, 22; level of a, 22 lattice, 297–298; definition of, 301 Lattice A4, 294, 303–304, 310 Lattice L4, 308–312 level: of a labeled binary tree, 22 Lewis, C. I., 238–239, 282n4 Lifting Lemma, 53, 60 Lifting Theorem for Resolution, 60 list: initial, 23 literal: definition of, 15, 46; ground, 47; resolving, 16 literals: complementary, 15 logic: affirmative, 13; deductive, 3, 13; first-order, 31; refutation-based, 14. See also classical logic
negation, 271–273, 281, 295–296 negation-as-failure, 85 node: ancestor, 22; definition of 22; immediate successor, 22; level of a, 22 nonclassical logics, 223–225 noncomputable function: existence of, 109 nonlogical symbols, 3, 32 O-clause, 63, 67; top, 63, 68–69, 73, 79 occurs check, 43, 85 Official view, 234, 236, 238, 258 OI-resolution logic: definition of, 69–70; Completeness Theorems for, 73, 76; Soundness Theorem for, 76; operations — on computable functions: composition, 110; minimalization, 112; primitive recursion, 110 — on relations, 114, 176 — on RM-computable functions: composition, 137; minimalization, 150; primitive recursion, 137 ordering rule, 63. See also clause ordering OSL-resolution logic: definition of, 62–63, 67; Soundness and Completeness Theorem for, 69 output: of P-computation, 125
321
Index
paradoxes of implication, 228–231, 238, 248 partial recursive functions: enumeration of n-ary, 211 partially ordered set, 297–301 P-computation, 125, 155–156 Post-Markov solution of Word Problem, 159, 162 predicate, 31; induction, 81 prefix: of a deduction, 16; notation, 86 prenex normal form, 39, 56 Priest, Graham, 237n primitive recursion, 110, 137 product rules: bounded, 171 program: COPY(j,k), 131; join of, 133; k-special, 128; MOVE(j,k), 132; P[k,w,z], 136; PR-representable, 189; for the register machine, 124; standard form, 133; ZERO[j,w], 136 projection functions: definition of, 165; RM-computable, 132; starting function, 165, 188 Prolog: example programs in, 85–91; relationship with OI-resolution, 77 —systems of: C-Prolog, 61; SWI-Prolog, 61 proof procedure, 41 proper subset, 20 proper subtraction: definition of, 108; recursive, 170; RM-computable, 127 quantifier: existential, 32; universal, 32 quantifier rules: bounded, 178 RE relations: enumeration of n-ary, 212 recursive function, 166; partial, 206 refutation, 16. See also deduction of a wff register machine, 124 — computation: informal definition,125; precise definition, 155–156 relation: characteristic function of, 115; decidable 114; Diophantine, 184; listable, 120; primitive recursive, 188; recursive, 175; RE, 182; RM-decidable, 144; RM-semi-decidable, 149; RE not recursive, 184, 197; semi-decidable, 116; undecidable, 104 relations: composition of recursive, 176 relevance, 228–230, 232, 234, 236–239, 253–256 Replacement Theorem, 11, 40 resolution: formal system, 14–16, 45–46; input, 69–70; linear, 62; OI-resolution, 69; OSL-resolution, 67; propositional
Soundness Theorem, 19; s-linear, 62, 65, 67; unit, 30, 71. See also Completeness Theorem Restall, G., 226n4 restriction: resolution, 20 rewrite system, 158 RM-computable functions: composition of, 137 Robinson, J.A., 24, 42, 53, 60 rule of inference: resolution 15, 46 Russell, Bertrand, 234, 237–238 s-resolution, 65–68 s-resolvent, 68 Sanford, David H., 232n11, 233n13, 286n17, 286n18, 287n, 316n SCF, 54 scope: of a quantifier, 32 Scott’s Thesis, 302–305, 307 search aids. See aids semantic tree: closed, 24; definition of, 23; failure node, 24; inference node, 24 simulation of a program: by a first-order language, 156; by rewrite system, 159 Skolem, Thoralf, 51; conjunctive normal form, 54; function, 56 sort, 34 Soundness Lemma (also Soundness Step), 18 Soundness Theorem: first order, 50, 57; OI-resolution, 76; OSL-resolution, 69; propositional, 19 standard form, 133 starting functions, 165–166 statement, 3; letter, 3 subroutines, 131–132 substitution: instance, 41; most general unifier, 42 sum rules: bounded, 171 System FE, 261–281 tautology, 7 term: as wff component, 32; ground, 47 Thue’s word problem: statement of, 96; unsolvability of, 162 top O-clause. See O-clause tree: binary, 22; branch, 22; closed semantic, 24; labeled binary, 22 truth table, 6 Turing: Halting Problem, 116, 145; solution to Decision Problem, 156–158
322
Index
undecidable relation: existence of, 104 Unification Lemma, 42, 60 unifier, 41–43 unsatisfiable clause set, 19 — minimally: definition of, 68; properties of, 71 unsatisfiable wff, 8 valid entailment. See entailment valuation, 7, 32–34, 37, 82
variable: bound, 32; free, 32 variable disjoint, 46 Waters, C. Kenneth, 315 well-formed formula. See wff wff: atomic, 3; closed, 39; first-order, 32; ground, 47; propositional, 3; satisfiable, 8, 39; valid, 7, 39. See also unsatisfiable wff Word problem, 96 w-step, 154