251 72 3MB
English Pages 551 Year 2006
Adam Olszewski, Jan Woleński, Robert Janusz. (Eds.) Church’s Thesis After 70 Years
ontos mathematical logic edited by Wolfram Pohlers, Thomas Scanlon, Ernest Schimmerling Ralf Schindler, Helmut Schwichtenberg Volume 1
Adam Olszewski, Jan Woleński, Robert Janusz. (Eds.)
Church’s Thesis After 70 Years
ontos verlag Frankfurt I Paris I Ebikon I Lancaster I New Brunswick
Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliographie; detailed bibliographic data is available in the Internet at http://dnb.ddb.de
North and South America by Transaction Books Rutgers University Piscataway, NJ 08854-8042 [email protected] United Kingdom, Ire Iceland, Turkey, Malta, Portugal by Gazelle Books Services Limited White Cross Mills Hightown LANCASTER, LA1 4XS [email protected]
Livraison pour la France et la Belgique: Librairie Philosophique J.Vrin 6, place de la Sorbonne ; F-75005 PARIS Tel. +33 (0)1 43 54 03 47 ; Fax +33 (0)1 43 54 48 18 www.vrin.fr
2006 ontos verlag P.O. Box 15 41, D-63133 Heusenstamm www.ontosverlag.com ISBN 3-938793-09-0 2006 No part of this book may be reproduced, stored in retrieval systems or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use of the purchaser of the work Printed on acid-free paper ISO-Norm 970-6 FSC-certified (Forest Stewardship Council) This hardcover binding meets the International Library standard Printed in Germany by buch bücher dd ag
Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
Darren Abramson Church’s Thesis and Philosophy of Mind . . . . . .
9
Andreas Blass, Yuri Gurevich Algorithms: A Quest for Absolute Definitions . . 24 Douglas S. Bridges Church’s Thesis and Bishop’s Constructivism . . . 58 Selmer Bringsjord, Konstantine Arkoudas On the Provability, Veracity, and AI-Relevance of the Church–Turing Thesis . . . . . . . . . . . . . 66 Carol E. Cleland The Church–Turing Thesis. A Last Vestige of a Failed Mathematical Program . . . . . . . . . . . 119 B. Jack Copeland Turing’s Thesis . . . . . . . . . . . . . . . . . . . . . . . 147 Hartmut Fitz Church’s Thesis and Physical Computation . . . . . 175 Janet Folina Church’s Thesis and the Variety of Mathematical Justifications . . . . . . . . . . . . . . 220 Andrew Hodges Did Church and Turing Have a Thesis about Machines? . . . . . . . . . . . . . . . . . . . . . . . . . . 242 Leon Horsten Formalizing Church’s Thesis . . . . . . . . . . . . . . 253 Stanisław Krajewski ¨ del’s Remarks on Church’s Thesis and Go Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
6
Contents
Charles McCarty Thesis and Variations . . . . . . . . . . . . . . . . . . . 281 Elliott Mendelson On the Impossibility of Proving the “Hard-Half” of Church’s Thesis . . . . . . . . . . . 304 Roman Murawski, Jan Woleński The Status of Church’s Thesis . . . . . . . . . . . . . 310 Jerzy Mycka Analog Computation and Church’s Thesis . . . . . 331 Piergiorgio Odifreddi Kreisel’s Church . . . . . . . . . . . . . . . . . . . . . . 353 Adam Olszewski Church’s Thesis as Formulated by Church — An Interpretation . . . . . . . . . . . . . . . . . . . . . 383 Oron Shagrir ¨ del on Turing on Computability . . . . . . . . . . 393 Go Stewart Shapiro Computability, Proof, and Open-Texture . . . . . 420 Wilfried Sieg Step by Recursive Step: Church’s Analysis of Effective Calculability . . . . . . . . . . . . . . . . . 456 Karl Svozil Physics and Metaphysics Look at Computation . . 491 David Turner Church’s Thesis and Functional Programming . . 518 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
Preface The 1930s became the golden decade of mathematical logic. The new generation of logicians, like Kurt G¨odel, Alfred Tarski, Jacques Herbrand, Gerhardt Genzten, Arend Heyting, Alonzo Church and Alan Turing joined older masters, like Hilbert, Brouwer or Skolem. The received paradigm of logic, emphatically and optimistically expressed by Hilbert’s famous phrase “Wir m¨ ussen wissen, und wir werden wissen”, had to be replaced by a more limited one and forced by discoveries of incompleteness, undefinability and undecidability. In 1936 three men, Church, Post and Turing, almost simultaneously and mutually independently, proposed the identification of the intuitive concept of computability and the mathematical concept of recursiveness. Although this proposal had three fathers, it was baptized as Church’s thesis (CT) and this label has been preserved even to this day. This thesis became one of the conceptual highlights of all time, at least in the area of logic and the foundations of mathematics. In fact, it is a very rare case in history when a proposal which has all the features of a definition, becomes as dexterous as Church’s thesis. On the one hand, almost everyone accepts it as very satisfying, but, on the other hand, it is continuously discussed by mathematicians, logicians, philosophers, computer scientists and psychologists. The discussion is many-sided and concerns the evidence for Church’s thesis, its history, various formulations, possible objections, the role in mathematics or applications in philosophy. This collection of papers, mostly commissioned specifically for the present volume, is intended as a testimony of a great significance and a fascinating story of the statement that computability is equivalent to recursiveness. Our assumption was to focus on the following topics: Church’s formulation of his thesis and its interpretations, different formulations of CT, CT and intuitionism, CT and intensional mathematics, CT and physics, epistemic status of CT, CT and philosophy of mind, (dis)provability of CT, CT and functional programming. Yet we have decided to publish these articles in the alphabetical order of
8
Preface
the authors’ names. We hope that the result gives proper justice to this beautiful landmark in the history of conceptual analysis. However, we cannot abstain from reporting an anecdote. We began our attempts to start this project with an offer to one of the leading world publishers, hoping that the password ‘Church’s thesis’ opens every door. Our query was forwarded to their religious (sic!) department, for which, in turn, this book was not interesting enough to publish. Habent sua fata libelli. Fortunately, Ontos Verlag had no problem with finding out the proper meaning of ‘Church’s thesis’.
Acknowledgments We are also grateful to the following publishing houses for giving permission to reprint the listed papers: 1. A K Peters Ltd: for P. Odifreddi, “Kreisel’s Church”. 2. The Association for Symbolic Logic: for W. Sieg, “Step by Recursive Step: Church’s Analysis of Effective Calculability”. 3. Bulletin of the European Association for Theoretical Computer Science: for A. Blass, Y. Gurevich, “Algorithms: A Quest for Absolute Definitions”. The Editors
Darren Abramson∗
Church’s Thesis and Philosophy of Mind 1. Introduction In this paper I examine implications of Church’s Thesis, and related theses, for the philosophy of mind. I will argue that the present indeterminate status of theses concerning what functions physical objects compute have significant import for some arguments over what mental states are. I show, however, that other arguments are not affected by this status, and I argue against claims that we can prove that certain views in philosophy of mind fail by consideration of computability.
2. The Church–Turing Thesis and the Chinese Room B. Jack Copeland has written extensively on the topic he calls ‘hypercomputation’ [Copeland 2002b], occasionally with reference to the philosophy of mind [Copeland 2000; 2002c]. Hypercomputation is referred to as ‘computation beyond the Turing limit’ [Copeland and Sylvan 1999]. A machine that hypercomputes, then, can produce values of a function that is not Turing computable. Copeland’s thesis is that a crucial error has crept into the writings of numerous philosophers of mind that becomes clear upon consideration of hypercomputation. In his writings, Copeland claims to find a widespread philosophical error that concerns the Church–Turing Thesis, often called Church’s Thesis. I will refer to it here as ‘CTT’. Since Copeland takes such great care to distinguish ‘CTT properly so-called’ from other related theses, let us agree to understand by CTT the definition he gives. ∗
D. Abramson, Department of Philosophy, Dalhousie University, Halifax, Nova Scotia, Canada, B3H 4P9, [email protected]
10
Darren Abramson
CTT: “[...] no human computer, or machine that mimics a human computer, can out-compute a universal Turing machine” [Copeland 2002a, p. 67]. Copeland’s complaint can be easily summarized. The locutions ‘computable by a physical device’ and ‘computable by a machine following an effective procedure’ mean different things, and may describe different sets of functions. Copeland thus distinguishes between CTT and another thesis, which we may call ‘PCTT’ for ‘physical Church–Turing Thesis’. PCTT: The physically computable functions are a subset of the Turing computable functions. The so-called ‘Church–Turing Fallacy’ is the conflation of PCTT with CTT. In some of his papers, Copeland cites examples in which, he claims, well known philosophers of mind and cognitive scientists argue as follows. Some proposition P follows from CTT; everyone thinks CTT is true; therefore P . However, Copeland says, the Church–Turing Fallacy is committed insofar as P only follows from PCTT and not CTT. To better understand what is at stake, let us examine a supposed instance of the fallacy. Copeland claims that Searle’s famous Chinese Room Argument founders on the Church–Turing Fallacy [Copeland 2002c]. Recall the form that Searle’s Chinese Room Argument takes. Its conclusion is that there is no computer program such that, merely by implementing it, can a machine possess understanding.1 Searle imagines himself, a philosopher who speaks no Chinese, and an implementation of a program which supposedly can take as inputs Chinese stories and then understand them, interacting in an enclosed room. Searle imagines himself carrying out the program’s instructions inside the enclosed room and observes that neither he, nor the room, understands Chinese, regardless of what people outside the room may think. 1
Note that this thesis is consistent with computational functionalism, which states merely that implementing some programs is sufficient for having mental states. Whether mental states count as understanding may depend on facts other than computational ones.
Church’s Thesis and Philosophy of Mind
11
We will now look at aspects of Searle’s argument relevant to CTT a little closer. The so-called ‘many-mansions reply’ consists of the claim that Searle has merely shown that existing computers do not understand, and there may come a point in the future in which other, human-constructed machines, do in fact think, perhaps due to novel causal powers they possess. Here is Searle’s response to the many-mansions reply. The interest of the original claim made on behalf of artificial intelligence is that it was a precise, well defined thesis: mental processes are computational processes over formally defined elements. I have been concerned to challenge that thesis. If the claim is redefined so that it is no longer that thesis, my objections no longer apply because there is no longer a testable hypothesis for them to apply to. [Searle 1980, p. 197]2
Of considerable relevance to our discussion is the ambiguity in the word ‘computational’ in Searle’s original article. A charitable interpretation, given Copeland’s concern over misinterpreting CTT, is ‘effectively computable’. However, Copeland points out that Searle’s later writings ask us to read the Chinese Room Argument as concerning any form of computation, where computation simpliciter is “the manipulation of formal symbols” [Copeland 2002c, p. 120]. Let us generalize the argument to take into account arbitrary manipulations of symbols by supposing that instead of a program, we are considering any rule-governed relationship between inputs and outputs that may be implemented in physical matter. Then, successfully constructing the thought experiment requires the assumption that John Searle could effectively follow the functional specification associated with understanding Chinese. If Searle is limiting his view to effectively computable functions, then his assumption, that he could implement an arbitrary implementation of such a function, would be justified. After all, ‘effectively computable’ simply describes those functions that a person, with enough time, paper, and pencil, can compute. However, as we have seen, he takes ‘functional specification’ to be more general than ‘effectively computable’. He takes it to mean ‘any symbol manipulation that may occur’. So, to conclude that the symbol manipulations which functionalism claims to underlie cognition can be effectively 2
Pagination follows the reprint of Searle’s article in [Haugeland 1997].
12
Darren Abramson
computed, Searle needs PCTT. We might think that some intermediate thesis is required, for example that all of the ‘humanly computable’ functions are Turing computable, where a humanly computable function is one that a person can compute, effectively or otherwise. However, this thesis is too weak, since the functions that underlie cognition may not be computable by humans.3 So, Copeland says, Searle commits the Church–Turing Fallacy. If PCTT is false, which it very well may be, then some physically computable functions might not be Turing computable. Then, from CTT, it follows that some physically computable functions are not effectively computable, by John Searle or anyone else. So, the Chinese Room Argument proves nothing about whether cognition really is formal symbol manipulation. This is, in essence, Copeland’s argument. A first response that one might make is that Searle really is, despite lack of care in some places, concerned with what Turing-equivalent computers can do. After all, in the article that introduced the Chinese Room Argument, Searle [1980] is explicitly interested in evaluating the claims of contemporary practitioners of ‘Strong AI’, those who believe that the programs they are writing to be run on mainframe and desktop computers do really understand stories, and can reason. For the time being, though, let us agree with Copeland’s exigesis: Searle intends his argument against computational functionalism in general. After all, irrespective of Searlean hermeneutics, it is of interest to see if some kind of computational functionalism can survive the Chinese Room Argument. Notice that Copeland’s analysis leads in the right direction, but ignores a key property of Searle’s argument: its thoroughgoing modal character. Copeland says that Searle needs PCTT to be able to pick an arbitrary function, and then imagine himself computing it in the Chinese room. However, he needs the following stronger thesis: 2PCTT. Thought experiments work by showing us what is possible. If we claim a necessary identity and are then shown that this breaks down in a possible world, we must revise our identity claim. However, we just don’t know if PCTT is true. We construct a thought experiment to investigate what it’s like to compute whatever functions underlie cognition. To do so, we must assume that PCTT holds 3
We can easily imagine an epistemic limitation on our ability to ascertain what functions biological tissue computes, for example.
Church’s Thesis and Philosophy of Mind
13
in the arbitrary possible world we are considering. So, the Searlean strategy requires 2PCTT, not merely PCTT or 3PCTT. However, the bare PCTT is already a modal claim. To say that all physically computable functions are Turing computable, presumably, doesn’t trivially fail in a finite universe. In other words, we take ‘physically computable’ to be some sort of augmentation of ‘effectively computable’. In investigating the class of effectively computable functions we, as Turing did, see what you can do with access to unbounded quantities of time, pencil, and paper. When we investigate PCTT, presumably, we ought still to consider unbounded access to resources. We may add, therefore, to any physical computation an unbounded number of steps which involve use of physical devices and/or substances, say, chunks of radium and geiger counters. Now, suppose that all the radium and geiger counters in the universe are made inaccessible, by being thrown into a black hole. In investigating PCTT we would still want to know what we could compute with access to objects consistent with the laws of physics. So, we would ignore the contingent unavailability of radium and geiger counters and reason over their use in deciding if PCTT were true. Notice that widespread agreement that all people are mortal does not prevent us from thinking that people can add—and the adding function has an infinite domain. If it turns out that effective methods do not capture all the methods at work in physical computation, then we must treat physical methods in the same unbounded fashion with which we treat effective methods. If we take PCTT to be ‘intrinsically modal’ in the sense described above, and subscribe to popular views of rigid designation and scientific law, PCTT implies 2PCTT. For, suppose it is the case that all the functions which are physically computable are Turing computable. By an externalist theory of reference, we take ‘the laws of physics’ to determine a set of possible worlds such that anything that is consistent with the laws of physics in our world is consistent with the laws of physics in those worlds. If it turns out that nothing can travel faster than the speed of light in this world, then in no physically possible world can anything travel faster than the speed of light. The same goes for physical computability. Of course, if we construe PCTT non-modally, or at least with only the same modal relaxations as CTT, then we get different results. Then, Searle really does need 2PCTT. For, suppose that there
14
Darren Abramson
is some physical property P which human brains need unbounded access to in order to compute the functions underlying, say, Chinese understanding. Suppose also that, in the limit, measuring P results in an uncomputable series of values. Then assuming that PCTT is true won’t permit us to conclude that in other possible worlds what is physically computable is Turing- and effectively computable. For, the actual world might have finite physical resources, while some possible worlds might have infinite physical resources. With deference to Copeland, I find it most convenient to simply read PCTT in the thoroughgoing modal nature described above. As we will see very shortly, the matters of rigid designation and possible worlds are of central importance to understanding the impact of hypercomputation on the philosophy of mind. After looking at another argument involving conceivable worlds and functionalism, I will be in a position to offer a new, relaxed thesis—a generalized computational functionalism.
3. A Second Modal Argument In this section I examine a more recent argument that attempts to argue that, on the basis of the purported failure of a version of CTT, that computational functionalism is false. In their recent book, Bringsjord and Zenzen [2003] make this claim. However, since their goal is to convince their reader that at least part of what constitutes our mental life is hypercomputation, they are sensitive to issues of computability. Following Searle’s [1992] lead, they argue that, given certain a priori considerations, these three propositions are not cotenable. 1. Persons are material things, viz., their brains (or some proper part of their nervous systems). 2. There is a conceptual or logical connection between Pconsciousness, i.e. phenomenal consciousness, as in [Block 1995], and the structure of, and information flow in, brains. Necessarily, if a person a enjoys a stretch of P-consciousness from ti to tk , this consciousness is identical to some computation c from ti to tk of some Turing machine or other equivalent computational system m, instantiated by a’s brain.
Church’s Thesis and Philosophy of Mind
15
3. A person’s enjoying P-consciousness is conceptually or logically distinct from that person’s brain being instantiated by a Turing machine running through some computation. With Searle [ibid.], they note that the following scenario both seems plausible, and demonstrates that these three propositions cannot be simultaneously held. A neuroscientist incrementally replaces portions of my brain with little silicon-based computers designed to mimic the information flow of brain matter perfectly. As bits of brain matter are replaced, conscious experience disappears bit by bit, but my outward behavior remains the same. Bringsjord and Zenzen argue that, if this scenario seems possible then we must reject the second proposition above, and thereby reject computationalism. Here is their argument in a nutshell. The second proposition tells us that in all possible worlds, having a brain with the right Turing-computable information flow implies the possession of particular conscious experiences. However, this is incompatible with the third proposition above, and the plausible claim that conceivability is a guide to possibility. The possible scenario described shows the conceptual distinction between P-consciousness and instantiating a Turing machine running through some computation. What happens, however, when we remove the second clause of the second proposition above? Suppose we simply hold that ‘there is a conceptual or logical connection between P-consciousness and the structure of, and information flow in, brains’. Given the truncated version of the second proposition, it seems at first that we can employ precisely the same scenario as before to reject the claim of necessary identity between objects instantiating the right information flow, and objects having P-consciousness. However, Copeland’s lesson against the Chinese Room Argument can be recast. For, suppose that the information flow instantiated in our brains is not recursive. Consider an arbitrary set S that is not recursively enumerable. Now, suppose that if computational neuroscientists could discover the truth of the matter, they would find that a fundamental property of the brain is to receive summed inputs from afferent neurons, convert that input to a real number r, and fire to an efferent neuron if and only if the initial, non-repeating decimal value of r is in S. First, this is an exemplary instance of ‘structured information flow’. Second, despite commitments we may have against PCTT,
16
Darren Abramson
this is at least as plausible as any zombie scenario. Finally, notice that we cannot even construct the zombie scenario once we admit that this outlandish, mathematical characterization of brain activity might be true. For, all the computers we know how to build are implementations of Turing-complete architectures. In other words, any set of natural numbers decidable by our bits of silicon are decidable by a Turing machine. So, we can deny that the scenario involving the surgeon and his replacement parts pick out any possible world. Once we observe that recursion theory shows us the conceptual possibility of physical structures that implement non-Turing computable functions, both zombies and the Chinese Room fail alike as counterexamples to a generalized form of computational functionalism.
4. Thought Experiments and Philosophy of Mind Careful attention to CTT and related theses does not solve all of the problems we might have in holding computational functionalism. To show this, I will invoke Kripke’s arguments concerning rigid designation of natural kind terms. According to Lycan, Kripke’s comments on identity, necessity, and materialism can be easily modified to concern not the identity theorists he has in mind in his writings, but functionalists also [Lycan 1974]. Lycan considers the now quaint machine-state functionalism. However, let us paraphrase, with slight alterations, his application of rigid designators to machine-state functionalism, allowing that having mental states may be identical to computing of a function that no Turing machine can compute. ‘According to us, my pain is identical with my functional state (of type) Sp . “My functional state Sp ” is just whatever state of me can be construed as obeying the symbolically sensitive method that defines Sp ’ [see ibid., p. 688 for comparison]. Given the failure of machinestate functionalism, we must offer something other than ‘state of a Turing machine’ for defining Sp . In a moment I will show why I think we cannot answer Lycan’s version of Kripke’s challenge by appeal to the failure of CTT. First, however, we will examine more closely what the above statement of computational functionalism amounts to. I concede that Block’s argument against the validity of any finitely bounded Turing Test shows that we are committed to counting as important for possessing mental abilities “the character of the internal information processing that produces [them]” [Block 1981, p. 5]. However, I
Church’s Thesis and Philosophy of Mind
17
am reluctant to use the phrase ‘algorithm’, or especially, ‘effective method’ in defining Sp since we are discussing the possibility that there might be other forms of information processing. So, by consideration of the fact that physical computability might outrun effective computability, we have a proposal for a modified version of computational functionalism. Sp includes any method, effective, physical, or otherwise that biological systems have available for converting inputs to ouputs. I call this a ‘relaxed’ version of computational functionalism because it takes its general form while permitting more computational methods. While this view seems to deal with certain arguments against traditional computational functionalism, we will now see that it is not immune to others. Recall Kripke’s famous challenge. We can conceive of worlds in which we have pain but instantiate none of the correlates the materialist tell us are identical to pain. Lycan shows us that the same is true for functionalism. Since we fix the reference of pain directly, we cannot claim that we are mistaking conceivable worlds in which there merely appears to be pain for ones in which there actually is pain, at least not in the usual way. When we are told why it is that we can conceive of worlds in which there appears to be, say, heat but not mean molecular kinetic energy, we accept that we fix the reference of heat by something contingently related to the heat itself—namely, by our experiences of heat. We can’t use this strategy in the case of pain since we do fix its reference directly. Moreover, there is no difficulty in imagining possible worlds in which we have mental states, but there are only disembodied souls and no machines computing any functions at all. The move that countered the zombie and Chinese Room cases does not work here because we begin by imagining mental phenomena, which we can do easily, and the thought experiment does not rely on our ability to imagine some other phenomena. I take it that there are live, interesting projects for showing why we are mistaken in taking these apparently conceivable worlds to be possible. I can’t do the topic justice here, but for two examples of an attempt to provide a non-Kripkean (i.e., disanalogous to the heat case) solution to the problem of apparently conceivable worlds as just described see George Bealer’s paper ‘Mental Properties’ [1994] and the recent book Names, Necessity, and Identity by Christopher Hughes [2004]. Because our imagination limits itself here to the
18
Darren Abramson
possible existence of disembodied souls (say), we cannot block the crucial step against a theory of mind by invoking the Church–Turing fallacy. However, we should be optimistic that progress has been made. There is intuitive appeal to the idea that the possible worlds which afford counterexamples to different conditionals must be dealt with in different ways. The first conditional at work in the Chinese Room Argument is that, in any possible world, if I have certain functional properties then I have certain mental ones also. The conditional being examined now is that, in any possible world, if I have mental properties, I have certain computational ones also. Suppose we start with a theory according to which mental states, events, and processes, are identical to computational ones. Then, when we conceive of worlds with the computational phenomena but not the mental ones, there should be something about the computational phenomena we have failed to notice. Similarly, when we we are dazzled by our apparent ability to imagine the mental phenomena but not the functional ones, it should be the imagined properties of the mental phenomena that reveal our error. So, I claim that the failure to address Kripke’s problem for computational functionalism is to be expected. Also, it is no small matter to deal directly with threatening thought experiments in which there are all the computational phenomena but none of the mental ones. Suppose we think with Yablo that “[almost] everything in The Conscious Mind [Chalmers 1997] turns on the single claim. The claim is that there can be zombie worlds: worlds physically like our own but devoid of consciousness.” [Yablo 1999, p. 455]. Then we have ammunition against some of the most prominent, recent threats to materialism. Note that the argument presented here against the conceivability of zombie worlds can be applied to other cases including dancing and inverted qualia. In responding to Yablo, Chalmers tells us “[most commentators on his book] bite the bullet and argue that psychophysical necessities are different in kind from the Kripkean examples, and not explicable by the two-dimensional [semantic] framework.” [Chalmers 1999, p. 477]. With the line of argument being presented, we can avoid bullet biting and take on necessary psychophysical (token) identification directly.
Church’s Thesis and Philosophy of Mind
19
5. Do We Hypercompute? So far, I have claimed that it is possible that people hypercompute, and that this possibility is enough to defend a relaxed form of computationalism against some familiar thought experiments. In this section, I will insist on taking the middle ground: I don’t think we can prove that we hypercompute. In a recent paper Bringsjord and Arkoudas have presented an argument which they think proves that people do hypercompute [Bringsjord and Arkoudas 2004]. In essence, they apply Kripke’s technique for arguing against materialists in order to show that we hypercompute. Let Dxyz mean ‘Turing machine x determines whether Turing machine y halts on input z. Let m range over Turing machines. Then, the unsolvability of the halting problem tells us that there is a particular Turing machine m0 such that Proposition 1 ∀m∃i¬3Dmm0 i Let p range over people. Then, we are told, computationalism can be construed as Proposition 2 ∀p∃m p = m 1 and 2 together let us derive Proposition 3 ∀p∃i¬3D pm0 i In plain language, 3 says that no matter who you are, there is a Turing machine out there—the one the proof that shows the halting problem is unsolvable says exists—and some input, such that we cannot determine whether that machine halts on that input. To complete their reductio against computationalism, Bringsjord and Arkoudas introduce the following premise which is inconsistent with 3: Proposition 4 ∀p∀i3Dpm0 i 3 and 4 together yield a contradiction. Since we are not prepared to reject Turing’s proof of the unsolvability of the halting problem, we must instead reject 2. Since people compute functions, and are identical to no Turing machine, Bringsjord and Arkoudas conclude
20
Darren Abramson
that people hypercompute. 4 says that for any person and any input, it possible for that person to decide whether Turing machine m0 halts on that input. They do, of course, recognize that 4 is tendentious, and provide some arguments for it. Two main lines of argument can be isolated. First, they claim, mathematical practice by children, let alone experts, reveals the method by which we hypercompute. A favorite example, given in [Bringsjord and Zenzen 2003] as well as the paper under discussion, is the familiar pictorial argument for lim 1/2n = 0. According to n→∞ the authors, we can complete an infinite number of steps in a finite amount of time to see that the equality holds—by completing such supertasks, human beings can hypercompute. However, understanding the precise relationship between the phenomenal nature of doing mathematics and the computational resources underlying this ability is difficult to say the least. Nor does the mere fact that we use calculus or picture proofs imply that we ever actually complete an infinite number of mathematical steps in a proof. The second argument for 4 runs as follows. For over 50 years, mathematicians, including Turing and G¨odel, have investigated the properties of notional machines that solve the halting problem. Many results concerning the structure of the arithmetical hierarchy make extensive use of such machines: see the classic discussion in [Rogers 1967, p. 129] for an introduction. So, it seems to involve no contradiction to suppose that our brain mechanisms make regular queries of the halting set just as oracle machines do. Could this possibility be an illusion? It is far from obvious that hypercomputation is an intrinsic property of our mental life in the same way that pain sensations are constitutive of being in pain. As a matter of fact, either we hypercompute or we don’t. If we do, then it is possible that we do. But if we do not hypercompute, then despite the imaginability of our brains behaving as though they are Turing machines with oracles attached, we may one day discover that we do not hypercompute. We are in precisely the same position with respect to claims that ‘persons possibly hypercompute’ as 15th century alchemists were with respect to the claim that ‘heat can exist in the absence of mean molecular kinetic energy’. Bringsjord and Arkoudas paraphrase Chalmers [1997] and say that “when some state of affairs ψ seems, by all accounts, to be perfectly coherent, the burden of proof
Church’s Thesis and Philosophy of Mind
21
is on those who would resist the claim that ψ is logically or mathematically possible” [Bringsjord and Arkoudas, p. 183]. I have hinted at how this burden must be borne, and now will do so explicitly. For any Turing degree T , one can coherently hold the view that mentation is the same as instantiating information flow which is not above T in the Kleene hierarchy. Now, if it is constitutive of having a human mental life to instantiate functions of a particular Turing degree, then anyone who thinks that they can imagine a person computing functions above that degree have deluded themselves. It is true that we don’t know which Turing degree, or less, is the correct one for fixing our mental properties. For that very reason, we may deny that some scenarios, such as those in which we solve the halting problem, are possible. We may grant that such scenarios hold epistemic possibility. We do not have recourse, as Kripke does, to the fact that we fix the reference of mental states directly by phenomenal feel. In fact, first-order axiomatizations of mathematics suggest that we have good reason for limiting the epistemic possibility of hypercomputation. That is, if mathematicians think that, despite the tedium involved, all mathematical results are really consequences of the Peano, or Zermelo–Fraenkel Axioms, then I have positive reasons to deny the possibility claimed by Bringsjord and Arkoudas. A significant argument, which is not yet on offer, is that mathematics is what mathematics feels like. In short, we may question the metaphysical possibility of hypercomputation—and reject 4—on the basis of necessary identity and an epistemic position provided by the philosophy of mathematics.
6. Conclusion I have shown three things. Assume that we don’t know whether it is constitutive of us to hypercompute. Then zombie arguments fail, because we are unable to construct the relevant thought experiments—the same goes for the Chinese Room. Second, we cannot conclude that we do hypercompute from the mere appearance that it seems logically possible. Finally, those who seek the philosophical benefits of computationalism may find them in its relaxed form, in which ‘information flow’, Turing computable or not, underlies our mental lives. Each of these conclusions has been reached by consideration of CTT and related theses.
22
Darren Abramson
It might be argued that these conclusions are merely of conditional interest. If we have good reason to think that hypercomputation is not possible, and that PCTT or a modal counterpart holds, then the conclusions I have argued for would be in vain. Deflationary arguments such as offered in [Kreisel 1982] and [Davis 2005] rely on epistemological arguments against the metaphysical possibility of hypercomputation, and so are less than convincing. I do not have the resources here to offer a positive argument for a model of hypercomputation; however, I leave the reader to the other papers in this collection that do.
References Bealer, G. [1994], “Mental Properties”, Journal of Philosophy 91(4), 185–208. Block, N. [1981], “Psychologism and Behaviorism”, The Philosophical Review 90(1), 5–43. Block, N. [1995], “On a Confusion about a Function of Consciousness”, Behavioral and Brain Sciences, 18. Bringsjord, S. and Arkoudas, K. [2004], “The Modal Argument for Hypercomputing Minds”, Theoretical Computer Science 317, 167–190. Bringsjord, S. and Zenzen, M. [2003], Superminds: People Harness Hypercomputation and More, Kluwer. Chalmers, D. [1997], The Conscious Mind, Oxford University Press. Chalmers, D. [1999], “Materialism and the Metaphysics of Modality”, Philosophy and Phenomenological Research 59(2), 473–493. Copeland, B.J. [2000], “Narrow versus Wide Mechanism: Including a Reexamination of Turing’s Views on the Mind-Machine Issue”, The Journal of Philosophy 96, 5–32. Copeland, B.J. [2002a], Computationalism: New Directions, chapter: Narrow versus Wide Mechanism, MIT Press, pp. 59–86. Copeland, B.J. [2002b], “Hypercomputation”, Minds and Machines 12, 461–502. Copeland, B.J. [2002c], Views into the Chinese Room: New Essays on Searle and Artificial Intelligence, chapter: The Chinese
Church’s Thesis and Philosophy of Mind
23
Room from a Logical Point of View, Oxford University Press, pp. 109–122. Copeland, J. and Sylvan, R. [1999], “Beyond the Universal Turing Limit”, Australasian Journal of Philosophy 77(46–66). Davis, M. [2005], “The Myth of Hypercomputation”, in Alan Turing: Life and Legacy of a Great Thinker, (C. Teuscher ed.), Springer-Verlag, pp. 195–210. Haugeland, J. [1997], Mind Design II, MIT Press. Hughes, C. [2004], Kripke: Names, Necessity, and Identity, Oxford University Press. Kreisel, G. [1982], Review: “A Computable Ordinary Differential Equation Which Possesses No Computable Solution; The Wave Equation With Computable Initial Data Such That Its Unique Solution Is Not Computable”, The Journal of Symbolic Logic 47(4), 900–902. Lycan, W.G. [1974], “Kripke and the Materialists”, Journal of Philosophy 71(18), 677–689. Rogers, H. [1967], Theory of Recursive Functions and Effective Computability, MIT Press. Searle, J. [1980], “Minds, Brains, and Programs”, Behavioral and Brain Sciences, 3. Searle, J. [1992], The Rediscovery of the Mind, MIT Press. Yablo, S. [1999], “Concepts & Consciousness”, Philosophy and Phenomenological Research 59(2), 455–463.
Andreas Blass, Yuri Gurevich∗
Algorithms: A Quest for Absolute Definitions What is an algorithm? The interest in this foundational problem is not only theoretical; applications include specification, validation and verification of software and hardware systems. We describe the quest to understand and define the notion of algorithm. We start with the Church–Turing thesis and contrast Church’s and Turing’s approaches, and we finish with some recent investigations.
1. Introduction In 1936, Alonzo Church published a bold conjecture that only recursive functions are computable [Church 1936]. A few months later, independently of Church, Alan Turing published a powerful speculative proof of a similar conjecture: every computable real number is computable by the Turing machine [Turing 1937]. Kurt G¨odel found Church’s thesis “thoroughly unsatisfactory” but later was convinced by Turing’s argument. Later yet he worried about a possible flaw in Turing’s argument. In Section 2 we recount briefly this fascinating story, provide references where the reader can find additional details, and give remarks of our own. By now, there is overwhelming experimental evidence in favor of the Church–Turing thesis. Furthermore, it is often assumed that the Church–Turing thesis settled the problem of what an algorithm is. That isn’t so. The thesis clarifies the notion of computable function. And there is more, much more to an algorithm than the function it ∗
A. Blass, Mathematics Department, University of Michigan, Ann Arbor, MI 48109; Y. Gurevich, Microsoft Research, One Microsoft Way, Redmond, WA 98052.
Algorithms: A Quest for Absolute Definitions
25
computes. The thesis was a great step toward understanding algorithms, but it did not solve the problem what an algorithm is. Further progress in foundations of algorithms was achieved by Kolmogorov and his student Uspensky in the 1950s [Kolmogorov 1953; Kolmogorov and Uspensky 1958]. The Kolmogorov machine with its reconfigurable “tape” has a certain advantage over the Turing machine. The notion of pointer machine was an improvement of the notion of Kolmogorov machine. These issues are discussed in Section 3. This paper started as a write-up of the talk that the second author gave at the Kolmogorov Centennial conference in June 2003 in Moscow. The talk raised several related issues: physics and computation, polynomial time versions of the Turing thesis, recursion and algorithms. These issues are very briefly discussed in Section 4. In 1991, the second author published the definition of sequential abstract state machines (ASMs, called evolving algebras at the time) [Gurevich 1991]. In 2000, he published a definition of sequential algorithms derived from first principles [Gurevich 2000]. In the same paper he proved that every sequential algorithm A is behaviorally equivalent to some sequential ASM B. In particular, B simulates A step for step. In Section 5 we outline the approach of [Gurevich 2000]. In 1995, the second author published the definition of parallel and distributed abstract state machines [Gurevich 1995]. The Foundations of Software Engineering group at Microsoft Research developed an industrial strength specification language AsmL that allows one to write and execute parallel and distributed abstract state machines [AsmL]. In 2001, the present authors published a definition of parallel algorithms derived from first principles as well as a proof that every parallel algorithm is equivalent to a parallel ASM [Blass and Gurevich 2003]. Section 6 is a quick discussion of parallel algorithms. The problem of defining distributed algorithms from first principles is open. In Section 7 we discuss a few related issues. Finally let us note that foundational studies go beyond satisfying our curiosity. Turing machines with their honest counting of steps enabled computational complexity theory. Kolmogorov machines and pointer machines enabled better complexity measures. Abstract state machines enable precise executable specifications of
26
Andreas Blass, Yuri Gurevich
software systems though this story is only starting to unfold [ASM, AsmL, B¨ orger and St¨ ark 2003]. Added in proof. This paper was written in 2003. Since then the ASM characterization has been extended to small-step interactive algorithms. Work continues on other aspects [Gurevich 2005].
2. The Church–Turing Thesis 2.1. Church + Turing
The celebrated Church–Turing thesis [Church 1936, Turing 1937] captured the notion of computable function. Every computable function from natural numbers to natural numbers is recursive and computable, in principle, by the Turing machine. The thesis has been richly confirmed in practice. Speaking in 1946 at the Princeton Bicentennial Conference, G¨ odel said this [G¨odel 1990 (article 1946)]: Tarski has stressed in his lecture (and I think justly) the great importance of the concept of general recursiveness (or Turing’s computability). It seems to me that this importance is largely due to the fact that with this concept one has for the first time succeeded in giving an absolute definition of an interesting epistemological notion, i.e., one not depending on the formalism chosen. In all other cases treated previously, such as demonstrability or definability, one has been able to define them only relative to the given language, and for each individual language it is clear that the one thus obtained is not the one looked for. For the concept of computability, however, although it is merely a special kind of demonstrability or decidability, the situation is different. By a kind of miracle it is not necessary to distinguish orders, and the diagonal procedure does not lead outside the defined notion. 2.2. Turing − Church
It became common to speak about the Church–Turing thesis. In fact the contributions of Church and Turing are different, and the difference between them is of importance to us here. Church’s thesis was a bold hypothesis about the set of computable functions. Turing analyzed what can happen during a computation and thus arrived at his thesis.
Algorithms: A Quest for Absolute Definitions
27
Church’s Thesis. The notion of an effectively calculable function from natural numbers to natural numbers should be identified with that of a recursive function. Church [1936] had in mind total functions. Later Kleene [1938] improved on Church’s thesis by extending it to partial functions. The fascinating history of the thesis is recounted in [Davis 1982]; see also [Sieg 1997]. Originally Church hypothesized that every effectively calculable function from natural numbers to natural numbers is definable in his lambda calculus. G¨ odel didn’t buy that. In 1935, Church wrote to Kleene about his conversation with G¨odel [Davis 1982, p. 9]. In discussion [sic] with him the notion of lambda-definability, it developed that there was no good definition of effective calculability. My proposal that lambda-definability be taken as a definition of it he regarded as thoroughly unsatisfactory. I replied that if he would propose any definition of effective calculability which seemed even partially satisfactory I would undertake to prove that it was included in lambda-definability. His only idea at the time was that it might be possible, in terms of effective calculability as an undefined notion, to state a set of axioms which would embody the generally accepted properties of this notion, and to do something on this basis.
Church continued: Evidently it occurred to him later that Herbrand’s definition of recursiveness, which has no regard to effective calculability, could be modified in the direction of effective calculability, and he made this proposal in his lectures. At that time he did specifically raise the question of the connection between recursiveness in this new sense and effective calculability, but said he did not think that the two ideas could be satisfactorily identified “except heuristically”.
The lectures of G¨ odel mentioned by Church were given at the Institute for Advanced Study in Princeton from February through May 1934. In a February 15, 1965, letter to Martin Davis, G¨odel wrote [Davis 1982, p. 8]: However, I was, at the time of these lectures [1934], not at all convinced that my concept of recursion comprises all possible recursions.
28
Andreas Blass, Yuri Gurevich
Soon after G¨ odel’s lectures, Church and Kleene proved that the Herbrand–G¨ odel notion of general recursivity is equivalent to lambda definability (as far as total functions are concerned), and Church became sufficiently convinced of the correctness of his thesis to publish it. But G¨ odel remained unconvinced. Indeed, why should one believe that lambda definability captures the notion of computability? The fact that lambda definability is equivalent to general recursivity, and to various other formalizations of computability that quickly followed Church’s paper, proves only that Church’s notion of lambda definability is very robust. To see that a mathematical definition captures the notion of computability, one needs an analysis of the latter. This is what Turing provided to justify his thesis. Turing’s Thesis. Let Σ be a finite alphabet. A partial function from strings over Σ to strings over Σ is effectively calculable if and only if it is computable by a Turing machine. Remark. Turing designed his machine to compute real numbers but the version of the Turing machine that became popular works with strings in a fixed alphabet. Hence our formulation of Turing’s thesis. Turing analyzed a computation performed by a human computer. He made a number of simplifying without-loss-of-generality assumptions. Here are some of them. The computer writes on graph paper; furthermore, the usual graph paper can be replaced with a tape divided into squares. The computer uses only a finite number of symbols, a single symbol in a square. “The behavior of the computer at any moment is determined by the symbols which he is observing, and his ‘state of mind’ at that moment”. There is a bound on the number of symbols observed at any one moment. “We will also suppose that the number of states of mind which need to be taken into account is finite [...] If we admitted an infinity of states of mind, some of them will be ‘arbitrarily close’ and will be confused”. He ends up with a Turing machine simulating the original computation. Essentially Turing derived his thesis from more or less obvious first principles though he didn’t state those first principles carefully. “It seems that only after Turing’s formulation appeared,” writes Kleene in [1981, p. 61], “did G¨ odel accept Church’s thesis, which had
Algorithms: A Quest for Absolute Definitions
29
then become the Church–Turing thesis.” “Turing’s arguments,” he adds in [Kleene 1988, p. 48], “eventually persuaded him.” Church’s lambda calculus was destined to play an important role in programming theory. The mathematically elegant Herbrand– G¨odel–Kleene notion of partial recursive functions served as a springboard for many developments in recursion theory. The Turing machine gave us honest step counting and became eventually the foundation of complexity theory. 2.3. Remarks on Turing’s Analysis
Very quickly the Church–Turing thesis acquired the status of a widely shared belief. Meantime G¨odel grew skeptical of at least one aspect of Turing’s analysis. In a remark published after his death, G¨odel writes this [G¨ odel 1990 (article 1972a, p. 306)]. A philosophical error in Turing’s work. Turing in his [Turing 1937, p. 250], gives an argument which is supposed to show that mental procedures cannot go beyond mechanical procedures. However, this argument is inconclusive. What Turing disregards completely is the fact that mind, in its use, is not static, but constantly developing, i.e. that we understand abstract terms more and more precisely as we go on using them, and that more and more abstract terms enter the sphere of our understanding. There may exist systematic methods of actualizing this development, which could form part of the procedure. Therefore, although at each stage the number and precision of the abstract terms at our disposal may be finite, both (and therefore, also Turing’s number of distinguishable states of mind ) may converge toward infinity in the course of the application of the procedure.
G¨odel was extremely careful in his published work. It is not clear whether the remark in question was intended for publication as is. In any case, the question whether mental procedures can go beyond mechanical procedures is beyond the scope of this paper, which focuses on algorithms. Furthermore, as far as we can see, Turing did not intend to show that mental procedures cannot go beyond mechanical procedures. The expression “state of mind” was just a useful metaphor that could be and in fact was eliminated: “we
30
Andreas Blass, Yuri Gurevich
avoid introducing the ‘state of mind’ by considering a more physical and definite counterpart of it” [Turing 1937, p. 253]. But let us consider the possibility that G¨odel didn’t speak about biology either, that he continued to use Turing’s metaphor and worried that Turing’s analysis does not apply to some algorithms. Can an algorithm learn from its own experience, become more sophisticated and thus compute a real number that is not computable by the Turing machine? Note that the learning process in question is highly unusual because it involves no interaction with the environment. (On the other hand, it is hard to stop brains from interacting with the environment.) G¨ odel gives two examples “illustrating the situation”, both aimed at logicians. Note that something like this indeed seems to happen in the process of forming stronger and stronger axioms of infinity in set theory. This process, however, today is far from being sufficiently understood to form a well-defined procedure. It must be admitted that the construction of a well-defined procedure which could actually be carried out (and would yield a non-recursive number-theoretic function) would require a substantial advance in our understanding of the basic concepts of mathematics. Another example illustrating the situation is the process of systematically constructing, by their distinguished sequences αn → α, all recursive ordinals α of the second number-class.
The logic community has not been swayed. “I think it is pie in the sky!” wrote Kleene [1988, p. 51]. Here is a more expansive reaction of his [Kleene 1988, p. 50]. But, as I have said, our idea of an algorithm has been such that, in over two thousand years of examples, it has separated cases when mathematicians have agreed that a given procedure constitutes an algorithm from cases in which it does not. Thus algorithms have been procedures that mathematicians can describe completely to one another in advance of their application for various choices of the arguments. How could someone describe completely to me in a finite interview a process for finding the values of a number-theoretic function, the execution of which process for various arguments would be keyed to more than the finite subset of our mental states that
Algorithms: A Quest for Absolute Definitions
31
would have developed by the end of the interview, though the total number of mental states might converge to infinity if we were immortal? Thus G¨ odel’s remarks do not shake my belief in the Church–Turing thesis [...]
If G¨ odel’s remarks are intended to attack the Church–Turing thesis, then the attack is a long shot indeed. On the other hand, we disagree with Kleene that the notion of algorithm is that well understood. In fact the notion of algorithm is richer these days than it was in Turing’s days. And there are algorithms, of modern and classical varieties, not covered directly by Turing’s analysis, for example, algorithms that interact with their environments, algorithms whose inputs are abstract structures, and geometric or, more generally, non-discrete algorithms. We look briefly at the three examples just mentioned. Interactive algorithms. This is a broad class. It includes randomized algorithms; you need the environment to provide random bits. It includes asynchronous algorithms; the environment influences action timing. It includes nondeterministic algorithms as well [Gurevich 2000 (sec. 9.1)]. Clearly, interactive algorithms are not covered by Turing’s analysis. And indeed an interactive algorithm can compute a non-recursive function. (The nondeterministic Turing machines, defined in computation theory courses, are known to compute only partial recursive functions. But a particular computation of such a machine cannot in general be simulated by a deterministic Turing machine.) Computing with abstract structures. Consider the following algorithm P that, given a finite connected graph G = (V, E) with a distinguished vertex s, computes the maximum distance of any vertex from s. A
S := {s} and r := 0.
B
If S = V then halt and output r.
C
If S 6= V then S := S ∪{y : ∃x (x ∈ S ∧E(x, y))} and r := r +1.
D
Go to B.
P is a parallel algorithm. Following Turing’s analysis we have to break the assignment S := {y : ∃x (x ∈ S ∧ E(x, y))} into small tasks
32
Andreas Blass, Yuri Gurevich
of bounded complexity, e.g. by going systematically though every x ∈ S and every neighbor y of x. But how will the algorithm go through all x ∈ S? The graph G is not ordered. A nondeterministic algorithm can pick an arbitrary vertex and declare it the first vertex, pick one of the remaining vertices and declare it the second vertex, etc. But a deterministic algorithm cannot do that. Algorithms like P are not covered directly by Turing’s analysis. But there is an easy patch if you don’t care about resources and use parallelism. Let n be the number of vertices. In parallel, the desired algorithm orders the vertices in all n! possible ways and then carries on all n! computations. Non-discrete computations. Turing dealt with discrete computations. His analysis does not apply directly e.g. to the classical, geometrical ruler-and-compass algorithms. The particular case of ruler-and-compass algorithms can be taken care of; such algorithms do not allow you to compute a non-recursive function [Kijne 1956]. In general, however, it is not clear how to extend Turing’s analysis to non-discrete algorithms.
3. Kolmogorov Machines and Pointer Machines The problem of the absolute definition of algorithm was attacked again in 1953 by Andrei N. Kolmogorov; see the one-page abstract [Kolmogorov 1953] of his March 17, 1953, talk at the Moscow Mathematical Society. Kolmogorov spelled out his intuitive ideas about algorithms. For brevity, we express them in our own words (rather than translate literally). • An algorithmic process splits into steps whose complexity is bounded in advance, i.e., the bound is independent of the input and the current state of the computation. • Each step consists of a direct and immediate transformation of the current state. • This transformation applies only to the active part of the state and does not alter the remainder of the state. • The size of the active part is bounded in advance. • The process runs until either the next step is impossible or a signal says the solution has been reached.
Algorithms: A Quest for Absolute Definitions
33
In addition to these intuitive ideas, Kolmogorov gave a oneparagraph sketch of a new computation model. The ideas of [Kolmogorov 1953] were developed in the article [Kolmogorov and Uspensky 1958] written by Kolmogorov together with his student Vladimir A. Uspensky. The Kolmogorov machine model can be thought of as a generalization of the Turing machine model where the tape is a directed graph of bounded in-degree and bounded out-degree. The vertices of the graph correspond to Turing’s squares; each vertex has a color chosen from a fixed finite palette of vertex colors; one of the vertices is the current computation center. Each edge has a color chosen from a fixed finite palette of edge colors; distinct edges from the same node have different colors. The program has this form: replace the vicinity U of a fixed radius around the central node by a new vicinity W that depends on the isomorphism type of the digraph U with the colors and the distinguished central vertex. Contrary to Turing’s tape whose topology is fixed, Kolmogorov’s “tape” is reconfigurable. Remark. We took liberties in describing Kolmogorov machines. Kolmogorov and Uspensky require that the tape graph is symmetric—for every edge (x, y) there is an edge (y, x). The more liberal model is a bit easer to describe. And the symmetry requirement is inessential in the following sense: any machine of either kind can be step-for-step simulated by a machine of the other kind. Like Turing machines, Kolmogorov machines compute functions from strings to strings; we skip the description of the input and output conventions. In the footnote to the article title, Kolmogorov and Uspensky write that they just wanted to analyze the existing definitions of the notions of computable functions and algorithms and to convince themselves that there is no hidden way to extend the notion of computable function. Indeed, Kolmogorov machines compute exactly Turing computable functions. It seems, however, that they were more ambitious. Here is a somewhat liberal translation from [Kolmogorov and Uspensky 1958, p. 16]: To simplify the description of algorithms, we introduced some conventions that are not intrinsic to the general idea, but it seems to us that the generality of the proposed definition re-
34
Andreas Blass, Yuri Gurevich mains plausible in spite of the conventions. It seems plausible to us that an arbitrary algorithmic process satisfies our definition of algorithms. We would like to emphasize that we are talking not about a reduction of an arbitrary algorithm to an algorithm in the sense of our definition but that every algorithm essentially satisfies the proposed definition.
In this connection the second author formulated a Kolmogorov– Uspensky thesis [Gurevich 1988, p. 227]: “every computation, performing only one restricted local action at a time, can be viewed as (not only being simulated by, but actually being) the computation of an appropriate KU machine”. Uspensky concurred [Uspensky 1992, p. 396]. Kolmogorov’s approach proved to be fruitful. It led to a more realistic complexity theory. For example, given a string x, a Kolmogorov machine can build a binary tree over x and then move fast about x. Leonid Levin used a universal Kolmogorov machine to construct his algorithm for NP problems that is optimal up to a multiplicative constant [Levin 1973; Gurevich 1988]. The up-to-amultiplicative-constant form is not believed to be achievable for the multitape Turing machine model popular in theoretical computer science. Similarly, the class of functions computable in nearly linear time n(log n)O(1) on Kolmogorov machines remains the same if Kolmogorov machines are replaced e.g. by various random access computers in the literature; it is not believed, however, that the usual multitape Turing machines have the same power [Gurevich and Shelah 1989]. Kolmogorov machines allow one to do reasonable computations in reasonable time. This may have provoked Kolmogorov to ask new questions. “Kolmogorov ran a complexity seminar in the 50s or early 60s,” wrote Leonid Levin, a student of Kolmogorov, to us [Levin 2003a]. “He asked if common tasks, like integer multiplication, require necessarily as much time as used by common algorithms, in this case quadratic time. Unexpectedly, Karatsuba reduced the power to log2 (3) [Karatsuba and Ofman 1963].” (Readers interested in fast integer multiplication are referred to [Knuth 1981].) It is not clear to us how Kolmogorov thought of the tape graph. One hypothesis is that edges reflect physical closeness. This hypothesis collides with the fact that our physical space is finite-dimensional. As one of us remarked earlier [Gurevich 2000, p. 81], “In a finite-
Algorithms: A Quest for Absolute Definitions
35
dimensional Euclidean space, the volume of a sphere of radius n is bounded by a polynomial of n. Accordingly, one might expect a polynomial bound on the number of vertices in any vicinity of radius n (in the graph theoretic sense) of any state of a given KU machine, but in fact such a vicinity may contain exponentially many vertices.” Another hypothesis is that edges are some kind of channels. This hypothesis too collides with the fact that our physical space is finitedimensional. Probably the most natural approach would be to think of informational rather than physical edges. If vertex a contains information about the whereabouts of b, draw an edge from a to b. It is reasonable to assume that the amount of information stored at every single vertex a is bounded, and so the out-degree of the tape graph is bounded. It is also reasonable to allow more and more vertices to have information about b as the computation proceeds, so that the in-degree of the tape graph is unbounded. This brings us to Sch¨ onhage machines. These can be seen as Kolmogorov machines (in the version with directed edges) except that only the out-degrees are required to be bounded. The in-degrees can depend on the input and, even for a particular input, can grow during the computation. “In 1970 the present author introduced a new machine model (cf. [Sch¨ onhage 1970]) now called storage modification machine (SMM),” writes Sch¨ onhage in [1980], “and posed the intuitive thesis that this model possesses extreme flexibility and should therefore serve as a basis for an adequate notion of time complexity.” In article [Sch¨ onhage 1980], Sch¨ onhage gave “a comprehensive presentation of our present knowledge of SMMs”. In particular, he proved that SMMs are “real-time equivalent” to successor RAMs (random access machines whose only arithmetical operation is n 7→ n + 1). The following definitions appear in [Sch¨onhage 1980, p. 491]. Definition. A machine M 0 is said to simulate another machine M r “in real time”, denoted M → M 0 , if there is a constant c such that for every input sequence x the following holds: if x causes M to read an input symbol, or to print an output symbol, or to halt at time steps 0 = t0 < t1 < · · · < tl , respectively, then x causes M 0 to act in the very same way with regard to those external actions at time steps 0 = t00 < t01 < · · · < t0l where t0j − t0j−1 ≤ c(tj − tj−1 ) for 1 ≤ j ≤ l. For machine classes M, M0 real time reducibility r M → M0 is defined by the condition that for each M ∈ M there
36
Andreas Blass, Yuri Gurevich r
exists an M 0 ∈ M0 such that M → M 0 . Real time equivalence r r r M ↔ M0 means M → M0 and M0 → M. ¤ Dima Grigoriev proved that Turing machines cannot simulate Kolmogorov machines in real time [Grigoriev 1980]. Sch¨ onhage introduced a precise language for programming his machines and complained that the Kolmogorov–Uspensky description of Kolmogorov machines is clumsy. For our purposes here, however, it is simplest to describe Sch¨onhage machines as generalized Kolmogorov machines where the in-degree of the tape graph may be unbounded. It is still an open problem whether Sch¨onhage machines are real time reducible to Kolmogorov machines. r Sch¨ onhage states his thesis as follows: “M → SMM holds for all atomistic machine models M.” Sch¨ onhage writes that Donald E. Knuth “brought to his attention that the SMM model coincides with a special type of ‘linking automata’ briefly explained in volume one of his book (cf. [Knuth 1968, pp. 462–463]) in 1968 already. Now he suggests calling them ‘pointer machines’ which, in fact, seems to be the adequate name for these automata.” Note that Kolmogorov machines also modify their storage. But the name “pointer machine” fits Knuth–Sch¨onhage machines better than it fits Kolmogorov machines. A successor RAM is a nice example of a pointer machine. Its tape graph consists of natural numbers and a couple of special registers. Each special register has only one pointer, which points to a natural number that is intuitively the content of the register. Every natural number n has only a pointer to n + 1, a pointer to another natural number that is intuitively the content of register n, and a pointer to every special register. The notion of pointer machine seems an improvement over the notion of Kolmogorov machine to us (and of course the notion of Kolmogorov machine was an improvement over the notion of Turing machine). And the notion of pointer machine proved to be useful in the analysis of the time complexity of algorithms. In that sense it was successful. It is less clear how much of an advance all these developments were from the point of view of absolute definitions. The pointer machine reflected the computer architecture of real computers of the time. (The modern tendency is to make computers with several CPUs, central processing units, that run asynchronously.)
Algorithms: A Quest for Absolute Definitions
37
Remark. In an influential 1979 article, Tarjan used the term “pointer machine” in a wider sense [Tarjan 1979]. This wider notion of pointer machines has become better known in computer science than the older notion.
4. Related Issues We mention a few issues touched upon in the talk that was the precursor of this paper. It is beyond the scope of this paper to develop these issues in any depth. 4.1. Physics and Computations
What kind of computations can be carried out in our physical universe? We are not talking about what functions are computable. The question is what algorithms are physically executable. We don’t expect a definitive answer soon, if ever. It is important, however, to put things into perspective. Many computer science concerns are above the level of physics. It would be great if quantum physics allowed us to factor numbers fast, but this probably will not greatly influence programming language theory. Here are some interesting references. • Robin Gandy attempted to derive Turing’s thesis from a number of “principles for mechanisms” [Gandy 1980]. Wilfried Sieg continues this line of research [Sieg 1999]. • David Deutsch [1985] designed a universal quantum computer that is supposed to be able to simulate the behavior of any finite physical system. Gandy’s approach is criticized in [Deutsch, Ekert and Lupaccini 2000, pp. 280–281]. Deutsch’s approach and quantum computers in general are criticized in [Levin 2003b (sec. 2)]. • Charles H. Bennett and Rolf Landauer pose in [1985] important problems related to the fundamental physical limits of computation. • Marian Boykan Pour–El and Ian Richards [1989] investigate the extent to which computability is preserved by fundamental constructions of analysis, such as those used in classical and quantum theories of physics.
38
Andreas Blass, Yuri Gurevich
4.2. Polynomial Time Turing’s Thesis
There are several versions of the polynomial time Turing’s thesis discussed in theoretical computer science. For simplicity, we restrict attention to decision problems. To justify the interest in the class P of problems solvable in polynomial time by a Turing machine, it is often declared that a problem is feasible (=practically solvable) if and only if it is in P. Complexity theory tells us that there are P problems unsolvable in time n1000 . A more reasonable thesis is that a “natural problem” is feasible if and only if it is in P. At the 1991 Annual Meeting of the Association of Symbolic Logic, Steve Cook argued in favor of that thesis, and the second author argued against it. Some of the arguments can be found in [Cook 1991] and [Gurevich 1993] respectively. A related but different version of the polynomial time Turing thesis is that a problem is in P if it can be solved in polynomial time at all, by any means. The presumed reason is that any polynomial time computation can be polynomial time simulated by a Turing machine (so that the computation time of the Turing machine is bounded by a polynomial of the computation time of the given computing device). Indeed, most “reasonable” computation models are known to be polytime equivalent to the Turing machine. “As to the objection that Turing machines predated all of these models,” says Steve Cook [2003], “I would reply that models based on RAMs are inspired by real computers, rather than Turing machines.” Quantum computer models can factor arbitrary integers in polynomial time [Shor 1997], and it is not believed that quantum computers can be polynomial time simulated by Turing machines. For the believers in quantum computers, it is more natural to speak about probabilistic Turing machines. We quote from [Bernstein and Vazirani 1997]. Just as the theory of computability has its foundations in the Church–Turing thesis, computational complexity theory rests upon a modern strengthening of this thesis, which asserts that any “reasonable” model of computation can be efficiently simulated on a probabilistic Turing Machine (an efficient simulation is one whose running time is bounded by some polynomial in the running time of the simulated machine). Here, we take reasonable to mean in principle physically realizable.
Algorithms: A Quest for Absolute Definitions
39
Turing’s analysis does not automatically justify any of these new theses. (Nor does it justify, for example, the thesis that polynomial time interactive Turing machines capture polynomial time interactive algorithms.) Can any of the theses discussed above be derived from first principles? One can analyze Turing’s original justification of his thesis and see whether all the reductions used by Turing are polynomial time reductions. But one has to worry also about algorithms not covered directly by Turing’s analysis. 4.3. Recursion
According to Yiannis Moschovakis, an algorithm is a “recursor”, a monotone operator over partial functions whose least fixed point includes (as one component) the function that the algorithm computes [Moschovakis 2001]. He proposes a particular language for defining recursors. A definition may use various givens: functions or recursors. Moschovakis gives few examples and they are all small ones. The approach does not seem to scale to algorithms interacting with an unknown environment. A posteriori the approach applies to well understood classes of algorithms. Consider for example non-interactive sequential or parallel abstract state machines (ASMs) discussed below in Sections 5 and 6. Such an ASM has a program for doing a single step. There is an implicit iteration loop: repeat the step until, if ever, the computation terminates. Consider an operator that, given an initial segment of a computation, augments it by another step (unless the computation has terminated). This operator can be seen as a recursor. Of course the recursion advocates may not like such a recursor because they prefer stateless ways. We are not aware of any way to derive from first principles the thesis that algorithms are recursors.
5. Formalization of Sequential Algorithms Is it possible to capture (=formalize) sequential algorithms on their natural levels of abstraction? Furthermore, is there one machine model that captures all sequential algorithms on their natural levels of abstraction? According to [Gurevich 2000], the answer to both questions is yes. We outline the approach of [Gurevich 2000] and put forward a slight but useful generalization.
40
Andreas Blass, Yuri Gurevich
As a running example of a sequential algorithm, we use a version Euc of Euclid’s algorithm that, given two natural numbers, computes their greatest common divisor d. 1. 2.
Set a = Input1, b = Input2. If a = 0 then set d = b and go to 1 else set a, b = b mod a, a respectively and go to 2.
Initially Euc waits for the user to provide natural numbers Input1 and Input2. The assignment on the last line is simultaneous. If, for instance, a = 6 and b = 9 in the current state then a = 3 and b = 6 in the next state. 5.1. Sequential Time Postulate
A sequential algorithm can be viewed as a finite or infinite state automaton. Postulate (Sequential Time). A sequential algorithm A is associated with • a nonempty set S(A) whose members are called states of A, • a nonempty1 subset I(A) of S(A) whose members are called initial states of A, and • a map τA : S(A) −→ S(A) called the one-step transformation of A. The postulate ignores final states [Gurevich 2000 (sec. 3.3.2)]. We are interested in runs where the steps of the algorithm are interleaved with the steps of the environment. A step of the environment consists in changing the current state of the algorithm to any other state. In particular it can change the “final” state to a non-final state. To make the one-step transformation total, assume that the algorithm performs an idle step in the “final” states. Clearly Euc is a sequential time algorithm. The environment of Euc includes the user who provides input numbers (and is expected to take note of the answers). 1
In [Gurevich 2000], I(A) and S(A) were not required to be nonempty. But an algorithm without an initial state couldn’t be run, so is it really an algorithm? We therefore add “nonempty” to the postulate here.
Algorithms: A Quest for Absolute Definitions
41
This sequential-time postulate allows us to define a fine notion of behavioral equivalence. Definition. Two sequential time algorithms are behaviorally equivalent if they have the same states, the same initial states and the same one-step transformation. The behavioral equivalence is too fine for many purposes but it is necessary for the following. Corollary. If algorithms A and B are behaviorally equivalent then B step-for-step simulates A in any environment. The step-for-step character of simulation is important. Consider a typical distributed system. The agents are sequential-time but the system is not. The system guarantees the atomicity of any single step of any agent but not of a sequence of agent’s steps. Let A be the algorithm executed by one of the agents. If the simulating algorithm B makes two steps to simulate one step of A then another agent can sneak in between the two steps of B and spoil the simulation. 5.2. Small-Step Algorithms
An object that satisfies the sequential-time postulate doesn’t have to be an algorithm. In addition we should require that there is a program for the one-step transformation. This requirement is hard to capture directly. It will follow from other requirements in the approach of [Gurevich 2000]. Further, a sequential-time algorithm is not necessarily a sequential algorithm. For example, the algorithm P in subsection 2.3 is not sequential. The property that distinguishes sequential algorithms among all sequential-time algorithms is that the steps are of bounded complexity. The algorithms analyzed by Turing [1937] were sequential: The behavior of the computer at any moment is determined by the symbols which he is observing and his ‘state of mind’ at that moment. We may suppose that there is a bound B to the number of symbols or squares which the computer can observe at one moment. If he wishes to observe more, he must use
42
Andreas Blass, Yuri Gurevich successive observations. We will also suppose that the number of states of mind which need be taken into account is finite.
The algorithms analyzed by Kolmogorov in [1953] are also sequential: “An algorithmic process is divided into separate steps of limited complexity.” These days there is a tendency to use the term “sequential algorithm” in the wider sense of the contrary of the notion of a distributed algorithm. That is, “sequential” often means what we have called “sequential-time”. So we use the term “small-step algorithm” as a synonym for the term “sequential algorithms” in its traditional meaning. 5.3. Abstract State Postulate
How does one capture the restriction that the steps of a smallstep algorithms are of bounded complexity? How does one measure the complexity of a single-step computation? Actually we prefer to think of bounded work instead of bounded complexity. The work that a small-step algorithm performs at any single step is bounded, and the bound depends only on the algorithm and does not depend on input. This complexity-to-work reformulation does not make the problem easier of course. How does one measure the work that the algorithm does during a step? The algorithm-as-a-state-automaton point of view is too simplistic to address the problem. We need to know more about what the states are. Fortunately this question can be answered. Postulate (Abstract State). • States of a sequential algorithm A are first-order structures. • All states of A have the same vocabulary. • The one-step transformation τA does not change the base set of any state. • S(A) and I(A) are closed under isomorphisms. Further, any isomorphism from a state X onto a state Y is also an isomorphism from τA (X) onto τA (Y ). The notion of first-order structure is well-known in mathematical logic [Shoenfield 1967]. We use the following conventions:
Algorithms: A Quest for Absolute Definitions
43
• Every vocabulary contains the following logic symbols: the equality sign, the nullary relation symbols true and false, and the usual Boolean connectives. • Every vocabulary contains the nullary function symbol undef. • Some vocabulary symbols may be marked static. The remaining symbols are marked external or dynamic or both.2 All logic symbols are static. • In every structure, true is distinct from false and undef, the equality sign has its standard meaning, and the Boolean connectives have their standard meanings on Boolean arguments. The symbols true and false allow us to treat relation symbols as special function symbols. The symbol undef allows us to deal with partial functions; recall that first-order structures have only total functions. The static functions (that is the interpretations of the static function symbols) do not change during the computation. The algorithm can change only the dynamic functions. The environment can change only the external functions. It is easy to see that higher-order structures are also first-order structures (though higher-order logics are richer than first-order logic). We refer to [Gurevich 2000] for justification of the abstractstate postulate. Let us just note that the experience of the ASM community confirms that first-order structures suffice to describe any static mathematical situation [ASM]. It is often said that a state is given by the values of its variables. We take this literally. Any state of a sequential algorithm should be uniquely determined (in the space of all states of the algorithm) by the interpretations of the dynamic and external function symbols. What is the vocabulary (of the states) of Euc? In addition to the logic symbols, it contains the nullary function symbols 0, a, b, d, Input1, Input2 and the binary function symbol mod. But what about labels 1 and 2? Euc has an implicit program counter. We have 2
This useful classification, used in [Gurevich 1991; 1995] and in ASM applications, was omitted in [Gurevich 2000] because it wasn’t necessary there. The omission allowed the following pathology in the case when there is a finite bound on the size of the states of A. The one-step transformation may change the values of true and false and modify appropriately the interpretations of the equality relation and the Boolean connectives.
44
Andreas Blass, Yuri Gurevich
some freedom in making it explicit. One possibility is to introduce a Boolean variable, that is a nullary relational symbol, initialize that takes value true exactly in those states where Euc consumes inputs. The only dynamic symbols are a, b, d, initialize, and the only external symbols are Input1, Input2. 5.4. Bounded Exploration Postulate and the Definition of Sequential Algorithms
Let A be an algorithm of vocabulary Υ and let X be a state of A. A location ` of X is given by a dynamic function symbol f in Υ of some arity j and a j-tuple a ¯ = (a1 , ..., aj ) of elements of X. The content of ` is the value f (¯ a). An (atomic) update of X is given by a location ` and an element b of X and denoted simply (`, b). It is the action of replacing the current content a of ` with b. By the abstract-state postulate, the one-step transformation preserves the set of locations, so the state X and the state X 0 = τA (X) have the same locations. It follows that X 0 is obtained from X by executing the following set of updates: ∆(X) = {(`, b) : b = ContentX 0 (`) 6= ContentX (`)} If A is Euc and X is the state where a = 6 and b = 9 then ∆(X) = {(a, 3), (b, 6)}. If Y is a state of A where a = b = 3 then ∆(Y ) = {(a, 0)}. Now we are ready to formulate the final postulate. Let X, Y be arbitrary states of the algorithm A. Postulate (Bounded Exploration). There exists a finite set T of terms in the vocabulary of A such that ∆(X) = ∆(Y ) whenever every term t ∈ T has the same value in X and Y . In the case of Euc, the term set {true, false, 0, a, b, d, b mod a, initialize} is a bounded-exploration witness. Definition. A sequential algorithm is an object A that satisfies the sequential-time, abstract-state and bounded-exploration postulates.
Algorithms: A Quest for Absolute Definitions
45
5.5. Sequential ASMs and the Characterization Theorem
The notion of a sequential ASM rule of a vocabulary Υ is defined by induction. In the following definition, all function symbols (including relation symbols) are in Υ and all terms are first-order terms. Definition. If f is a j-ary dynamic function symbol and t0 , ..., tj are first-order terms then the following is a rule: f (t1 , ..., tj ) := t0 . Let ϕ be a Boolean-valued term, that is ϕ has the form f (t1 , ..., tj ) where f is a relation symbol. If P1 , P2 are rules then so is if ϕ then P1 else P2 . If P1 , P2 are rules then so is do in-parallel P1 P2 The semantics of rules is pretty obvious but we have to decide what happens if the constituents of the do in-parallel rule produce contradictory updates. In that case the execution is aborted. For a more formal definition, we refer the reader to [Gurevich 2000]. Syntactically, a sequential ASM program is just a rule; but the rule determines only single steps of the program and is supposed to be iterated. Every sequential ASM program P gives rise to a map τP (X) = Y where X, Y are first-order Υ-structures. Definition. A sequential ASM B of vocabulary Υ is given by a sequential ASM program Π of vocabulary Υ, a nonempty set S(B) of Υ-structures closed under isomorphisms and under the map τΠ , a nonempty subset I(B) ⊆ S(B) that is closed under isomorphisms, and the map τB which is the restriction of τΠ to S(B). Now we are ready to formulate the theorem of this section. Theorem [*] (ASM Characterization of Small-Step Algorithms). For every sequential algorithm A there is a sequential abstract state machine B behaviorally equivalent to A. In particular, B simulates A step for step.
46
Andreas Blass, Yuri Gurevich
If A is our old friend Euc, then the program of the desired ASM B could be this. if initialize then do in-parallel a := Input1 b := Input2 initialize := false else if a = 0 then do in-parallel d := b initialize := true else do in-parallel a := b mod a b := a We have discussed only deterministic sequential algorithms. Nondeterminism implicitly appeals to the environment to make the choices that cannot be algorithmically prescribed [Gurevich 2000]. Once nondeterminism is available, classical ruler-and-compass constructions can be regarded as nondeterministic ASMs operating on a suitable structure of geometric objects. A critical examination of [Gurevich 2000] is found in [Reisig 2003].
6. Formalization of Parallel Algorithms Encouraged by the success in capturing the notion of sequential algorithms in [Gurevich 2000], we “attacked” parallel algorithms in [Blass and Gurevich 2003]. The attack succeeded. We gave an axiomatic definition of parallel algorithms and checked that the known (to us) parallel algorithm models satisfy the axioms. We defined precisely a version of parallel abstract state machines, a variant of the notion of parallel ASMs from [Gurevich 1995], and we checked that our parallel ASMs satisfy the definitions of parallel algorithms. And we proved the characterization theorem for parallel algorithms: every parallel algorithm is behaviorally equivalent to a parallel ASM. The scope of this paper does not allow us to spell out the axiomatization of parallel ASMs, which is more involved than the ax-
Algorithms: A Quest for Absolute Definitions
47
iomatization of sequential ASMs described in the previous section. We just explain what kind of parallelism we have in mind, say a few words about the axioms, say a few words about the parallel ASMs, and formulate the characterization theorem. The interested reader is invited to read—critically!—the paper [Blass and Gurevich 2003]. More scrutiny of that paper is highly desired. 6.1. What Parallel Algorithms?
The term “parallel algorithm” is used for a number of different notions in the literature. We have in mind sequential-time algorithms that can exhibit unbounded parallelism but only bounded sequentiality within a single step. Bounded sequentiality means that there is an a priori bound on the lengths of sequences of events within any one step of the algorithm that must occur in a specified order. To distinguish this notion of parallel algorithms, we call such parallel algorithms wide-step. Intuitively the width is the amount of parallelism. The “step” in “wide-step” alludes to sequential time. Remark. Wide-step algorithms are also bounded-depth where the depth is intuitively the amount of sequentiality in a single step; this gives rise to a possible alternative name shallow-step algorithms for wide-step algorithms. Note that the name “parallel” emphasizes the potential rather than restrictions; in the same spirit, we choose “wide-step” over “shallow-step”. Here is an example of a wide-step algorithm that, given a directed graph G = (V, E), marks the well-founded part of G. Initially no vertex is marked. 1. For every vertex x do the following. If every vertex y with an edge to x is marked then mark x as well. 2. Repeat step 1 until no new vertices are marked. 6.2. A few Words on the Axioms for Wide-Step Algorithms
Adapt the sequential-time postulate, the definition of behavioral equivalence and the abstract-state postulate to parallel algorithms simply by replacing “sequential” with “parallel”. The bounded-
48
Andreas Blass, Yuri Gurevich
exploration postulate, on the other hand, specifically describes sequential algorithms. The work that a parallel algorithm performs within a single step can be unbounded. We must drop the boundedexploration postulate and assume, in its place, an axiom or axioms specifically designed for parallelism. A key observation is that a parallel computation consists of a number of processes running (not surprisingly) in parallel. The constituent processes can be parallel as well. But if we analyze the computation far enough then we arrive at processes, which we call proclets, that satisfy the bounded-exploration postulate. Several postulates describe how the proclets communicate with each other and how they produce updates. And there is a postulate requiring some bound d (depending only on the algorithm) for the amount of sequentiality in the program. The length of any sequence of events that must occur in a specified order within any one step of the algorithm is at most d. There are several computation models for wide-step algorithms in the literature. The two most known models are Boolean circuits and PRAMs [Karp and Ramachandran 1990]. (PRAM stands for “Parallel Random Access Machines”.) These two models and some other models of wide-step algorithms that occurred to us or to the referees are shown to satisfy the wide-step postulates in [Blass and Gurevich 2003]. 6.3. Wide-Step Abstract State Machines
Parallel abstract state machines were defined in [Gurevich 1995]. Various semantical issues were elaborated later in [Gurevich 1997]. A simple version of parallel ASMs was explored in [Blass, Gurevich and Shelah 1999]; these ASMs can be called BGS ASMs. We describe, up to an isomorphism, an arbitrary state X of a BGS ASM. X is closed under finite sets (every finite set of elements of X constitutes another element of X) and is equipped with the usual set-theoretic operations. Thus X is infinite but a finite part of X contains all the essential information. The number of atoms of X, that is elements that are not sets, is finite, and there is a nullary function symbol Atoms interpreted as the set of all atoms. It is easy to write a BGS ASM program that simulates the example parallel algorithm above.
Algorithms: A Quest for Absolute Definitions
49
forall x ∈ Atoms if {y: y ∈ Atoms: E(y,x) ∧¬ (M(y))} = ∅ then M(x) := true Note that x and y are mathematical variables like the variables of first-order logic. They are not programming variables and cannot be assigned values. In comparison to the case of sequential ASMs, there are two main new features in the syntax of BGS ASMs: • set-comprehension terms {t(x) : x ∈ r : ϕ(x)}, and • forall rules. In [Blass and Gurevich 2000], we introduced the notion of a background of an ASM. BGS ASMs have a set background. The specification language AsmL, mentioned in the introduction, has a rich background that includes a set background, a sequence background, a map background, etc. The background that naturally arises in the analysis of wide-step algorithms is a multiset background. That is the background used in [Blass and Gurevich 2003]. 6.4. The Wide-Step Characterization Theorem
Theorem [**] (ASM Characterization of Wide-Step Algorithms). For every parallel algorithm A there is a parallel abstract state machine B behaviorally equivalent to A. In particular, B simulates A step for step. Thus, Boolean circuits and PRAMs can be seen as special widestep ASMs (which does not make them any less valuable). The existing quantum computer models satisfy our postulates as well [Gr¨adel and Nowack 2003] assuming that the environment provides random bits when needed. The corresponding wide-step ASMs need physical quantum-computer implementation for efficient execution.
7. Toward Formalization of Distributed Algorithms Distributed abstract state machines were defined in [Gurevich 1995]. They are extensively used by the ASM community [ASM] but the problem of capturing distributed algorithms is open. Here we concentrate on one aspect of this important problem: interaction
50
Andreas Blass, Yuri Gurevich
between a sequential-time agent and the rest of the system as seen by the agent. One may have an impression that this aspect has been covered because all along we studied runs where steps of the algorithm are interleaved with steps made by the environment. But this interleaving mode is not general enough. If we assume that each agent’s steps are atomic, then interleaving mode seems adequate. But a more detailed analysis reveals that even in this case a slight modification is needed. See Subsection 7.1. But in fact an agent’s steps need not be atomic because agents can interact with their environments not only in the inter-step fashion but also in the intra-step fashion. It is common in the AsmL experience that, during a single step, one agent calls on other agents, receives “callbacks”, calls again, etc. It is much harder to generalize the two characterization theorems to intra-step interaction. 7.1. Trivial Updates in Distributed Computation
Consider a small-step abstract state machine A. In Section 5, we restricted attention to runs where steps of A are interleaved with steps of the environment. Now turn attention to distributed computing where the agents do not necessarily take turns to compute. Assume that A is the algorithm executed by one of the agents. Recall that an update of a location ` of the current state of A is the action of replacing the current content a of ` with some content b. Call the update trivial if a = b. In Section 5 we could ignore trivial updates. But we have to take them into account now. A trivial update of ` matters in a distributed situation when the location ` is shared: typically only one agent is allowed to write into a location at any given time, and so even a trivial update by one agent would prevent other agents from writing to the same location at the same time. Recall that ∆(X) is the set of nontrivial updates computed by the algorithm A at X during one step. Let ∆+ (X) be the set of all updates, trivial or not, computed by A at X during the one step. It seems obvious how to generalize Section 5 in order to take care of trivial updates: just strengthen the bounded-exploration postulate by replacing ∆ with ∆+ . There is, however, a little problem. Nothing in the current definition of a small-step algorithm A guarantees that there is a ∆+ (X) map associated with it. (∆(X) is definable in terms of X and τA (X).) That is why we started this subsection by
Algorithms: A Quest for Absolute Definitions
51
assuming that A is an ASM. Euc also has a ∆+ (X) map: if X is the state where a = 6 and b = 9 then ∆+ (X) = {(a, 3), (b, 6)}, and if Y is a state of A where a = b = 3 then ∆(Y ) = {(a, 0)} and ∆+ (Y ) = {(a, 0), (b, 3)}. To generalize Section 5 in order to take into account trivial updates, do the following. • Strengthen the abstract-state postulate by assuming that there is a mapping ∆+ associating a set of updates with every state X of the given algorithm A in such a way that the set of nontrivial updates in ∆+ (X) is exactly ∆(X). • Strengthen the definition of behavioral equivalence of sequential algorithms by requiring that the two algorithms produce the same ∆+ (X) at every state X. • Strengthen the bounded exploration postulate by replacing ∆ with ∆+ . It is easy to check that Theorem [*], the small-step characterization theorem, remains valid. Remark. In a similar way, we refine the definition of wide-step algorithms and strengthen Theorem [**], the wide-step characterization theorem. Remark. Another generalization of Section 5, to algorithms with the output command, is described in [Blass and Gurevich 2003]. The two generalizations of Section 5 are orthogonal and can be combined. The output generalization applies to wide-step algorithms as well. 7.2. Intra-Step Interacting Algorithms
During the execution of a single step, an algorithm may call on its environment to provide various data and services. The AsmL experience showed the importance of intra-step communication between an algorithm and its environment. AsmL programs routinely call on outside components to perform various jobs. The idea of such intra-step interaction between an algorithm and its environment is not new to the ASM literature; external functions appear already in the tutorial [Gurevich 1991]. In simple cases, one
52
Andreas Blass, Yuri Gurevich
can pretend that intra-step interaction reduces to inter-step interaction, that the environment prepares in advance the appropriate values of the external functions. In general, even if such a reduction is possible, it requires an omniscient environment and is utterly impractical. The current authors are preparing a series of articles extending Theorems [*] and [**] to intra-step interacting algorithms. In either case, this involves • axiomatic definitions of intra-step interacting algorithms, • precise definitions of intra-step interacting abstract state machines, • the appropriate extension of the notion of behavioral equivalence, • verification that the ASMs satisfy the definitions of algorithms, • a proof that every intra-step interacting algorithm is behaviorally equivalent to an intra-step interacting ASM.
Acknowledgment We thank Steve Cook, John Dawson, Martin Davis, Sol Feferman, Leonid Levin, Victor Pambuccian and Vladimir Uspensky for helping us with references. We thank John Dawson, Sol Feferman, Erich Gr¨ adel, Leonid Levin, Victor Pambuccian and Dean Rosenzweig for commenting, on very short notice (because of a tight deadline), on the draft of this paper.
References ASM, ASM Michigan Webpage, , maintained by James K. Huggins. AsmL, The AsmL Webpage, . Bennett, C.H. and Landauer, R. [1985] “Fundamental Physical Limits of Computation”, Scientific American 253(1) (July), 48–56.
Algorithms: A Quest for Absolute Definitions
53
Bernstein, E. and Vazirani, U. [1997], “Quantum Complexity Theory”, SIAM Journal on Computing 26, 1411–1473. Blass, A. and Gurevich, Y. [1997] “The Linear Time Hierarchy Theorem for Abstract State Machines and RAMs”, Springer Journal of Universal Computer Science 3(4), 247–278. Blass, A. and Gurevich, Y. [2000] “Background, Reserve, and Gandy Machines”, Springer Lecture Notes in Computer Science, 1862, pp. 1–17. Blass, A. and Gurevich, Y. [2003], “Abstract State Machines Capture Parallel Algorithms”, ACM Transactions on Computational Logic 4(4), 578–651. Blass, A., Gurevich, Y., and Shelah, S. [1999], “Choiceless Polynomial Time”, Annals of Pure and Applied Logic 100, 141–187. B¨orger, E. and St¨ ark, R. [2003], Abstract State Machines, Springer. Church, A. [1936], “An Unsolvable Problem of Elementary Number Theory”, American Journal of Mathematics 58, 345–363. Reprinted in [Davis 1965, pp. 88–107]. Cook, S.A. [1991], “Computational Complexity of Higher Type Functions”, Proceedings of 1990 International Congress of Mathematicians, Kyoto, Japan, Springer-Verlag, pp. 55–69. Cook, S.A. [2003], Private communication. Davis, M. [1965], “The Undecidable”, Raven Press. Davis, M. [1982] “Why G¨ odel Didn’t Have Church’s Thesis”, Information and Control 54, 3–24. Deutsch, D. [1985], “Quantum Theory, the Church–Turing Principle and the Universal Quantum Computer”, Proceedings of the Royal Society, A 400, 97–117. Deutsch, D., Ekert, A., and Lupaccini, R. [2000], “Machines, Logic and Quantum Physics”, The Bulletin of Symbolic Logic 6, 265–283. Gandy, R.O. [1980], “Church’s Thesis and Principles for Mechanisms”, in The Kleene Symposium, (J. Barwise, H.J. Keisler, and K. Kunen eds.), North-Holland, pp. 123–148. Gandy, R.O. [1988], “The Confluence of Ideas in 1936”, in The Universal Turing Machine: A Half-Century Story, (R. Herken ed.), Oxford University Press, pp. 55–111.
54
Andreas Blass, Yuri Gurevich
G¨odel, K. [1990], “Collected Works”, vol. II, Oxford University Press. Gr¨adel, E. and Nowack, A. [2003], “Quantum Computing and Abstract State Machines”, Springer Lecture Notes in Computer Science, 2589, pp. 309–323. Grigoriev, D. [1980], “Kolmogorov Algorithms are Stronger Than Turing Machines”, Journal of Soviet Mathematics 14(5), 1445–1450. Gurevich, Y. [1988], “Kolmogorov Machines and Related Issues”, in Current Trends in Theoretical Computer Science, (G. Rozenberg and A. Salomaa eds.), World Scientific, 1993, pp. 225–234; originally in Bull. EATCS 35(1988). Gurevich, Y. [1991], “Evolving Algebras: An Attempt to Discover Semantics”, in Current Trends in Theoretical Computer Science, (G. Rozenberg and A. Salomaa eds.), World Scientific, 1993, pp. 266–292; originally in Bull. EATCS 43(1991). Gurevich, Y. [1993], “Feasible Functions”, London Mathematical Society Newsletter 206 (June), 6–7. Gurevich, Y. [1995], “Evolving Algebra 1993: Lipari Guide”, in Specification and Validation Methods, (E. B¨orger ed.), Oxford University Press, pp. 9–36. Gurevich, Y. [1997], “May 1997 Draft of the ASM Guide”, Technical Report CSE–TR–336–97, EECS Department, University of Michigan. Gurevich, Y. [2000], “For Every Sequential Algorithm There Is an Equivalent Sequential Abstract State Machine”, ACM Transactions on Computational Logic 1(1), 77–111. Gurevich, Y. and Huggins, J.K. [1993], “The Semantics of the C Programming Language”, Springer Lecture Notes in Computer Science, 702, 274–308. Gurevich, Y. and Shelah, S. [1989], “Nearly Linear Time”, Springer Lecture Notes in Computer Science, 363, 108–118. Gurevich, Y. and Spielmann, M. [1997], “Recursive Abstract State Machines”, Springer Journal of Universal Computer Science 3(4), 233–246. Gurevich, Y. [2005], “Interactive Algorithms 2005”, Springer Lecture Notes in Computer Science, 3618, pp. 26–38,
Algorithms: A Quest for Absolute Definitions
55
(J. Jedrzejowicz and A. Szepietowski eds.). An extended-with-an-appendix version is to be published in Interactive Computation: The New Paradigm, (D. Goldin, S. Smolka, and P. Wegner eds.), Springer-Verlag. Karatsuba, A. and Ofman, Y. [1963], “Multiplication of Multidigit Numbers on Automata”, Soviet Physics Doklady (English translation) 7(7), 595–596. Karp, R.M. and Ramachandran, V. [1990], “Parallel Algorithms for Shared-Memory Machines”, in Handbook of Theoretical Computer Science, vol. A: Algorithms and Complexity, (J. van Leeuwen ed.), Elsevier and MIT Press, pp. 869–941. Kijne, D. [1956], “Plane Construction Field Theory”, Van Gorcum, Assen. Kleene, S.C. [1938], “On Notation for Ordinal Numbers”, Journal of Symbolic Logic 3, 150–155. Kleene, S.C. [1981], “Origins of Recursive Function Theory”, Annals of the History of Computing 3(1) (January), 52–67. Kleene, S.C. [1988], “Turing’s Analysis of Computability, and Major Applications of It”, in The Universal Turing Machine: A Half-Century Story, (R. Herken ed.), Oxford University Press, pp. 17–54. Knuth, D.E. [1968] The Art of Computer Programming, vol. 1: Fundamental Algorithms, Addison-Wesley, Reading, MA. Knuth, D.E. [1981], The Art of Computer Programming, vol. 2: Seminumerical Algorithms, Addison-Wesley, Reading, MA. Kolmogorov, A.N. [1953], “On the Concept of Algorithm”, Uspekhi Mat. Nauk 8(4), 175–176, Russian. An English translation is found in [Uspensky and Semenov 1993, pp. 18–19]. Kolmogorov, A.N. and Uspensky, V.A. [1958], “On the Definition of Algorithm”, Uspekhi Mat. Nauk 13(4), 3–28, Russian. Translated into English in AMS Translations 29(1963), 217–245. Levin, L.A. [1973], “Universal Search Problems”, Problemy Peredachi Informatsii 9(3), 265–266, Russian. The journal is translated into English under the name Problems of Information Transmission. Levin, L.A. [2003a], Private communication.
56
Andreas Blass, Yuri Gurevich
Levin, L.A. [2003b], “The Tale of One-Way Functions”, Problemy Peredachi Informatsii 39(1), 92–103, Russian. The journal is translated into English under the name Problems of Information Transmission. The English version is available online at . Moschovakis, Y.N. [2001], “What Is an Algorithm?” in Mathematics Unlimited, (B. Engquist and W. Schmid eds.), Springer-Verlag, pp. 919–936. Pour–El, M.B. and Richards, I. [1989], “Computability in Analysis and Physics”, (Perspectives in Mathematical Logic), Springer-Verlag. Reisig, W. [2003], “On Gurevich’s Theorem on Sequential Algorithms”, Acta Informatica 39, 273–305. Sch¨onhage, A. [1970], “Universelle Turing Speicherung”, in Automatentheorie und Formale Sprachen, (J. D¨orr and G. Hotz eds.), Bibliogr. Institut, Mannheim, pp. 369–383. In German. Sch¨onhage, A. [1980], “Storage Modification Machines”, SIAM Journal on Computing 9, 490–508. Shoenfield, J.R. [1967], “Mathematical Logic”, Addison-Wesley. Shor, P.W. [1997], “Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer”, SIAM Journal on Computing 26(5), 1484–1509. Sieg, W. [1997], “Step by Recursive Step: Church’s Analysis of Effective Calculability”, The Bulletin of Symbolic Logic 3(2), 154–180. Sieg, W. [1999], “An Abstract Model for Parallel Computations: Gandy’s Thesis”, The Monist 82(1), 150–164. Tarjan, R.E. [1979], “A Class of Algorithms Which Require Nonlinear Time to Maintain Disjoint Sets”, Journal of Computer and System Sciences 18, 110–127. Turing, A.M. [1937], “On Computable Numbers, With an Application to the Entscheidungsproblem”, Proceedings of London Mathematical Society, Series 2 42(1936–1937), 230–265; correction, ibidem 43, pp. 544–546. Reprinted in [Davis 1965, pp. 155–222] and available online at .
Algorithms: A Quest for Absolute Definitions
Uspensky, V.A. [1992], “Kolmogorov and Mathematical Logic”, Journal of Symbolic Logic 57(2), 385–412. Uspensky, V.A. and Semenov, A.L. [1993], “Algorithms: Main Ideas and Applications”, Kluwer.
57
Douglas S. Bridges∗
Church’s Thesis and Bishop’s Constructivism The twin pillars of constructivism in the first half of the twentieth century were Brouwer’s intuitionism (INT, dating from his Amsterdam Ph.D. thesis [Brouwer 1907]) and Markov’s recursive constructive mathematics (RUSS) [Markov]. The first of these introduced to mathematics Brouwer’s continuity principle—a feature that renders intuitionistic and classical mathematics (CLASS) superficially contradictory but, in reality, barely comparable—and his Fan Theorem, which predated the appearance of its contrapositive, K¨onig’s Lemma, in classical mathematics. On the other hand, Markov’s insight was that computability theory, based on any of the equivalent models of computation created in the 1930s and drawn together under the banner of Church’s Thesis, would provide a practicable framework for computable analysis. In fact, Markov did not use any of the then existing models of computation, but created a new notion of algorithm that now bears his name. The development of analysis using Markov algorithms and intuitionistic logic (abstracted by Heyting [Heyting] from the practice of intuitionistic mathematics) had some remarkable successes. In particular, it showed that many aspects of analysis that are taken for granted in the classical development cannot be applied in a constructive one. Examples of this are: • A positive-valued uniformly continuous mapping f : [0, 1] → R whose infimum is 0. ∗
D.S. Bridges, Department of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand.
Church’s Thesis and Bishop’s Constructivism
59
• A sequence (In )n>1 of bounded open intervals such that R = [ N P |In | < 1 for each N. In and n>1
n=1
• A compact subset of [0, 1] that is not Lebesgue measurable. Of course, we have to be careful when interpreting these three statements. Let Rc denote the computable (recursive) real line, which is countable but not recursively so. The first of the three bulleted statements really says that there exists a positive-valued recursive function f from [0, 1] ∩ Rc to Rc that is recursively uniformly continuous and has infimum 0. Stated thus, the result is classically correct, even if the rather casual form of its statement on the right of the bullet appears to be classically false. The problem with developing analysis with Church’s Thesis a ` la Markov is that framing the proofs in terms of the language of Markov algorithms or, equivalently, recursive functions, does not come easily to the working analyst; nor does reading the resulting mathematics— see [Kushner]. In contrast, Errett Bishop’s approach to constructive analysis, which uses intuitionistic logic and first came to the attention of the mathematical community on the publication of his monograph [Bishop 1967], enables one to develop analysis like a normal analyst, produces proofs that read like analysis rather than logic, and always gives results that are consistent with classical analysis, intuitionism, and Markov-style mathematics. In other words, each of CLASS, INT and RUSS can be regarded as a model of Bishop’s constructive mathematics (BISH).1 A consequence of this is that if, for a certain classically valid proposition P , we can produce in RUSS a “recursive counterexample” showing that P is recursively false, then we know that P cannot be proved in BISH. Metakides et al. [Metakides] have just such an example in the case of the Hahn–Banach theorem: they produce a recursively presented Banach space B, a recursively presented linear subspace S of B, and a recursive linear functional u on S such that 1
Very recent work of Andrej Bauer [Bauer] shows that, by means of a realizability interpretation, proofs in BISH can be translated into proofs in Weihrauch’s Type Two Effectivity formalism [Weihrauch] for computable analysis using classical logic.
60
Douglas S. Bridges
(i) the kernel of u is located2 in S (whence u has a computable norm) and (ii) there is no recursively continuous linear functional v on B that extends u and has the same norm as u. Thus we know that the version of the Hahn–Banach theorem proved in BISH, in which the extension of a linear functional with located kernel can be carried out with the norm increased by any preassigned positive number, cannot be improved to yield extension of the functional plus preservation of the norm. Sometimes recursive counterexamples lead to independence results. For example, since there is a recursive counterexample to the uniform continuity theorem, Every continuous mapping f : [0, 1] → R is uniformly continuous, which is provable in CLASS, that theorem is independent of—can be neither proved nor disproved within—BISH. In order to disprove the theorem, we have to add to BISH something like Church’s thesis; whereas in order to prove it, we need to work in another model of BISH, such as CLASS or, as it happens, INT [Dummett, p. 87]. Another example of this independence phenomenon occurs with the proposition Every compact subset of R is Lebesgue measurable, which holds in CLASS but is false in RUSS [Bridges–Richman, p. 64]. Given that Church’s thesis has a role to play in BISH, even if that role could be seen3 as the rather negative one of providing examples that show the limitations on what can be proved within BISH unsupplemented by other principles, can we find a simple way of bringing the power of that thesis into BISH without getting involved in the details of recursive function theory or any equivalent apparatus for the discussion of algorithms? We can, using Fred Richman’s approach [Richman] based on the notion of a partial function algorithm: 2
A subset A of a metric space (X, ρ) is located in X if the distance ρ(x, A) = inf {ρ(x, y) : y ∈ A}
exists—is computable—for each x ∈ X. 3 This vision is a rather restricted one, since in BISH plus Church’s thesis one can establish interesting positive results, such as the continuity of all real-valued functions on [0, 1].
Church’s Thesis and Bishop’s Constructivism
61
that is, a mapping A : N × N → N ∪ {⊥} with the property ∀m∀n (A(m, n) 6= ⊥ ⇒ A(m, n + 1) = A(m, n)) . The underlying idea here is that A(m, n) = ⊥ if, on the input m, the program represented by A has executed n+1 steps without stopping; while if it stops and outputs a natural number k after executing the (n + 1)th step, then A(m, i) = k for all i > n + 1. Associated with the partial function algorithm A is a partial function f : N → N with the following properties: B the domain of f is {m ∈ N : ∃n ∈ N (A(m, n) 6= ⊥)} , B for each m in the domain of f , if A(m, n) 6= ⊥, then f (m) = A(m, n). We can then prove, within BISH, that a partial function from N to N has countable domain if and only if equals the partial function associated with some partial function algorithm [Bridges–Richman, p. 50, Lemma (1.1)]. To bring into play the effective power of Church’s thesis, it remains to postulate Richman’s axiom:4 CPF: There is an enumeration of the set of all partial functions from N to N with countable domains. For convenience, we then fix an enumeration φ0 , φ1 , φ2 , . . . of the set of all computable partial functions with countable domains, and a corresponding enumeration D0 , D1 , D2 , . . . where Dn = (Dn,k )k>0 is a sequence of finite subsets of N such that Dn,0 ⊂ Dn,1 ⊂ Dn,2 ⊂ · · · 4
In reading this axiom, one must remember that we are working within BISH; so, for example, the enumeration postulated in CPF has the property that for each n ∈ N we can find the nth term in the list of all partial functions with countable domains.
62
Douglas S. Bridges
and the domain of φn is the set
S
Dn,k . We think of Dn,k as the
k>0
set of those natural numbers that are known to be elements of the domain of φn after the execution of the (n + 1)th step of the program that computes φn . We now regard RUSS formally as BISH plus CPF. To illustrate proof in RUSS, we show that the halting problem is unsolvable. First, we prove that for each total function f : N → {0, 1} there exists n ∈ N such that f (n) = 0 if and only if φn (n) is defined. Indeed, [¡ ¢ f −1 (0) ∩ {0, 1, . . . , n} f −1 (0) = n>0
is countable and therefore the domain of some (computable) partial function φ. It remains to choose n such that φ = φn . It is now easy to show that the halting problem cannot be solved: suppose there exists a total function h : N × N → {0, 1} such that for all m, n ∈ N, φm (n) is defined if and only if h(m, n) = 1; then by taking f (n) = h(n, n) in the foregoing little result, we obtain a contradiction. Within BISH plus CPF we can also prove that such “omniscience principles” as LPO: For each binary sequence (an )n>1 , either an = 0 for all n or else there exists n such that an = 1 and LLPO: For each binary sequence (an )n>1 such that am an = 0 for all distinct m and n, either a2n = 0 for all n or else a2n+1 = 0 for all n are false. We can also prove Specker’s theorem, a strong counterexample to the sequential compactness of [0, 1]: There exists an increasing Specker sequence (sn )n>1 of rational numbers in the Cantor set C with the following property: for each x ∈ R there exist N ∈ N and δ > 0 such that |x − sn | > δ whenever n > N . Specker’s theorem gives rise to several important recursive counterexamples. Here are three. With (sn )n>1 a Specker sequence in C, define a sequence (fn )n>1 of continuous functions on [0, 1] with the following properties:
Church’s Thesis and Bishop’s Constructivism
63
(i) fn (sn ) = 1, fn vanishes outside an interval In ⊂ (0, 1) centred on sn , and fn is linear in each half of In ; (ii) if m 6= n, then Im ∩ In = ∅. Then
∞ P
fn is a bounded continuous mapping on [0, 1] that is not
n=1
uniformly continuous;
∞ P
nfn is a continuous mapping that is un-
n=1
bounded on [0, 1]; and
1−
∞ X
n=1
(1 − 2−n )fn
is a continuous mapping on [0, 1] that is everywhere positive but has infimum 0. One of the most significant theorems provable in BISH plus CPF ˇ is the Ceitin–Kreisel–Lacombe–Shoenfield theorem, which asserts that every reasonable function from R to R is pointwise continuous. (We will not go into detail here about what is “reasonable”.) Note that this theorem requires us to add to BISH not just CPF, but also Markov’s principle: For each binary sequence (an )n>1 with the property that not all the terms are 0, there exists n such that an = 1. This principle embodies an unbounded search: to find n such that an = 1, just keep testing the terms of the sequence until, as is guaranteed by Markov’s principle, such a term turns up. For this and other reasons,5 it is regarded with unease by most practitioners of BISH, and even by many of RUSS. ˇ In view of the Ceitin–Kreisel–Lacombe–Shoenfield theorem, we cannot hope to produce discontinuous functions from R to R within BISH. On the other hand, discontinuous functions certainly exist in CLASS, so we cannot expect to prove, within BISH alone, that every function from R to R is continuous; we need the extra hypotheses of Church’s thesis and Markov’s principle, or else Brouwer’s continuity principle. 5
Another reason is that Markov’s principle is independent of Heyting arithmetic: that is, Peano arithmetic with intuitionistic logic [Bridges–Richman, pp. 137–138].
64
Douglas S. Bridges
In conclusion, let me return to the question of how one approaches constructivity in mathematics. It seems to me that there are two main roads one can take. The first is open to those who believe that classical logic is the right one (the only one?) to use. However, they then have to be very careful about excluding “decisions” that classical logic allows but are clearly non-computational: for example, classical logic enables us to prove that ∀x ∈ R (x = 0 ∨ ¬ (x = 0)) ,
(1)
which is a decision outwith the competence of a physical computer. In order to exclude such non-computational statements, it is then necessary to work within a carefully prescribed algorithmic framework, such as that of recursive function theory. In turn, this requires careful interpretation of the resulting theorems, as indicated at the end of the second paragraph of this article. The alternative road is taken by those who are, perhaps, more flexible in their attitude to logic, and who are prepared, when working on constructive mathematics,6 to adopt intuitionistic logic, which automatically excludes such non-computational statements as (1). On this road we do not need a special algorithmic framework for developing mathematics constructively, so we can work in the usual manner of, say, a classical analyst—except that we have to pay closer attention to the underlying logic. Moreover, we gain the advantage of multiple interpretations, including the recursive one, of our constructive mathematics. Many people seem to believe that it is unbearably hard to give up the classical-logical way of thinking and to use systematically the less familiar intuitionistic logic. However, my experience, shared with others working in constructive mathematics, is that it is not hard to make that logical transition, and that the resulting mathematics is fascinating and rewarding to develop.
6 It is important to emphasise this point: unless you are a dyed-in-the-wool believer that intuitionistic logic is the correct logic, and hence that all nonconstructive mathematics is devoid of meaning, you would be well advised to use classical logic to handle such things as, for example, the higher reaches of abstract set theory.
Church’s Thesis and Bishop’s Constructivism
65
References Bauer, A. [2005], “Realizability as the Connection Between Computable and Constructive Mathematics”, Proceedings of CCA 2005, Kyoto, Japan, 25–29 August 2005 [to appear]. Bishop, E.A. [1967], Foundations of Constructive Analysis, McGraw-Hill, New York. Bridges, D.S. and Richman, F. [1987], Varieties of Constructive Mathematics, London Math. Soc. Lecture Notes 97, Cambridge Univ. Press. Brouwer, L.E.J. [1907 (1981)], Over de Grondslagen der Wiskunde, Doctoral Thesis, University of Amsterdam, 1907. Reprinted with additional material (D. van Dalen, ed.), Matematisch Centrum, Amsterdam, 1981. Dummett, M.A.E. [2000], Elements of Intuitionism, Oxford Logic Guides 39, Clarendon Press, Oxford, 2nd Edn. Heyting, A. [1930], “Die formalen Regeln der intuitionistischen Logik”, Sitzungsber. preuss. Akad. Wiss. Berlin, pp. 42–56. Kushner, B.A. [1985], Lectures on Constructive Mathematical Analysis, Amer. Math. Soc., Providence RI. Markov, A.A. [1954], Theory of Algorithms [Russian], Trudy Mat. Istituta imeni V.A. Steklova 42, Izdatel’stvo Akademi Nauk SSSR, Moskva. Metakides, G., Nerode, A., and Shore, R. [1985], “Recursive Limits on the Hahn–Banach Theorem”, in Errett Bishop: Reflections on Him and His Research, (M. Rosenblatt ed.), Contemporary Math. 39, Amer. Math. Soc., Providence, R.I. Richman, F. [1983], “Church’s Thesis Without Tears”, J. Symbolic Logic 48, 797–803. Weihrauch, K. [2000], Computable Analysis, Springer-Verlag, Heidelberg.
Selmer Bringsjord, Konstantine Arkoudas∗
On the Provability, Veracity, and AI-Relevance of the Church–Turing Thesis After providing some brief background on the Church–Turing thesis, we discuss Mendelson’s much-publicized change of heart regarding the provability of that thesis, and defend the standard conception of the thesis as a mathematically unprovable proposition. Next, the first author (Bringsjord) offers an argument that aims to establish the outright falsity of the thesis. The argument is controverted by the second author (Arkoudas), who puts forth several objections against it. Bringsjord replies to these objections, and to a number of other potential objections, and proceeds to consider previous attacks on the Church–Turing thesis and compare them with his own. The final section of the paper is devoted to an examination of arguments for the computational conception of mind from the Church–Turing thesis. We distinguish between strong and weak forms of computationalism, and we analyze an argument for weak computationalism on the basis of the Church–Turing thesis, which we show, contra Copeland, to be formally valid. Accordingly, if the thesis is true, then, by virtue of this argument, weak computationalism is validated. ∗
S. Bringsjord, K. Arkoudas, Department of Cognitive Science, Department of Computer Science, Rensselaer AI & Reasoning (RAIR) Lab, Rensselaer Polytechnic Institute, Troy NY 12180 USA, [email protected], [email protected], .
On the Provability, Veracity, and AI-Relevance...
67
1. Background At the heart of the Church–Turing thesis (CTT from now on) is the notion of an algorithm, characterized in traditional fashion by Mendelson as an effective and completely specified procedure for solving a whole class of problems. [...] An algorithm does not require ingenuity; its application is prescribed in advance and does not depend upon any empirical or random factors. [Mendelson 1990, p. 225]
A function f : A → B is then called effectively computable iff there exists an algorithm that an idealized computing agent can follow in order to compute the value f (a) for any given a ∈ A.1 Without loss of generality, we can restrict attention to so-called numbertheoretic functions, i.e., functions that take N to N (where N is the set of natural numbers). Briefly, the justification for this restriction is a technique known as arithmetization. Using ideas made popular by G¨ odel, one can devise encoding and decoding algorithms that will represent any finite mathematical structure (e.g., a graph, a context-free grammar, a formula of second-order logic, a Java program, etc.) by a unique natural number. By using such a scheme, a function from, say, Java programs to graphs, can be faithfully (and effectively) represented by a function from N to N . Similar techniques can be used to represent a function of multiple arguments by a single-argument function. The notion of an effectively computable function is informal, since it is based on the somewhat vague concept of an algorithm. CTT also involves a more formal notion, that of a Turing-computable function. A (total) function f : N → N is Turing-computable iff there exists a Turing machine which, starting with n on its tape (perhaps represented by n |s), leaves f (n) on its tape after processing, for any n ∈ N . (The details of the processing are harmlessly left aside for now; see, e.g., [Lewis and Papadimitriou 1997] for a thorough development.) Given this definition, CTT amounts to: CTT A function f : N → N is effectively computable if and only if it is Turing-computable. 1
Turing [1936] spoke of “computists” and Post [1934] of “workers,” humans whose sole job was to slavishly follow explicit, excruciatingly simple instructions.
68
Selmer Bringsjord, Konstantine Arkoudas
The term “Church’s thesis” will be used synonymously with “the Church–Turing thesis,” and CT will serve as an abbreviation for the former.
2. Mendelson and the Provability of Church’s Thesis In a widely circulated and debated essay, [Mendelson 1990] reversed his earlier position [Mendelson 1963] on the provability of Church’s thesis and went on to argue against the standard conception of the thesis as mathematically unprovable. He contended that it is wrong to say that the thesis is not mathematically provable “just because it states an equivalence between a vague, imprecise notion (effectively computable function) and a precise mathematical notion (partial-recursive function),” and gave what he considers to be a perfectly precise mathematical proof of the easier half of the thesis. The issue clearly boils down to what counts as a mathematical proof, and indeed the bone of contention here can be traced to different understandings of that concept. The question of what exactly ought to count as a mathematical proof is somewhat controversial; there have been widely differing opinions. Some thinkers in the “quasi-empirical” tradition (a tradition that is largely associated with Lakatos [1976], and which emphasizes the fallibility of mathematics and the impossibility of securing foundations for it) view mathematical proofs as informal arguments (Lakatos called proofs “thought experiments”), and hold, as Kitcher [1977] puts it, that: The mistake is to regard proofs as instruments of justification. Instead, we should see them as tools of discovery, to be employed in the development of mathematical concepts and the refinement of mathematical conjectures.
Anti-foundationalists in general, particularly those who are sympathetic to social constructivism (e.g., [Ernest 1998]), hold that mathematical proofs are essentially social constructs, inextricably tied to a particular time and culture. According to such views, a mathematical proof is any argument—indeed, any thing, e.g., a diagram— that can convince fellow mathematicians that a certain claim is true. Rigor, they claim, particularly of the formal axiomatic variety, is not a realistic model of mathematical discovery and can actually stifle innovation. “Too much rigor can lead to rigor mortis,” as Kleiner
On the Provability, Veracity, and AI-Relevance...
69
[1991] puts it. On the other end of the spectrum, we have formalists of various degrees, in the tradition of Frege, Russell, and Hilbert, who view proofs as they are treated in proof theory [Buss 1998; Troelstra and Schwichtenberg 1996]: as perfectly rigorous mathematical objects in and of themselves. Now, if one concurs with the Fregean approach and regards proofs as crisp mathematical objects (e.g., sequences or trees of formulas) arising in the context of some formal system or other, then clearly there can only be proofs of statements that represent well-formed formulas of such systems. Accordingly, since CTT in its usual formulation is not expressed as a well-formed sentence of a formal system (since “effectively computable” is taken as a pretheoretical, informal concept), then there is ipso facto no mathematical proof for it. If, by contrast, one is willing to entertain a more liberal view, whereby a mathematical proof need not be tied to any particular formal system, but need only be a convincing argument possibly involving informal, intuitive notions, then clearly it could be that CTT is mathematically provable. However, it seems to us that when logicians and authors of computability and logic texts state that CTT is not provable, they do so under the former understanding, and so, contra Mendelson, what they say is not only true but trivially so. In fact, such authors would not deny that the thesis might be mathematically provable if one adopts the latter interpretation of “mathematical proof” as any cogent piece of argumentation. Indeed, almost invariably such authors point out that there is ample empirical evidence for CTT. The issue is simply that they are not willing to acknowledge such evidence, however convincing, as constituting a mathematical proof. So the issue raised by Mendelson has little to do with CTT per se, and more to do with the old debate about the nature of mathematical proof. There is perhaps another way to understand Mendelson’s assertion that we can have mathematical proofs about informal concepts, a weaker interpretation. On that interpretation, one could understand Mendelson as saying that given certain informal concepts C1 , . . . , Cn , it is possible to incorporate C1 , . . . , Cn as undefined (i.e., primitive) notions in some appropriate formal theory T , postulate certain propositions about them, and then proceed to give rigorous proofs in the extended theory T 0 , proofs that may involve C1 , . . . , Cn . This amounts to an implicit and possibly partial characterization of
70
Selmer Bringsjord, Konstantine Arkoudas
the new concepts instead of explicit, total definitions of them within T . The new theory T 0 will be a non-conservative extension of T , and might be regarded as a quasi-formalization of C1 , . . . , Cn . We obviously take no issue with that possibility. Indeed, we believe that this is how Mendelson’s claim that he has given a perfectly precise mathematical proof of the easier half of CTT should be understood. That “proof” can be readily formalized in a typed higher-order logic where we have a theory of the natural numbers and functions on them already available as libraries (as we do, say, in HOL [Gordon and Melham 1993]), and where “effectively computable” is taken as a primitive notion in the form of a unary relation on number-theoretic functions. We could then define the partial recursive functions as the inductive closure of the initial functions under the operations of composition, recursion, and minimalization, and couch Mendelson’s argument as an inductive proof. We could also formalize Mendelson’s proof in a first-order setting within an extension of ZFC, again by taking “effectively computable” as a primitive notion in addition to the binary relation of set membership. There is nothing wrong with doing this, if we are dealing with a new concept C that we want to elucidate and can think of no explicit definition for it inside an already established theory. Taking C as a primitive and postulating appropriate propositions about it is a good way to experiment with the concept. (Usually, of course, this will happen by extending other theories rather than by starting from scratch.) It forces us to make explicit assumptions about C that integrate it rigorously with other previously established concepts and results, and gain experience with the kind of reasoning and results that such assumptions permit. But we gain nothing if the concept is one that has already been explicitly defined within another formal theory in a way that has proven satisfactory, i.e., in a way that has been empirically validated. Why take something as primitive when we can evidently define it explicitly in terms of other primitives that appear more fundamental (more primitive, so to speak)? We only end up violating Occam’s razor and increasing the chances of inconsistency, which is a risk that we take every time we postulate a new axiom (and we have to postulate some axioms about the primitives, as otherwise we will not be able to prove anything about them). Conservative theory extensions are obviously preferable to non-conservative extensions.
On the Provability, Veracity, and AI-Relevance...
71
This can be seen more clearly by analyzing Mendelson’s argument for the easy half of CTT in detail. The argument has the form of an inductive proof in which both the basis step and the inductive step are asserted instead of proved. The fact that it has the form of an inductive argument is appropriate, since the set of partial recursive functions is inductively defined as a closure system—the smallest set of number-theoretic functions that contains the initial functions and is closed under composition, recursion, and minimalization. Now any closure system (or equivalently, any subalgebra built by a generating set) has an associated principle of structural induction, stating that if the basis elements have a certain property, and if that property is preserved by the operations, then every element of the closure has the property. In the case of the partial recursive functions the principle takes the following form: Let S be any set of numbertheoretic functions and suppose that the following two conditions hold: 1. All initial functions are in S (basis step). 2. If f1 , . . . , fn are in S and g is obtainable from fi through composition, recursion, and/or minimalization, then g is in S (inductive step). We are then entitled to conclude that S contains all partial recursive functions. Mendelson’s argument can be understood as an instantiation of this pattern, where S is the set of all effectively computable number-theoretic functions, where “effectively computable” is represented by some undefined, primitive unary predicate E applying to such functions. But Mendelson’s “proof” simply asserts both the basis step and the inductive step (where S here is the set of effectively computable functions). In other words, he postulates two axioms: A1 : The initial functions are effectively computable. A2 : Effective computability is preserved by composition, recursion, and minimalization. Of course these propositions are not arbitrary. They seem true in the intended interpretation, and Mendelson tries to justify them with two informal and very brief parenthetical arguments. But A1 and A2 are asserted nevertheless, as no mathematical derivations are given for
72
Selmer Bringsjord, Konstantine Arkoudas
them, only informal appeals to intuition. (Of course, it is conceivable that A1 and A2 could be deduced from some other unspecified and perhaps more fundamental axioms about E, but Mendelson does not do anything of the sort.) Now an inductive mathematical proof that postulates both the basis and the inductive steps is tantamount to simply asserting that the inductive closure at hand has the relevant property. Mathematically, it has very little value. So let us assess the present state of affairs from a formal perspective. We have obtained a new theory T 0 as a non-conservative extension of a formal theory T by introducing a new primitive notion E. And we have postulated two axioms in T 0 that amount to a formal underspecification of E: We have claimed that the extension of E is a superset of the partial recursive functions. We have gained nothing by doing so. We have proliferated our conceptual ontology without settling the question of the exact extension of E relative to the formally defined set of partial recursive functions. It would in fact be impossible to settle that question within T 0 without arbitrarily postulating yet more axioms. Given that the weak interpretation of Mendelson seems theoretically unsatisfactory and renders his proof trivial, we are inclined to think that what he has in mind is the stronger claim to which we alluded earlier, namely, that there are bona fide mathematical proofs that cannot be accurately couched in any formal system whatsoever. A mathematical proof of CTT, in particular, would have to lie outside the confines of any particular formal system in that “effective computability” would need to remain entirely unformalized, taken neither as an undefined term of a formal system nor as a defined one. Accordingly, if CTT is mathematically provable in the intuitive sense, then there exist informal mathematical proofs that are inherently unformalizable, in the sense that they cannot be captured in any axiomatic system, even in principle. That is an extremely strong claim. We are not aware of any evidence for it. We are aware of ample evidence for its negation. Certainly it is a widely held belief that any branch of modern mathematics can be developed within ZFC (e.g., Henson [1984] states “it is an empirical fact that all of mathematics as presently known can be formalized within the ZFC system”). That is not simply an article of faith. As Mendelson is well aware, such development has been carried out in painstaking and tremendously extensive detail by many mathematicians. There may be doubts as to whether ZFC can capture everything that math-
On the Provability, Veracity, and AI-Relevance...
73
ematicians are interested in talking about, but that mainly refers to intensionality issues (e.g., modal notions such as epistemic states at given points in time), not to extensional questions of proof existence. And here we are not even talking about the adequacy of ZFC specifically; we are talking about all formal axiomatic systems, the claim being that there are informal mathematical proofs that cannot be captured in any formal system. This contradicts what one might call “Hilbert’s thesis”: that the identification of the informal concept of mathematical proof with formal deduction in axiomatic systems is correct, meaning that any informal mathematical proof can be represented by some rigorous deduction in an appropriate formal system, and conversely, any such deduction can be viewed as representing an informal mathematical proof (this is “the easier half” of Hilbert’s thesis). In fact, there are increasing amounts of evidence to suggest that any style of argumentation capable of serving as genuine mathematical justification can be made perfectly rigorous. For example, for a long time diagrams were thought to be suspect and “rigorous” mathematicians would warn against the pitfalls of using such informal devices in mathematical proofs. But recent work (much of it sparked by the efforts of the late Jon Barwise) has shown that Venn diagrams, higraphs, and Peirce diagrams can all be made perfectly precise [Hammer 1995; Shin 1995], with rigorous notions of syntax, semantics, soundness, completeness, etc. Even more recent results by the authors [Arkoudas 2005] suggest that it is possible to formalize arbitrary diagrammatic proofs; we have designed and analyzed a new domain-independent logical framework for heterogeneous natural deduction capable of combining diagrammatic and symbolic inference for arbitrary finite domains, we have proven it sound, and have given detailed algorithms for how to implement a proof checker for it. Accordingly, given that the mathematical provability of CTT in that sense would contradict Hilbert’s thesis, and given the overwhelming evidence in favor of that thesis and the absence (to the best of our knowledge) of any evidence against it, we conclude that CTT is indeed mathematically unprovable. Even if one remains agnostic on the proper reading of “mathematical proof,” however, there is still much to take issue with in Mendelson’s arguments. For instance, he writes:
74
Selmer Bringsjord, Konstantine Arkoudas The concepts and assumptions that support the notion of partial-recursive function are, in an essential way, no less vague and imprecise than the notion of effectively computable function; the former are just more familiar and are part of a respectable theory with connections to other parts of logic and mathematics. (The notion of effectively computable function could have been incorporated into an axiomatic presentation of classical mathematics, but the acceptance of CT made this unnecessary.) The same point applies to [PT, FT, and TT]. Functions are defined in terms of sets, but the concept of set is no clearer than that of function and a foundation of mathematics can be based on a theory using function as primitive notion instead of set. Tarski’s definition of truth is formulated in set-theoretic terms, but the notion of set is no clearer than that of truth. The model-theoretic definition of logical validity is based ultimately on set theory, the foundations of which are no clearer than our intuitive understanding of logical validity. [Mendelson 1990, p. 232]
But Mendelson doesn’t establish these statements; he simply asserts them. By our lights—and by the lights of many others—the concept of a set and the relation of set membership, which are ultimately the only two primitive concepts underlying the notion of a Turing-computable (equivalently, partial recursive) function, are much clearer than the notion of an algorithm, which is the main concept underlying the informal notion of an effectively computable function.2 They are likewise clearer than the concepts of logical validity and entailment, limits and convergence, and all other examples mentioned by Mendelson. Indeed, we claim that the set concept is inherently more foundational than—and hence it is “essentially different” from—the concept of an algorithm. This is not a fringe view. Maddy [1997] calls the foundational view of sets “a pillar of contemporary orthodoxy,” citing quotations such as the following: “All mathematicians do mathematics in set theory, consciously or unconsciously” 2 We do not think that the well-known independence results in ZFC, or the proliferation of unorthodox set-theoretic axioms and interpretations, undermine the clarity of the primitive concept any more than the existence of non-standard models of arithmetic or the incompleteness of Peano’s axioms make the concept of natural number any less clear—or the existence of hyperbolic and elliptic geometries makes the Euclidean concepts of points and lines any less clear.
On the Provability, Veracity, and AI-Relevance...
75
[Levy 1979]. Views like the following one by the mathematical logician Moschovakis [1998] are common: I believe that most mathematical theories (and all the nontrivial ones) can be clarified considerably by having their basic notions modeled faithfully in set theory; that for many of them a (proper) set-theoretic foundation is not only useful but necessary—in the sense that their basic notions cannot be satisfactorily explicated without reference to fundamentally settheoretic notions; and that set-theoretic foundations of mathematical theories can be formulated so that they are compatible with a large variety of views about truth in mathematics and the nature of mathematical objects. [Moschovakis 1998, p. 9]
Mendelson also claims that Another difficulty with the usual viewpoint concerning CT is that it assumes that the only way to ascertain the truth of the equivalence asserted in CT is to prove it.
But no such assumption is “usually” made, either explicitly or tacitly. Indeed, virtually all authors gladly accept the truth of the thesis even though they consider it mathematically unprovable. If they assumed that mathematical proof was the only way to ascertain the truth of the thesis, they would be in the rather embarrassing position of enthusiastically endorsing a proposition whose truth there is no way, by their own assumption, to ascertain. Rather, the usual viewpoint is that the thesis is as “ascertained” as any proposition with empirical import can hope to be: eminently likely, but in principle subject to refutation.
3. Is Church’s Thesis True? In this section Bringsjord presents an argument purporting to show that Church’s thesis is false.3 Arkoudas regards the thesis as 3
Alert readers will realize that if Church’s thesis is false, it follows immediately that it’s unprovable, since presumably nothing false can be proved. Such readers will thus perhaps wonder why success in the present section doesn’t render the previous section otiose. The answer is simply that, together, we have both endeavored to take Mendelson seriously: To grapple directly with the provability issue, independent of other arguments.
76
Selmer Bringsjord, Konstantine Arkoudas
true and presents three objections to Bringsjord’s argument, in Section 4.1, Section 4.2, and Section 4.3. Bringsjord replies to these objections, and proceeds to consider some additional possible objections. In Section 5 Bringsjord discusses previous attacks on Church’s thesis. Bringsjord’s suspicion that CT is false first arose in connection with the concept of productive sets, which have two properties: P1 They are classically undecidable (=no program, Turing machine, etc. can decide such sets). P2 There is a computable function f from the set of all standard programs to any such set, a function which, when given a candidate program P (for deciding the set in question), yields an element of the set for which P will fail. Put informally, a set A is productive iff it’s not only classically undecidable, but also if any program proposed to decide A can be counter-exampled with some element of A. Clearly, if a set A0 has these properties, then A0 6∈ Σ0 and A0 6∈ Σ1 . If A0 falls somewhere in AH, and is effectively decidable, then CTT falls. But what could possibly fit the bill? Bringsjord has become convinced that the set S of all interesting stories provides a perfect fit.
Figure 1: Various Letter As This no doubt catches you a bit off guard. Interesting stories? Well, let us first remind you that the view that there are productive
On the Provability, Veracity, and AI-Relevance...
77
sets near at hand is far from unprecedented. Douglas Hofstadter [1982], for example, holds that the set A of all As is a productive set. In order to satisfy P1, A must forever resist attempts to write a program for deciding this set; in order to satisfy P2, there must at minimum always be a way to “stump” a program intended to decide A. That A satisfies both these conditions isn’t all that implausible— especially when one faces up to the unpredictable variability seen in this set. For example, take a look at Figure 1, taken from Graphic Art Materials Reference Manual [1981]. In order for a program to decide A, it must capitalize on some rules that capture the “essence” of the letter in question. But what sorts of rules could these be? Does the bar in the middle need to touch the sides? Apparently not (see 2 A). Does there have to be a bar that approximates connecting the sides? Apparently not (see 7 G). And on and on it goes for other proposed rules.4 However, it must be conceded that no argument for the productivity of A has been provided by Hofstadter. For all we know, some company could tomorrow announce a letter recognition system that will work for all As. The situation is a bit different in the case of the mathematician Peter Kugel [1986], who makes clever use of an elementary theorem in unmistakably arguing that the set of all beautiful objects is located above Σ1 in AH: We seem to be able to recognize, as beautiful, pieces of music that we almost certainly could not have composed. There is a theorem about the partially computable sets that says that there is a uniform procedure for turning a procedure for recognizing members of such sets into a procedure for generating them. Since this procedure is uniform—you can use the same one for all computable sets—it does not depend on any specific information about the set in question. So, if the set of all beautiful things were in Σ1 , we should be able to turn our ability to recognize beautiful things into one for generating them [...] This suggests that a person who recognizes the Sistine Chapel Ceiling as beautiful knows enough to paint it, [which] strikes me as somewhat implausible. [Kugel 1986, pp. 147–148] 4
Relevant here is Hofstadter’s letter spirit program, which generates fonts from the first few letters in the font in question. For an argument that this program, and others, aren’t really creative, see [Bringsjord, Ferrucci and Bello 2001].
78
Selmer Bringsjord, Konstantine Arkoudas
The main problem with this line of reasoning is that it’s disturbingly exotic. Beauty is perhaps a promising candidate for what Kugel is after, but it must be conceded that most of those scientists who think seriously about human cognition don’t think a lot about beauty. Indeed, they don’t seem to think at all about beauty.5 And this isn’t (they would insist) because beauty is a daunting concept, one that resists recasting in computational terms. The stance would doubtless be that beauty is left aside because one can exhaustively analyze cognition (and replicate it on a machine) without bothering to grapple in earnest with this concept. This claim about the irrelevance of beauty may strike some as astonishing, and it certainly isn’t a view affirmed by each and every computationalist, but we gladly concede it for the sake of argument: for the record, we grant that ignoring beauty, in the context of attempts to model, simulate, and replicate mentation, is acceptable.6 However, Bringsjord thinks there is another concept that serves our purposes perfectly: namely, the concept of a story. Stories are thought by many to be at the very heart of cognition. For example, in their lead target chapter in Knowledge and Memory: The Real Story [Wyer 1995], Roger Schank and Robert Abelson, two eminent scientists working in the area of cognition and computation, boldly assert on the first page that “virtually all human knowledge” is based on stories.7 Schank and Abelson go on to claim that since the essence of cognition inheres in narrative, we can jettison propositional, logic-based, rule-based, formal... schemes for knowledge representation. Among the 17 commentators who react to the target piece, 13 affirm the story-based view (the remaining four authors are skeptical). Moreover, this book is one of many in the same family. For example, Schank has devoted a book to the 5 A search for coverage of this concept in standard texts about cognition—e.g., [Ashcraft 1994] and [Stillings et al. 1995]—turns up nothing whatever. 6 What argument could be mustered for ignoring beauty in the context of attempts to reduce cognition to computation, or to build an artificial agent capable of behaviors analogous to human ones typically taken to involve beauty? We envisage an argument running parallel to the one John Pollock [1995] gives for ignoring human emotions in his attempt to build an artificial person. Pollock’s view, in a nutshell, is that human emotions are in the end just “time savers;” with fast enough hardware, and clever enough algorithms, artificial persons could compute the need to quickly flee (say) a lion, whereas we take one look and immediately feel a surge of fear that serves to spark our rapid departure. 7 An insightful review of this book has been written by Tom Trabasso [1996].
On the Provability, Veracity, and AI-Relevance...
79
view that stories are at the very heart of human cognition: [Schank 1995]. For another example, note that Dennett’s [1991] Consciousness Explained can be read as a defense of the view (his “multiple drafts” view of consciousness) that thinking amounts to spinning out parallel stories. The other nice thing about stories, from our perspective, is that apparently one of us knows a thing or two about them, in connection to computation. For over a decade, Bringsjord worked at creating an artificial agent capable of autonomously creating sophisticated fiction. Bringsjord first discussed this project in his What Robots Can and Can’t Be [1992], in which he specifically discussed the challenge of characterizing, precisely, the class of interesting stories. (His main claim was that formal philosophy offers the best hope of supplying this characterization.) For those who seek to build agents capable of creative feats like good storytelling, this is a key challenge. It’s easy enough to build systems capable of generating uninteresting stories. For example, the world’s first significant artificial story generator, tale-spin [Meehan 1981], did a good job of that. Here, for example, is one of tale-spin’s best stories: “Hunger” Once upon a time John Bear lived in a cave. John knew that John was in his cave. There was a beehive in a maple tree. Tom Bee knew that the beehive was in the maple tree. Tom was in his beehive. Tom knew that Tom was in his beehive. There was some honey in Tom’s beehive. Tom knew that the honey was in Tom’s beehive. Tom had the honey. Tom knew that Tom had the honey. There was a nest in a cherry tree. Arthur Bird knew that the nest was in the cherry tree. Arthur was in his nest. Arthur knew that John was in his cave. [...]
How are things to be improved? How is one to go about building an agent capable of creating interesting stories? It was the sustained attempt to answer this question, in conjunction with the concept of productivity discussed above, that persuaded Bringsjord that CT is indeed false. Let us explain. First, to ease exposition, let S I denote the set of all interesting stories. Now, recall that productive sets must have two properties, P1 and P2; let’s take them in turn, in connection with S I . First, S I must be classically undecidable; i.e., there is no program (or
80
Selmer Bringsjord, Konstantine Arkoudas
TM, etc.) which answers the question, for an arbitrary story in S I , whether or not it’s interesting. Second, there must be some computable function f from the set of all programs to S I which, when given as input a program P that purportedly decides S I , yields an element of S I for which P fails. It seems to us that S I does have both of these properties—because, in a nutshell, Bringsjord and colleagues seemed to invariably and continuously turn up these two properties “in action.” Every time someone suggested an algorithm-sketch for deciding S I , it was easily shot down by a counter-example consisting of a certain story which is clearly interesting despite the absence in it of those conditions regarded by the proposal to be necessary for interestingness. (It has been suggested that interesting stories must have inter-character conflict, but monodramas can involve only one character. It has been suggested that interesting stories must embody age-old plot structures, but some interesting stories are interesting precisely because they violate such structures, and so on.) The situation we have arrived at can be crystallized in deductive form as follows.8 Arg3 (9)
(10)
∴ ∴ ∴
(11) (12) (13) (14) (15)
I
If S ∈ Σ1 (or S I ∈ Σ0 ), then there exists a procedure P which adapts programs for deciding members of S I so as to yield programs for enumerating members of S I . There’s no procedure P which adapts programs for deciding members of S I so as to yield programs for enumerating members of S I . S I 6∈ Σ1 (or S I 6∈ Σ0 ). S I ∈ AH. S I ∈ Π1 (or above in the AH). S I is effectively decidable. CT is false.
10, 11 disj syll reductio
8 Please note that the labeling in this argument is intentional. This argument is one Bringsjord is defending anew in the present chapter, and desires to preserve it precisely as it has been previously articulated [see Bringsjord and Zenzen 2003]. The argument is now followed by new objections from Arkoudas, given in section 4.
On the Provability, Veracity, and AI-Relevance...
81
Clearly, Arg3 is formally valid. Premise (9) is not only true, but necessarily true, since it’s part of the canon of elementary computability theory. What about premise (10)? Well, this is the core idea, the one expressed above by Kugel, but transferred now to a different domain: People who can decide S I , that is, people who can decide whether something is an interesting story, can’t necessarily generate interesting stories. Student researchers in Bringsjord’s laboratory have been a case in point: with little knowledge of, and skill for, creating interesting stories, they have nonetheless recognized such narrative. That is, students who are, by their own admission, egregious creative writers, are nonetheless discriminating critics. They can decide which stories are interesting (which is why they know that the story generators AI has produced so far are nothing to write home about), but producing the set of all such stories (including, as it does, such works as not only King Lear, but War and Peace) is quite another matter. These would be, necessarily, the same matter if the set of all interesting stories, S I , was in either Σ0 or Σ1 , the algorithmic portion of AH. But what’s the rationale behind (14), the claim that S I is effectively decidable? The rationale is simply the brute fact that a normal, well-adjusted human computist can effectively decide S I . Try it yourself: First, start with the sort of story commonly discussed in AI; for example: “Shopping” Jack was shopping at the supermarket. He picked up some milk from the shelf. He paid for it and left.9
Well? Your judgement? Uninteresting, we wager. Now go back to “Hunger,” and come up with a judgement for it, if you haven’t done so already. Also uninteresting, right? Now render a verdict on “Betrayal,” a story that can be produced by Bringsjord and Ferrucci’s [2000] brutus: “Betrayal” Dave Striver loved the university. He loved its ivy-covered clocktowers, its ancient and sturdy brick, and its sun-splashed 9
From page 592 of [Charniak and McDermott 1985]. The story is studied in the context of attempts to resolve pronouns: How do we know who the first occurrence of ‘He’ refers to in this story? And how do render the process of resolving the pronoun to Jack as a computational one?
82
Selmer Bringsjord, Konstantine Arkoudas verdant greens and eager youth. He also loved the fact that the university is free of the stark unforgiving trials of the business world—only this isn’t a fact: academia has its own tests, and some are as merciless as any in the marketplace. A prime example is the dissertation defense: to earn the PhD, to become a doctor, one must pass an oral examination on one’s dissertation. This was a test Professor Edward Hart enjoyed giving. Dave wanted desperately to be a doctor. But he needed the signatures of three people on the first page of his dissertation, the priceless inscriptions which, together, would certify that he had passed his defense. One of the signatures had to come from Professor Hart, and Hart had often said—to others and to himself—that he was honored to help Dave secure his wellearned dream. Well before the defense, Dave gave Hart a penultimate copy of his thesis. Hart read it and told Dave that it was absolutely first-rate, and that he would gladly sign it at the defense. They even shook hands in Hart’s book-lined office. Dave noticed that Hart’s eyes were bright and trustful, and his bearing paternal. At the defense, Dave thought that he eloquently summarized Chapter 3 of his dissertation. There were two questions, one from Professor Rodgers and one from Dr. Teer; Dave answered both, apparently to everyone’s satisfaction. There were no further objections. Professor Rogers signed. He slid the tome to Teer; she too signed, and then slid it in front of Hart. Hart didn’t move. “Edward?” Rogers said. Hart still sat motionless. Dave felt slightly dizzy. “Edward, are you going to sign?” Later, Hart sat alone in his office, in his big leather chair, saddened by Dave’s failure. He tried to think of ways he could help Dave achieve his dream.
This time, interesting, right? Now at this point some readers may be thinking: “Now wait a minute. Isn’t your position inconsistent? On the one hand you cheerfully opine that ‘interesting story’ cannot be captured. But on the other you provide an interesting story!—a story that must, if I understand your project, capitalize upon some careful account of interestingness in narrative.” “Betrayal” is based in significant part upon formalizations, in intensional logic, of definitions taking the classic form of necessary and
On the Provability, Veracity, and AI-Relevance...
83
sufficient conditions seen in analytic philosophy. These definitions are given for “immemorial themes;” in “Betrayal” the two themes are self-deception and, of course, betrayal. Here is approximately the definition of betrayal with which brutus works:10 D Agent sr betrays agent sd at tb iff there exists some state of affairs p and ∃ti , tk (ti ≤ tk ≤ tj ≤ tb ) such that 1 sd at ti wants p to occur; 2 sr believes that sd wants p to occur; 30 (3 ∧ 60 ) ∨
600 sd wants at tk that there is no action a which sr performs in the belief that thereby p will not occur;
400 there is some action a such that: 400 a sr performs a at tb in the belief that thereby p will not occur; and 400 b it’s not the case that there exists a state of affairs q such that q is believed by sr to be good for sd and sr performs a in the belief that q will not occur; 50 sr believes at tj that sd believes that there is some action a which sr will perform in the belief that thereby p will occur.
All of this sort of work (i.e., the gradual crafting of such definitions in the face of counter-example after counter-example; the crafting in the case of betrayal is described in Chapter 4 of [Bringsjord and Ferrucci 2000]) is perfectly consistent with the absence of an account of ‘interesting story.’ In fact, this kind of philosophical analysis figures in the observation that proposed accounts of interestingness are invariably vulnerable to counter-example. For example, suppose we try (here, schematically) something Bringsjord and colleagues have tried: Let c1 , . . . , cn enumerate the definitions of all the immemorial themes involved in narrative. Now suppose we venture a defintion having the following structure. D0 A story s is interesting iff 10
Note that the variables ti range over times, and that ≤ means “earlier or simultaneous.” Note also the following clauses, which appear in clause 30 . 3 sr agrees with sd that p ought to occur; 60 sd wants that there is some action a which sr performs in the belief that thereby p will occur.
84
Selmer Bringsjord, Konstantine Arkoudas 1 [...] .. . k s instantiates (inclusive) either c1 or c2 or [...] or cn . k + 1 [...] .. . p [...]
The problem—and, alas, Bringsjord has experienced it time and time again—is that along will come a counter-example; in this case, a story which explicitly fails to satisfy k from D0 ’s definiens will arrive. For example, an author can write a very interesting story about a phenomenon like betrayal as cashed out in definition D, except that instead of clause 400 , the following weaker clause is satisfied. 40 there is some action a which sr performs in the belief that thereby p will not occur.
The story here might involve a courageous, self-sacrificial mother who assures her addicted son that she will procure drugs to relieve his misery (as he desires), but intends only to confront the pusher and put an end to his destructive dealings. Ironically, clearly some of the interestingness in this story will derive precisely from the fact that the mother is not betraying her son. On the contrary, she plans to save him and others. In short, devising accounts like D0 seems to be to fight a battle that can never be won; good narrative cannot be bottled. Selmer will now endeavor to reply to several possible objections to his argument, the first three of which were expressed by his coauthor.
4. Objections to Arg3 4.1. Objection 1
Arkoudas expressed his first objection as follows: The “set of all interesting stories” is an inherently fuzzy concept; it does not have a precise extension. Your argument rests on a confusion between defining a set and computing one. Questions of formal computability start after one has precisely defined a set S via arithmetization techniques as a set of natural numbers (or, more generally—and sans arithmetization—as
On the Provability, Veracity, and AI-Relevance...
85
a set of strings over some countable symbol set). Ideally, the definition should be given rigorously via a logical formula of the form x ∈ S ⇔ F (x) (1) where F is a completely formal statement (no undefined symbols in it) of one free variable x. Of course, one need not descend to this level of detail and may instead offer a high-level definition that omits certain tedious details. But convincing remarks must be made to show that it is indeed possible (at least in principle) to go from the high-level definition sketch to a rigorous one of the form (1). Otherwise one cannot claim to have a mathematical object about which mathematical statements (such as Turing-computability or lack thereof) can be made. The vast majority of the sets in the arithmetic hierarchy (AH) are of course uncomputable. All of them, however, are precisely definable. What your own argument indicates is simply that the concept of an interesting story does not have a clear extension: Every time your students tried to come up with precise sufficient and necessary conditions to characterize it, someone came up with a counter-example. This has nothing to do with computability; it has to do with the set’s definability. So to argue about exactly where in the AH the set of interesting stories resides (above or below the Turing limit) is to put the cart before the horse, as you have not even given us a single reason to believe that the set would be in the AH at all. To show that a set is in the AH, you need to convince us that it can be defined by a sequence of alternating quantifier blocks over a recursive predicate. No such definition seems to exist for your set.
We routinely apply concepts like ‘Turing-computable’ to objects that aren’t defined in the narrow way described in this objection. In fact, we have already seen reference to such an object above, in Figure 1: the set of all A’s. This set isn’t defined in the rigorous way Arkoudas venerates. Now of course he requires that such definitions be achievable only in principle. But can the set of all A’s be narrowly defined, given more insight, time, and energy? No one really knows. Nonetheless, we still specify and implement computer programs that take as input various A’s, and we still (witness Hofstadter) try to determine whether the set of all A’s is Turing-decidable. The attack on CTT should not be forced to abide by constraints more stringent than those guiding the practice of computer science.
86
Selmer Bringsjord, Konstantine Arkoudas
It is worth noting, as well, that, like A’s, stories can be visual. For example, see Figure 2. How does computing over visual objects work, relative to a formal scheme (recursion theory) that is purely linguistic in nature? Again, no one yet knows. Therefore, to repeat, the constraints Arkoudas has in mind are too stringent. Nonetheless, to simplify the dialectic that follows, we will pretend that stories are invariably textual in nature.
Figure 2: What is the story that can be constructed from these snapshots? (Reprinted here with paid permission from Psychological Corporation.)
4.2. Objection 2
Arkoudas’ second objection runs as follows: My second main criticism of your argument concerns your claim that S I is effectively decidable. (I’ll disregard for now the inherent fuzziness of S I , since the points I want to make here are orthogonal to that issue.) Quoting from your text:
On the Provability, Veracity, and AI-Relevance... But what’s behind the rationale for (14)? [That’s the premise that S I is effectively decidable.] The rationale is simply the fact that a normal, welladjusted human computist can effectively decide SI . But, to falsify CT, you need to come up with a non-Turingcomputable set that ‘a normal, well-adjusted human computist’ can nevertheless decide by way of an algorithm. By definition, to show that a set A is effectively decidable you need to demonstrate the existence of an algorithm that an idealized computist could use to decide the membership problem for A. Ideally, you would do this constructively: you would show us the algorithm whose existence you are claiming. Perhaps you could also argue indirectly for the algorithm’s existence, by trying to derive some sort of contradiction from the assumption that no such algorithm exists. But you do not do that. What you show is that for the two or three particular short story excerpts that you cite, most people would come to the expected judgment. No one would doubt that, but it is quite irrelevant. It says nothing about the existence or non-existence of an algorithm. Consider an analogous hypothetical argument: What is the rationale for the claim that the halting problem is effectively decidable? The simple fact that a normal, well-adjusted human computist can effectively decide whether any given program always halts or not. Consider the program x := y * z; Well? Your judgment? A halter, we wager. Ok, now try this: if true then x := 1 else x := 2; Also a halter, right? Now try this one: while true do ; This time a non-halter, right? Ergo, there is an algorithm for deciding the halting problem.
87
88
Selmer Bringsjord, Konstantine Arkoudas In fact even if you presented millions of positive and negative examples of programs that were correctly classified by humans with regard to termination, we could still infer nothing whatsoever about the existence of an algorithm for the halting problem. Moreover, I would claim that, by your line of reasoning, all sets are effectively decidable. Consider any set A whatsoever. Now it is clear that “a normal, well-adjusted human” will be able to correctly identify some objects as belonging to A and some objects as not belonging to A, for otherwise they can hardly be said to understand what A is. By your reasoning, this would appear to licence the conclusion that A is effectively decidable. Indeed, if we reject Church’s thesis, as you do, and refuse to identify “algorithm” with any precise set-theoretic notion, then we can never deduce that a given set is not effectively decidable. So if you claim that a decision algorithm for a certain set exists (as you do for S I ) and yet you refuse to present the alleged algorithm, no one could possibly falsify your claim. Accordingly, if we take the standard view that an unfalsifiable statement is not scientific, then we ought to conclude that your assertion that “S I is effectively decidable” is not a scientific statement—it is an article of faith.
This objection mistakenly conflates two senses of ‘effectively computable.’ One sense, invoked in our background section, and a direct reflection of the framework and language used by Turing and Post (see note 1), is based on what a “computist” or “worker” can do; i.e., on what a human being, working mechanically, can accomplish. The second sense, which is clear in what the objection states, is that of what is algorithmically computable. Let e.c.1 refer to the former sense, and e.c.2 refer to the latter. Arkoudas is certainly correct that S I cannot be said to be e.c.2 on the strength of what a computist can do: one needs in this case to present the algorithm. But when one has in mind e.c.1 , as I do, the one and only piece of evidence to bring forward is the observation that it is transparent that a computist can handle the task at hand—and I mean the arbitrary task as hand. This is clearly the case in the story domain: read it, judge it, spit out the verdict. Arkoudas goes on to offer programs designed to confirm his objection, and to generalize his objection. But the sample programs are irrelevant to the case at hand, for the simple reason that we can
On the Provability, Veracity, and AI-Relevance...
89
put on display programs that stump human computists with respect to haltingness.11 But no such counter-examples can be provided in the case of interesting stories. Finally, as to the generalization to the proposition that we can never be sure that any set is not effectively decidable, which is purported to go through if my position is assumed, I do clearly follow Turing and Post (and many others in the relevant tradition, e.g. [Sieg and Byrnes 1996]) in holding that it must be self-evident that the computist or worker can prevail in all cases. Even young students realize, for example, that long division, when time and energy is unbounded, is perfectly reliable. This realization comes not only because particular examples like 456 ÷ 8 are unproblematic, but also because it’s evident that the trick will work for any relevant pair, and hence no counter-example (unlike the case of the halting problem) is forthcoming. 4.3. Objection 3
Konstantine’s third objection is actually a pair of related objections. The first of the pair is this: You use Kugel’s argument to justify your claim that S I is not Turing-computable. Unfortunately, Kugel’s argument is enthymematic and flawed. Kugel starts by making the following two assumptions: (a) there is an algorithm for recognizing “beautiful objects;” and (b) there is an algorithm for generating the set of all “objects,” beautiful or not. (In my view both assumptions are problematic for various reasons—e.g., both “beautiful” and “object” are ill-defined—but in any event these are two assumptions that Kugel must make because they are needed by the result from elementary computability theory to which he appeals, so let us go along for the sake of the argument.) He then glibly cites the aforementioned result (omitting, as we will see, a key assumption of the result), which states that for any set A ⊆ U whose elements are drawn from some countable universe U , if (i) there is an algorithm for deciding the membership problem for A (namely, given any member x ∈ U , do we have x ∈ A or x ∈ U \ A?); and 11 E.g., at least in the days before Wiles changed the landscape [Wiles 1995, Wiles and Taylor 1995], we could stump computists with a Turing machine M such that it halts iff Fermat’s Last Theorem is true and provable, and spins forever iff FLT is false. Any number of Turing machines like this could be dreamed up now.
90
Selmer Bringsjord, Konstantine Arkoudas (ii) there is an algorithm for generating the universe U ; then there is an algorithm for enumerating A (to wit: Use the algorithm from (ii) to start listing the elements of the universe U , deploying the procedure from (i) to weed out elements which are not members of A). Using (a) and (b) for (i) and (ii), Kugel concludes that the set of all beautiful objects is algorithmically enumerable, which means that (idealized) persons could generate arbitrarily large numbers of beautiful objects even if they had zero creativity, as long as they could effectively recognize beauty. In Kugel’s words: “This suggests that a person who recognizes the Sistine Chapel Ceiling as beautiful knows enough to paint it, [which] strikes me as somewhat implausible.” Since he views this conclusion as counter-intuitive, he rejects assumption (a) via reductio ad absurdum, inferring that the set of beautiful objects cannot possibly be effectively decidable (or Turing-computable, by Church’s thesis). But the oddness which Kugel attributes to the conclusion actually lies in assumption (b), an assumption which Kugel neglects to state even though it is a crucial premise of the theorem he invokes. If one assumes, as Kugel does, that a person has an algorithm for generating all objects, then that person already “knows enough to paint the Sistine Chapel Ceiling”—as well as compose Beethoven’s ninth symphony, write Tolstoy’s War and Peace, and so on. Hence, assumption (b) is just as counter-intuitive as the conclusion that Kugel finds implausible, and therefore, by his own lights, we have just as good grounds to reject it. If we reject it, however, we can no longer appeal to the theorem that is the centerpiece of Kugel’s reasoning, and his whole argument collapses.
Arkoudas is of course perfectly right about the situation: Any standard typographic set A that is Turing-enumerable can presumably be enumerated by even a dim human being: just follow the relevant instructions. Even dim human beings, after all, can locate entries in a dictionary; they do so by essentially following the standard algorithm for lexicographic ordering, which takes as a starting place the ordering on the starting alphabet. (In English, ‘A’ comes before ‘B’ comes before ‘C,’ and so on.) Computists needn’t be humans: they could be, say, pigeons. Thus, a trained pigeon could write King Lear, sooner or later. All of this is completely uncontroversial. However, it doesn’t tell in the least against Kugel’s point,
On the Provability, Veracity, and AI-Relevance...
91
which is based on the real-life fact that (e.g.) Shakespeare.12 created King Lear. He imagined the characters, arranged the narrative, wrote the dialogue, and so on. The second part of the pair is expressed as follows: Since you (Bringsjord, not Kugel) actually claim that the set of interesting stories S I is effectively decidable, the foregoing theorem from computability theory can be adapted to show that, contrary to what you claim, S I is effectively enumerable, i.e., there is an algorithm for generating all and only the interesting stories. Let AI be the algorithm that you claim can decide S I . We can of course represent every element of S I by a string of English letters and punctuation characters. And there is an obvious algorithm AU that effectively enumerates the set of all strings of English letters and/or punctuation characters; call that set U (this is our universe here, U ⊇ S I ). Now here is an algorithm for cranking out interesting stories: start enumerating the set U by using algorithm AU ; as each string in U is generated, use algorithm AI to decide if it represents an interesting story. If it does, keep it in the list, otherwise strike it out. It is easy to see that if one accepts your own assumptions, then this algorithm generates all and only the elements of S I .
This objection founders for reasons already canvassed. What we know is that S I , the set of interesting stories, is effectively decidable. We know this, again, because we ourselves can be the verifying computists. It hardly follows from this (as has been previously noted) that we have on hand the algorithm (and that is Arkoudas’ word: algorithm) AI . This inference succeeds only if the two previously distinguished senses of effectively computable (e.c.1 and e.c.2 ) are erroneously conflated. 4.4. Objection 4
In the next objection, we see a variant of Arkoudas’ final objection: 12
I (Bringsjord) shy away from speaking of Michelangelo, for the simple reason that while I can write fiction, I can’t paint. I do suspect that painters would confirm, in the case of Michelangelo, what I say about Shakespeare.
92
Selmer Bringsjord, Konstantine Arkoudas “Look, Bringsjord, you must have gone wrong somewhere! Stories are just strings over some finite alphabet. In your case, given the stories you have put on display above, the alphabet in question is { Aa, Bb, Cc, [...], :, !, ;, [...]}, that is, basically the characters we see before us on our computer keyboard. Let’s denote this alphabet by ‘E.’ Elementary string theory tells us that though E ∗ , the set of all strings that can be built from E, is infinite, it’s countably infinite, and that therefore there is a program P which enumerates E ∗ (P , for example, can resort to lexicographic ordering). From this it follows that your S, the set of all stories, is itself countably infinite. (If we allow, as no doubt we must, all natural languages to be included—French, Chinese, and even Norwegian—the situation doesn’t change: the union of a finite (or for that matter a countably infinite) number of countably infinite sets is still just countably infinite.) So what’s the problem? You say that your students are able to decide S I ? Fine. Then here’s what we do to enumerate S I : Start P in motion, and for each item S generated by this program, call your students to pass verdict on whether or not S is interesting. This composite program—call it P 0 : P working in conjunction with your students—enumerates S I . So sooner or later, P 0 will manage to write King Lear, War and Peace, and even more recent belletristic narrative produced by Bringsjord’s favorite author: Mark Helprin.”13
The reasoning here is fallacious. The reason is straightforward, and decisive: An assumption is made here that the composite information processing, i.e., P plus what the computist is doing in judging stories, falls at the level of Turing machines. While it’s of course true that P is at this level (it’s just lexicographic ordering yet again) we don’t know that what computists are doing as literary critics (if you will) is Turing-computable. In fact, that’s exactly the issue at hand, and hence the objection is nothing more than a petitio. 13 Bringsjord has responded to this objection in an earlier publications (see the chapters on Church’s thesis in Bringsjord & Ferrucci [2000] and Bringsjord & Zenzen [2003]). The following response is a new one, and supplants previous ones, which are confessedly inadequate.
On the Provability, Veracity, and AI-Relevance...
93
4.5. Objection 5
The next objection is an attempt to resurrect Arkoudas’ first objection: “I now see your error, Selmer: premise (12) in Arg3 . If S I is to be in AH, then your key predicate—‘Interesting’; denote it by ‘I’—must be a bivalent one. (More precisely, I must be isomorphic to a predicate that is built via quantification out of the totally computable bivalent predicates of Σ0 .) But a moment’s reflection reveals that I isn’t bivalent: different people have radically different opinions about whether certain fixed stories are interesting! Clearly, though Jones and Smith may share the same language, and may thus be able to fully understand ‘Shopping,’ ‘Hunger,’ ‘Betrayal,’ King Lear, and War and Peace, their judgements may differ. “Shopping” might be downright thrilling to an AInik interested in determining how, upon reading such a story, humans know instantly that the pronoun ‘He’ refers to Jack.”14
It is important to realize that we are talking about stories qua stories; stories as narrative. Hence a better way to focus the present objection is to note that Jones may find Kind Lear to be genuine drama, but monstrously boring drama (because, he says, King Lear is but a lunatic), while Smith is transfixed. It’s undeniable that differences of opinion like those existing between Jones and Smith are common. But this fact is not a threat to Bringsjord’s argument. First, note that such differences are present in all domains, not just in the domain of narrative. Wittgenstein, remember, teased much out of a clash between someone who says that 2 + 2 = 4 and someone who flatly denies it—so even the arithmetical realm, if Objection 3 goes through, would lack bivalent properties, and if anything is suffused with bivalence, it’s arithmetic. Moreover, there is nothing to prevent us from stipulating that these agents come decked out with some fixed “value system”—for judging stories. In fact, let us heretofore insist that I be read as not just interesting simpliciter, but interesting given (what must surely be one of the world’s most 14
This intelligent objection is originally due to Michael McMenamin [1992], though a number of thinkers have conveyed its gist to us.
94
Selmer Bringsjord, Konstantine Arkoudas
refined systems for gauging stories) the knowledge and ability of none other than Umberto Eco.15 Our new predicate, then, can be IUE . The objection could perhaps be sustained as follows: “I seriously doubt that Umberto Eco has a fixed effective decision system by which he decides. I take it this is an illusion predicated on the fact that Eco has the authority to say what interests him (a la Wittgenstein on the incorrigibility of ‘introspection’). Whatever Eco sincerely pronounces ‘interesting’ is interesting for Eco; what he says goes. This seems akin to what you two envision your ‘decked out’ agents doing (just reading and pronouncing); this seems unlike effective deciding. You might as well say that each of us has an effective procedure for deciding the set of things that will be said by us in our lifetime: just by saying that we do we ‘enumerate the set.’ You might as well say the U.S. Supreme Court has a rote procedure for deciding cases: in deciding them they ‘enumerate the set’ of Supreme Court decisions. Eco’s own infallibility being a matter of authority, nothing guarantees that identically ‘decked out’ agents—lacking authority—will decide the same as him (or each other for that matter).”
This is a decidedly weak objection. Clearly, one claim made against us is simply that Eco has no system by which he judges interestingness. But this claim is wrong. The reason is that Eco doesn’t rely on mere authority: he presents the system: again, we refer interested readers to: [Eco 1979]. (One might say that Eco has become an authority because he has described his system.) Given this, the analogies to the Supreme Court, and to what we say in our lifetimes, fail. In neither of these domains is there even the hint of a description of the scheme by which verdicts are produced; the situations are therefore disanalogous. We do suspect that individual members of the Supreme Court would be analogous to Eco. Indeed, analyses of and careful commentaries on Supreme Court opinions routinely contain descriptions of the scheme deployed by a particular member of the Court. 15 Those unfamiliar with Eco’s non-fiction work, might start with his surprising reasons for finding Ian Fleming’s 007 (James Bond) series to be very interesting; see “Chapter Six: Narrative Structures in Fleming,” in [Eco 1979].
On the Provability, Veracity, and AI-Relevance...
95
4.6. Objection 6
“At the start of this chapter you affirmed Mendelson’s characterization of ‘algorithm.’ Let me remind you that according to that characterization, ‘An algorithm does not require ingenuity.’ Are you not now bestowing remarkable ingenuity upon the readers/judges you have in mind?” Recall that in order to parse ‘effectively computable,’ as we have noted, it’s necessary to invoke the generic concept of an agent, either Turing’s “computist” or Post’s “worker.” (At the very least, the standard way to unpack ‘effectively computable’ is through this generic concept.) The agent in question, as none other than Elliot Mendelson reminded us nearly forty years ago [Mendelson 1963], needn’t be a human agent, because, following the mantra at the heart of computability theory, we impose no practical restrictions on the length of calculations and computations. It follows immediately that the agents we have in mind have enough raw time and energy to process the longest and most complex contenders in S. Furthermore, if we are going to seriously entertain CTT, we must, all of us, allow the agents in question to have certain knowledge and ability, for example the knowledge and ability required to grasp the concepts of number, symbol, change, movement, instruction, and so on. The agents we have in mind are outfitted so as to be able to grasp stories, and the constituents of stories. And in deploying I, and in moving to IUE , we assume less on the part of agents (workers, computists, etc.) than what even defenders of CT through the years have assumed. This is so because such thinkers freely ascribe to the agents in question the knowledge and ability required to carry out sophisticated proofs—even proofs which cannot be formalized in first-order logic. The agents capable of deciding S I need only read the story (and, for good measure, read it n subsequent times—something mathematicians routinely do in order to grasp proofs), and render their decision. 4.7. Objection 7
“Yes, but what your computists do is not decomposable into smaller, purely mechanical steps, which is the hallmark of an algorithm. They are supposed to read a story (and, if I understand you,
96
Selmer Bringsjord, Konstantine Arkoudas
perhaps read it again some finite number of times), and then, just like that, render a judgment. This is more like magic than mechanism.”
Figure 3: A Flow-Diagram Fragment That Entails Non-Halting To see the problem with this objection, let’s prove, in a thoroughly traditional manner, that a certain well-defined problem is effectively solvable. Recall that all Turing machines can be recast as flow diagrams (e.g., see [Boolos and Jeffrey 1989]). Next, note that any TM represented by a flow diagram having as part the fragment shown in Figure 3 would be a non-halting TM (because if started in state 1 with its read/write head scanning the leftmost 1 in a block of 1s—and we can assume the alphabet in question to be a binary one consisting of {0,1}—it will loop forever in this fragment). Let m be a fixed TM specified for computist Smith in flow diagram form, and let this diagram contain the fragment of Figure 3. Suppose that Brown looks for a minute at the diagram, sees the relevant fragment, and declares: “Nonhalter!” In doing this, Brown assuredly decides m, and his performance is effective. And yet what’s the difference between what Brown does and what our “Eco-ish” agents do? The activity involved is decomposable in both cases. There are innumerable “subterranean” cognitive processes going on beneath Brown’s activity, but they are beside the point: that we don’t (or perhaps can’t) put them on display does not tell against the effectiveness in question. The fact is that Brown simply looks at the diagram, finds the relevant fragment, assimilates, and returns a verdict.16 The same is true of our agents in the case of stories. Before turning to consider other attacks on CT, we point out that the predicates I and IUE really aren’t exotic, despite appearances to the contrary. All those who try to harness the concepts of theoretical 16
Our example is perfectly consistent with the fact that the set of TMs, with respect to whether or not they halt, is not Turing-decidable.
On the Provability, Veracity, and AI-Relevance...
97
computer science (concepts forming a superset of the formal ones canvassed in this book) in order to get things done end up working with predicates at least as murky as these two. A good example is to be found in the seminal work of John Pollock, which is based on the harnessing of theoretical computer science (including AH) so as to explicate and implement concepts like warrant, defeasibility, prima facie plausibility, and so on.17
5. Arg3 in Context: Other Attacks on CT Over the past six decades, the possibility of CT’s falsity has not only been raised,18 but CT has been subjected to a number of outright attacks. While we obviously don’t have the book-long space it would take to treat each and every attack, we think it’s possible to provide a provisional analysis that is somewhat informative, and serves to situate Bringsjord’s own attack on CT. What this analysis shows, we think, is that Arg3 is the most promising attack going. Following R.J. Nelson [1987], we partition attacks on CT into three categories: CAT1 Arguments against the arguments for CT; CAT2 Arguments against CT itself; and CAT3 Arguments against doctrines (e.g., the computational conception of mind) which are said (by some, anyway) to presuppose CT.
Consider CAT3 first. Perhaps the most promising argument in this category runs as follows. Assume for the sake of argument that all human cognition consists in the execution of effective processes (in brains, perhaps). It would then follow by CT that such processes are Turing-computable, i.e., that computationalism is true. 17 Here is one example from [Pollock 1995]: Pollock’s oscar system is designed so as to constantly update that which it believes in response to the rise and fall of arguments given in support of candidate beliefs. What constitutes correct reasoning in such a scheme? Pollock notes that because a TM with an ordinary program can’t decide theorems in first-order logic (the set of such theorems isn’t Turing-decidable), answering this question is quite tricky. He ingeniously turns to super-computation for help: the basic idea is that oscar’s reasoning is correct when it generates successive sets of beliefs that approach the ideal epistemic situation in the limit. This idea involves AH, as Pollock explains. 18 Boolos and Jeffrey, for example, in their classic textbook Computability and Logic [1989], provide a sustained discussion of CT—and take pains to leave the reader with the impression that CT can be overthrown.
98
Selmer Bringsjord, Konstantine Arkoudas
However, if computationalism is false, while there remains incontrovertible evidence that human cognition consists in the execution of effective processes, CT is overthrown. Attacks of this sort strike us as unpromising. For starters, many people aren’t persuaded that computationalism is false (despite some careful arguments we have ourselves given; e.g., see Bringsjord & Arkoudas [2004]). Secondly, this argument silently presupposes some sort of physicalism, because the evidence for the effectiveness of cognition (in the sense that all cognition is effective; only this view can support an overthrow of CT in CAT3) no doubt derives from observation and study of processes in the central nervous system. Thirdly, it is certainly at least an open question as to whether the processes involved are effective. Indeed, by Bringsjord’s lights, some of the processes that constitute cognition aren’t effective. What about CAT1? The main issue with the work of all those who intend to attack CT by attacking the time-honored rationales for it is that such work can at best expose flaws in particular arguments for the thesis, but cannot refute the thesis itself. For example, William Thomas [1973] seeks to capitalize on the fact (and it is a fact, that much is uncontroversial) that the main rationale behind CT involves empirical induction—a form of reasoning that has little standing in mathematics. Unfortunately, Thomas’ observations don’t threaten CT in the least, as is easy to see. Most of us believe, unshakably believe, that the universe is more than 3 seconds old— but what mathematical rationale have we for this belief? As Russell pointed out, mathematics is quite consistent with the proposition that the universe popped into existence 3 seconds ago, replete not only with stars, but with light here on Earth from stars, and also with minds whose memories include those we have. More generally, of course, from the fact that p doesn’t follow deductively from a set of propositions Γ, it hardly follows that p is false; it doesn’t even follow that p is the slightest bit implausible. We are left, then, with CAT2—the category into which Bringsjord’s attack on CT falls. How does Arg3 compare with other attacks in this category? To support the view that Bringsjord’s attack is superior, let us consider a notorious argument from four decades back, one due to L´ aszló Kalm´ar [1959] (and rejected by none other than Elliott Mendelson [1963]), and the only other mod-
On the Provability, Veracity, and AI-Relevance...
99
ern attack on CT that we know of, one given by Carol Cleland [1993; 1995].19 5.1. Kalm´ ar’s Argument against CT
Here’s how Kalm´ ar’s argument runs. First, he draws our attention to a function g that isn’t Turing-computable, given that f is:20 ½ the least y such that f (x, y) = 0 if y exists g(x) = µy (f (x, y) = 0) = 0 if there is no such y Kalm´ar proceeds to point out that for any n ∈ N for which a natural number y with f (n, y) = 0 exists, “an obvious method for the calculation of the least such y [...] can be given,” namely, calculate in succession the values f (n, 0), f (n, 1), f (n, 2), . . . (which, by hypothesis, is something a computist or TM can do) until we hit a natural number m such that f (n, m) = 0, and set y = m. On the other hand, for any natural number n for which we can prove, not in the frame of some fixed postulate system but by means of arbitrary—of course, correct—arguments that no natural number y with f (n, y) = 0 exists, we have also a method to calculate the value g(n) in a finite number of steps: prove that no natural number y with f (n, y) = 0 exists, which requires in any case but a finite number of steps, and gives immediately the value g(n) = 0. [Kalm´ar 1959, p. 74]
Kalm´ar goes on to argue as follows. The definition of g itself implies the tertium non datur, and from it and CT we can infer the existence of a natural number p which is such that (i) there is no natural number y such that f (p, y) = 0; and (ii) this cannot be proved by any correct means. 19
Perhaps we should mention here something that students of CT and its history will be familiar with, viz., given an intuitionistic interpretation of ‘effectively computable function,’ CT can be disproved. The basic idea is to capitalize on the fact that any subset of N is intuitionistically enumerable, while many such sets aren’t effectively enumerable. (A succinct presentation of the disproof can be found on page 592 of Nelson [1987].) The main problem with such attacks on Church’s thesis, of course, is that they presuppose (certain axioms of—see e.g., Kreisel [1965; 1968]) intuitionistic logic, which most reject. 20 The original proof can be found on page 741 of [Kleene 1983].
100
Selmer Bringsjord, Konstantine Arkoudas
Kalm´ar claims that (i) and (ii) are very strange, and that therefore CT is at the very least implausible. This argument is interesting, but really quite hopeless, as a number of thinkers have indicated. For example, as Mendelson [1963] (see also Moschovakis’ [1968] review of both Kalm´ar’s paper and Mendelson’s reaction) points out, Kalm´ar’s notion of ‘correct proof,’ for all Kalm´ ar tells us, may fail to be effective, since such proofs are outside the standard logical system (set theory formalized in first-order logic). This is surely historically fascinating, since—as we have seen—it would be Mendelson who, nearly thirty years later, in another defense of CT (the one we examined earlier), would offer a proof of the ‘only if’ direction of this thesis—a proof that he assumes to be correct but one that he admits to be beyond ZF. But the root of Kalm´ ar’s problem is that his proofs, on the other hand, are wholly hypothetical: we don’t have a single one to ponder. And things get even worse for Kalm´ ar (as Nelson [1987] has pointed out), because even absent the proofs in question, we know enough about them to know that they would vary for each argument to g that necessitates them, which would mean that Kalm´ar has failed to find a uniform procedure, a property usually taken to be a necessary condition for a procedure to qualify as effective. Though Kalm´ ar does anticipate the problem of lack of uniformity,21 and though Bringsjord personally happens to side with him on this issue, it is clear that his argument against CT fails: If Kalm´ar’s argument is to succeed, (ii) can be supplanted with (ii0 ) this cannot be proved by any effective means.
But then how can the argument be deductively valid? It is not, at bottom, a reductio, since (i) and (ii0 ) surely are not absurd, and this is the only form a compelling version of the argument could at core be. Kalm´ ar himself, as we have noted, confesses that his argument 21
He says: By the way, [the assumption that the procedure in question] must be uniform seems to have no objective meaning. For a school-boy, the method for the solution of the diverse arithmetical problems he has to solve does not seem uniform until he learns to solve equations; and several methods in algebra, geometry and theory of numbers which are now regarded group-theoretic methods were not consider as uniform before group-theory has (sic) been discovered. [Kalm´ ar 1959, p. 73]
On the Provability, Veracity, and AI-Relevance...
101
is designed only to show that CT is implausible, but this conclusion goes through only if (i) and (ii0 ), if not absurd, are at least counterintuitive. But are they? For some, perhaps; for others, definitely not. Our own take on Kalm´ ar’s argument is that it can be rather easily shown to be flawed as follows: First, let m1 , m2 , . . . , mn , mn+1 , . . . enumerate the set of Turing machines. Now substitute for Kalm´ar’s g the following function.
h(mi ) =
½
1 0
if mi halts if mi doesn’t halt
Recall that if a TM halts, simulating this machine will eventually reveal this fact. This allows us to produce an exact parallel to Kalm´ar’s reasoning: Start with m1 ; proceed to simulate this machine. Assuming it halts, return 1, and move on to m2 , and do the same for it; then move to m3 , and so on. While this process is running, stand ready to prove “not in the frame of some fixed postulate system but by means of arbitrary—of course, correct—arguments” that the machine mi fails to halt, in which case 0 is returned. The parody continues as follows. Given CT, and the law of the excluded middle (which the definition of the function h presupposes), we infer two implausible propositions—propositions so implausible that CT is itself cast into doubt. They are: (ih ) there exists an mk such that h(mk ) = 0; and (ii0h ) this cannot be proved by any effectively computable means.
This is a parody, of course, because both of these propositions are fully expected and welcomed by all those who both affirm CT and have at least some familiarity with the formalisms involved. Now, what about Bringsjord’s case against CT? First, the narrational case is deductive, as Arg3 makes plain. Second, the process of reading (and possibly rereading a finite number of times) a story, assimilating it, and judging whether or not it’s interesting on a fixed evaluation scheme—this process is transparently effective. (Indeed, related processes are routinely requested on standardized tests containing reading comprehension problems, where stories are read, per-
102
Selmer Bringsjord, Konstantine Arkoudas
haps reread, and judged to express one from among n “main ideas.”) Third, the process we’re exploiting would seem to be uniform.22 5.2. Cleland’s Doubts about CT
Cleland [1993; 1995] discusses three variants on our CT: CT1 Every effectively computable number-theoretic function is Turing-computable. CT2 Every effectively computable function is Turing-computable. CT3 Every effective procedure is Turing-computable. Before evaluating Cleland’s arguments against this trio, some exegesis is in order. First, each of these three theses is a conditional, whereas CT, as we have explained, is a bi conditional. There should be no question that the biconditional is more accurate, given not only Mendelson’s authoritative affirmation of the biconditional form, but also given that Church himself originally refers to his thesis as a definition of “effectively calculable function” in terms of “recursive function” [Church 1940].23 However, since we have happily conceded the ‘if’ direction in CT, there is no reason to worry about this aspect of Cleland’s framework. The second point is that by ‘number-theoretic’ function Cleland simply means a mapping from N to N . We thus now understand function simpliciter, as for example it’s used in CT2 , to allow functions from the reals to reals.24 There is of course no denying that Church and Turing failed to advocate CT2 , but CT1 is certainly the “left-to-right” direction of our CT. Now, what does Cleland say against CT1 -CT3 ? She claims, first, that CT3 can be disproved; the argument is simply this. One type of effective procedure coincides with what Cleland calls “mundane procedures,” which are “ordinary, everyday procedures such as recipes 22 No doubt test designers are correct that a uniform procedure needs to be followed in order to excel in their reading comprehension sections. So why wouldn’t the process at the heart of Arg3 be uniform as well? 23 On the other hand, Church then immediately proceeds to argue for his “definition,” and the reader sees that he is without question urging his readers to affirm a thesis. 24 It will not be necessary to present here the formal extension of computability with number-theoretic functions to computability with functions over the reals. For the formal work, see, e.g., [Grzegorczyk 1955; 1957].
On the Provability, Veracity, and AI-Relevance...
103
for making Hollandaise sauce and methods for starting camp fires; they are methods for manipulating physical things such as eggs and pieces of wood” [Cleland 1995, p. 11]. Turing machine procedures, on the other hand, are “methods for ‘manipulating’ abstract symbols” [Cleland 1995, p. 11]. Since mundane procedures have “causal consequences,” and TMs (qua mathematical objects) don’t, it follows straightaway that mundane procedures aren’t Turing-computable, that is, ¬CT3 .25 Cleland’s reasoning, when formalized, is certainly valid. The problem is that CT3 (on that reading) has next to nothing to do with those propositions placed in the literature under the title “Church’s Thesis”! CT3 is a variant that no one has ever taken seriously. It may seem to some that CT3 has been taken seriously, but this is only because one construal of it, a construal at odds with Cleland’s, has in fact been recognized. On this construal, that a procedure is Turingcomputable can be certified by either a relevant design (e.g., a TM flow-graph for making Hollandaise sauce, which is easy to come by or by a relevant artifact (e.g., an artificial agent capable of making Hollandaise sauce, which again is easy to come by). At any rate, we’re quite willing to concede that CT3 , on Cleland’s idiosyncratic reading, is provably false. (Note that we have known for decades that even CT1 , on an intuitionistic (and hence idiosyncratic) reading of “effectively computable function,” is provably false. See note 20.) It’s worth noting that Cleland herself has sympathy for those who hold that her reading of CT3 is not a bona fide version of Church’s Thesis [Cleland 1995, p. 10]. What then, about CT2 and CT1 ? Here Cleland no longer claims to have a refutation in hand; she aims only at casting doubt on these two theses. This doubt is supposed to derive from reflection upon what she calls “genuinely continuous devices” [Cleland 1995, p. 18], which are objects said to “mirror” Turing-uncomputable functions [Cleland 1995, pp. 16–17]. An object is said to mirror a function iff (a) it includes a set of distinct objects which are in one-to-one correspondence with the numbers in the field of the function, and (b) the object pairs each and every object corresponding to a number in the domain of the function with an object corresponding to the appropriate number in the 25
In [Bringsjord and Zenzen 2002] we explain why Cleland’s placing recipes for such things as cheese balls alongside mathematical accounts of computation is unacceptable.
104
Selmer Bringsjord, Konstantine Arkoudas
range of the function. Cleland takes pains to argue, in intuitive fashion, that there are objects which mirror Turing-uncomputable functions (e.g., an object moving through a 2-dimensional Newtonian universe). She seems unaware of the fact that such objects provably exist—in the form, for example, of analog chaotic neural nets and, generally, analog chaotic dynamical systems [Siegelmann and Sontag 1994, Siegelmann 1995]. (These objects are known to exist in the mathematical sense. Whether they exist in the corporeal world is another question, one everyone—including Cleland—admits to be open.) We will be able to see Cleland’s fundamental error (and, indeed, the fundamental error of anyone who attacks CT by taking her general route) if we pause for a moment to get clear about the devices in question. Accordingly, we’ll present here an analog dynamical system via the “analog shift map,” which is remarkably easy to explain. First let’s get clear on the general framework for the “shift map.” Let A be a finite alphabet. A dotted sequence over A is a sequence of characters from A∗ wherein one dot appears. For example, if A is the set of digits from 0 to 9, then 3.14 is a dotted sequence over A. Set A· to the set of all dotted sequences over A. Dotted sequences can be finite, one-way infinite (as in the decimal expansion of π), or bi-infinite. Now, let k ∈ N ; then the shift map S k : A· → A· : (a)i → (a)i+k
shifts the dot k places, negative values for a shift to the left, positive ones a shift to the right. (For example, if (a)i is 3.14159, then with k = 2, S 2 (3.14159) = 314.159.) Analog shift is then defined as the process of first replacing a dotted substring with another dotted substring of equal length according to a function g : A· → A· . This new sequence is then shifted an integer number of places left or right as directed by a function f : A· → Z. Formally, the analog shift is the map Φ : a → S f (a) (a ⊕ g(a)),
where ⊕ replaces the elements of the first dotted sequence with the corresponding element of the second dotted sequence if that element is in the second sequence, or leaves it untouched otherwise. Formally:
(a ⊕ g)i =
½
gi ai
if gi ∈ A if gi is the empty element
On the Provability, Veracity, and AI-Relevance...
105
Both f and g have “finite domains of dependence” (DoDs), which is to say that they depend only on a finite dotted substring of the sequence on which they act. The domain of effect (DoE) of g, however, may be finite, one-way infinite, or bi-infinite. Here is an example from [Siegelmann 1995, p. 547] which will make things clear, and allow us to see the fatal flaw in Cleland’s rationale for doubting CT2 and CT1 . Assume that the analog shift is defined by (where π 2 is the left-infinite string . . . 51413 in base 2) DoD 0.0 0.1 1.0 1.1
f 1 1 0 1
g π2 .10 1.0 .0
and that we have a starting sequence of u = 000001.10110; then the following evolution ensues: 000001.00110 0000010.0110 π 2 .0110 π 2 0.100 π 2 0.100 π 2 01.00 π 2 1.00 π 2 01.00 At this point the DoD is 1.0 and hence no changes occur; this is a fixed point. Only the evolution from an initial dotted sequence to a fixed point counts.26 In this case the input-output map is defined as the transformation of the initial sequence to the final subsequence to the right of the dot (hence in our example u as input leads to 00). The class of functions determined by the analog shift includes as a proper subset the class of Turing-computable functions (the proof is straightforward: Siegelmann [1995]). Moreover, the analog shift map 26
For a nice discussion of the general concept of a fixed point in connection with supertasks, see [Steinhart 2002].
106
Selmer Bringsjord, Konstantine Arkoudas
is a mathematical model of idealized physical phenomena (e.g., the motion of a billiard ball bouncing among parabolic mirrors). From this it follows that we provably have found exactly what Cleland desires, that is, a genuinely continuous device that mirrors a Turinguncomputable function. So, if Cleland can establish that (16) If x mirrors a function, then x computes it, she will have overthrown both CT2 and CT1 . Unfortunately, given our analysis of the analog shift map, we can see that Cleland doesn’t have a chance; here is how the reasoning runs. Recall, first, the orthodox meaning of ‘effectively computable function,’ with which we started this chapter: a function f is effectively computable provided that, an agent having essentially our powers, a computist (or worker), can compute f by following an algorithm. So let’s suppose that you are to be the computist in the case of the analog shift map. There is nothing impenetrable about the simple math involved; we’ll assume that you have assimilated it just fine. So now we would like you to compute the function Φ as defined in our example involving π. To make your job as easy as possible, we will guarantee your immortality, and we will supply you with an endless source of pencils and paper (which is to say, we are “idealizing” you). Now, please set to work, if you will; we will wait and observe your progress... What happened? Why did you stop? Of course, you stopped because you hit a brick wall: it’s rather challenging to write down and manipulate (or imagine and manipulate mentally) strings like π in base 2! (Note that the special case where the DoE of g is finite in the analog shift map generates a class of functions identical to the class of Turing-computable ones.) Yet this is precisely what needs to be done in order to attack CT2 and CT1 in the way Cleland prescribes. Cleland sees the informal version of the problem, for she writes: Is there a difference between mirroring a function and computing a function? From an intuitive standpoint, it seems that there is. Surely, falling rocks don’t compute functions, even supposing that they mirror them. That is to say, there seems to be a difference between a mere representation of a function, no matter how detailed, and the computation of a function. [Q:] But what could this difference amount to? [Cleland 1995, p. 20]
On the Provability, Veracity, and AI-Relevance...
107
She then goes on to venture an answer to this question: A natural suggestion is that computation requires not only the mirroring of a function but, also, the following of a procedure; falling rocks don’t compute functions because they don’t follow procedures. [Cleland 1995, p. 20]
Cleland then tries to show that this answer is unacceptable. The idea is that since the answer doesn’t cut it, she is entitled to conclude that (16) is true, that is, that there isn’t a difference between mirroring a function and computing a function,27 which then allows the mere existence of (say) an idealized billiard ball bouncing among parabolic mirrors to kill off CT2 and CT1 . What, then, is Cleland’s argument for the view that the “natural suggestion” in response to Q fails? It runs as follows: Turing machines are frequently construed as purely mathematical objects. They are defined in terms of the same kinds of basic entity (viz., sets, functions, relations and constants) as other mathematical structures. A Turing machine is said to compute a number-theoretic function if a function can be defined on its mathematical structure which has the same detailed structure as the number-theoretic function concerned; there isn’t a distinction, in Turing machine theory, between computing a function and defining a function [...] If computing a function presupposes following a procedure, then neither Turing machines nor falling rocks can be said to compute functions. [Cleland 1995, p. 21]
This argument is an enthymeme; its hidden premise is that ‘compute’ is used univocally in the relevant theses, i.e., that ‘compute’ means the same thing on both the left and right sides of CT, CT1 , and CT2 . This premise is false. The locution ‘f is effectively computable,’ on the orthodox conception of Church’s Thesis, does imply that there is an idealized agent capable of following an algorithm in order to compute f . But it hardly follows from this that when ‘compute’ is used in the locution ‘f is Turing-computable’ (or in the related locution ‘TM M computes f ’), the term ‘compute’ must 27
This reasoning is certainly enthymematic (since it hides a premise to the effect that there are no other answers that can be given to question Q), but we charitably leave this issue aside.
108
Selmer Bringsjord, Konstantine Arkoudas
have the same meaning as it does in connection with idealized agents. Certainly anyone interested in CT, and in defending it, would hasten to remind Cleland that the term ‘compute’ means one thing when embedded within CT’s left side, and another thing when embedded within CT’s right side.28 Having said this, however, and having implicitly conceded the core mathematical point (viz., that at least some definitions of TMs and Turing-computability deploy ‘compute’ in the absence of the concept of “following”29 ), we should probably draw Cleland’s attention to the formal approach we took, where in order to characterize information-processing beyond the Turing Limit, we distinguished between a TM as a type of architecture, and a program which this architecture follows in order to compute. Cleland never intended to literally refute CT1 and CT2 . (As we have seen, she did intend to refute the heterodox CT3 , and for the sake of argument we agreed that here she succeeds.) But she fails even in her attempt to cast doubt upon these theses, and therefore CT is unscathed by her discussion.
6. Church’s Thesis and Computationalism In this final section we briefly discuss the relationship between CTT and computationalism, the view, roughly, that cognition is computation. The plan is as follows. We start by clarifying computationalism, and end up distinguishing between “weak” and “strong” versions of the doctrine. Next, we consider an argument deemed by Copeland to be fallacious. We show that the argument is formally valid once neatened, and is aimed at validating weak computation28
Unexceptionable parallels abound: We can say ‘My friend told me that Burlington is a nice city,’ and we can say ‘My CD-ROM travel program told me that Burlington is a nice city,’ but we needn’t accept the view that ‘told me’ means the same in both utterances. 29 Consider, e.g., one Bringsjord uses in teaching mathematical logic: A Turing machine is a quadruple (S, Σ, f, s) where 1. S is a finite set of states; 2. Σ is an alphabet containing the black symbol —, but not containing the symbols ⇐ (“go left”) and ⇒ (“go right”). 3. s ∈ S is the initial state; 4. f : S × Σ −→ (Σ ∪ {⇐, ⇒}) × S (the transition function).
On the Provability, Veracity, and AI-Relevance...
109
alism.30 Of course, since this argument has CTT as a premise, it is sound only if Bringsjord’s argument against CTT given in Section 3 fails. 6.1. What Is Computationalism?
Propelled by the writings of innumerable thinkers (this touches but the tip of a mammoth iceberg of relevant writing: [Peters 1962], [Barr 1983], [Fetzer 1994], [Simon 1980], [Simon 1981], [Newell 1980], [Haugeland 1985], [Hofstadter 1985], [Johnson–Laird 1988], [Dietrich 1990], [Bringsjord 1992], [Searle 1980], [Harnad 1991]), computationalism has reached every corner of, and indeed energizes the bulk of, contemporary AI and cognitive science. However, this isn’t to say that the view has been once and for all defined. The fact is, the doctrine is exceedingly vague. Myriad one-sentence versions of it float about; e.g., 1. Thinking is computing. 2. Cognition is computation. 3. People are computers (perhaps with sensors and effectors). 4. People are Turing machines (perhaps with sensors and effectors). 5. People are finite automata (perhaps with sensors and effectors). 6. People are neural nets (perhaps with sensors and effectors). 7. Cognition is the computation of Turing-computable functions. . 8. ..
For present purposes, such a list isn’t particularly helpful. We need to settle on one proposition, so that we can have some hope of productively discussing the relationship between CTT and computationalism. The one that we pick (unsurprisingly, given that we have have anchored our discussion of Church’s Thesis to CTT) is the 30
This is as good a place as any to point out that earlier, we brought to your attention that some have maintained that computationalism presupposes CTT. This view would amount to C → CTT, and if CTT is false, it would of course follow by modus tollens that C is false. However, we also pointed out that this conditional is questionable. But we focus in this final part on the converse: whether CTT implies (perhaps in conjunction with some additional premises) computationalism.
110
Selmer Bringsjord, Konstantine Arkoudas
fourth, and we unpack it into a “strong” and “weak” version, and simply drop the parenthetical: C s People are Turing machines. C w People can be simulated by Turing machines. We can put these doctrines perspicuously: Given that p ranges over persons, and m over Turing machines, we simply say: C s ∀p∃m p = m C w ∀p∃m S(m, p) Having on hand these versions of computationalism will prove valuable when it comes time to consider whether this doctrine is entailed by CTT.31 Of course, as you might expect, more will need to be said about what ‘simulation’ means here. 6.2. The ‘Simulation Fallacy’—Isn’t a Fallacy
Copeland tells us that one commits the ‘Simulation Fallacy’ “by believing that the Church–Turing thesis, or some formal result proved by Turing or Church, secures the truth of the proposition that the brain can be simulated by a Turing machine” [1998, p. 133]. As a paradigmatic example of the fallacy at work, Copeland gives us this passage from John Searle: Can the operations of the brain be simulated on a digital computer [read: Turing machine—B.J.C.]? [...] The answer [...] seems to me [...] demonstrably ‘Yes’ [...] That is, naturally interpreted, the question means: Is there some description of the brain such that under that description you could do a computational simulation of the operations of the brain. But given Church’s thesis that anything that can be given a precise enough characterization as a set of steps can be simulated on a digital computer, it follows trivially that the question has an affirmative answer. The operations of the brain can be 31
We do not consider in this chapter the various attacks on C s in the literature, many of which have been published by Bringsjord (e.g., to pick just one, see [Bringsjord 1999]), joined in some cases by Arkoudas (e.g., see [Bringsjord and Arkoudas 2004]).
On the Provability, Veracity, and AI-Relevance...
111
simulated on a digital computer in the same sense in which weather systems, the behavior of the New York stock market, or the pattern of airline flights over Latin America can. [1992, pp. 200–201]
It seems to us that Searle’s reasoning here is perfectly valid. The proposition he seeks to establish is this one: (S) Under some description d, the activity of the brain is Turingcomputable. His argument for (S) is really quite straightforward. He points out that under some descriptions, the activity of the brain can be given a “precise enough characterization as a set of steps,” that is, can be characterized in algorithmic, or effectively computable, steps. Since that which is algorithmic is Turing-computable by CTT, (S) follows. In stark sequence, Searle’s argument is
∴
Arg4 Under some particular description d, the activity of the brain is effectively computable. CTT Under some particular description d, the activity of the brain is Turing-computable.
Not only does this seem to be a perfectly valid argument, but there would seem to be descriptions that could be set to d to make the first premise true. We have in mind the common, general description (seen in the fields of cognitive science and AI) of the brain in terms of (conventional) artificial neural networks, which provably only process information in effectively computable ways.32 According to Copeland, other well-known thinkers have been led astray by the Church–Turing fallacy. Victims supposedly include the Churchlands, and Johnson–Laird. But their arguments, characterized by Copeland as bald attempts to “deduce from Church’s Thesis [= our CTT] that the mindbrain can in principle be simulated by a Turing machine” [Copeland 1998, p. 133], can all be charitably read as formidable instances of Arg4 . 32
See, for example, the discussion of the mindbrain and neural nets in [Russell and Norvig 2002; Bringsjord and Zenzen 1997].
112
Selmer Bringsjord, Konstantine Arkoudas
Figure 4: Snapshot of Virtual Storm Front Approaching Troy It’s very important to realize that Searle and others, when speaking here of ‘simulation,’ do not have in mind the technical sense introduced early in the study of computability. This is the technical sense used in simulation proofs, for example that a multi-tape Turing machine can be shown by simulation to be equivalent in power to a standard one-tape Turing machine (a readable version of this proof is given in [Lewis and Papadimitriou 1981]). Instead, what these thinkers have in mind is the sense of ‘simulation’ at work when, for example, the weather is simulated.33 Consider the case of a hurricane H. H exists in the real world, and can of course wreak tremendous havoc. The virtual version of H—call it H sim ,—on the other hand, is quite benign, and is thoroughly digital, regulated by standard computation. More importantly, H sim is built by selectively attending to only some of the attributes possessed by H; the former is far from an atom-by-atom representation of the latter in some computational system. What Searle has in mind is only some description: there is a description of the brain that admits of a Turing-computable simulation. This is a very weak claim; Copeland evidently fails to appreciate just how weak it is. Such a claim, in the case of the weather, is in fact easy to concretize courtesy of what many of us 33 Note that Searle’s [1980] famous Chinese Room Argument against strong AI explicitly invokes a sense of simulation matching the sense in use in the realm of weather.
On the Provability, Veracity, and AI-Relevance...
113
routinely consult: radar. Figure 4 shows a snapshot of a simulation of a storm that, as I type this sentence, is about to come crashing through Troy and then over the Taconic mountains just east of the city. Each of these snapshots can be chained together to make a temporally extended simulation; and this simulation, more assuredly, is Turing-computable. The same kind of thing can without question be done for the brain. Notice as well the rather obvious connection between (S) and w C . First, instead of speaking of human brains, Searle could speak of human persons. Second, to say that under some description d, x is Turing-computable, is just to say that x can be simulated by a Turing machine. So, there is a simple variant of Searle’s argument that runs like this:
∴
Arg04 Under some particular description d, the (cognitive) activity of persons is effectively computable. CTT Cw
We conclude that, contra Copeland, “Weak” computationalism (or, as it’s sometimes called, “Weak” AI) does indeed follow from CTT. Of course, if Bringsjord’s argument against Church’s thesis in Section 3 is sound, then the case in question, while based on formally valid reasoning, nonetheless fails for the simple reason that one of the premises in it, CTT, is false.
References Arkoudas, K. [2005], “Combining Diagrammatic and Symbolic Reasoning”, Technical Report 2005–59, MIT Computer Science and Artificial Intelligence Lab, Cambridge, USA. Ashcraft, M. [1994], Human Memory and Cognition, Harper-Collins, New York, NY. Barr, A. [1983], “Artificial Intelligence: Cognition as Computation”, in The Study of Information: Interdisciplinary Messages, (F. Machlup ed.), Wiley-Interscience, New York, NY, pp. 237–262. Boolos, G.S. and Jeffrey, R.C. [1989], Computability and Logic, Cambridge University Press, Cambridge, UK.
114
Selmer Bringsjord, Konstantine Arkoudas
Bringsjord, S. [1992], What Robots Can and Can’t Be, Kluwer, Dordrecht, The Netherlands. Bringsjord, S. [1999], “The Zombie Attack on the Computational Conception of Mind”, Philosophy and Phenomenological Research 59.1, 41–69. Bringsjord, S. and Arkoudas, K. [2004], “The Modal Argument for Hypercomputing Minds”, Theoretical Computer Science 317, 167–190. Bringsjord, S. and Ferrucci, D. [2000], Artificial Intelligence and Literary Creativity: Inside the Mind of Brutus, a Storytelling Machine, Lawrence Erlbaum, Mahwah, NJ. Bringsjord, S., Ferrucci, D., and Bello, P. [2001], “Creativity, the Turing Test, and the (Better) Lovelace Test”, Minds and Machines 11, 3–27. Bringsjord, S. and Zenzen, M. [1997], “Cognition is not Computation: The Argument from Irreversibility?”, Synthese 113, 285–320. Bringsjord, S. and Zenzen, M. [2002], “Toward a Formal Philosophy of Hypercomputation”, Minds and Machines 12, 241–258. Bringsjord, S. and Zenzen, M. [2003], Superminds: People Harness Hypercomputation, and More, Kluwer Academic Publishers, Dordrecht, The Netherlands. Buss, S. [1998], “Introduction to Proof Theory”, in Handbook of Proof Theory, Studies in Logic and the Foundations of Mathematics 137, (S. Buss ed.), Elsevier. Charniak, E. and McDermott, D. [1985], Introduction to Artificial Intelligence, Addison-Wesley, Reading, MA. Church, A. [1940], “A Formulation of the Simple Theory of Types”, Journal of Symbolic Logic 5, 56–68. Cleland, C. [1993], “Is the Church-Thesis True?”, Minds and Machines 3, 283–312. Cleland, C. [1995], “Effective Procedures and Computable Functions”, Minds and Machines 5, 9–23. Copeland, B.J. [1998], “Turing’s O-Machines, Searle, Penrose and the Brain”, Analysis 58(2), 128–138. Davis, M.D., Sigal, R., and Weyuker, E.J. [1994], Computability, Complexity, and Languages, 2nd edn, Academic Press.
On the Provability, Veracity, and AI-Relevance...
115
Dennett, D. [1991], Consciousness Explained, Little, Brown, Boston, MA. Dietrich, E. [1990], “Computationalism”, Social Epistemology 4(2), 135–154. Eco, U. [1979], The Role of the Reader: Explorations in the Semiotics of Texts, Indiana University Press, Bloomington, IN. Ernest, P. [1998], Social Constructivism as a Philosophy of Mathematics, State University of New York Press. Fetzer, J. [1994], “Mental Algorithms: Are Minds Computational Systems?”, Pragmatics and Cognition 2.1, 1–29. Gordon, M.J.C. and Melham, T.F. [1993], Introduction to HOL, a Theorem Proving Environment for Higher-Order Logic, Cambridge University Press, Cambridge, England. Graphic Art Materials Reference Manual [1981], Letraset, New York, NY. Grzegorczyk, R. [1955], “Computable Functionals”, Fundamentals of Mathematics 42, 168–202. Grzegorczyk, R. [1957], “On the Definitions of Computable Real Continuous Functions”, Fundamentals of Mathematics 44, 61–71. Hammer, E.M. [1995], Logic and Visual Information, CSLI Publications, Stanford, California. Harnad, S. [1991], “Other Bodies, Other Minds: A Machine Incarnation of an Old Philosophical Problem”, Minds and Machines 1(1), 43–54. Haugeland, J. [1985], Artificial Intelligence: The Very Idea, MIT Press, Cambridge, MA. Henson, C.W. [1984], Review of Set Theory: An Introduction to Independence Proofs by K. Kunen, Bulletin of the American Mathematical Society (New Series) 10, 129–131. Hofstadter, D. [1982], “Metafont, Metamathematics, and Metaphysics”, Visible Language 14(4), 309–338. Hofstadter, D. [1985], “Waking Up from the Boolean Dream”, Metamagical Themas: Questing for the Essence of Mind and Pattern, Bantam, New York, NY, pp. 631–665. Johnson–Laird, P. [1988], The Computer and the Mind, Harvard University Press, Cambridge, MA.
116
Selmer Bringsjord, Konstantine Arkoudas
Kalm´ar, L. [1959], “An Argument Against the Plausibility of Church’s Thesis”, in Constructivity in Mathematics, (A. Heyting ed.), North-Holland, Amsterdam, The Netherlands, pp. 72–80. Kitcher, P. [1977], “On the Uses of Rigorous Proof”, Science 196, 782–783. Kleene, S.C. [1983], “General Recursive Functions of Natural Numbers”, Math. Annalen 112, 727–742. Kleiner, I. [1991], “Rigor and Proof in Mathematics: A Historical Perspective”, Mathematics Magazine 64(5), 291–314. Kreisel, G. [1965], “Mathematical Logic”, in Lectures in Modern Mathematics, (T. Saaty ed.), John Wiley, New York, NY, pp. 111–122. Kreisel, G. [1968], “Church’s Thesis: A Kind of Reducibility Thesis for Constructive Mathematics”, in Intuitionism and Proof Theory, Proceedings of a Summer Conference at Buffalo, N.Y., (A. Kino, J. Myhill, and R. Vesley eds.), North-Holland, Amsterdam, The Netherlands, pp. 219–230. Kugel, P. [1986], “Thinking May Be More Than Computing”, Cognition 18, 128–149. Lakatos, I. [1976], Proofs and Refutations: the Logic of Mathematical Discovery, Cambridge University Press. Levy, A. [1979], Basic Set Theory, Springer. Lewis, H.R. and Papadimitriou, C.H. [1981], Elements of the Theory of Computation, Prentice Hall, Englewood Cliffs, NJ. Lewis, H.R. and Papadimitriou, C.H. [1997], Elements of the Theory of Computation, Prentice Hall. Maddy, P. [1997], Naturalism in Mathematics, Oxford University Press. McMenamin, M. [1992], Deciding Uncountable Sets and Church’s Thesis, [unpublished manuscript]. Meehan, J. [1981], “Tale-spin”, in Inside Computer Understanding: Five Programs Plus Miniatures, (R. Schank and C. Reisbeck eds.), Lawrence Erlbaum, Englewood Cliffs, NJ, pp. 197–226. Mendelson, E. [1963], “On Some Recent Criticism of Church’s Thesis”, Notre Dame Journal of Formal Logic 4(3), 201–205.
On the Provability, Veracity, and AI-Relevance...
117
Mendelson, E. [1990], “Second Thoughts about Church’s Thesis and Mathematical Proofs”, Journal of Philosophy 87(5), 225–233. Moschovakis, Y. [1968], “Review of Four Recent Papers on Church’s Thesis”, Journal of Symbolic Logic 33, 471–472. One of the four papers is Kalm´ ar [1959], “An Argument Against the Plausibility of Church’s Thesis”, in Constructivity in Mathematics, (A. Heyting ed.), Amsterdam, The Netherlands: North-Holland, pp. 72–80. Moschovakis, Y.N. [1998], “On Founding the Theory of Algorithms”, in Truth in mathematics, (H.G. Dales and G. Oliveri eds.), Oxford Science Publications, pp. 71–104. Nelson, R.J. [1987], “Church’s Thesis and Cognitive Science”, Notre Dame Journal of Formal Logic 28(4), 581–614. Newell, A. [1980], “Physical Symbol Systems”, Cognitive Science 4, 135–183. Peters, R.S. (ed.) [1962], Body, Man, and Citizen: Selections from Hobbes’ Writing, Collier, New York, NY. Pollock, J. [1995], Cognitive Carpentry: A Blueprint for How to Build a Person, MIT Press, Cambridge, MA. Post, E. [1944], “Recursively Enumerable Sets of Positive Integers and their Decision Problems”, Bulletin of the American Mathematical Society 50, 284–316. Rogers, H. [1967], Theory of Recursive Functions and Effective Computability, McGraw-Hill Book Company. Russell, S. and Norvig, P. [2002], Artificial Intelligence: A Modern Approach, Prentice Hall, Upper Saddle River, NJ. Schank, R. [1995], Tell Me a Story, Northwestern University Press, Evanston, IL. Searle, J. [1980], “Minds, Brains and Programs”, Behavioral and Brain Sciences 3, 417–424. Searle, J. [1992], The Rediscovery of the Mind, MIT Press, Cambridge, MA. Shin, S.-J. [1995], The Logical Status of Diagrams, Cambridge University Press. Sieg, W. and Byrnes, J. [1996], “K-Graph Machines: Generalizing Turing’s Machines and Arguments”, in G¨ odel 96, Lecture Notes in Logic, Springer-Verlag, New York, NY, pp. 98–119.
118
Selmer Bringsjord, Konstantine Arkoudas
Siegelmann, H. [1995], “Computation Beyond the Turing Limit”, Science 268, 545–548. Siegelmann, H. and Sontag, E. [1994], “Analog Computation via Neural Nets”, Theoretical Computer Science 131, 331–360. Simon, H. [1980], “Cognitive Science: The Newest Science of the Artificial”, Cognitive Science 4, 33–56. Simon, H. [1981], “Study of Human Intelligence by Creating Artificial Intelligence”, American Scientist 69(3), 300–309. Steinhart, E. [2002], “Logically Possible Machines”, Minds and Machines 12(2), 259–280. Stillings, N., Weisler, S., Chase, C., Feinstein, M., Garfield, J., and Rissland, E. [1995], Cognitive Science, MIT Press, Cambridge, MA. Thomas, W. [1973], “Doubts about Some Standard Arguments for Church’s Thesis”, Papers of the Fourth International Congress for Logic, Methodology, and Philosophy of Science, Bucharest, D. Reidel, Amsterdam, The Netherlands, pp. 13–22. Trabasso, T. [1996], “Review of Knowledge and Memory: The Real Story”, Minds and Machines 6, 399–403. Troelstra, A.S. and Schwichtenberg, H. [1996], Basic Proof Theory, Cambridge University Press, Cambridge, England. Turing, A.M. [1936], “On Computable Numbers with Applications to the Entscheidungsproblem”, Proceedings of the London Mathematical Society 42, 230–265. Wiles, A. [1995], “Modular Elliptic Curves and Fermat’s Last Theorem”, Annals of Mathematics 141(3), 443–551. Wiles, A. and Taylor, R. [1995], “Ring-Theoretic Properties of Certain Hecke Algebras”, Annals of Mathematics 141(3), 553–572. Wyer, R.S. [1995], Knowledge and Memory: The Real Story, Lawrence Erlbaum, Hillsdale, NJ.
Carol E. Cleland∗
The Church–Turing Thesis. A Last Vestige of a Failed Mathematical Program 1. Introduction The historical roots of theoretical computer science are embedded in a debate over the foundations of mathematics that occurred at the turn of the twentieth century. This is somewhat surprising because physical science is prima facie a more natural place to look for answers to questions about the capacities of concrete machines to perform tasks. But the Turing analysis of computation1 , which still dominates mainstream computer science, is not grounded in considerations from physical theory. It is directly descended from a radical but failed metaphysical theory of mathematics, namely, Hilbert’s formalist program. Hilbert’s formalist program was precipitated by the discovery of paradoxes in Georg Cantor’s set theory. Cantor’s set theory introduced the perplexing idea of the actual infinite into mathematics, that is, the idea of the infinite as a completed entity, as opposed to an unending process or an indefinitely large or small magnitude. His purpose was to give mathematicians access to the real number system as a fully articulated abstract structure for representing the points on the line of geometry. In order to do this, Cantor needed to make good mathematical sense out of the mysterious irrational ∗ C.E. Cleland, Philosophy Department and Center for Cognitive Science, University of Colorado, Boulder, CO, USA 1 There is debate over whether Alan Turing actually held the view that is currently imputed to him; see Copeland [1998]. What I have in mind by the Turing account of computation, however, is the received view on Turing’s account, which may or may not reflect what Turing actually thought.
120
Carol E. Cleland
magnitudes that had plagued mathematicians since the time of the ancient Greeks. By the late nineteenth century, concern over the nature of the irrationals had again become acute. They seemed indispensable for clarifying the foundations of mathematical analysis, which provides the theoretical underpinnings for the invaluable calculus. Efforts to account for the irrational magnitudes using concepts from limit theory, which construes the infinite as merely potential (i.e., in terms of an unending process), had met with failure. Many mathematicians were reluctantly coming to the conclusion that understanding the irrational magnitudes requires recourse to the full strength of completed infinities. Cantor was the first to formulate a rigorous mathematical theory of the actual infinite. He founded it upon a theory of sets. But as mathematicians soon discovered, there are serious flaws in the foundation that Cantor bequeathed them. Hilbert’s formalist program for mathematics represents a drastic effort to salvage Cantor’s set theory from the paradoxes that threatened it. In Hilbert’s poignant words, “no one shall be able to drive us from the paradise that Cantor created for us” [Hilbert 1925]. The first section of this paper traces the development of Hilbert’s formalist program from its roots in antiquity to the paradoxes discovered in Cantor’s theory of sets. The unifying thread running through this discussion is the problem of matching up the numbers of arithmetic with the points on the line of geometry and the metaphysical status of the mysterious irrational magnitudes, which are an inevitable by-product of such attempts. In this context, it is important to keep in mind that my exposition is highly selective and incomplete. I am not adumbrating a new historical interpretation of the history of mathematical thought about the irrationals or the development of Hilbert’s views about the nature of mathematics. My purpose is merely to undermine the widespread view that the Turing account of computation rests upon solid (albeit inconclusive) mathematical foundations. As I explain, Hilbert’s program represents a drastic break with traditional thought about the nature of mathematics, a break whose credibility rested almost exclusively upon its ability to circumvent the paradoxes afflicting Cantor’s set theory. Unfortunately, Hilbert’s formalist program was soundly defeated by Kurt G¨ odel’s incompleteness theorems.
The Church–Turing Thesis...
121
The second section of this paper explores the extent to which the Church–Turing thesis and the Turing account of computation to which it gave rise are based upon Hilbert’s formalist program for mathematics. As is well known, the Church–Turing thesis grew out of attempts by Alonzo Church, Kurt G¨odel, Alan Turing, and others to solve the Enscheidungproblem, a problem peculiar to Hilbert’s program. What is not fully appreciated, however, is the extent to which the Turing account is based upon problematic assumptions that are very specific to Hilbert’s program—assumptions that are not shared by other, more plausible, accounts of the nature of mathematics. As I shall show, the credibility of the Turing account is significantly diminished if these dubious formalist assumptions are rejected.
2. Pythagoras to Euclid Concern over the metaphysical status (a.k.a. nature) of irrational “numbers” has plagued mathematicians for well over two thousand years. Their discovery is traditionally attributed to the Greek philosopher Pythagoras (c. 569–500 B.C.). Pythagoras founded a mystical order around the arithmetic of the whole numbers (positive integers). The Pythagoreans discovered that they could construct a fruitful system of mathematics by representing geometrical entities in terms of whole numbers and their ratios; they did not view ratios of whole numbers as bona fide (viz., rational) numbers. The Pythagorean theorem provides a particularly salient example of the Pythagorean approach. Although it is unclear whether Pythagoras or one of his followers discovered it, and some scholars believe that it was familiar to the earlier Babylonians, the Pythagorean theorem fit beautifully into the Pythagorean program of number worship. The Pythagorean theorem expresses the relationship of the diagonal of a right triangle to its sides in arithmetical terms, more specifically, the magnitude (c) of the length of the hypotenuse of a right triangle is equal to the sum of the squares of the magnitudes (a & b) of the lengths of the other two sides, or c2 = a2 + b2 . Their success in arithmetizing geometry led them to believe that reality is fundamental arithmetical, which explains their mysticism about the whole numbers. But dark clouds loomed on the horizon. If one applies the Pythagorean theorem to a right triangle having sides of length 1, one
122
Carol E. Cleland
gets an incommensurable ratio of whole numbers (i.e., a ratio that is not measurable by whole number multiples of a common unit) as the value for the length of the hypotenuse. This ratio is the root of the equation c2 = 2. Because the value of c cannot be written as the ratio of two whole numbers it fell outside the system of numbers venerated by the Pythagoreans; the discombobulated Pythagoreans 2 referred to it as “alogon” √ (irrational). This is not surprising to us because we believe that 2 (as it is now written) is neither a positive integer nor a rational number (which can be uniquely represented as a ratio of positive integers). It is a different kind of number, namely, an irrational number. But it was devastating to the Pythagoreans. Legend has it that some of them committed suicide and the rest swore an oath of secrecy punishable by death. One member of the order, Hippasus, reportedly paid the ultimate penalty for leaking the terrible secret to the outside world! The failure of the Pythagorean program transformed Greek mathematics. Geometry had been shown to be more powerful than arithmetic (which at this time was limited to the whole numbers). Reversing the Pythagorean approach, Greek mathematicians, most conspicuously Euclid, developed geometrical accounts of the arithmetic of the whole numbers, including the problematic incommensurable ratios. In Book V of Euclid’s Elements, numbers are represented as lengths, angles, areas, and volumes. Arithmetical operations are interpreted in terms of geometrical constructions. The sum of two lengths is a length and the product of two lengths is an area, for example. In other words, numbers were construed as geometrical entities. The Greek geometrical interpretation of arithmetic dominated mathematical thought for two thousand years.
3. The Development of Mathematical Analysis The debate over the nature of the mysterious irrational magnitudes played a central role in the evolution of analysis from its origins in the theory of the calculus; see Boyer [1959] for a detailed history of the development of analysis. Ren´e Descartes’ invention of the Cartesian coordinate system, which provided the first systematic method for assigning numbers to elements of geometrical figures, 2
Thanks to Chris Shields for the translation and additional information on the way in which the Pythagoreans used the word.
The Church–Turing Thesis...
123
opened up the possibility of representing geometrical objects algebraically, something that had been unavailable to the Greeks. The calculus unified and systematized a plethora of diverse techniques for analyzing continuous processes and solving geometrical problems in the context of the Cartesian coordinate system. Isaac Newton and Gottfried Leibniz independently developed the calculus in the seventeenth century. Taking motion as the paradigm of a continuous process, Newton interpreted a curve as the path traced by a moving point. Leibniz introduced the precursor of the contemporary concept of a function. In keeping with the geometrical interpretation of magnitudes inherited from the Greeks, he loosely associated functions with properties of curves [Kitcher 1983, pp. 171-72]. Algebraic expressions were used to represent functions but, in light of their intended geometrical interpretations, they were viewed as encapsulating arithmetical relations holding between the numerical values of the x- and y- axes (abscissa and ordinate) of the Cartesian coordinate system in the plane. As we shall see, the problem of reconciling algebraic representations of function and number with their geometrical conceptions played a crucial role in the evolution of analysis. In keeping with the geometrical conception, the basic operations of the calculus, integration and differentiation, were characterized geometrically. Integrating a function involved summing an infinite number of infinitesimally thin rectangles to produce a specific numerical value for the area under a curve. Differentiating a function depended upon the idea of a ratio of infinitesimal values (corresponding to infinitely small differences in ordinate and abscissa) yielding a specific numerical value for the slope of the tangent to a curve at a point; this value was interpreted as a rate of change. Functions that proved difficult to analyze were approximated by means of infinite series whose terms, while infinite in number, consisted of familiar functions (e.g., of powers, of sines and cosines). Thus at its inception the calculus drew upon the concept of the infinitely large (e.g., infinite series or sums) and the infinitely small (infinitesimals). This was not a happy state of affairs for seventeenth and eighteenth century mathematicians, who almost universally viewed completed infinities as metaphysical monstrosities. Differentials, as Leibniz dubbed the infinitesimal increments symbolized by “dx” (Newton called them “fluxions”), were very peculiar quantities. They approached zero closer than any finite quantity without actually
124
Carol E. Cleland
reaching it. At different stages in the solution to the same problem they were treated inconsistently, at one point being taken to be extremely small but nevertheless nonzero values and at another point being taken as equal to zero. In addition, there was the problem of making good sense of the idea of performing algebraic operations (e.g., addition and division) on infinitesimal quantities or infinite sequences of (finite or infinitesimal) quantities: How could an unending sequence of separately performed operations yield an exact numerical value? If division by zero is prohibited, how can division by an infinitesimal quantity be legitimate? In the ensuing years, the calculus gradually expanded into an algebraic study of functions known as “analysis”. The central problem of early analysis was finding an algebraic mode of representation for an arbitrary curve; it wasn’t clear that every curve (that can be drawn) is algebraically representable. Infinite series representations had proven their mettle for representing difficult functions, and thus seemed ideal candidates. The algebra soon took on a life of its own. Mathematicians discovered that they could use infinite series to define esoteric functions that were difficult to picture geometrically. Algebra seemed to be more powerful than geometry for thinking about functions. As a consequence, geometrical reasoning fell increasingly under suspicion, and the use of infinite series to represent functions took on central theoretical importance. The calculus slowly ceased to be a geometrical study of continuous processes (in the intuitive pictorial sense) and became an algebraic study of more enigmatic infinite numerical processes. But the notion of function was still inextricably linked to geometry through the idea that to any well-defined function there corresponds a graph (the “path” traced by a point moving in conformity with the numerical regularity expressed by the algebraic expression of the function). The problem was to make sense of the graphs of the esoteric functions. In order to do this, analysts needed an arithmetical model of the points on a line (broadly construed to include both finite line segments and infinite lines) which could replace simple geometrical reasoning about lines, curves, areas, and tangents while preserving the successes that had been founded upon it. The French mathematician Augustin–Louis Cauchy undertook the challenge. Cauchy focused on the fact that there are points on a line that have no exact numerical representation but can nonethe-
The Church–Turing Thesis...
125
less be approximated to any desired degree of accuracy by infinite sequences of rational numbers. This suggested the possibility of using infinite series to extend the set of known “numbers” (the rationals) to include the mysterious irrational magnitudes, which had heretofore been understood geometrically in terms of lengths, and hence divisions or points. But in order to define the irrational numbers in terms of infinite sequences of rational numbers, Cauchy needed mathematical recourse to the concept of infinity. Like his predecessors, he loathed the actual infinite. He therefore opted for the merely potential infinite, a concept legitimated by Aristotle two thousand years earlier. Cauchy introduced the potential infinite into mathematics by means of the concept of a limit, which he defined as a “fixed value” that may be approached as closely as one wishes but is never actually reached. That is, he explicated the potentially infinite in terms of the idea of an unending mathematical process. With a mathematically precise concept of the potential infinite at hand, Cauchy turned his attention to providing an algebraic reconstruction of analysis. He defined the real numbers in terms of the limits of infinite converging sequences of rational numbers, the irrationals being identified with the limits of sequences that fail to converge to a rational value. Mathematicians at last had an expanded set of numbers for representing (indexing) the points on a line. Cauchy also reconstructed the basic concepts of the calculus (continuity, infinitesimal, differentiability, and the integral) in terms of the concept of a limit. A fully algebraic theory of analysis that could dispense with simple intuitive geometrical reasoning and yet make sense of the graphs of the esoteric functions seemed at hand. Unfortunately, Cauchy failed to fully deliver on his promises. Defining an “irrational number” as the limit of a converging sequence of rational numbers that fails to converge to a rational value amounts to assuming the existence of the very entities whose existence is in question. Indeed, why conclude that such a sequence converges to any number whatsoever? One cannot appeal to geometrical intuitions (lengths and points of division) to justify this claim if the goal is to construct a wholly arithmetical theory of the real numbers. Moreover, Cauchy’s limit theory wasn’t powerful enough to enable him to completely escape the need for geometrical reasoning. The algebra sometimes became unwieldy, and he was forced to fall back
126
Carol E. Cleland
upon intuitive geometrical conceptions of the real number system in order to justify his conclusions. Most famously, on the basis of geometrical intuitions, he assumed that continuous functions or curves are differentiable at most points. It wasn’t long before mathematicians discovered functions that violate some of Cauchy’s geometrically grounded assumptions. Around 1830 Bernhard Bolzano defined a continuous function having no derivative. (Unfortunately, the mathematical community ignored his discovery, and Karl Weierstrass, who rediscovered such functions thirty years later, is usually credited with it.) Similarly, the great German mathematician Karl Riemann exhibited a function f (x) that is discontinuous at infinitely many points in an interval and yet has an integral that defines a continuous function F (x) that lacks a derivative at each of these infinitely many points. Riemann also defined a function that is discontinuous for all rational arguments and continuous for all irrational arguments. Viewed in the context of their intended geometrical interpretations, these functions were truly pathological. They characterized a continuous line as divided into an infinite number of discontinuities in an unimaginable fashion; see [Tiles 1989, pp. 80-82] for a detailed discussion. More specifically, they introduced an infinite density of discontinuities into a line segment; every subinterval, however small, contained them. As a consequence, they could not be viewed as introduced successively, as required by Cauchy’s theory. They divided a line segment in one fell swoop, so-to-speak. This suggested that a continuous line is actually divided (vs. merely potentially divisible) into an infinite collection of distinct points, one for each division introduced by a discontinuity. With great reluctance, mathematicians began to seriously entertain the possibility that a satisfactory arithmetical model of the points on a line requires recourse to the actual infinite. A number of mathematicians, most notably, Karl Weierstrass, Richard Dedekind, and Georg Cantor, pursued the development of an arithmetical theory of the real numbers that could make sense of the pathological functions. Pursuing an analogy with the pointdiscontinuities in the graphs of the pathological functions, Dedekind identified real numbers with unique “cuts” in the rational number system. These cuts divide the rational numbers into two classes A and B √ such that every member of A is less than every member of B. The 2, for instance, is defined as the cut that divides the rational
The Church–Turing Thesis...
127
numbers into the class A consisting of all the negative rational numbers, zero, and those positive rational numbers whose squares are less than 2, and the class B consisting of all the remaining rational numbers. Weierstrass and Cantor, in contrast, reworked Cauchy’s ideas. Cleverly circumventing the problem of presupposing the existence of irrational numbers, Weierstrass identified the limits of converging sequences with the sequences themselves, as opposed to a fixed value that is ever more closely approached without being reached. Utilizing this notion of limit, he developed a complex and subtle theory of the real numbers. Cantor, who is the focus of our discussion in the next section, grounded his theory of the real numbers on Weierstrass’s concept of a limit, defining a real number in terms of an infinite set of infinite sequences of rational numbers. These differing theories of the real numbers all have in common an appeal to actual (vs. merely potential) infinities. Real numbers are defined in terms of infinitely large collections (sequences, classes, or sets) of rational numbers. In order to preserve the uniqueness of individual real numbers, which is required if they are to provide an arithmetic model for the baffling point-structure of the line as revealed by the pathological functions, these collections must be construed as fully determinate, infinite wholes. Mathematicians could no longer ignore the actual infinite. To wrap up, by the end of the nineteenth century, the metaphysical stance on mathematics inherited from the ancient Greeks (viz., that arithmetic is to be understood geometrically) had been reversed. It was generally conceded that clarification of the foundations of analysis, with its puzzling pathological functions, required an arithmetical theory of the real numbers, with their problematic denizens the irrationals. √ The irrationals were becoming more and more puzzling. Unlike 2, the vast majority of the irrationals cannot be constructed with the tools of classical geometry. The late nineteenth century discovery (by Ferdinand Lindemann) that π is transcendental explained the inability of Greek geometers to square the circle with compass and straight edge—to find a square with the same area as a given circle. It also explained the inability of eighteenth century mathematicians to square √ the circle by algebraic means. Algebraic irrationals such as the 2 are the roots of polynomial equations with rational coeffi-
128
Carol E. Cleland
cients. Transcendental magnitudes such as π and e (the base of the natural logarithms) are not. It is tempting to dismiss magnitudes that not only lack constructible geometrical counterparts but are not even the roots of algebraic equations as imposters, denizens of the fevered imagination of over wrought mathematicians. But transcendental irrationals play indispensable roles throughout mathematics. As everyone knows, the area of a circle is equal to πd2 /4 (where ‘d’ stands for the diameter). Similarly the natural logarithm e is utilized in diverse areas of mathematics, including trigonometry, the calculus, and probability theory. As an example, ex is the function for which f (x) is equal to its own derivative. Like algebraic irrationals, however, the transcendental irrationals have decimal expansions that are unending and nonrepeating. It thus seemed clear to most late nineteenth century mathematicians that a satisfactory arithmetical theory of the real numbers requires recourse to the actual infinite. But there was a catch. Not only was the concept of the actual infinite vague and poorly understood, it wasn’t even obvious that it was intelligible. The actual infinite transcends all possible experience. It is perceptually indistinguishable from the unending and the indefinitely large or small. Moreover, as Galileo had recognized long ago, the idea of a completed infinite collection of numbers seems inherently paradoxical; he pointed out that each positive integer has a unique square associated with it and vice versa, and hence that the squares of the positive integers can be exhaustively paired off with all the positive integers even though it seems obvious that there are fewer of the former [Clark 2002]. If it were to bear the weight being placed upon it, the concept of the actual infinite needed careful clarification and analysis. Mathematicians required a rigorous theory of the actual infinite. No one saw this more clearly than Cantor. Utilizing the ostensibly innocuous concept of set (as an arbitrary collection of items), Cantor developed the first mathematical theory of the actual infinite. The apparent simplicity and extreme generality of the concept of set promised more than the clarification of the foundations of analysis. It held forth the promise of placing all of mathematics upon a secure foundation.
4. Cantor, Hilbert and G¨ odel Cantor assumed (the Cantor–Dedekind axiom) that there is a one-to-one correspondence between the real numbers and the points
The Church–Turing Thesis...
129
on the line of geometry, which was now construed very differently than the intuitively picturable line of traditional geometry. The problem of understanding the mysterious structure of the real (as it was now called) line was thus transformed into the problem of understanding the structure of the real numbers. Utilizing Weierstraass’s provocative idea of identifying the limits of converging sequences of rational numbers with the sequences themselves, Cantor defined the real numbers as sets of converging sequences of rational numbers; in contemporary terminology, each real number is identified with an equivalence class of converging sequences of rationals. These sets are not only infinite but their members consist of completed infinities (entire Weierstrassian sequences). Thus Cantor was explicitly committed to sets that are completed infinite totalities. It was clear to Cantor that the concept of set, which played the crucial role in his definition of the real numbers, needed clarification. He was using it in an intuitive way that might disguise subtle difficulties, particularly when the sets under consideration were extremely large or complicated. Accordingly, Cantor turned his attention to the task of formulating a mathematically rigorous theory of sets, a theory that would provide a solid foundation for understanding the actual infinite, and hence for explicating the structure of the real numbers with their problematic irrationals. Cantor construed a set as any collection of definite, well distinguished items. The items in a set could be entities of any sort, from physical objects (such as numerals, pebbles, butterflies, or planets) to abstract entities (e.g., numbers, concepts, ideas, or sets). Moreover, on his account, the items in a set (as opposed to the method by means of which they are grouped together) wholly determine the identity of the set. Thus a set can consist of completely unrelated items; no overarching principle of unity among its members is required. The basic set operations (union, intersection, and complement) were defined in terms of the general concept of set, e.g., the intersection of two sets is defined as the set whose members belong to both sets. Having formulated a general theory of sets, Cantor proceeded to construct a theory of the actual (completed) infinite. Cantor took what Galileo had regarded as a paradoxical feature of infinite sets (viz., that a proper subset of an infinite set can be placed into oneto-one correspondence with the set itself) as fundamental. As an
130
Carol E. Cleland
example, the set of positive even integers can be placed into oneto-one correspondence with the set of positive integers. The set of positive integers includes all the positive even integers plus the positive odd integers and zero. Thus one would think that the set of positive even integers is smaller than the set of positive integers. Nonetheless the two sets can be paired off in such a way that each member of the former corresponds to a unique member of the latter and vice versa. They thus seem to be of the same size after all. Cantor embraced this paradox as revealing a fundamental feature of infinite sets. According to Cantor, both sets have the same size or cardinality, namely, aleph-zero. Aleph-zero is the cardinality of all the countably infinite sets, regardless of the identity or order of their elements.3 Using his new set theory, Cantor developed his famous transfinite cardinals. The power set operation played the pivotal role. The power set of a given set consists of all the subsets (proper and improper) of the set. It thus represents a completed set that has more members than the original set. When it comes to infinite sets, such as the set of all positive integers, it can be demonstrated that their power sets not only have more members but they cannot be placed into one-to-one correspondence with the sets from which they are derived. They thus represent larger completed infinities. Unlike the set of positive even integers, the power set of the set of positive integers cannot be placed into one-to-one correspondence with the set of positive integers. Thus the power set of the set of positive integers, which is of the same size as the set of real numbers, is both infinite and larger than the set of positive integers. Cantor realized that he could continue the process of forming the power set of infinite sets ad infinitum, yielding a hierarchy of higher order cardinals, each of which represents a numerically larger (completed) infinity than the original set. Cantor’s theory of sets thus produced a theory of completed infinities, something that had never before been available to mathematicians. The reversal of the post-Pythagorean tradition of interpreting numbers geometrically seemed finally complete. The finegrained point structure of the line of geometry could be interpreted 3
A cardinal number designates the size of a set independently of the order in which it is arranged; an ordinal number designates the order of a member of a set with respect to a well ordering of the set.
The Church–Turing Thesis...
131
in terms of the arithmetic of the real numbers, which construed the problematic irrational magnitudes as authentic numbers. Unfortunately, however, paradoxes were soon discovered in Cantor’s precise mathematical account of sets and transfinite numbers. His transfinite ordinals were shown to be paradoxical (the Burali–Forti paradox), and he himself discovered a paradox (“Cantor’s paradox”) involving his transfinite cardinals. Bertrand Russell discovered the most serious paradox, however, viz., the self-contradictory set of all sets that are not members of themselves. The latter was particularly devastating because it involved only the concept of set, and thus struck at the very foundations of Cantor’s set theory along with everything that had been built upon it. The discovery of paradoxes in Cantor’s wonderful new set theory plunged mathematics into crisis; for a more extensive discussion of the history, see [K¨ orner 1960, Sieg 1999]. Without analysis much of modern mathematics (including the calculus, analytical geometry, abstract algebra, and most of applied mathematics) would disappear. It was clear to most mathematicians that clarifying the foundations of mathematical analysis required legitimating the full strength of the real number system with its mysterious irrational numbers. Cantor’s theory seemed the best hope for achieving this. David Hilbert endeavored to save Cantor’s set theory from the ravages of the paradoxes by radically reconceptualizing mathematics.4 According to Hilbert, mathematics is not, as traditionally thought, the study of abstract objects such as the number 2 or geometrically straight lines. The terms and operations of formal systems consist of finite numbers of primitive symbols (marks or “strokes”) and finite numbers of “purely mechanical” operations on these symbols. Considered as part of a formal system, the symbols are meaningless. They may stand for anything whatsoever or nothing at all. All that matters is their structure. Any structural feature may play the role of symbol so long as different primitive symbols are repre4 Hilbert was the originator of the formalist program in mathematics but there is disagreement about his actual views. It is generally conceded that his version of formalism is weaker than the position now associated with formalism. Indeed, Hilbert has been criticized for his fence-straddling realism about mathematics; he identified the content of mathematics with physical marks and their manipulations. My concern in this paper, however, is not with the history of Hilbert’s ideas so much as the contemporary understanding of the formalist position. For a detail discussion of Hilbert’s early views and their relation to those of later formalists, see [K¨ orner 1960, Sieg 1999].
132
Carol E. Cleland
sented by distinct structures. The operations of a formal system are sensitive only to the structure (vs. meaning) of a symbol. Their purpose is to construct strings of symbols (formulae) and to transform them into other strings of symbols in a step-by-step fashion (i.e., to construct proofs). Hilbert’s plan was to formalize enough of classical arithmetic for doing analysis while avoiding the set theoretic paradoxes. To accomplish this he needed the right kind of formalism, namely, a formalism whose formal consistency corresponds to the logical consistency of the relevant portion of classical arithmetic. The basic idea was to start with an axiomatization of classical arithmetic (of which a number were already available, including Whitehead and Russell’s, in Principia Mathematica), and then formalize it. In the process of formalization, the traditional content of arithmetic is stripped away. The features that remain are purely formal. The task is to show that the resultant formalism provides a consistent and complete formal theory of classical arithmetic. Proofs of consistency and completeness could not, however, employ methods resting upon suspect transfinite ideas. This would defeat the whole purpose of the formalist program. The powerful existence proofs of classical mathematics were thus unavailable to Hilbert. Moreover, a collection of specialized decision procedures tailored to particular problem types wouldn’t count either. What Hilbert needed was nothing less than a definite finite formal procedure that could be used to unequivocally decide the provability of any claim in formalized mathematics. This decision problem became known as Hilbert’s Entscheidungsproblem.5 Hilbert’s formalist program was effectively destroyed by Kurt G¨odel’s first incompleteness theorem, which demonstrated that a formal system rich enough to encapsulate elementary arithmetic could not be both consistent and complete. In essence, G¨odel demonstrated that not even elementary arithmetic could be fully captured in a formal system. All that remained of Hilbert’s original program was the Entscheidungsproblem. G¨odel’s work strongly suggested a negative answer to the Entscheidungsproblem. A number of mathematicians, most notably G¨odel, Church, and Turing, set out to prove that the Entscheidungsproblem is unsolvable. This work 5
Hilbert officially introduced the Entscheidungsproblem in 1928 in a textbook on logic co-authored with his student Ackermann.
The Church–Turing Thesis...
133
produced the Church–Turing thesis, which lies at the heart of the Turing account of computation.
5. The Church–Turing Thesis In order to show that the Entscheidungsproblem is unsolvable, mathematicians needed a consensus about what constitutes an “effective” (i.e., definite and finite) procedure for computing a number theoretic function (a function defined on the positive integers). The Church–Turing thesis provides this. It is founded upon two independent analyses of the concept of effective procedure. The first is based upon the lambda-calculus, a formal logical system developed by Alonzo Church in the nineteen thirties. Church argued [Church 1935] that lambda-definability represents an intuitively plausible notion of what it means for a number theoretic function to be effectively calculable. When it was subsequently shown [Church 1935; Kleene 1936] that lambda-definability is extensionally equivalent to Herbrand–G¨ odel general recursiveness, the other well-known formal logical analysis of effective computability, Church postulated [1935] that lambda-definability defines the set of computable functions. He proposed that there aren’t any number theoretic functions that are effectively computable but not lambda-definable. This bold conjecture became known as “Church’s thesis.” At almost the same time, Alan Turing, who was unaware of Church’s work, proposed a different analysis of effective calculability [Turing 1936]. Like Church, Turing was trying to show that the Entscheidungsproblem is unsolvable. Instead of using a formal logical calculus, he based his analysis on the concept of an extremely simple, abstract “mechanism” now known as a “Turing machine.” Turing subsequently demonstrated [Turing 1937] that Turing (machine) computability is extensionally equivalent (vis-`a-vis the computation of the number theoretic functions) to lambda-definability and general recursiveness. The fact that three independent and ostensibly different analyses of effective calculability were extensionally equivalent strongly suggested to mathematicians that they had finally captured the decidable (a.k.a. computable) number theoretic functions. Church’s thesis became the Church–Turing thesis. Turing’s analysis had a conceptual advantage over the others, for it seemed closer to the intuitive idea of carrying out a computation. Turing introduced it in the context of an idealized person
134
Carol E. Cleland
doing calculations with pencil and paper. Moreover, he explicitly characterized these idealized humans as “machines” and described their activity as “purely mechanical.” In Turing’s words, “A function is said to be ‘effectively calculable’ if its values can be found by some purely mechanical process. We may take this statement literally, understanding by a purely mechanical process one which could be carried out by a machine” [Turing 1939]. Given this, it is thus hardly surprising that the concept of a Turing machine is part of the foundation of theoretical computer science. Not only could the universal Turing machine compute all the computable functions (the Church–Turing thesis) but it also “performed” these calculations in a manner reminiscent of the way in which physical machines operate.
6. Inadequacies of the Turing Account of Computation Turing machine procedures are often characterized as providing paradigms of effective procedure in virtue of (1) the “formal” (and hence ultimately general) character of their symbols and operations, (2) the “purely mechanical” (and, hence, physically realistic) character of their basic operations, and (3) the “perfect precision” with which their instructions “describe” or “define” the “actions” that they prescribe (making them error free). As I have argued elsewhere [Cleland 2001; 2002; 2004], however, the attraction of the Turing account rests upon conceptual confusions, more specifically, the idea of an “operation” prescribing an “action” that is “formal,” “mechanical,” and “well-defined” or “precisely described”, and the idea of a “symbol” that is “formal,” “uninterpreted,” and “shaped.” When these concepts are carefully disentangled, the plausibility of the Turing’s account as a foundation for theoretical computer science is seriously undermined. Let us briefly review these arguments. Turing machines are characterized in the literature in two quite different ways. They are sometimes described informally as abstract, machine-like entities and sometimes described formally as set theoretic structures.6 These different characterizations underscore the hybrid nature of Turing machines as both machine-like and purely formal, which is at the heart of the claim that Turing’s account supplies a firm 6
I shall restrict our discussion to the simplest versions, namely, deterministic sequential Turing machines; everything that I say applies fairly straightforward to the more complex deterministic and multidimensional Turing machines.
The Church–Turing Thesis...
135
foundation for theoretical computer science. Because it is closer to the intuitive notion of computing a function, we shall begin our discussion with the informal analysis. Turing machines are described informally as consisting of a “mechanism” (finite state machine) coupled to an external storage medium, known as the “tape,” through an abstract device called the “head.” The tape is divided into squares, and may be indefinitely extended in either direction. In the standard set up, each square of the tape is “occupied” by one of two distinct “symbols,” traditionally represented by “S0 ” and “S1 ,” and the head, which is always positioned over a single square of the tape, is characterized as “performing” one of five, extremely simple, basic operations; it can “erase” a symbol, “write” a symbol, “move” to the left one square, “move” to the right one square, and “halt” over a square. At any given time, the machine is characterized as being in one of a finite number of “internal states,” q1 , . . . , qn . A Turing machine is said to be in a particular state qi only if it is about to carry out a particular instruction i. The instructions are conditional in form. As an example, a Turing machine instruction might read as follows: ‘If the symbol in the square being scanned is an S1 , erase it and print an S0 ; otherwise [the symbol is an S0 ] move to the left one square.’ What the head does is completely determined by the identity of the symbol being scanned (whether it is an S0 or an S1 ) and the internal state of the “machine”—the instruction that it is implementing. Turing machines compute arithmetical functions by implementing “programs” consisting of lists of conditional instructions. The arguments and values of the function are encoded on the tape as strings of S0 s and S1 s. As it follows the program, the “machine” transforms the initial string, which represents the argument, in a step-by-step fashion into a final string representing the value of the function. In keeping with the formalist framework for mathematics, Turing machine “symbols,” the entities purportedly written and erased, are characterized as “uninterpreted.” They are manipulated solely in virtue of their structure or “shape,” as opposed to their meaning or content. Thus the “shapes” of the symbols of a Turing machine are crucial to the identity of the “actions” that it “performs.” But what constitutes the “shape” of a Turing machine symbol?
136
Carol E. Cleland
Turing machines are abstract entities that can be realized by many different kinds of physical systems. Non-geometrical characteristics such as being the color blue, 2 kg, or a single flash of light may be used to instantiate a Turing machine symbol as well as geometrical characteristics such as the shape of the alpha-numerical character S0 , the numeral 2, or a pebble. This means that Turing machine symbols cannot be said to have “shapes” in the traditional geometrical sense. Indeed, the only physical constraints on the symbols of a Turing machine hold within (vs. across) its physical realizations. Within a particular realization, every token of the same symbol must share some (it does not matter what) physical property and all tokens of different symbols must differ in some (it does not matter what) physical property. Considered independently of a particular instantiation the most that may be said is that different symbols have different but not any definite distinguishing physical properties. The idea of a mere difference in some completely indeterminate physical property is not enough, however, to secure the idea that the symbols of a Turing machine have distinguishing structures of some sort. But even supposing that Turing machine “symbols” had identifying structures, the instructions of a Turing machine program couldn’t be said to provide us with “precise specifications” of “action.” Action presupposes something to manipulate, but having something to manipulate isn’t enough to specify an action. As an example, it wouldn’t suffice to specify just a knife and an apple in a recipe for apple pie. One also needs to describe what is to be done to the apple by the knife. There are many possibilities and they include quartering it, slicing it, dicing it, and pounding it. Despite the use of familiar English expressions for action, such as “erase” and “move,” the instructions of a Turing machine (qua multiply realizable “machine”) do not specify what is to be done to their symbols by their basic operations. What counts as “erasing” or “writing” a symbol is left completely open. If the symbols of a Turing machine were instantiated by pebbles “erasing” a symbol could be realized by activities as diverse as painting it, pulverizing it, or removing it from a tin can. This brings us to the much-lauded mechanical character of Turing machines. Turing machine operations and “actions” are traditionally characterized as “purely mechanical.” This helps to explain the
The Church–Turing Thesis...
137
prominent role played by Turing machines in theoretical computer science. Nevertheless, Turing machine operations and “actions” cannot be said to be “mechanical” in anything like the physical scientist’s sense. Forces operating over a distance without any intervening causal chain are capable of satisfying the instructions of a Turing machine program as readily as the stereotypically mechanical actions (“pushes” and “pulls”) of Newtonian physics. Even magical actions, such as angels creating and annihilating pebbles in tin cans, could satisfy them! This is a consequence of the fact that (in keeping with the formalist framework for mathematics) the only constraints on Turing machine “actions” are purely structural, namely, they must be (1) distinct from one another and (2) occur in a time-ordered sequence. These constraints have nothing to do with the intrinsic character of the actions involved; they represent purely external relations among actions. Whether or not an action is “mechanical” in the sense of physical science, however, crucially depends upon its intrinsic causal character. Because the intrinsic character of the “actions” prescribed by its instructions doesn’t affect the operation of a Turing machine, it is a mistake to characterize Turing machines as “mechanical” devices. In summary, on the informal version of the Turing account, it is a mistake to describe their instructions as “precisely describing” or “well defining” the “actions” that they “perform,” and it is also a mistake to characterize the operations that they prescribe as “mechanical” in anything like the sense in which physical scientist’s use the word. This significantly undermines the idea that Turing machines provide a firm foundation for theoretical computer science, which, after all, is primarily concerned with the theoretical capacities of physical machines (of any sort) to compute functions. This is not a new point. Turing’s student Robin Gandy claims that Turing never intended his analysis to apply to physical machines [Gandy 1980; 1988]; he was trying to capture the idea of a person following an instruction “without thought or volition” (a.k.a. “mechanically”). Gandy has attempted to extend Turing’s analysis to physical machines in light of very general considerations from contemporary physical theory [Gandy 1988]. Oron Shagrir argues, however, that Gandy’s account does not encompass all instances of finite machine computation [Shagrir 2002], suggesting perhaps that Turing’s account cannot be extended to physical computation. In
138
Carol E. Cleland
any case, however, it seems clear that an adequate understanding of the computational capacities of physical machines presupposes an understanding of the causal structure of physical processes. One might suspect that the formal version is immune to these conceptual difficulties; for they seem to be consequences of the informal way in which Turing machines are characterized in some of Turing’s writings and in introductory computer science texts to this day. Alas this is not the case. In the formal account, Turing machines are analyzed in terms of set theoretic structures, which consist of functions and relations (both of which are identified with sets of n-tuples, ordered in the case of the former) and constants. The “usual (prototypical) structure” for a Turing machine includes three binary functions (known as the “next place” function, the “next symbol” function, and the “next state” function) and two constants (corresponding to the symbols of the informal account), which are typically designated by the numerals 0 and 1.7 At best the constants may be interpreted as integers; to take them as numerals, which have identifying shapes, would preclude instantiating a Turing machine with pebbles or flashes of light which lack the requisite shapes. If this is the case, however, we lose even the minimalist idea that distinct Turing machine “symbols” must differ in at least some physical property. For integers (qua abstract mathematical objects) have no physical characteristics whatsoever. We are thus left with a relation of bare numerical difference among symbols. As reflection on the possibility of numerically distinct but qualitatively identical objects (e.g., two identical leaves) reveals, bare numerical difference is incompatible with the idea that uninstantiated Turing machine symbols have distinguishing structures of any sort. Moreover, insofar as they are construed as functions in the set theoretic sense (as ordered n-tuples) the basic Turing machine operations do not require dynamic change, let alone change that qualifies as mechanical, for their realization. They may be instantiated by spatially as well as temporally ordered structures. A mineral crystal could instantiate a Turing machine just as well as a laptop computer. In short, on the formal account, Turing machine instructions 7
The usual structure for a Turing machine is usually articulated in terms of the universal Turing machine. The does not, however, affect the point I am making.
The Church–Turing Thesis...
139
cannot be characterized as prescribing actions of any sort, let alone mechanical action. This brings us to what is perhaps the most problematic aspect of the formal account of Turing machines. Mathematicians do not identify Turing machines with their usual structures. They identify them only up to isomorphism. Considered independently of a particular instantiation, a Turing machine is nothing more than a class of isomorphic structures. The structures in this class include physical machines, mineral crystals, angelic games, and abstract mathematical systems such as the “usual structure” for the Turing machine; although it is picked out as representing the class as a whole, the usual structure does not have a special status with respect to the other structures. And this brings us to the important point. A class of Turing machine instantiations can no more be viewed as a Turing machine than the class of all red objects can be viewed as a red object. Considered independently of a specific instantiation, Turing machine symbols and operations amount to nothing more than logical roles in a second order structure that (strictly speaking) is not a Turing machine. At best, Turing machines (qua formal mathematical entities) may be viewed as mere procedure schemas containing placeholders for symbols and operations. It is not until we descend to the level of the individual structures in the equivalence class defining a Turing machine that we get authentic symbols and operations. In short, on the formal analysis, one cannot be said to have specifications of action, however, imprecise, and one can certainly not be said to have specifications of mechanical action. When one considers that the Turing account originated in an attempt to settle Hilbert’s Entscheidungsproblem none of this should be very surprising. Despite the misleading informal language sometimes used by Turing and others to describe them, Turing machines are not highly abstract machines. They are as purely formal as the algorithms supplied by Church’s lambda-definability and Herbrand– G¨odel general recursiveness. This is not a coincidence. All three proposals are remnants of Hilbert’s failed formalist program for mathematics.
7. Is the Church–Turing Thesis True? This brings us back to the Church–Turing thesis. In light of the above considerations, just how plausible is the claim that the Turing
140
Carol E. Cleland
computable functions exhaust the class of the computable (by any means) functions? If Hilbert’s formalist program had succeeded, the Church–Turing thesis would be on firm theoretical ground. Classical arithmetic would have been fully captured in a formal system. As a consequence, the formal possibilities for computing functions would exhaust all the possibilities, setting an absolute logical limit on what functions are computable. But Hilbert’s program did not succeed. It succumbed to G¨ odel’s incompleteness theorems. G¨odel’s incompleteness theorems demonstrate that a purely formal account of mathematics cannot provide us with a satisfactory account of elementary arithmetic, let alone the rest of mathematics. Given the failure of the formalist program, the claim that Turing computability captures the class of computable functions is suspect. Not all programs in the foundations of mathematics would find a formalist proof convincing. Intuitionism and Platonism, two historically important and highly influential schools of thought about the nature of mathematics, reject the idea that everything important to classical arithmetic can be captured in a formalization; for a more detailed discussion, see K¨orner [1960]. Something meaningful and unformalizable that renders arithmetic logically (vs. merely formally) inconsistent or consistent could have been left out of the formalism. In this context, it is important to keep in mind that the failure of Hilbert’s program represents not only a failure to formalize arithmetic but also a failure to successfully arithmetize the graphs of the exotic functions of Bolzano, Weierstrass, and Riemann. For as discussed earlier, making satisfactory arithmetical sense of the structure of these graphs requires recourse to completed infinities. Insofar as Hilbert’s program cannot provide a complete and consistent formal reconstruction of even elementary arithmetic, it cannot be viewed as saving Cantor’s theory from paradox. We are left with the unsettling possibility that the graphs of the pathological functions (and other continuous processes and structures as well) are not fully arithmetizable after all. This has implications for our theoretical understanding of computability. As an example, the claim that the computational capacities of analog computers (which exploit continuous physical processes) and digital computers (whose prototypes are Turing machines) are the same rests upon a tacit supposition to the effect that
The Church–Turing Thesis...
141
the fine structure of any continuous process or structure is fully analyzable in terms of the arithmetic of the real number system. If this is not the case, these arguments fail, and the question of the computational capacities of analog devices vis-`a-vis digital devices is left open. Put more generally, we cannot exclude the possibility of unformalizable but nevertheless viable effective decision procedures that could be utilized to compute Turing uncomputable functions. Whatever plausibility the Church–Turing thesis has must rest upon nonmathematical considerations having to do with intuitive notions about the possibilities for computing functions and the nature of physical processes. Because it is in large part responsible for the foundational role of Turing machines in theoretical computer science, Turing’s informal account is a good place to look for such considerations. Significantly, Turing’s account is what finally convinced G¨odel that mathematicians had captured the computable functions; he was not convinced by the previously established extensional equivalence (vis-`a-vis the computation of the number theoretic functions) of the explicitly formal analyses of computability provided by lambda definability and general recursiveness. In G¨odel words, “[...] with this concept [Turing’s computability] one has for the first time succeeded in giving an absolute definition of an interesting epistemological notion” [G¨ odel 1946]. As we have seen, Turing embraced a highly anthropomorphic view of computation. Turing machines were introduced in the context of a person plugging away at arithmetical calculations with pencil and paper. But the claim that human beings provide good models for the computational capacities of physical machines is highly suspect. Physical devices can accomplish many things (e.g., escape the pull of Earth’s gravitational field, withstand tremendous pressures at great ocean depths) that cannot be accomplished by unaided human beings. Why suppose that the situation is any different in the case of computing functions? It is not enough for defenders of the Turing view to retort that Turing machines represent idealized human beings that are not subject to human frailties. This merely side steps the central question, which is why model computability on the capacities of limited creatures like humans in the first place. Moreover, there are many different ways in which the computational activity of human beings could be idealized. Turing opted for setting no upper bound on the number
142
Carol E. Cleland
of actions his “machines” could perform; he allowed them unlimited space and time in which to operate. He could have idealized human action in other ways, however. He might have set no upper bound on the speed with which they perform their actions. What justification can there be for selecting the former over the latter? An obvious rejoinder is that there is an upper limit to the speed of physical processes in our universe. Einstein’s theory of relativity tells us that no real-valued mass can travel faster than light. But this is an empirical consideration. Why not invoke empirical considerations in the case of space and time too? Entropy, for instance, sets limits to the capacities of physical objects to perform actions in time. No physical device can go on and on, performing actions forever. It will eventually break down or wear out. What justification can there be for preferring one empirically unrealistic idealization of human action to another? Furthermore, not all computers behave like a Turing machine. Turing machines compute functions in a step-by-step fashion. They treat computational processes as sequences of discrete, primitive subcalculations. While some arithmetically challenged human beings may operate in this plodding manner, many computers do not. Analog computers provide particularly salient examples, but they are not the only ones. Insofar as functions are identified with sets of ordered pairs, any process that achieves the requisite pairings ought to count as a bona fide computation; how it achieves the pairings shouldn’t matter. As earlier, Turing machines seem to be too closely modeled on the behavior of a very peculiar physical process, namely, the overt computational behavior of a self-conscious human being. The familiar retort is that the Church–Turing thesis isn’t concerned with how physical processes actually compute functions. It only says that no physical processes can compute something that couldn’t be computed by a Turing machine. But given the failure of the formalist program in mathematics, what could motivate such a claim? The standard response is that Turing’s analysis initially faced two competitors, Herbrand–G¨odel general recursiveness, and Church’s lambda-calculus. Turing demonstrated that all three accounts are extensionally equivalent vis-`a-vis the computation of the number-theoretic functions. There is thus strong inductive support for the claim that mathematicians have captured the decidable (a.k.a. computable) number-theoretic functions. But this response
The Church–Turing Thesis...
143
ignores the fact that all three accounts are grounded in the formalist perspective on mathematics. Indeed, each proposal was explicitly designed to resolve Hilbert’s Entscheidungsproblem. Viewed from this perspective, it is not so surprising that they end up classifying the same functions as computable. All three are built on the same conceptual bedrock. This brings us to the oft-repeated allegation that no one has been able to come up with a computable function that cannot be computed by a Turing machine. The burden of proof, the argument goes, is on opponents of the Church–Turing thesis to provide such a function. This allegation does not, however, provide much support for the Church–Turing thesis. For it studiously ignores the growing body of literature on inductive machines (e.g., Burgin [1983], Gold [1964], Putnam [1965]), analogue chaotic neural nets (e.g., Siegelmann [1995]), and accelerating (Zeus) machines (e.g., Copeland [2002], Steinhart [2002]) claiming to have done just this! Defenders of the received view on computability ignore this literature because they are not willing to countenance an activity that fails to conform to the strictures of the official Turing line as a computation. In other words, there is more than a whiff of circularity here. It is sometimes objected that no one could possibly verify that a physical machine computed a Turing uncomputable function, and hence that the whole controversy is a mere tempest in a teapot. But as I have argued elsewhere [Cleland 1993; 1995], the verification problem is not unique to hypercomputational devices. It afflicts all physical devices for computing functions. As an example, my hand calculator will cease to function long before it can finish computing a total function such as addition. If no electronic glitches occur during its remarkably short life, my calculator will compute a partial function that is consistent with its computing addition. But this partial function is also consistent with my calculator’s computing any one of an uncountably infinite number of total functions that are not addition. This underscores a frequently overlooked point. The claim that a physical machine computes a certain function is nothing more than an empirical hypothesis. Its plausibility ultimately depends upon physical considerations, both empirical and theoretical (e.g., probabilistic causal relations, counterfactual suppositions grounded in physical law), as well as mathematical considerations (e.g., identity relations among different arithmetical operations). We cannot
144
Carol E. Cleland
dismiss the possibility that we will someday have reasons for believing that some physical device computes a Turing uncomputable function that are just as compelling as our reasons for believing that our hand calculators compute addition. The use of radioactive processes in place of pseudo-random algorithms suggests how this might happen. Turing machines cannot simulate truly random processes; they can only implement pseudo random processes. According to the theory of quantum mechanics, however, radioactive decay provides us with truly random, discrete physical processes. Because of this, radioactive processes are sometimes used in place of pseudo-random algorithms when there is a need for truly random input to a computation. As I have discussed elsewhere [Cleland 2002], whether physical processes, quantum or otherwise, can be harnessed to compute Turing uncomputable functions (in a robust sense of “compute”) ultimately depends upon the causal (vs. formal) structure of the world. In conclusion, considerations from mathematics provide little support for the Church–Turing thesis; its mathematical plausibility rests upon Hilbert’s formalist program for mathematics, which was effectively destroyed by G¨ odel’s incompleteness theorems many years ago. Furthermore, non-mathematical considerations, based upon intuitively plausible scenarios for computing functions and our current understanding of the nature of physical processes, do not provide much support for the Church–Turing thesis either. There is little reason to suppose that some exotic physical device couldn’t compute a Turing uncomputable function in exactly the same sense in which my hand calculator is said to “compute” an ordinary arithmetic function such as addition. In order to succeed in designing such machines, however, computer scientists need to focus more on considerations from physical science and the structure of causation [Cleland 2001; 2002] than on fiddling with the structure of traditional Turing machines, and this includes hypercomputational devices, which as I have argued elsewhere [Cleland 2004] are still too tightly coupled to the Turing framework for computation. In short, theoretical computer science needs to become less of a formal mathematical discipline and more of a science. Seventy long years have passed since Church first proposed his thesis. It is time for a change.
The Church–Turing Thesis...
145
References Boyer, C.B. [1959], The History of the Calculus and its Conceptual Development, New York: Dover. Burgin, M.S. [1983], “Inductive Turing machines”, Sov. Math. Dok., pp. 1289–1293. Church, A. [1936], “An Unsolvable Problem of Elementary Number Theory”, reprinted in M. Davis [1965], The Undecidable, New York: Raven Press, pp. 89–115. Clark, M. [2002], Paradoxes from A to Z, London and New York: Routledge. Cleland, C.E. [1993], “Is the Church–Turing Thesis True?”, Minds and Machines 3, 283–312. Cleland, C.E. [1995], “Effective Procedures and Computable Functions”, Minds and Machines 5, 9–23. Cleland, C.E. [2001], “Recipes, Algorithms, and Programs”, Minds and Machines 11, 219–237. Cleland, C.E. [2002], “On Effective Procedures”, Minds and Machines 12, 159–179. Cleland, C.E. [2004], “The Concept of Computability”, Theoretical Computer Science 317, 209–225. Copeland, J. [1998], “Turing’s O-machines, Searle, Penrose, and the Brain”, Analysis 58, 129–131. Copeland, J. [2002], “Accelerating Turing machines”, Minds and Machines 12, 281–301. Gandy, R. [1980], “Church’s Thesis and Principles for Mechanism”, in The Kleene Symposium, (J. Barwise, H.J. Keisler, and K. Kunen eds.), Amsterdam: North-Holland, pp. 123–148. Gandy, R. [1988], “The Confluence of Ideas in 1936”, in The Universal Turing Machine: A Half-Century Survey, (R. Herkin ed.), Oxford: Oxford University Press, pp. 55-112. G¨odel, K. [1946], “Remarks Before the Princeton Bicentennial Conference on Problems in Mathematics”, reprinted in M. Davis [1965], The Undecidable, New York: Raven Press, pp. 84. Gold, M.E. [1964], “Limiting Recursion”, J. Symbolic Logic 30, 28–48.
146
Carol E. Cleland
Hilbert, D. [1925], “On the Infinite”, reprinted in From Frege to G¨ odel, (J. Van Heijenoort ed.), Cambridge, Mass.: Harvard University Press, p. 376. Kitcher, P. [1983], The Nature of Mathematical Knowledge, Oxford: Oxford University Press. Kleene, S. [1936], “Definability and Recursiveness”, Duke Mathematical Journal 2, 340–353. K¨orner, S. [1960], The Philosophy of Mathematics, New York, Dover. Putnam, H. [1965], “Trial and Error Predicates and the Solution to a Problem of Mostowski”, J. Symbolic Logic 30, 49–57. Shagrir, O. [2002], “Effective Computation by Humans and Machines”, Minds and Machines 12, 221–240. Sieg, W. [1999], “Hilbert’s Programs: 1917–1922”, Bull. of Symbolic Logic 5, 1–44. Siegelmann, H.T. [1995], “Computation beyond the Turing Limit”, Science 268, 545–548. Steinhart, E. [2002], “Logically Possible Machines”, Minds and Machines 12, 259–280. Tiles, M. [1989], The Philosophy of Set Theory, New York: Dover. Turing, A.M. [1936; 1937], “On Computable Numbers with an Application to the Entscheidungsproblem”, reprinted in M. Davis [1965], The Undecidable, New York: Raven Press, pp. 115–154. Turing, A.M. [1939], “Systems of Logic Based on Ordinals”, reprinted in M. Davis [1965], The Undecidable, New York: Raven Press, p. 160.
B. Jack Copeland∗
Turing’s Thesis 1. Computers and Computers When Turing uses the word ‘computer’ in his early papers, he does not employ it in its modern sense. Many passages make this obvious, for example the following: Computers always spend just as long in writing numbers down and deciding what to do next as they do in actual multiplications, and it is just the same with ACE [...] [T]he ACE will do the work of about 10,000 computers [...] Computers will still be employed on small calculations. [Turing 1947, pp. 387, 391]
(The ACE or Automatic Computing Engine was an electronic storedprogram computer designed by Turing and built at the National Physical Laboratory, London. A pilot version first ran in 1950 and at the time was the fastest computer in the world.1 ) Turing introduces his ‘logical computing machines’—his term for what we now call Turing machines—with the intention of providing an idealised description of a certain human activity, the tedious one of numerical computation, which was at that time the occupation of many thousands of people in commerce, government, and research establishments. These people were referred to as ‘computers’. The term ‘computing machine’ was used increasingly from the 1920s to refer to small calculating machines which mechanised elements of the human computer’s work. For a complex calculation, several dozen human computers might be required, each equipped with a desk-top computing machine. ∗
B.J. Copeland, University of Canterbury, New Zealand. For information on the Automatic Computing Engine see Copeland (ed.) [2005a]. 1
148
B. Jack Copeland
Turing prefaces his first description of a Turing machine with the words: ‘We may compare a man in the process of computing a [...] number to a machine’ [Turing 1936, p. 59]. The Turing machine is a model, idealised in certain respects, of a human computer. Wittgenstein put this point in a striking way: ‘Turing’s “Machines”. These machines are humans who calculate’ [Wittgenstein 1980, §1096]. It is a point that Turing was to emphasise, in various forms, again and again. For example: ‘A man provided with paper, pencil, and rubber, and subject to strict discipline, is in effect a universal machine’ [1948, p. 416]. The electronic stored-program digital computers for which the universal Turing machine was a blueprint are, each of them, computationally equivalent to a Turing machine with a finite tape, and so they too are, in a sense, models of human beings engaged in computation. Turing chose to emphasise this when explaining the new electronic machines in a manner suitable for an audience of uninitiates: ‘The idea behind digital computers may be explained by saying that these machines are intended to carry out any operations which could be done by a human computer’ [1950a, p. 444]. He makes the point a little more precisely in the technical document containing his design for the ACE: The class of problems capable of solution by the machine can be defined fairly specifically. They are [a subset of] those problems which can be solved by human clerical labour, working to fixed rules, and without understanding [...] [1945, p. 386]
(Turing went on to characterise the subset in terms of the amount of time and paper available to the human clerk.) It was presumably because Turing considered the point under discussion to be essential for understanding the nature of the new electronic machines that he chose to begin his Programmers’ Handbook for Manchester Electronic Computer with this explanation: Electronic computers are intended to carry out any definite rule of thumb process which could have been done by a human operator working in a disciplined but unintelligent manner. [1950b, p. 1]
Turing’s Thesis
149
Effective Procedures and the Entscheidungsproblem
Why was it that Turing modelled the Turing machine on the human computer? He wished to show that there is no uniform method by which a human computer could carry out a certain task; and his strategy was to establish this by means of proving that no Turing machine can carry out the task in question. He introduced the Turing machine in the course of arguing that the Entscheidungsproblem, or decision problem, for the predicate calculus (posed by Hilbert2 ) is unsolvable. Here is Church’s account of the Entscheidungsproblem: By the Entscheidungsproblem of a system of symbolic logic is here understood the problem to find an effective method by which, given any expression Q in the notation of the system, it can be determined whether or not Q is provable in the system. [Church 1936, p. 41]
‘Effective’ and its synonym ‘mechanical’ are terms of art in mathematical logic. A mathematical method is termed ‘effective’ or ‘mechanical’ if and only if it can be set out in the form of a list of fully explicit instructions that admit of being followed by an obedient human clerk—the computer—who works with paper and pencil, reliably but without insight or creativity, for as long as is necessary. In the case of the propositional calculus, the truth table test is an effective method meeting the requirement set out by Church. Turing showed by means of a two-stage argument that there can be no such method in the case of the predicate calculus. First, he proved formally that there is no Turing machine that is able to determine, in a finite number of steps, whether or not any given formula Q of the predicate calculus is a theorem of the predicate calculus. Second, he argued more informally for the proposition that whenever there is an effective method for performing a mathematical task then the method can be carried out by a Turing machine in some finite number of steps. These two stages jointly secure the result that there is no effective method for determining whether or not an arbitrary formula Q of the predicate calculus is a theorem of the calculus. Notice that this result does not entail that there can be no machine for determining this. The Entscheidungsproblem for the predicate calculus is the problem of finding a humanly executable procedure of a certain sort, and the fact that there is none is consistent 2
See Hilbert and Ackermann [1928].
150
B. Jack Copeland
with the claim that some machine may nevertheless be able to decide arbitrary formulae of the calculus; all that follows is that such a machine, if it exists, cannot be mimicked by a human computer. Turing’s (and Church’s) discovery was that there are limits to what a human computer can achieve, for all that their result is often portrayed as concerning the limitations of mechanisms in general (see for example Andrew Hodges in this volume, whose claims are discussed below).
2. Turing’s Thesis The proposition that any effective method can be carried out by means of a Turing machine is known variously as ‘Turing’s thesis’ and the ‘Church–Turing thesis’. Turing stated his thesis in numerous places, with varying degrees of rigour. The following formulation is one of the most accessible: LCMs [logical computing machines] can do anything that could be described as ‘rule of thumb’ or ‘purely mechanical’. [1948, p. 414]
Turing adds ‘This is sufficiently well established that it is now agreed amongst logicians that “calculable by means of an LCM” is the correct accurate rendering of such phrases’ (ibid.). Church [1936] proposed the (not quite) equivalent thesis that whenever there is an effective method for calculating the values of a function on the positive integers then the function is recursive (not quite equivalent because Turing did not restrict attention to functions on the positive integers, mentioning also ‘computable functions of [...] a real or computable variable, computable predicates, and so forth’ [Turing 1936, p. 58]). The term ‘Church–Turing thesis’ seems to have been introduced by Kleene (with a flourish of bias in favour of his mentor Church): So Turing’s and Church’s theses are equivalent. We shall usually refer to them both as Church’s thesis, or in connection with that one of its. .. versions which deals with ‘Turing machines’ as the Church–Turing thesis. [1967, p. 232]
Turing’s Thesis
151
3. Theses to be Distinguished from Turing’s Thesis Another proposition, very different from Turing’s thesis, namely that a Turing machine can compute whatever can be calculated by any machine is nowadays sometimes referred to as the Church– Turing thesis or as Church’s thesis. This loosening of established terminology is potentially misleading, since neither Church nor Turing endorsed, implied, nor even implicitly gestured towards, this further proposition. There are numerous examples of this and other extended usages in the literature. The following are typical (for further examples see Copeland [1997]). That there exists a most general formulation of machine and that it leads to a unique set of input-output functions has come to be called Church’s thesis. [Newell 1980, p. 150] Church–Turing thesis: If there is a well defined procedure for manipulating symbols, then a Turing machine can be designed to do the procedure. [Henry 1993, p. 149]
More distant still from anything that Church or Turing actually wrote are the following theses concerning computability and physics: The first aspect that we examine of Church’s Thesis [...] [w]e can formulate, more precisely: The behaviour of any discrete physical system evolving according to local mechanical laws is recursive. [Odifreddi 1989, p. 107] I can now state the physical version of the Church–Turing principle: Every finitely realizable physical system can be perfectly simulated by [Turing’s] universal model computing machine [...] This formulation is both better defined and more physical than Turing’s own way of expressing it. [Deutsch 1985, p. 99]
The Maximality Thesis
It is important to distinguish between Turing’s thesis and the stronger proposition that whatever functions (in the mathematical sense of ‘function’) can be generated by machines can be generated by a universal Turing machine.3 (To say that a machine m generates a function f is to say that for each of the function’s arguments, 3
Gandy [1980] is one of the few writers to draw such a distinction.
152
B. Jack Copeland
x, if x is presented to m as input, m will carry out some finite number of atomic processing steps at the end of which it produces the corresponding value of the function, f(x).) I call this stronger proposition the ‘maximality thesis’ and will use expressions such as ‘Turing’s thesis properly so called’ for the proposition that Turing himself endorsed.4 Maximality Thesis: All functions that can be generated by machines (working on finite input in accordance with a finite program of instructions) are Turing-machine computable.
The maximality thesis itself admits of two interpretations, according to whether the phrase ‘can be generated by a machine’ is taken in the this-worldly sense of ‘can be generated by a machine that conforms to the physical laws (if not to the resource constraints) of the actual world’, or in a sense that abstracts from the issue of whether or not the notional machine in question could exist in the actual world. The former version of the thesis is an empirical proposition whose truth-value is unknown. The latter version of the thesis is known to be false: there are notional machines that generate functions no Turing machine can generate (see the next section). As previously remarked, the word ‘mechanical’, in technical usage, is tied to effectiveness, ‘mechanical’ and ‘effective’ being used interchangeably. (Gandy has outlined the history of this usage of the word ‘mechanical’ [1988].) Thus statements like the following are to be found in the technical literature: Turing proposed that a certain class of abstract machines could perform any ‘mechanical’ computing procedure. [Mendelson 1964, p. 229]
Understood correctly, this remark attributes to Turing not the maximality thesis but Turing’s thesis properly so called. However, this usage of ‘mechanical’ tends to obscure the possibility that there may be machines, or biological organs, that generate (or compute, in a broad sense) functions that cannot be computed by Turing machine. For the question ‘Can a machine execute a procedure that is not mechanical?’ may appear self-answering, yet this is precisely what is asked if the maximality thesis is questioned. 4
Gandy [1980] uses the label ‘thesis M’ but not the term ‘maximality thesis’ (and his thesis M differs in certain respects from the maximality thesis stated here).
Turing’s Thesis
153
In the technical literature, the word ‘computable’ is sometimes tied by definition to effectiveness: a function is said to be computable if and only if there is an effective procedure for determining its values. Turing’s thesis then becomes: Every computable function can be computed by Turing machine.
Corollaries such as the following are sometimes offered: certain functions are uncomputable in an absolute sense: uncomputable even by [Turing machine], and, therefore, uncomputable by any past, present, or future real machine. [Boolos and Jeffrey 1980, p. 55]
To a casual reader of the technical literature, statements such as this may appear to say more than they in fact do. But tying the term ‘computable’ to the concept of effectiveness cannot settle the truth-value of the maximality thesis. All that is true, if the term ‘computable’ is to be used in this way, is that a machine falsifying the maximality thesis cannot be described as computing the function that it generates.
4. Hypercomputation A hypercomputer is any machine, notional or real, that is able to generate (or compute in a broad sense of the term) functions or numbers that cannot be computed in the sense of Turing [1936], i.e. cannot be computed with paper and pencil in a finite number of steps by a human clerk working effectively. Hypercomputers generate functions or numbers, or more generally solve problems or carry out tasks, that lie beyond the reach of the universal Turing machine. The additional computational power of a hypercomputer may arise because the machine possesses, among its repertoire of fundamental operations, one or more processes that no human being unaided by machinery can perform. Or the additional power may arise because certain of the restrictions customarily imposed on the human computer are absent in the case of the hypercomputer—for example, the restrictions that data take the form of symbols on paper, that all data be supplied in advance of the computation, and that the rules followed by the computer remain fixed for the du-
154
B. Jack Copeland
ration of the computation. In one interesting family of hypercomputers, what is relaxed is the restriction that the human computer produce the result, or each digit of the result, in some finite number of steps (Copeland [1998b], [2002a]). Some notional hypercomputers are discrete state machines while others involve infinite precision measurements or real-valued connection weights and the like. In this section I describe two of the simplest models of hypercomputation. Partially Random Machines
A partially random machine (the term is from Turing’s [1948, p. 416]) is a machine some of whose actions are the outcome of random influences but whose operations are otherwise determined (e.g. by a program). Some partially random machines are hypercomputational (Copeland [2000, pp. 28–31]). A simple example consists of a Turing machine linked to a source providing an infinite sequence of binary digits that is random, r1 , r2 , ..., rn , .... Suppose the machine is set up (for example) to print out on its tape the sequence of digits 2 × r1 , ...: since this sequence is itself random, the universal Turing machine cannot produce it. As Church argued, if a sequence of digits is random then there is no function f (n) = rn that is calculable by the universal Turing machine [1940, pp. 134–135]. Hypercomputation via Perfect Measurement
Let ‘τ ’ (for Turing) be the number 0·h1 h2 h3 ..., where hi is 1 if the ith Turing machine halts when started with a blank tape, and is 0 otherwise (see Copeland [1998a], [2000]). According to some classical physical theories, the world contains continuously-valued physical magnitudes (for example, the theory of continuous elastic solids in Euclidean space). The magnitude of some physical quantity might conceivably be exactly τ units. Suppose that some mechanism A stores exactly τ units of such a physical quantity, which for the sake of vividness one might call ‘charge’. Suppose further that a mechanism B can measure the quantity of ‘charge’ stored in A to any specified number of significant figures. B determines hn by measuring A’s charge to sufficiently many significant figures and outputting the nth digit of the result. A and B together are able to solve the problem: ‘Determine, for any given n, whether or not the nth Turing machine
Turing’s Thesis
155
halts’. This problem, known as the ‘halting problem’, cannot be solved by the universal Turing machine. Is this arrangement of notional components a machine? In [2000] I argue that it is so in the sense of ‘machine’ crucial to the historical debate between mechanists and anti-mechanists about physiological and psychological mechanism (a debate involving such figures as Descartes, Hobbes, and de La Mettrie). Bechtel and Richardson [1993, p. 23] speak aptly of the mechanists’ twin heuristic strategies of decomposition and localisation. The former heuristic seeks to decompose the activity of the system whose functioning is to be explained into a number of subordinate activities; the latter attributes these subordinate activities to specific components of the system. The core of the claim, as put forward by the historical mechanists, that such-and-such naturally occurring item—a living body, say— is a machine is this: the item’s operation can be accounted for in monistic, materialist terms and in a manner analogous to that in which the operation of an artefact, such as a clockwork figure or church organ, is explained in terms of the nature and arrangement of its components. The notional arrangement just described, no less than the universal Turing machine, is a machine in the sense that its behaviour is the product of the nature and arrangement of its material parts. The claim that the mind is a machine, in the sense of ‘machine’ used by the historical mechanists, is evidently consistent with the hypothesis that the mind is a form of hypercomputer (see further Copeland [2000]). Of course, the perfect measuring device B is thoroughly notional. There is no claim that the device could actually be built, nor even that it is consistent with the physics of the real world. The AB machine is merely an easy-to-present illustration of the key idea that appropriate mechanisms are able to solve problems unsolvable by Turing machine. (In his discussion of hypercomputation in this volume, Hodges reinterprets statements in Copeland and Proudfoot [1999] concerning the potential importance of hypercomputation as claiming that this simple model of hypercomputation represents ‘a potential technological revolution as great as that of the digital computer’. Hodges has somehow confused the general theory with this pocket-sized illustration of it.)
156
B. Jack Copeland
5. Turing’s O-machines: a General Framework for Hypercomputation Turing introduced his abstract o-machines in his PhD thesis (Princeton, 1938; subsequently published as Turing [1939], this is a classic of recursive function theory). An o-machine is a Turing machine augmented with an ‘oracle’—a fundamental process that is able to solve some problem not solvable by the universal Turing machine. (For example, the problem solved by the oracle may be the halting problem—the A-B machine described in the preceding section is in effect an oracle.) The concept of oracular computation can profitably be employed in theorizing about hardware, about brains, and about minds. Turing introduced oracle machines with the following words: Let us suppose that we are supplied with some unspecified means of solving number-theoretic problems; a kind of oracle as it were. We shall not go any further into the nature of this oracle apart from saying that it cannot be a machine. With the help of the oracle we could form a new kind of machine (call them o-machines), having as one of its fundamental processes that of solving a given number-theoretic problem. [1939, p. 156]
By the statement that an oracle ‘cannot be a machine’, Turing perhaps meant that an oracle cannot be a machine of the kind so far considered in his discussion, viz. a machine that calculates effectively—a Turing machine. His statement is then nothing more than a reiteration of what he himself had shown in his [1936]. (This interpretation is to an extent supported by the fact that in the preceding pages Turing several times uses simply ‘machine’ where he clearly intends ‘computing machine’.) Or perhaps his view might have been that o-machines are machines one of whose fundamental processes is implemented by a component that is not in turn a machine. Not every component of a machine need be a machine. One might even take a more extreme view and say that the atomic components of machines—e.g. squares of paper—are never machines. (Modern writers who believe that there can be nonmechanical physical action might well choose to speak of machines with components that, while physical, are not themselves machines.)
Turing’s Thesis
157
Hodges, writing in defence of his earlier assertion that (in Turing [1936]) ‘Alan had [...] discovered something almost [...] miraculous, the idea of a universal machine that could take over the work of any machine’ [1992, p. 109], suggests that o-machines, which prima facie are a counterexample to his assertion, are according to Turing not really machines [2003, pp. 50–51]. (Hodges’ 1992 assertion amounts to the claim that the maximality thesis is implied by the discoveries set out by Turing in his 1936 paper.) Quoting Turing’s statement that an oracle ‘cannot be a machine’, Hodges concludes that o-machines are ‘only partly mechanical’ [2003, p. 50]. However, Hodges’ inference is logically on a par with: ‘Ink is not a machine, therefore a Turing machine is only partly mechanical’; machines can have parts that are not themselves machines. Hodges’ interpretation of Turing’s discussion in [1939] tends to disregard Turing’s statement that with ‘the help of the oracle we could form a new kind of machine’ (my italics). Turing said that oracle machines are machines; and there seems no reason not to take him at his word. Hodges does not address the fact that Turing repeatedly refers to o-machines as machines. For example [1939, p. 156]: Given any one of these machines [...] If the machine is in the internal configuration [...] These machines may be described by tables of the same kind as those used for the description of a-machines [...]
(a-machines are what we now call Turing machines; Turing [1936, p. 60].) Turing’s description of o-machines is entirely abstract; he introduced them in order to exhibit an example of a certain type of mathematical problem (the section in which o-machines are introduced is entitled ‘A type of problem which is not number-theoretic’). In [1936] he exhibited a problem which cannot be solved by effective means, the problem of determining, given any Turing machine, whether or not it prints infinitely many binary digits. All problems equivalent to this one he termed ‘number-theoretic’ (noting that he was using the term ‘number-theoretic’ in a ‘rather restricted sense’) [1939, p. 152]. The o-machine concept enabled him to describe a new type of problem, not solvable by a uniform process even with the help of a number-theoretic oracle. The class of machines whose oracles solve number-theoretic problems is, he showed, subject to the same
158
B. Jack Copeland
diagonal argument that he used in [1936]. The problem of determining, given any such machine, whether or not it prints infinitely many binary digits is not one that can be solved by any machine in the class and, therefore, is not number-theoretic [1939, p. 157]. Turing appealed to o-machines in discussing the question of the completeness of the ‘ordinal logics’ that he described [1939, pp. 179-80]. Via a suitable class of oracle machines and the diagonal argument, a logic itself forms the basis for the construction of a problem with which it cannot deal. Turing, as he said, did ‘not go any further into the nature’ of an oracle. There are by now in the literature numerous ways of filling out the idea of an oracle, some more physical in flavour than others. To mention only a few examples, an oracle is in principle realisable by certain classical electrodynamical systems (Scarpellini [1963], [2003]); the physical process of equilibriation (Doyle [2002]); an automaton travelling through relativistic spacetime (Pitowsky [1990], Hogarth [1992], [1994]); an ‘accelerating’ Turing machine (Copeland [1998b], [2002a]); an asynchronous network of Turing machines (Copeland and Sylvan [1999], Copeland [2002b]); a quantum mechanical computer (Calude and Pavlov [2002], Kieu [2002], Komar [1964], Stannett [1990], [2003]); an inter-neural connection (Siegelmann and Sontag [1994], Siegelmann [2003]); and a temporally evolving sequence of Turing machines, representing for example a learning mind (Copeland [2002b]). In an article in Scientific American Proudfoot and I suggested that Turing was an important forerunner of the modern debate concerning the possibility of uncomputability in physics and uncomputability in the action of the human mind (Copeland and Proudfoot [1999]). Turing more than anyone else is to be thanked for uniting mechanism (and especially mechanism about the mind) with modern mathematics. He enriched mechanism with an abstract theory of (information-processing) machines, presenting us with an indefinitely ascending hierarchy of possible machines, of which the Turing machines form the lowest level. His work posed a new question: if the mind is a machine, where in the hierarchy does it lie? In sections 8 and 9 I attempt to cast new light on Turing’s own view of the mind.
Turing’s Thesis
159
6. Hodges on Church on Turing In his 1937 review of Turing [1936], Church said: The author [Turing] proposes as a criterion that an infinite sequence of digits 0 and 1 be ‘computable’ that it shall be possible to devise a computing machine, occupying a finite space and with working parts of finite size, which will write down the sequence to any desired number of terms if allowed to run for a sufficiently long time. As a matter of convenience, certain further restrictions are imposed on the character of the machine, but these are of such a nature as obviously to cause no loss of generality—in particular, a human calculator, provided with pencil and paper and explicit instructions, can be regarded as a kind of Turing machine. It is thus immediately clear that computability, so defined, can be identified with (especially, is no less general than) the notion of effectiveness as it appears in certain mathematical problems [...] [1937a, pp. 42–43]
Hodges claims that the characterisation of computability given in this passage ‘excluded’ the possibility of ‘machines that could exceed the power of Turing machines’ [2003, p. 49]. The passage, Hodges says, shows that Church ‘equat[ed] the scope of computability with the scope of machines’ [2002, p. 2]. (In other words, according to Hodges Church equated the scope of machines with the scope of computability in Turing’s 1936 sense.) Hodges also claims that his own earlier statement, ‘Alan had [...] discovered something almost [...] miraculous, the idea of a universal machine that could take over the work of any machine’ [1992, p. 109], ‘reflects’ the description of Turing’s work given by Church in this quotation [2003, p. 49]. There is no textual evidence whatsoever that Church was saying what Hodges claims he was saying. Church’s meaning is perfectly clear, and it is difficult to see how Hodges can have been so misled. Had Church said simply ‘devise a machine, occupying a finite space and with working parts of finite size’, rather than ‘devise a computing machine [...]’, then Hodges’ interpretation might perhaps have been more plausible. But Church (who was a careful man) did not say simply ‘machine’, he said ‘computing machine’. (In the usage of his day, a computing machine was a machine that works in accordance with a systematic method—a machine that takes over the work of
160
B. Jack Copeland
the human computer.) Far from equating ‘the scope of computability with the scope of machines’, Church (loosely following Turing) is explaining Turing’s concept of a computable sequence in terms of the action of a computing machine. (Turing: ‘A sequence is said to be computable if it can be computed by a circle-free machine’ [1936, p. 61]). Hodges goes on to argue as follows: since Church presented Turing’s position as excluding the possibility of ‘machines that could exceed the power of Turing machines’, then it may reasonably be inferred that Turing himself intended to exclude this possibility, for the two men were in close contact at Princeton during the period 1936– 1938, and Church surely would not have ‘made this statement lightly, in ignorance or defiance of Turing’s views’ [2003, p. 49]. In his ‘Did Church and Turing have a thesis about machines?’ Hodges attempts to add further weight to his argument, pointing out that Turing recorded no objection to the summary of his views given by Church in the above passage, whereas Turing surely would have objected had he regarded the summary as misrepresenting his ideas. One reason why this argument has no persuasive force is that Church’s review does misrepresent Turing’s position in some significant ways. In fact Hodges himself lists some of the respects in which it does so, saying: ‘As a summary of [Turing 1936], Church’s review was notably incorrect.’ The moral, which Hodges fails to draw, is that the absence of an objection from Turing cannot be presumed to indicate that Turing thought the review a correct summary of his position. This weakness aside, the crucial stumbling block for Hodges’ argument is that, as previously mentioned, the quotation has no tendency to show that Church ‘equat[ed] the scope of computability with the scope of machines’. Hodges interprets Church’s clear remarks in an idiosyncratic way, and then proceeds to infer, on the ground that Turing raised no objection to the remarks, that Turing probably believed the remarks as interpreted by Hodges. But in fact Church was merely explaining the idea of a computable sequence in terms of the action of a computing machine: there is little mystery about why Turing voiced no objection to that! Hodges quotes from Church’s 1937 review of Post [1936] in an effort to substantiate his interpretation of Church (and so, via the argument just discussed, his interpretation of Turing). Church says: ‘To define effectiveness as computability by an arbitrary machine,
Turing’s Thesis
161
subject to restrictions of finiteness, would seem to be an adequate representation of the ordinary notion’ [1937b, p. 43]. Hodges seizes at the phrase ‘arbitrary machine’, thinking that because Church ‘based his observations on the concept of “an arbitrary machine”’, the quoted observation rules out the possibility of ‘machines that could exceed the power of Turing machines’ (see also Hodges [2003, p. 50]). Yet to define effectiveness as computability by an arbitrary machine is hardly to imply that the scope of machines is to be equated with the scope of computability—any more than defining, say, ‘twelveness’ as ‘provability by an arbitrary machine in no more than 12 lines’ implies that the scope of machines is restricted to producing proofs containing 12 lines or less. To define X as Y ability by an arbitrary machine does not imply that Y -ability is the maximum that machines can achieve (nor that every machine is able to Y ). Church’s use of the words ‘arbitrary machine’ therefore lends no credibility to Hodges’ interpretation of Church. In summary, Church’s remarks in these two reviews are consistent both with the thesis that the scope of machines is to be equated with the scope of computability and with the negation of this thesis. Church was simply silent about the thesis. Moreover, this thesis has nothing whatsoever to do with Church’s concern in the passages that Hodges quotes, namely explicating ‘the notion of effectiveness as it appears in certain mathematical problems’. It is curious that Hodges should expect this thesis to appear in Church’s explication of effectiveness. (Signs that something is amiss occur elsewhere in Hodges’ discussion of effectiveness. In ‘Did Church and Turing have a thesis about machines?’ he summarises my view as being that ‘Turing’s “oracle machine” is to be regarded as an example of effective computation’, adding that ‘embodying the oracle physically’ suffices for effectiveness. Whereas it should be clear that the issue here is the possibility of non-effective physical action and the possibility of machines whose action transcends effective computation.)
7. Hodges on Turing on Discrete State Machines In an analysis of Turing’s [1950a] Hodges says: Not quite made explicit, but implicit in every statement, is that the operation of a discrete state machine is computable. [1997, p. 35]
162
B. Jack Copeland
Let us examine carefully Turing’s argument in [1950a] concerning discrete state machines. After giving an example of a simple discrete state machine with a total of three states or configurations, Turing describes the behaviour of the machine by means of a finite table of the sort that would now be called a look-up table, and then says: This example is typical of discrete state machines. They can be described by such tables provided they have only a finite number of possible states [...] Given the table corresponding to a discrete state machine it is possible to predict what it will do. There is no reason why this calculation should not be carried out by means of a digital computer. Provided it could be carried out sufficiently quickly the digital computer could mimic the behaviour of any discrete state machine. [1950a, pp. 447–8] (my italics)
Turing’s point appears to be that any discrete state machine whose total number of configurations is finite can be mimicked by a digital computer (and therefore by a Turing machine) since the computer can be given a finite look-up table setting out the behaviour of the machine. The textual evidence does not support Hodges’ interpetation. There is no indication in the text that Turing’s implicit position is that the action of each and every discrete state machine is computable. It is any case unlikely that Turing thought that the behaviour of every discrete state machine with an unlimited number of configurations is computable. Some partially random discrete state machines exhibit uncomputable behaviour. Immediately before the discussion of discrete state machines to which Hodges is referring, Turing mentions an ‘interesting variant on the idea of a digital computer’, a ‘digital computer with a random element’ [1950a, p. 445]; and he also makes clear on the same page that he is considering digital computers ‘with an unlimited store’ (and so with an unlimited number of configurations). As Turing explains [1948, p. 416] a discrete-state machine containing a random element can be set up so as to choose between two paths of action by calling to the random element for a number and following one path if, say, the number is even and the other if it is odd. Provided that the number of possible configurations of the partially random discrete-state machine is unlimited, a Turing machine cannot calculate its behaviour (Section 4).
Turing’s Thesis
163
8. The Physics of the Mind Turing often mentions the idea of partial randomness. For example, in a paper on machine intelligence he said, ‘one feature that I would like to suggest should be incorporated in the machines [...] is a “random element”’ [c. 1951, p. 475]. He continues: ‘This would result in the behaviour of the machine not being by any means completely determined by the experiences to which it was subjected’ (ibid.). Much interested in the issue of freewill, Turing seems to have believed that the mind is a partially random machine. We have the word of one of Turing’s closest associates, mathematician and computer pioneer Max Newman, that Turing ‘had a deep-seated conviction that the real brain has a “roulette wheel” somewhere in it’.5 So far as is known, Turing’s only surviving discussion of these matters occurs in the typescript of a lecture that he gave in 1951 on BBC radio, entitled ‘Can Digital Computers Think?’ (Turing [1951]). In the course of his discussion Turing considers the claim that if ‘some particular machine can be described as a brain we have only to programme our digital computer to imitate it and it will also be a brain’ [1951, p. 483]. He remarks that this ‘can quite reasonably be challenged’, pointing out that there is a difficulty if the behaviour of the machine is not ‘predictable by calculation’, and he draws attention to Eddington’s view that ‘no such prediction is even theoretically possible’ on account of ‘the indeterminacy principle in quantum mechanics’ (ibid.). Turing’s overarching aim in the 1951 lecture is to answer the question posed by his title, and his strategy is to argue for the proposition that ‘[i]f any machine can appropriately be described as a brain, then any digital computer can be so described’ [1951, p. 482]. This proposition is consistent, he explains, with the possibility that the brain is the seat of free will: To behave like a brain seems to involve free will, but the behaviour of a digital computer, when it has been programmed, is completely determined. [...] [I]t is certain that a machine which is to imitate a brain must appear to behave as if it had free will, and it may well be asked how this is to be achieved. One possibility is to make its behaviour depend on something 5
Newman in interview with Christopher Evans (‘The Pioneers of Computing: An Oral History of Computing’, London: Science Museum).
164
B. Jack Copeland like a roulette wheel or a supply of radium. [...] It is, however, not really even necessary to do this. It is not difficult to design machines whose behaviour appears quite random to anyone who does not know the details of their construction. [1951, pp. 484–5]
Turing calls machines of the latter sort ‘apparently partially random’; an example is a Turing machine in which ‘the digits of the number π [are] used to determine the choices’ [1948, p. 416]. Apparently partially random machines imitate partially random machines. As is well-known, Turing advocates imitation as the basis of a test, now known simply as the ‘Turing test’, that ‘[y]ou might call [...] a test to see whether the machine thinks’ [1952, p. 495]. If the brain is a partially random machine, an appropriately programmed digital computer may nevertheless give a convincing imitation of a brain. The appearance that this deterministic machine gives of possessing free will is ‘mere sham’; but free will aside, it is, Turing asserts, ‘not altogether unreasonable’ to describe a machine that ‘imitate[s] a brain’ as itself being a brain [1951, pp. 484, 482]. ‘Can Digital Computers Think?’ remained virtually unknown until 1999, when I included it in a collection of important unpublished work by Turing (Copeland [1999]). In an accompanying analysis of the text I highlighted Turing’s reference to Eddington’s view and emphasised that Turing was noting the possibility that an aspect of the physics of the brain might be uncomputable [1999, pp. 451–2]. I also pointed out that Roger Penrose was evidently mistaken when he attributed to Turing the view that ‘the computational capacities of any physical device must (in idealisation) be equivalent to the action of a Turing machine’ [1994, p. 21]. (Penrose even went so far as to dub the latter ‘Turing’s thesis’ (ibid.).) Referring to my 1999 analysis, Hodges acknowledges that Turing was contemplating the possibility of ‘something physical that may not be reducible to computable action’ [2003, p. 53]. (In his public lecture at the Lausanne Turing Day in June 2002 Hodges thanked me for drawing his attention to this point, citing my analysis.) In ‘Did Church and Turing have a thesis about machines?’, Hodges reiterates that in the 1951 lecture Turing noted the possibility of ‘something about physics that might be uncomputable’ and says ‘we can see Turing as helping to open the whole question of computability and
Turing’s Thesis
165
physics as it has slowly developed over the last 50 years’. I am glad that Hodges has come to agree with me on this important matter. In light of his change of mind, Hodges might be expected to revisit aspects of his earlier interpretation of Turing, for there are undoubted tensions. There is tension, for example, with his claim (discussed above) that in the 1950 paper Turing was implicitly maintaining that the action of any ‘discrete state machine is computable’ [1997, p. 35]; and there is tension also with Hodges’ suggestion (quoted above) that the maximality thesis is implied by the discoveries set out by Turing in his 1936 paper. Hodges’ position in ‘Did Church and Turing have a thesis about machines?’ appears to be that while in the years immediately following 1936 Turing may well have considered his work to rule out the possibility of machines whose action is uncomputable, we find a different view emerging ‘if we look at a different time-scale’. In other words, Hodges relieves the tension between his earlier and later interpretations by suggesting that Turing had changed his mind by 1951. As we shall see, this is not the only place where Hodges’ interpretational ideas require him to say that Turing had a change of mind. Hodges’ ad hoc suggestion that Turing altered his view is quite unnecessary, since there is no evidence that Turing at any time understood his work on computability to rule out the possibility of machines whose action is uncomputable. There is also obvious tension between the converted Hodges’ view of the relevant passage of ‘Can Digital Computers Think?’ and his assertion that Turing’s 1950 paper ‘summarizes Turing’s post-1945 claim that the action of the brain must be computable, and therefore can be simulated on a computer’ [2003, p. 51]. Hodges is aware of this tension, but contents himself with noting briefly that the passage from ‘Can Digital Computers Think?’ ‘runs against what Turing had said’ in the previous year [2003, p. 53]—as though it were Turing himself who was guilty of inconsistency. Yet it is not Turing’s actual words that create the difficulty, but Hodges’ interpretation, whereby Turing is supposed to have claimed that ‘the action of the brain must be computable’. The extent to which this interpretation is supported by the historical record is the topic of the next section.
9. The Pre- and Post-War Turing on the Mind Concerning Turing’s [1939] Hodges writes:
166
B. Jack Copeland the evidence is that at this time [Turing] was open to the idea that in moments of ‘intuition’ the mind appears to do something outside the scope of the Turing machine [1997, p. 22]
but that: in the course of the war Turing dismissed the role for uncomputability in the description of mind, which once he had cautiously explored with the ordinal logics [1997, p. 51]
and: by 1945 Turing had come to believe computable operations had sufficient scope to include intelligent behaviour, and had firmly rejected the direction he had followed in studying ordinal logics. [1997, p. 30]
What changed Turing’s mind, Hodges suggests, was his experience at Bletchley Park: My guess is that there was a turning point in about 1941. After a bitter struggle to break U-boat Enigma, Turing could then taste triumph. Machines turned and people carried out mechanical methods unthinkingly, with amazing and unforeseen results. [...] [I] suggest that it was at this period that [Turing] abandoned the idea that moments of intuition corresponded to uncomputable operations. Instead, he decided, the scope of the computable encompassed [...] quite enough to include all that human brains did, however creative or original. [1997, pp. 28–29]
The mathematician Peter Hilton, Turing’s friend and colleague and a leading codebreaker at Bletchley Park, comments as follows (in a letter) on these passages by Hodges: I must say that, if Alan Turing’s thinking was undergoing so dramatic a change at that time, he concealed the fact very effectively.6
(Hilton adds: ‘I would never have said that we, working on Naval Enigma, “carried out mechanical methods unthinkingly”, nor that our results were “amazing and unforeseen”’.) 6
Letter from Hilton to Copeland (16 May 2003).
Turing’s Thesis
167
There is no textual evidence for the supposed sea-change in Turing’s thinking about the mind. What Turing said in [1939] is perfectly consistent with his post-war views. Indeed, Turing’s later work on the mind, far from representing a rejection of his earlier ideas, appears to be a development of them. Turing’s wartime letters to Newman provide a useful summary of Turing’s view at that time of the role of intuition in mathematics. An extract from one of these letters is printed in the appendix to this chapter. In the letter, Turing says that different Turing machines allow ‘different sets of proofs’ and ‘by choosing a suitable machine one can approximate “truth” by “provability” better than with a less suitable machine, and can in a sense approximate it as well as you please’. He points out that if one selects a ‘proof finding machine’ for proving a particular theorem, intuition is required in making the selection, just as if one constructed the proof for oneself. A human mathematician working according to the rules of a fixed logical system is in effect a proof-finding machine. When intuition supplies the mathematician with some new means of proof, he or she becomes a different proof-finding machine, capable of a larger set of proofs. If there is in principle a limit to the ability of human mathematicians to become transformed into successively more powerful proof-finding machines, no such limit has so far been discovered. As Turing said in 1936: Let δ be a sequence whose n-th figure is 1 or 0 according as n is or is not satisfactory. It is an immediate consequence of the theorem of §8 that δ is not computable. It is (so far as we know at present) possible that any assigned number of figures of δ can be calculated, but not by a uniform process. When sufficiently many figures of δ have been calculated, an essentially new method is necessary in order to obtain more figures. [1936, pp. 78–9]
(n is satisfactory if it is the description number of a circle-free Turing machine, i.e. a Turing machine that prints an infinite number of binary digits, and is unsatisfactory otherwise.) Neither in the wartime letters to Newman nor in [1939] did Turing attempt to explain the ‘activity of the intuition’ [1939, p. 192]. How does the mathematician manage the uncomputable transformation from one proof-finding machine to another? In 1939 Turing was content to leave this question to one side, making no ‘attempt to
168
B. Jack Copeland
explain this idea of “intuition” any more explicitly’ [192, p. 215]. In his post-war work, on the other hand, Turing had a lot to say that is relevant to this question. In his post-war writing on mind and intelligence (Turing [1947], [1948], [1950a], [1951], [c. 1951], [1952], and [1953]) the term ‘intuition’ drops from view and what comes to the fore is the closely related idea of learning—in the sense of devising or discovering— new methods of proof. When a human mathematician is confronted by a problem that he or she is unable to solve, the mathematician ‘would search around and find new methods of proof’ [1947, pp. 393– 4]. Turing argued forcefully that machines can do this too. The limits of a proof-finding Turing machine are determined by its table of instructions. The mechanism for acquiring new methods of proof lying beyond this limit must therefore involve the modification of the table of instructions. Turing said in 1947: What we want is a machine that can learn from experience. The possibility of letting the machine alter its own instructions provides the mechanism for this. [1947, p. 393]
And: One can imagine that after the machine had been operating for some time, the instructions would have altered out of all recognition. (ibid.)
(Ted Newman, one of the engineers at the National Physical Laboratory who built the ACE, remarked that Turing’s ‘particular purpose was to permit the writing of programs that modify programs, not in the simple way now common but rather in the way that people think’ [1994, p. 12].7 ) Modifying the table of instructions in effect transforms the learning machine into a different Turing machine. So a machine with the ability to learn is able to traverse the space of proof-finding Turing machines. The learning machine successively mutates from one prooffinding Turing machine into another, becoming capable of wider sets of proofs as it searches for and acquires new, more powerful methods of proof. Turing’s discussions of the nature of learning emphasize two points, the importance of the learner’s making and correcting mistakes, and the advantages of the involvement of a ‘random element’ 7
I am grateful to Teresa Numerico for drawing this article to my attention.
Turing’s Thesis
169
in the learning process (see Turing [1948], [1950a], [1951], [c.1951]). The idea appears to be that the partially random learning machine emulates the ‘activity of the intuition’ in its walk through the space of proof-finding Turing machines. The trajectory of the learning machine through this space might indeed be uncomputable, in the precise sense that the function on the non-negative integers whose value at i is the i th Turing machine on the trajectory need not be computable by the universal Turing machine.
Appendix: Extract from a Letter from Turing to Newman8 I think you take a much more radically Hilbertian attitude about mathematics than I do. You say ‘If all this whole formal outfit is not about finding proofs which can be checked on a machine it’s difficult to know what it is about’. When you say ‘on a machine’ do you have in mind that there is (or should be or could be, but has not been actually described anywhere) some fixed machine on which proofs are to be checked, and that the formal outfit is, as it were about this machine. If you take this attitude (and it is this one that seems to me so extreme Hilbertian) there is little more to be said: we simply have to get used to the technique of this machine and resign ourselves to the fact that there are some problems to which we can never get the answer. On these lines my ordinal logics would make no sense. However I don’t think you really hold quite this attitude because you admit that in the case of the G¨ odel example one can decide that the formula is true i.e. you admit that there is a fairly definite idea of a true formula which is quite different from the idea of a provable one. Throughout my paper on ordinal logics I have been assuming this too.9 [...] If you think of various machines I don’t see your difficulty. One imagines different machines allowing different sets of proofs, and by choosing a suitable machine one can approximate ‘truth’ by ‘provability’ better than with a less suitable machine, and can in a sense approximate it as well as you please. The choice of a proof checking machine involves intuition, which is interchangeable with the intuition required for finding an Ω if one has an ordinal logic Λ, or as a third alternative one may go straight for the proof and this again requires intuition: or one may go for a proof finding machine. I am rather puzzled why you draw this distinction between proof 8 9
A transcript of the complete letter is in The Essential Turing. Editor’s note. Turing is referring to his [1939].
170
B. Jack Copeland finders and proof checkers. It seems to me rather unimportant as one can always get a proof finder from a proof checker, and the converse is almost true: the converse fails if for instance one allows the proof finder to go through a proof in the ordinary way, and then, rejecting the steps, to write down the final formula as a ‘proof’ of itself. One can easily think up suitable restrictions on the idea of proof which will make this converse true and which agree well with our ideas of what a proof should be like.10
References Bechtel, W. and Richardson, R.C. [1993], Discovering Complexity: Decomposition and Localization as Strategies in Scientific Research, Princeton: Princeton University Press. Boolos, G.S. and Jeffrey, R.C. [1980], Computability and Logic (2nd edition), Cambridge: Cambridge University Press. Calude, C.S. and Pavlov, B. [2002], “Coins, Quantum Measurements, and Turing’s Barrier”, Quantum Information Processing 1, 107–127. Church, A. [1936], “A Note on the Entscheidungsproblem”, Journal of Symbolic Logic 1, 40–41. Church, A. [1937a], Review of Turing [1936], Journal of Symbolic Logic 2, 42–43. Church, A. [1937b], Review of Post [1936], Journal of Symbolic Logic, 2, 43. Church, A. [1940], “On the Concept of a Random Sequence”, American Mathematical Society Bulletin 46, 130–135. Copeland, B.J. [1997], “The Church–Turing Thesis”, in Stanford Encyclopedia of Philosophy, (E. Zalta ed.), . Copeland, B.J. [1998a], “Super Turing-Machines”, Complexity 4, 30–32. Copeland, B.J. [1998b], “Even Turing Machines Can Compute Uncomputable Functions”, in Unconventional Models of Computation, (C. Calude, J. Casti, and M. Dinneen eds.), London: Springer-Verlag. 10
Research on which this article draws was supported in part by Marsden Grant no. UOC905.
Turing’s Thesis
171
Copeland, B.J. (ed.) [1999], “The Turing–Wilkinson Lecture Series on the Automatic Computing Engine”, and “A Lecture and Two Radio Broadcasts on Machine Intelligence by Alan Turing”, in Machine Intelligence 15, (K. Furukawa, D. Michie, and S. Muggleton eds.), Oxford: Oxford University Press. Copeland, B.J. [2000], “Narrow Versus Wide Mechanism”, Journal of Philosophy 96, 5–32. Copeland, B.J. [2002a], “Accelerating Turing Machines”, Minds and Machines 12, 281–301. Copeland, B.J. [2002b], “Hypercomputation”, Minds and Machines 12, 461–502. Copeland, B.J. (ed.) [2004a], The Essential Turing, Oxford: Oxford University Press. Copeland, B.J. (ed.) [2005a], Alan Turing’s Automatic Computing Engine: The Master Codebreaker’s Struggle to Build the Modern Computer, Oxford: Oxford University Press. Copeland, B.J. and Proudfoot, D. [1999], “Alan Turing’s Forgotten Ideas in Computer Science”, Scientific American 280, 99–103. Copeland, B.J. and Sylvan, R. [1999], Australasian Journal of Philosophy 77, 46–66. Deutsch, D. [1985], “Quantum Theory, the Church–Turing Principle and the Universal Quantum Computer”, Proceedings of the Royal Society, Series A 400, 97–117. Doyle, J. [2002], “What is Church’s Thesis? An Outline”, Minds and Machines 12, 519–520. Gandy, R.O. [1980], “Church’s Thesis and Principles for Mechanisms”, in The Kleene Symposium, (J. Barwise, H.J. Keisler, and K. Kunen eds.), Amsterdam: North-Holland. Gandy, R.O. [1988], “The Confluence of Ideas in 1936”, in The Universal Turing Machine: A Half-Century Survey, (R. Herken ed.), Oxford: Oxford University Press. Henry, G.C. [1993], The Mechanism and Freedom of Logic, Lanham: University Press of America. Hilbert, D. and Ackermann, W. [1928], Grundz¨ uge der Theoretischen Logik, Berlin: Springer. Hodges, A. [1992], Alan Turing: The Enigma, London: Vintage. Hodges, A. [1997], Turing, London: Phoenix.
172
B. Jack Copeland
Hodges, A. [2002], “What would Alan Turing have done after 1954?”, Lecture at the Turing Day, Lausanne, 2002, Part 2, . Hodges, A. [2003], “What would Alan Turing have done after 1954?”, in Alan Turing: Life and Legacy of a Great Thinker, (C. Teuscher ed.), Berlin: Springer-Verlag. Hogarth, M.L. [1992], “Does General Relativity Allow an Observer to View an Eternity in a Finite Time?”, Foundations of Physics Letters 5, 173–181. Hogarth, M.L. [1994], “Non-Turing Computers and Non-Turing Computability”, PSA 1994 1, 126–38. Kieu, T.D. [2002], “Quantum Hypercomputation”, Minds and Machines 12, 541–561. Kleene, S.C. [1967], Mathematical Logic, New York: Wiley. Komar, A. [1964], “Undecidability of Macroscopically Distinguishable States in Quantum Field Theory”, Physical Review, second series, 133B, 542–544. Mendelson, E. [1964], Introduction to Mathematical Logic, New York: Van Nostrand. Newell, A. [1980], “Physical Symbol Systems”, Cognitive Science 4, 135–183. Newman, E. [1994], “Memories of the Pilot Ace”, Resurrection 9, 11–14. Odifreddi, P. [1989], Classical Recursion Theory, Amsterdam: North-Holland. Penrose, R. [1994], Shadows of the Mind: A Search for the Missing Science of Consciousness, Oxford: Oxford University Press. Pitowsky, I. [1990], “The Physical Church Thesis and Physical Computational Complexity”, Iyyun 39, 81–99. Post, E.L. [1936], “Finite Combinatory Processes—Formulation 1”, Journal of Symbolic Logic 1, 103–105. Scarpellini, B. [1963], “Zwei Unentscheitbare Probleme der Analysis” [Two undecidable problems of analysis], Zeitschrift f¨ ur mathematische Logik und Grundlagen der Mathematik 9, 265–289; and in English translation in Minds and Machines 13(2003), 49–77.
Turing’s Thesis
173
Scarpellini, B. [2003], “Comments on ‘Two Undecidable Problems of Analysis’”, Minds and Machines 13, 79–85. Siegelmann, H.T. and Sontag, E.D. [1994], “Analog Computation via Neural Networks”, Theoretical Computer Science 131, 331–360. Siegelmann, H.T. [2003], “Neural and Super-Turing Computing”, Minds and Machines 13, 103–14. Stannett, M. [1990], “X-Machines and the Halting Problem: Building a Super-Turing Machine”, Formal Aspects of Computing 2, 331–341. Stannett, M. [2003], “Computation and Hypercomputation”, Minds and Machines 13, 115–53. Turing, A.M. [1936], “On Computable Numbers, with an Application to the Entscheidungsproblem”, in The Essential Turing, Copeland (ed.) [2004a]. Turing, A.M. [1939], “Systems of Logic Based on Ordinals”, in The Essential Turing. Turing, A.M. [1945], “Proposed Electronic Calculator”, in Alan Turing’s Automatic Computing Engine, Copeland (ed.) [2005a]. Turing, A.M. [1947], “Lecture on the Automatic Computing Engine”, in The Essential Turing. Turing, A.M. [1948], “Intelligent Machinery”, in The Essential Turing. Turing, A.M. [1950a], “Computing Machinery and Intelligence”, in The Essential Turing. Turing, A.M. [1950b], “Programmers’ Handbook for Manchester Electronic Computer”. University of Manchester Computing Machine Laboratory; a digital facsimile is in The Turing Archive for the History of Computing . Turing, A.M. [1951], “Can Digital Computers Think?”, in The Essential Turing. Turing, A.M. [c.1951], “Intelligent Machinery, A Heretical Theory”, in The Essential Turing. Turing, A.M. [1953], “Chess”, in The Essential Turing.
174
B. Jack Copeland
Turing, A.M., et al. [1952], “Can Automatic Calculating Machines Be Said To Think?”, in The Essential Turing. Wittgenstein, L. [1980], Remarks on the Philosophy of Psychology, vol. 1, Oxford: Blackwell.
Hartmut Fitz∗
Church’s Thesis and Physical Computation Introduction In 1936 Alonzo Church first officially proposed to identify the ‘effectively calculable’ with the ‘λ-definable’ functions. This proposal became known as Church’s Thesis (henceforth CT).1 Despite CT’s widespread acceptance, even today—some seventy-odd years later— the precise meaning and epistemic status of CT remain opaque. Being a very plastic statement, CT has been construed as a nominal definition, a rational reconstruction, a function-theoretic axiom, a principle for constructivism, a purely mathematical thesis, and even as an empirical hypothesis about minds, machines and the material world. It is this extraordinary range of possible interpretations which is responsible for the abiding fascination with CT. In this paper we want to examine one of the most extreme stretches of the imagination still associated with CT proper, namely the interpretation of CT as a physical principle. More specifically we want to point out what we believe are serious conceptual, methodological and epistemic difficulties in refuting CT by means of physical computation. First, however, some context and a few conventions need to be introduced. (CTT) A function is effectively computable iff it is Turing machine computable. ∗ H. Fitz, Institute for Logic, Language & Computation, Nieuwe Doelenstraat 15, 1012 CP Amsterdam, the Netherlands; . H.F. would like to thank Reinhard Blutner and Marian Counihan for critical comments on earlier versions of this paper. 1 See Church [1936]. For historical accounts of CT, see Davis [1982], Gandy [1988] and Soare [1999].
176
Hartmut Fitz
We call this the Church–Turing Thesis. CTT is a scholarly construct because Church didn’t phrase ‘his’ thesis in terms of Turing machines (TM) and Turing did not articulate a ‘thesis’ at all. In their landmark papers, both Church [1936] as well as Turing [1936] explicitly refer to human calculability and computability, respectively. Church was primarily concerned with defining the concept of an effectively calculable function, whereas Turing analyzed the notion of a mechanical procedure, the primitive operations involved in the effective computation of a real number by a human clerk, unaided by machinery.2 Unless the term ‘effectively computable’ is further qualified, CTT itself is neither about the limitations of computing machinery, realizable or notional, nor, in particular, about the limitations of idealized physical computers, be they natural or artificial. Therefore, the following mechanistic variant is a non-trivial restriction of CTT: (MCT) A function is effectively computable by a machine iff it is Turing machine computable. MCT admits at least of three interpretations which have varying
degree of material significance, according to (i) whether the machine conforms to the laws of physics in the actual world, (ii) whether in addition it conforms to resource and other constraints or (iii) whether the machine model abstracts from all laws and constraints of the actual world (cf. Copeland [1996a]). Gandy [1980] substantiated MCT by identifying four principles or ‘criteria for “being a machine”’. These principles determine the ‘form of description’ of a machine, impose a ‘limitation of hierarchy’ on its structure, and demand for ‘unique reassembly’ and ‘local causation’. Hence, they abstract from concrete design and are stipulated to apply to radically diverse devices—‘mechanical, electrical or merely notional’—as long as they are finitely operating, discrete, and deterministic. Because any machine satisfying the above principles is provably Turing machine equivalent, Gandy’s Mechanistic Thesis can be put as follows: (GMT) A function is effectively computable by a finitely operating, discrete, deterministic machine iff it is Turing machine computable. 2
A thorough analysis of Church and Turing’s relative merits regarding CTT can be found in Sieg [1994], reprinted with revisions in this volume.
Church’s Thesis and Physical Computation
177
Gandy formulated and motivated his four principles by appeal to contemporary physics which, for instance, “rejects the possibility of instantaneous action at a distance” [op. cit., p. 135]. Thus he interpreted MCT as a claim about machines which are nomologically possible in the actual world. Furthermore, his principles are sufficiently precise as to be testable but at the same time sufficiently general to be satisfiable irrespective of particular machine architecture. Whether every finitely operating, discrete, deterministic device satisfies Gandy’s principles, and consequently GMT itself, appears to be a well-defined empirical hypothesis. Machines are physical devices constrained by laws and limitations of the world they are constructed in. It is natural to ask, whether there are machine models with a physically realizable architecture which transcend Turing machine computability. Yet, the procedure of devising models and debating their physical feasibility is certainly reversible. We might also consider “physical systems as computational devices that process information much as computers do” [Mundici and Sieg 1995, p. 28] and attempt to construct a model which accurately describes the behavior of the system under consideration. Accordingly, the notion of a machine is extendable to arbitrary physical devices or systems which need not operate on a data structure according to a finite set of instructions, but could nonetheless be exploited for computational purposes. Both MCT, in its various forms, as well as GMT are therefore mechanistic restrictions of the more general Physical Church–Turing Thesis: (PCT) A function is effectively computable by a physical system iff it is Turing machine computable. Like MCT, PCT is subject to different interpretations depending on the envisaged nomological and ontological status of what is meant by ‘physical systems’. We focus here on compliance with known laws governing the actual world. On its most liberal reading, perhaps, PCT posits limitations on the kind of lawlike relations that can obtain between observables, namely Turing machine computable relations.3 It is widely held that PCT, the issue of whether physical systems or processes are Turing machine computable or not, is a genuine empirical question which will be decided by scientific exploration into 3
Cf. [Hansson 1985]. Thus understood, PCT imposes a restriction on nature which is comparable to the elimination of the possibility of a perpetuum mobile by the laws of thermodynamics, see [Rosen 1988].
178
Hartmut Fitz
matters of fact. Philosophical considerations and conceptual analyses are considered negligible or irrelevant. Like fundamental laws of electromagnetism or thermodynamics, the computational capacities of physical systems are subject to straightforward factual investigation. Voicing this widespread belief, Deutsch proclaims that [c]omputers are physical objects, and computations are physical processes. What computers can or cannot compute is determined by the laws of physics alone and not by pure mathematics. [Deutsch 1997, p. 98]
A similar view is expressed by Bennett and Landauer: A computation, whether it is performed by electronic machinery, on an abacus or in a biological system such as the brain, is a physical process. [...] We are looking for general laws that must govern all information processing, no matter how it is accomplished. Any limits we find must be based solely on fundamental physical principles [...]. [Bennett and Landauer 1985, p. 48]4
This physicalist doctrine concerning PCT rests on the implicit assumption that in times when Church, Turing, G¨odel, Post, Kleene, etc. pondered the abstract recursion-theoretic limits of computability, the validity or truth of CTT was a logico-mathematical issue, a matter of mathematics, simply because computing machinery didn’t exist in the thirties of the last century. Real computers, however, are physical objects and hence, after the advent of computing machinery, CTT naturally evolved into PCT and became an empirical proposition, a matter of physics. Regarding the epistemic status of PCT such a physicalist stance is simplistic, as a comparison with the Church–Turing Thesis indicates. CTT associates a precise mathematical concept, partial recursiveness, with a vague intuitive notion, effective computability. No amount of justification from within mathematics can corroborate such a statement. Neither equivalence proofs, the failure to diagonalize, closure properties, absoluteness, and so forth can provide the right kind of justification. Such intra-mathematical ‘evidence’ only 4
See also Lloyd [2000], Etesi and N´emeti [2002], Hogarth [2004], and many more.
Church’s Thesis and Physical Computation
179
shows that Turing machine computability is a natural and very robust mathematical concept. Justification for CTT is notoriously difficult to deliver. To this day the available justification reduces to the plausibility of Turing’s analysis of mechanical procedures. Moreover, it has been argued that the modal, constructive, epistemic and intensional flavor of our intuitive understanding of computability must be denied relevance when explicating effective computability if CTT be at all recognizable as adequate or even true.5 By parity of reasoning, because PCT aims at making precise the same pre-theoretic notion of computability in the realm of physics, the Physical Church–Turing Thesis cannot entirely be justified internally, from within physics, either. Just as in case of the ordinary Church–Turing Thesis, a philosophical analysis needs to be accomplished before the project of investigating the validity of CTT in terms of physical computation becomes a meaningful endeavor in the first place.6 Notwithstanding, PCT has increasingly come under attack by what has been labelled hypercomputation, the alleged possibility of computational processes—natural, artificial or merely notional— which transcend the limits of Turing machine computability.7 In recent years, the field of hypercomputation has flourished like a tropical rain forest and by now it has already become impossible to enumerate all proposed hypercomputational models, let alone conduct an appropriate evaluation of them.8 Attempts at classifying such models, which should be applauded, have been made by Ord [2002], Cotogno [2003], and Stannett [2004]. Classification can be achieved along different dimensions, for example in terms of abstract computational strength, required resources such as time and memory, or the differential granularity of time and space. Undoubtedly, however, the most important dimension for categorization is the degree of potential physical feasibility. There are many instances 5
See Fitz [2001] for an extensive discussion. Cf. Shapiro [1981]. Certainly, whatever we find out about the limitations of nomologically possible physical computers must be embedded in some accepted physical theory like general relativity or quantum mechanics. In this sense PCT evidently is a matter of physics. 7 Subsequently, the term ‘hypercomputation’ will be used as shorthand for the claim that PCT, on some appropriate reading, is inadequate or false. 8 See for example Copeland [2002] and the entire volume 317 of the journal Theoretical Computer Science, 2004 on super-recursive algorithms and hypercomputation. For critical reviews, refer to Teuscher and Sipper [2002] and Davis [2004]. 6
180
Hartmut Fitz
of purely notional models of hypercomputation, e.g. Zeno machines [Copeland 1998], infinite time Turing machines [Hamkins and Lewis 2000], prop-computability [Baer 1995], Turing-projectability [McCarthy and Shapiro 1987], and so forth. In some way or other these models involve an element of actual infinity which renders them physically infeasible, or else they are ineffective by involving unexplained basic operations, or both. For this reason, many hypercomputational models are not considered to jeopardize PCT. On the other hand, there are hypercomputational models which are directly inspired by natural processes and whose physical feasibility is therefore quite controversial.9 Such processes would seem effective because there are underlying causal relations which govern their behavior very much like a state-transition function determines Turing machine configurations. Thus, we must distinguish at least three different claims pertaining to PCT: (a) hypercomputational models are implementable by physical systems, (b) there are superTuring computers in nature, and (c) we can find out whether there are super-Turing computers in nature. Regrettably, this distinction between modal, ontological, and epistemic aspects of PCT is often ignored in the hypercomputation literature. In the remainder of this paper we seek to identify and elucidate five challenges which pose major problems to the endeavor, in fact, the very idea, of invalidating the Church–Turing Thesis by means of physical computation. These challenges concern all of the above issues (a)-(c), with an emphasis on epistemic arguments against the falsifiability of PCT, the relevance of which will become manifest in section 5. Specifically, we will argue that: (i) the notion of implementation is inherently vague, (ii) non-computability in nature is experimentally intractable, (iii) PCT is not epistemically on a par with ordinary physics, (iv) there are methodological obstacles to analog hypercomputation, (v) physical computation is conceptually underdetermined. 9
Examples will be given later in the paper.
Church’s Thesis and Physical Computation
181
We neither hope nor believe that these challenges are insurmountable, and they certainly don’t exempt us from the burden of examining each proposed model of physical hypercomputation individually. Nonetheless we are convinced that these are serious challenges undermining confidence in the possibility of reducing inquiries into the truth of CTT (or CT for that matter) to contemporary physics.
1. Vagueness of Implementation PCT can be interpreted as positing a relation of function-theoretic equivalence between nomologically possible physical systems and a formal machine model via the abstract notion of computation. In order to determine whether this relation obtains we need to specify exactly what it means for a physical system to implement a computation. The standard view on implementation, tacitly adopted by many researchers in the field, is expressed here by Chalmers: A physical system implements a given computation when the causal structure of the physical system mirrors the formal structure of the computation. [Chalmers 1994, p. 392]
Generally speaking, the idea is to specify a homomorphism ψ from internal states p1 , . . . , pn of a physical system P to internal states q1 , . . . , qm of a machine model M defining a computation in such a way that the causal state-transitional structure of P is projected onto the formal structure of state transitions of M.10 For instance, suppose (qi , a) 7→ qj is an element of M’s state transition function δ, where a is from a finite alphabet, and τ describes the causal dynamics of P. Then there are physical states pk and pl such that δ(ψ(pk ), ψ(b)) = ψ(τ (pk , b)) where ψ(pk ) = qi , ψ(pl ) = qj , ψ(b) = a, and τ (pk , b) = pl . Such a structure-preserving mapping ψ is considered a minimal requirement for defining implementation. According to this view, for a physical system P to implement M more is required than mere input-output equivalence. The system’s internal structure and its specific causal properties are relevant as well, rendering implementation an intensional notion.11 10
Adding designated input and output states is straightforward. This approach doesn’t violate the doctrine of multiple realizability with respect to extensional equivalence and still admits e.g. the infamous pigeon trained to peck like a particular TM. 11
182
Hartmut Fitz
Furthermore, not only regular succession of internal states in a ‘window of observation’ is required, but also that there be a lawlike connection between physical states. Such a connection supports counterfactual conditionals and must hold for all possible state transitions of the corresponding machine table on all permissible inputs. This condition rules out that some pattern of activity in some physical system like Hinck’s pail, a monkey on a typewriter, or the Searlean Word Star wall can be interpreted as implementing a computation. It also invalidates Putnam’s claim of universal realizability, the claim that every physical system under some appropriate description realizes every finite state machine [Putnam 1988, p. 120]. The input-output relation computed by a physical system P and the system’s internal causal organization must mimic the computational and state-transitional behavior of M in a non-contingent way.12 The sketched notion of implementation is admittedly vague and there is plenty of room for specification. Nonetheless, there are some fundamental problems with this characterization which cannot easily be remedied by further tinkering. It is straightforward to explicate the implementation relation for physical systems and finite state automata in terms of such a condition of lawlike, structural homomorphy. Turing machines, on the other hand, cause slight complications. A specification of a Turing machine which would fit in with this notion of implementation requires potentially infinite dimensional state vectors. This is because state transition rules relate complete configurations, or computational states, including, among other things, instantaneous descriptions of the work tape. If implementation is not taken to refer to individual computations but to the complete behavioral dispositions of a computational system this becomes problematic: a physical system implementing a Turing machine must be capable of assuming an infinity of discriminable states. An implementation relation between a physical system P and a TM e cannot therefore be established by any finite amount of actual, observable behavior of P. In physics it is common to talk about states of a physical system in terms of points in phase space. No doubt, this is an extremely useful abstraction. However, there are several disanalogies with computational states, especially if the physical system is defined over continuous variables. A computational state in, say, a real 12
See also Chalmers [1996] and Copeland [1996b].
Church’s Thesis and Physical Computation
183
digital computer has a temporal dimension, it remains active for some non-zero amount of time to play its causal role and thus satisfy its logical function. If, on the other hand, internal states of physical systems are simply points in phase space they are by definition without spatio-temporal extension. Secondly, computational states effectuate change qua representational content. For instance, in a logic gate voltage levels (low/high) are representational (0/1) and therefore cause voltage level change in other logic gates of a digital circuit. In contrast, the standard notion of implementation opens up a divide between causal efficacy of physical states and merely ascribed representational content. More importantly, however, a particular computation that is being implemented determines the identity of computational states of the implementing system up to functional equivalence. If implementation is characterized in the above terms, the identity of physical states determines vice versa the very computation that is being implemented. But the individuation or labelling of internal states of P is to a certain extent arbitrary, their identity dependent on a particular sampling. As a consequence, even though the ascription of states to P is constrained by the actual behavior of the physical system, implementation is not an objective but an observer-dependent relation between physical processes and computations. Moreover, because it is a one-many relation, it is inherently vague. Every physical system P can be interpreted as implementing some (not every) computation but which particular computation P implements depends on the labelling for the particular states sampled. Thus system P computes different functions under different descriptions and there is no a priori reason to exclude the possibility that P obtains different abstract computational power under different descriptions, or on different levels of description. To illustrate this point, consider a simple harmonic oscillator S (which is among the simplest continuous dynamical systems), such as a mass attached to an ideal spring, whose periodic motion is gov2 k erned by the second order linear differential equation ddt2x + m x = 0. The exact solution which gives the displacement of the mass as a function of time is given by x(t) = A cos(ωt + φ), where A is the amplitude, ω the oscillation frequency and φ the phase constant. Intuitively, if S computes anything at all then it is its integral curve x(t). According to the above notion of implementation, however,
184
Hartmut Fitz
the system can easily be described as computationally associating any two values y, z ∈ [−1, 1]. Simply set the initial conditions to x(0) = 1 and determine the position of x at times t1 , t2 such that x(t1 ) = y and x(t2 ) = z. Obviously, this association supports counterfactual conditionals, if any relation does. Hence the trivial system S can be abused to implement computations for at least every computable function f : [0, 1] → [0, 1]. In fact, a similar proposition has been proved by Branicky [1995]. He showed that (a) some simple harmonic oscillator S point-wise implements every discrete (nonreversible) dynamical system in [−1, 1] and that (b) every TM is isomorphic to some discrete dynamical system in Z.13 Consequently, S implements a universal Turing machine.14 Of course this is just a case of reverse engineering where the dynamics of the system itself are utilized to compute appropriate read-out times and determine a labelling of states. However, it indicates precisely the flaws with the standard account of implementation. There are no stability conditions for ‘implemented states’ and no constraints on read-out times. Physical states which figure in implementation are causally related points in phase space which simply have to exist in an unspecified sense to be interpretable as computational states. But in an experimental setting, whether an implementation relation obtains, or rather which one, essentially depends on human epistemic procedures employed to label relevant physical states.15 One could argue, for instance, that labelled states have to be equidistant, that only certain sampling frequencies are permissible, etc. Different constraints on point-wise implementation have been proposed by Branicky, leading to a notion of implementation by interval. This is a step in the right direction, though it does not remove the essential dependence on human description. A robust notion of implementation should neither depend on observer-relative ascription of computational states to a physical system, nor be based on 13
Point-wise implementation is P-simulation in Branicky’s terminology. This is intended in the sense of realizing every computation under some description, not in the sense of being capable of interpreting data as program code. A similar result is due to Moore [1990], who argues that the motion of a particle in a three-dimensional potential can be described as universal computation. 15 On a mathematical level of description, a system P implements many computations at once, without dependence on human labelling. This response merely retreats to a mysterious existential quantifier for such a labelling. 14
Church’s Thesis and Physical Computation
185
unexplained human capacities like taking measurements at appropriate times. Read-out times, for instance, should presumably be computable in the initial conditions. But because we are bio-mechanical and thus physical systems, an explication of physical computation based on human intervention eats its tail. Finally, the implementation relation invokes a dubious notion of causation. Causation takes events as its relata not physical states. Events are spatio-temporally extended objects, involving physical change of some property—at least on most philosophical accounts— not merely traversed locations in phase space represented as numerical data points. It is therefore difficult to make sense of causation occurring during the state-evolution of a continuous dynamical system. A solution to an ordinary differential equation, for instance, is a description of the geometric behavior of a dynamical system, but certainly not a causal explanation of that system’s behavior. What causes the system’s behavior is its initialization together with laws of motion and it is misguided to view two points in state-space as causally related because they are ordered temporally and lie on the same trajectory. The general idea of causation as inextricably tying together internal states of a physical system, which is underlying PCT, seems no more than a convenient but confused fa¸con de parler. In order to assess the validity of PCT it is conducive to first formulate a purely structural specification of physical computation in terms of an implementation relation. Unfortunately, there is little explanatory gain from such an endeavour if this relation hinges on ascriptional features of the physical system, like a particular labelling, and consequently the properties of epistemic procedures employed by external agents to determine such a labelling. Given its fundamental importance and the ubiquity with which the term is appealed to in the debate over PCT, it is quite astounding how little attention and effort has hitherto been devoted to clarify and codify a notion of implementation. A notable exception is Goel [1992] who attempts to identify constraints on ‘computational information processing’ systems. On our current understanding, or rather lack thereof, implementation is an observer-dependent, epistemic notion, and not an objective relation between physical and computational systems. The philosophical burden of explicating the intuitive notion of computation can therefore not easily be obviated
186
Hartmut Fitz
by appeal to physics; there is a conceptual analysis to be accomplished first.
2. Empirical Intractability From a realist viewpoint the physical world divides into the real, the known and the knowable. PCT may in fact be false but is not known to be false at present. The issue pursued here is whether we could ever know, or have good reason to believe, that PCT is false. Is it possible for an epistemic agent to empirically identify a refutation of CT in terms of physical processes? The significance of this question lies in the fact that super-Turing capacities of physical systems would seem useful only if they can be witnessed and known by us. If the falsehood of PCT is unknowable this is impossible. There are several obstacles to ‘verifying falsification’. Suppose we intend to falsify PCT empirically, by interpreting a sequence of measurable quantities as a computational process. Let us call such a sequence random if it is obtained from physical processes whose dynamics are not reliably predictable beyond some level of accuracy fixed by our best scientific theories and, furthermore, these physical processes are not systematically reproducible. For instance, average precipitation/mm2 or seismic activity in Amsterdam at time t0 is random in the above sense. By repeated measurements at times t1 , . . . , tn coupled with a threshold read-out, we can convert these quantities into a binary sequence m1 , . . . , mn , mi ∈ {0, 1}. Associating the time-index of each measurement with the corresponding bit, we obtain an initial segment of a numerical function ϕ : N → {0, 1}. Since the class of recursive functions is countable and the set of functions N 2 is not, ϕ is recursive with zero probability on most reasonable accounts. But clearly ϕ is computable by an effective procedure: we simply need to store each pair (tn , mn ) as time progresses. Thus, for any argument n, the value ϕ(n) will eventually be known. Consequently, PCT is false.16 Intuitively we don’t accept this argument. This is not because the sketched procedure violates our understanding of computation. The procedure draws on external information; it is not autonomous, and it does not strictly compute information, but merely relates data 16
This argument is essentially due to Bowie [1973, pp. 72–74]. His ‘mechanism’, however, depends on the unwarranted existence of ‘perfect’ random sources, which are not recursively simulable.
Church’s Thesis and Physical Computation
187
types. But the major reason why we don’t buy into such refutations of PCT is that the procedure is not sufficiently transparent and deterministic for us as epistemic subjects. Even if the universe were in fact perfectly deterministic in some intelligible Laplacean sense, the procedure to compute ϕ remains epistemically random, that is non-deterministic relative to our ignorance.17 We therefore have no experimental means to test the alleged case of a violation of PCT, and not even a hypothesis on which function is being computed. As a consequence, the procedure is not practically utilizable for a specific purpose either. Epistemically random or erratic processes cannot be exploited to falsify PCT. But this is an epistemic worst-case scenario, which doesn’t warrant the conclusion that PCT is not empirically falsifiable. Are there epistemic conditions which are significantly more favorable to falsifying PCT? Let’s call a sequence of measurable quantities pseudorandom if it is obtained from physical processes which are not reliably predictable, but can be brought about repeatedly, in a systematic manner. As a paradigm case, we imagine the “possibility of our being presented some day (perhaps by some extraterrestrial visitor) with a (perhaps extremely complex) device or ‘oracle’ that ‘computes’ a noncomputable function” [Davis 1958, p. 11]. The internal organization of this black box is unknown to us but it has input and output slots and a lever to set it in motion. Suppose further that we conduct a series of test runs, repeatedly feeding various inputs into the black box and it turns out that it behaves perfectly uniform on each one— that is, it yields the same output on the same input in successive runs, and consumes approximately the same amounts of time and energy, etc. Even though we have no knowledge of how it operates, presumably we would agree that the device computes a numerical function by performing effective operations, and we could utilize it to obtain a pseudorandom sequence of ordered pairs. The observation that the device behaves regularly, however, merely adds to our confidence that it operates effectively. We are not thereby convinced that the device computes a non-recursive function. The reason is quite trivial. Any such pseudorandom sequence of ordered pairs observed by us is necessarily finite and hence a recursive 17
To prevent misunderstanding, local randomness in computation, such as in Rabin’s [1980] probabilistic algorithm to test primality, is a very different issue.
188
Hartmut Fitz
subset of the (possibly non-recursive) function the device computes. All observational data is finite but non-recursiveness is an inherently infinitistic property. Kreisel remarks that “only the most coarseminded would conclude from this that the mathematical property [of being recursive] is without any scientific significance.”18 But notice that the problem to empirically falsify PCT in the sketched scenario amounts to more than just the platitude that finite data can never provide conclusive evidence for a statement about an infinite sequence of events. As in the preceding case, there simply is no inductive hypothesis to begin with. Consequently there seems to be no way to discover non-recursive processes in nature simply by gazing at the phenomena. In this weak sense, PCT does not have any direct observational consequences, it is not straightforwardly refutable by empirical data. Yet it is premature to conclude that PCT can never be refuted by empirical evidence but only by logico-mathematical insight.19 Rather PCT can never be refuted by empirical evidence alone, unaccompanied by logico-mathematical insight. What is needed in addition to observational data is a mathematical theory expressing the laws underlying a sequence of measured observables and this is a conditio sine qua non for an empirical refutation of PCT—plain observation is impotent. The quest for causally efficacious non-recursive physical processes is futile without a formal description of these processes which is provably ‘non-effective’. Physical systems or processes can be described by dynamical laws, taking the form of differential equations, plus initial and background conditions in a specific experimental set up. Conjoin these in a theory T about the dynamical behavior of system S with respect to observables O. If S is placed in an initial state and runs its course until a measurement is made at time T which is interpreted as output, the theory T may tell us that the dynamic change in the observables O effectuated by S is according to some non-recursive function (cf. Pitowsky [1990, p. 85]). Wang dubbed a physical theory algorithmic if the predictions are always computable when the initial conditions are computable [Wang 1993, p. 110].20 Call a theory T non-recursive iff it is not algorithmic. A non-recursive theory 18
Quoted from [Odifreddi 1996, p. 397], see also this volume. This has been put forth, for instance, by Thomas [1972]. 20 Similarly, Kreisel called a theory mechanistic if “every sequence of natural numbers or every real number which is well defined (observable) according to the19
Church’s Thesis and Physical Computation
189
T tells us that any real physical system S whose dynamical behavior is adequately described by T with respect to O can be interpreted as computing a non-recursive function. If we have good reasons to believe that some system S is adequately described by such a theory T, we also have good reasons to believe that PCT is false.21 In order to falsify PCT by means of physical systems or processes such a modelling relationship between T and S regarding O must be experimentally established. For simplicity, suppose S, according to T, computes the halting function for TMs with respect to O. Ex hypothesi we must unambiguously relate finite observational data to the infinite behavioral repertoire of S. Can this be done? It would seem that this can be done no less convincingly than any other universal physical theory can be corroborated by finite observational data, in particular the physics employed in the design of personal computers itself. However, there are some critical disanalogies. In standard physical methodology, a single data point which matches theoretical prediction will count as confirming evidence. On the other hand, no finite set of data points can confirm the claim that S computes the halting function because computing a specific function is not a gradational, cumulative affair. In addition, it might be impossible for us even to verify that a sample sequence of numerical values is a subset of the halting function. Secondly, as a mathematical object (or a physical instantiation thereof, for that matter) a Turing machine is finite but it is an ‘idealization which brings in infinity’ (cf. Wang [1993]). Its infinite behavioral repertoire is encoded in a finite list of instructions, a program. A program is perspicuous and surveyable, revealing the machine’s functional organization independent of a particular physical realization. An epistemic agent can therefore become convinced that a programmable machine M, modulo hardware limitations or failure, is a computing device for a particular infinite function ϕ. Knowledge of this kind is fundamentally different from mere belief based on extrapolation from a finite table of observed values, because it supports counterfactual conditionals. ory is recursive or, more generally, recursive in the data.” Quoted from [Odifreddi 1996, p. 409]. 21 Moreover, as Deutsch has claimed, “if the dynamics of some physical system did depend on a [non-recursive] function then that system could in principle be used to compute the function” [Deutsch 1985, p. 101].
190
Hartmut Fitz
For instance, we may consistently interpret some pseudorandom source S as implementing addition according to T. But support for this interpretation is based on matching the predictions of T with the actual behavior of S. Knowledge that a TM e computes the addition function is not based on such evidence. Inspection of its program code yields certainty that e adds, not just confidence which is inductively based on past observation that it will output a certain value. A claim to the extent that an algorithmic device computes a particular function ϕ is a statement about an infinitude of possible behavioral instances. Whatever natural number n the device had been fed with, it would have churned out f (n). This is a counterfactual conditional which uniquely links a finite functional description with an infinite set. Via the program, we have access to the infinite dispositions of a TM, not just access to a finite chunk of observed behavior. On the other hand, any dynamical theory T of the behavior of a physical systems S with respect to O must be established solely on the basis of confirmation and testing by observable instances. Epistemic justification for a statement reporting the empirical falsification of PCT by an extended ‘system’ (S,T,O) amounts to the justification of a counterfactual conditional. No claim is made that this kind of epistemic justification is impossible, but it is a difficult and serious challenge which needs to be addressed by anyone entertaining the hypothesis that PCT is false.
3. Epistemic Asymmetry The na¨ıve considerations presented in the preceding section can be rectified within the theory of computable empirical inquiry. It can be shown that in a mathematically precise sense PCT is epistemically distinct from more mundane natural laws. It has been remarked elsewhere that what we call ‘scientific method’ can be conceived as a self-correcting procedure that converges to knowledge reliably and effectively (see Glymour [1996] and Kelly [2001]). There is no infallible knowledge of universal laws which could be established by one-shot epistemic justification, once and for all times. Inductive ‘underdetermination’ in scientific investigation, however, can be remedied by admitting a finite number of retractions and thus relaxing epistemic standards for knowledge. For instance, a universal hypothesis is verifiable in the sense of asserting its truth, preliminarily, as long as conflicting evidence is absent. Similarly,
Church’s Thesis and Physical Computation
191
existential statements are falsifiable by declaring them false as long as the systematic attempt to produce a positive example fails. In both cases, the epistemic procedure yields knowledge in finite time, even though the moment of ‘convergence’ is not known. A statement which is verifiable and falsifiable in this sense is decidable in the limit.22 The proposition that a given physical system S is super-Turing, on the other hand, cannot be known in the limit in the same fashion. Observing but a finite amount of input-output behavior of S, Gold [1965] gave a mathematical proof that it is not possible to determine in the limit whether the function computed by that physical system is recursive or not.23 Intuitively this is obvious: since no finite amount of observational data will ever make us revise the claim that S is super-Turing, it is not verifiable (nor falsifiable) in the limit. In other words, the predicate ‘is super-Turing’ (STM) has a different ‘degree of underdetermination’ than universal hypotheses, which are limiting recursive iff they are ∆02 in the arithmetical hierarchy. It is therefore of a fundamentally different epistemic quality; it is not epistemically on a par with other universal hypotheses common in the natural sciences. Notice again that this is not just the familiar skeptical challenge that PCT, like other scientific hypothesis, cannot be known with certainty based on a finite amount of observation. There is an epistemic asymmetry between the two, in that universal scientific theories can in principle be known in the limit whereas PCT cannot.24 This asymmetry can be put into simple, distinct terms. Ordinary scientific hypotheses and theories can be contradicted by evidence, invalidated by singular statements, even a single data point; the claim that PCT is false can not.25 Therefore, at the very least we can conclude that in the absence of plausible epistemic strategies to assess hypotheses involving predicates like STM, it is doubtful whether PCT is an empirical statement at all. 22
This notion was introduced by Gold [1965] who coined the term limiting recursion for it. 23 See Theorem 11 in Gold’s paper: the class of primitive recursive black boxes over given input and output spaces is not limiting weak identifiable. 24 Decidability in the limit is a weak epistemic criterion, but such theories may at least be adequate for all practical purposes, even though they are not known in a strong sense. 25 According to the Quine–Duhem thesis any theory can be made logically compatible with any singular observation by adjusting background assumptions. We ignore this intricate issue here.
192
Hartmut Fitz
PCT is observationally inaccessible, hence hypercomputation cannot be an empirical consequence of any other scientific theory either. Yet, we do believe in the existence of many entities which are observationally inaccessible; philosophers call them ‘theoretical entities’. Theoretical entities are often indispensable to out best scientific theories, in that they play a crucial role in causal explanations of observable phenomena. And we are ontologically committed to them because it would otherwise be mysterious how they can figure in such causal explanations in the first place. So perhaps non-computability in nature should best be understood as such a theoretical entity, similar to electrons and quarks? It is certainly conceivable that one day the existence of hypercomputational systems needs to be posited for purely explanatory reasons, even though they are forever observationally inaccessible. Our belief in the existence of non-computability in nature may therefore hinge on whether such systems play any substantial role in our best scientific theories. And that, in turn, will depend on whether hypercomputation will ever assume an explanatory function in some empirical domain. If hypercomputational systems help us causally explain phenomena in the macroscopic physical world we will ultimately come to believe in their existence. In the current circumstances, however, PCT is a monolithic, rather isolated statement, bearing no immediate conceptual or explanatory relationship to physical or, in fact, any other empirically testable beliefs, apart from trivial beliefs, that no known physical system can compute the halting function for TMs. Hypercomputation does not play a significant role in explaining any observable phenomenon, it is an idle, frictionless addition to our best scientific theories.26 We speculate that the decoupled status of PCT will not change any time soon, if ever, because non-computability is an infinitary function-theoretic property, whose natural habitat is mathematical logic, not the physical world. By contrast, several people have argued, on varying grounds, that some cognitive systems (like human beings) are non-computable, e.g. Lucas [1964], Penrose [1995], Horgan and Tienson [1996], and most recently Bringsjord and Zenzen [2003]. None of them claims, though, that hypercomputation itself 26
A somewhat deviant opinion is voiced by Cooper and Odifreddi [2003], viz that studying degrees of unsolvability is relevant for the empirical sciences. A justification for this claim is not easily recognizable in their paper.
Church’s Thesis and Physical Computation
193
has an explanatory function, let alone could form the ‘missing link’ in a theory of consciousness and the mind. Furthermore, when studying the behavior and properties of physical systems like brains, it is far more interesting to answer the question of how these systems process information and to which end, not whether under idealized conditions they could be described as hypercomputational. In any case, optimists about hypercomputation need to elucidate its explanatory role for observable phenomena, before, by ampliative inference, we may grant it an ontological status comparable to other theoretical entities such as photons, electromagnetic fields or the big bang.27 In the natural sciences, theories are nodes in a web of beliefs whose confirmation or refutation may affect the tenability of other inferentially connected nodes. Thus, genuinely empirical beliefs can in principle be corroborated or disconfirmed by other scientific beliefs or theories which exhibit some kind of inferential relation to them. Accordingly, it is perfectly conceivable that non-computability in nature, the falsehood of PCT, is a theoretical i.e. logical consequence of an observationally accessible theory.28 It is possible that one day our best scientific theories predict or entail the existence of hypercomputational systems. Depending on the degree of confidence we have in our best scientific theories, we would have good reason to believe that PCT is false. The hypercomputational properties of such systems, however, would still neither be observable nor have observable consequences—they would remain an epiphenomenon of empirical science.
4. Continuous Computation Caveats In this section we will discuss several recurrent topics surrounding PCT such as infinite precision, bounds on measurement, and recur27
Hypercomputation differs from electrons in another important respect. The scientific transition from optical to electron microscopes suggests that in principle we could ‘see’ electrons one day. Super-Turingness, on the other hand, is observationally intractable in a non-contingent way. 28 To belabor the obvious, such a theory would have to be different from a plain mathematical description of a physical system’s non-recursive behavior, because such a theory would be observationally inaccessible.
194
Hartmut Fitz
sive approximation. Before we do so we need to briefly sketch the notion of analog computation.29 Some of the more serious attempts to cast doubt on PCT build on the idea of analog computation, which can be found advertised under the slogan to ‘harness the power of the continuum.’ It must be pointed out that the distinction between analog and digital computation is not a clear-cut one. The key differences are as follows: digital computation proceeds according to discrete logical steps on a discrete time, discrete signal architecture. Analog computation on the other hand is a continuous physical process which converts the quantity of some observable into another quantity according to the physical laws governing the dynamics of this process. Digital computation abstracts away from the concrete properties of a physical substrate, whereas these properties are often crucial for the specific computational purpose of an analog device. In analog computers, no distinction can be made between logical software and physical hardware. In contrast to digital computation it is therefore difficult to specify a universal analog computer. Digital computers typically are serial machines, whereas analog computers naturally function in parallel in that multiple computational components may be involved simultaneously. Analog computers are operated by measuring the magnitude of physical quantities such as charge, voltage, etc. Based on the presumption that the physical world provides for real-valued quantities, the variables involved in analog computation, including time, vary continuously. This leads directly to computable functions of real numbers as a fundamental concept of analog computation. While digital computers represent binary quantities by discrete states such as open or closed relays, analog computers allow representing reals by physical quantities. It has been conjectured that this enhanced representational capacity also results in amplified computational power. Analog computation has a long history, but the first large-scale analog computation device, the differential analyser, was designed and built by Bush [1931] at MIT (cf. Copeland [2000]). It was a mechanical, electrically powered device which worked similar to a slide rule. Numbers were represented by directly measurable quan29
The term analog computation derives from computation by ‘analogy’ but is used today in a broader sense, meaning continuous or real computation. We stick with the prevalent terminology.
Church’s Thesis and Physical Computation
195
tities and the differential analyser could solve differential equations with one variable. Computing was done by manually measuring the number of degrees that specific gears rotated. Later, Shannon [1941] incorporated Bush’s ideas into an idealized general purpose analog computer (GPAC). The GPAC is a ‘classical model’ of computation in that it is not super-Turing. Analog in the broadest sense are also Abramson’s extended Turing machines [Abramson 1971], which can store reals on a single square of tape, the generalization of Blum et al. [1989] to computation over arbitrary ordered rings, and Moore’s model of continuous-time computation [Moore 1996]. Several concrete analog machine models have been proposed with the explicit purpose to reach ‘beyond the Turing limit’. We will briefly examine a few to convey the flavor of these recent attempts to invalidate PCT by analog computation. Copeland [1997] discusses so called accumulator machines, where input is conceived of as charge. Non-computable quantities of charge result in noncomputable output. More sophisticated but logically similar are Stannett’s X-machines [1990], which ‘compute’ relations over arbitrary data types (thence the X) and demonstrably solve the halting problem for Turing machines.30 Another approach is based on parallel networks of TMs which operate according to asynchronous clock frequencies. Two regular TMs perform operations at integer and τ -times, respectively, with τ ∈ (0, 1) irrational. A third machine interleaves the output streams of both machines in order of their real-time appearance and outputs the n-th bit. Thus two trivial machines can be coupled to compute a non-computable real for any τ which is non-recursive itself.31 Irrespective of whether clocks which effectuate non-recursive real-length intervals of time are a coherent idea, being parasitic on such a device, asynchrony in networks seems to beg the question of PCT. An interesting class of ideas is that of analog recurrent connectionist networks (ARNN) with irrational connection matrices as described by Siegelmann and Sontag [1994; 1995]. The basic model for these networks consists of a fixed, finite number n of units, which are interconnected by real constant weights, each computing a polynomial net function and a continuous, quasi-sigmoid activation func30
The author also specifies a universal analog X-machine, conceding however, that it is not physically implementable. 31 Details are to be found in Delchamps [1995].
196
Hartmut Fitz
tion. The networks are updated in discrete time but are analog in that they can be described in a continuous configuration space only. They have designated input channels, and a labelled subset of the n units delivers the output of the network to the environment as a binary stream. Equipped with rational weights, ARNNs compute but computable functions; likewise for computable real-valued weights. They obtain super-Turing capacities if arbitrary reals are admitted as weight constants. As a natural extension of standard models of the human central nervous system, ARNNs appear to be at least good candidates of physically realizable models which display non-computable behavior. ARNNs are special cases of dynamical systems. In general, the approach to invalidate PCT via dynamical systems is to represent the behavior of an imagined physical system as a set of differential equations, to prove that for some recursive initial conditions the solution is non-recursive, and to ask whether these equations have an interpretation as models of physically feasible devices (cf. Stewart [1991]). A result based on this methodology has been obtained by Pour–El and Richards [1981], who showed that the wave equation with unique solution does not preserve recursiveness. That is, for a computable initial condition the solution at subsequent integral time points is a non-recursive real.32 The wave equation itself is provided by our physics. It has been objected, however, that the initial condition, albeit computable, is such a complex function that it cannot be expected to arise naturally.33 Further indications that PCT could be invalid, stem from the work of da Costa and Doria [1991; 1994] on the limits of computability in chaotic dynamical systems, and, most recently, Fouch´e [2000] on complex oscillations reflecting properties of Brownian motion. Clearly, however, PCT cannot be refuted by a notional, but only by a realizable system or device. Consequently, the plausibility of these candidates hinges on their physical feasibility, a difficult issue beyond the scope of this paper.34 With the exception of those models directly inspired by natural processes, we are left with an awkward feeling that many custom32
See also Pour–El and Richards [1979] for an ordinary differential equation with non-unique, non-recursive solutions. 33 See Pitowsky [1990, p. 87] and Kreisel [1982, p. 902]. 34 For instance, it is disputable whether Wiener processes are good models of Brownian motion.
Church’s Thesis and Physical Computation
197
ary analog models merely invoke computable operations on uncomputable data. As in the case of accumulator machines, asynchronous networks and ARNNs, which serve only as examples here, their computational capacities derive from the representational power of the continuum, and are not due to the procedural capacities of the proposed mechanisms. The alleged ‘super-Turingness’ originates from some non-computable real which figures in the computational process itself, as input, clock frequency, or design constant. Thus, in violating PCT these models essentially depend on the existence of exact real-valued physical quantities. There are three standard replies to such models, see e.g. Myhill [1966], Vergis et al. [1986], Trautteur [2005], and in particular Schonbein [2005].35 First, it has been objected that a finite system, such as an ARNN or any other bounded physical device computing over the reals, cannot represent or store the infinite amount of information inherent to irrational numbers and that therefore ARNNs are impossible devices. This argument is misguided because analog devices like ARNNs do not act on a data structure such as representations of real numbers. States of the system are represented as real vectors only in the corresponding mathematical model but these representations are purely ascriptional. Second, it has been doubted whether a physical device can function with ‘infinite precision’, whether it can be operationally sensitive to the exact quantity of an irrational number. The effects of environmental perturbations and especially thermal noise seem to exclude such infinite sensitivity with near certainty. On the other hand, it is by no means obvious that actual infinite precision is required for hypercomputation in every analog system.36 Presumably, the ‘precision problem’ needs to be addressed for each proposed analog model individually and cannot be answered across the board. Maass and Orponen, for instance, proved that the computational power of ARNNs collapses to that of finite state automata when subjected to “any reasonable type of analog noise, even if their computation time is unlimited and if they employ arbitrary real-valued parameters” [Maass and Orponen 1998, p. 1082]. 35
We draw on the latter’s discussion in the following paragraphs. In fact, Siegelmann and Sontag [1994] have shown that unbounded linear precision suffices for their ARNNs to become super-Turing. 36
198
Hartmut Fitz
A third common reservation against analog hypercomputation concerns fundamental limitations to measurement. Whether we treat physical systems classically or quantum mechanically, there is consensus that the process of measuring an observable disturbs and influences the behavior of that system. These perturbations are uncontrollable beyond certain limits and may cause severe behavioral changes, e.g. in chaotic dynamical systems, at subsequent measurement points. Consequently, “only the behavior of the system as perturbed by the presence of the measuring apparatus is observable” [Fields 1996, p. 169]. Infinite precision on which an analog system may operationally rely for its computational capacities will be destroyed. It therefore seems impossible to witness analog hypercomputation by measurement. In similar vein, it has been argued that there is an absolute bound on the precision of any measuring device; there are no zero sensors. Measured quantities are always rational, and in a bounded range, measurement can only distinguish finitely many states of a physical system. Hence, under any behavioral description involving known methods of measurement, any analog device can perfectly be simulated by a Turing machine. Both these reservations are hard to deny. Nevertheless, bounds on the precision of measurement are a purely epistemic limitation which certainly does not preclude the existence of analog hypercomputation. Furthermore, exact, infinite precision measurement need not even be necessary for utilizing analog hypercomputational systems, as long as they have discrete input and output channels. In other words, bounds on measurement further cement the observational inaccessibility of hypercomputation but cannot be employed to argue against the metaphysical possibility of PCT being false. We have only scratched the surface of deep, intriguing epistemic and ontological problems in the preceding discussion which, we believe, no one involved with physical computation can ignore. Yet, there are more serious philosophical issues lurking. According to the doctrine of scientific realism, objects exist externally to us and independently of the mental. That is, they exist independent of our observation, description, or knowledge, as opposed to the anti-realist maxim esse est percipi (cf. [Devitt 1991]). We believe that objects exists if our best scientific theories claim or entail that they exist. On the other hand, no plausible variant of realist doctrines demands that we should accept our best scientific theories at face value. Rather
Church’s Thesis and Physical Computation
199
they are transitory stages in scientific development which increasingly approximate truth. Likewise, mathematical descriptions which are an integral part of our best scientific theories, are taken to be true only approximately on most reasonable accounts of theory realism. Thus it is doubtful whether there is any meaning to a scientific statement asserting the existence of analog hypercomputation relying on infinite precision. Optimists about such observationally inaccessible devices have an obligation to clarify the semantic relation between approximate scientific theories and a world of real-valued quantities before they are justified in asserting that their existence entirely depends on our best scientific theories. It is therefore a moot point whether PCT is a matter of physics. A related comment applies to the very concept of continuous space and time. Analog computation is literally committed to the existence of observables taking on irrational values. The mathematical continuum is an integral part of our best scientific theories, which describe many natural phenomena in terms of continuous space and time. Nonetheless, the continuum is a set theoretic construction—a useful fiction, not an ontological fact. The assumption of spatio-temporal continuity is inspired by subjective human experience, rather than based on scientific observation. Despite appearance to the contrary, the continuous/discrete divide does not mark a downright empirical question, because density and uncountability are forever beyond human ken. How we model time and space is therefore a largely pragmatic decision; there is no experimentally accessible fact of the matter.37 The idea of hypercomputational infinite precision devices seems to vanish behind this pragmatic veil. Studying continuous dynamical systems is the daily bread of many a physicist. It appears natural to exploit this knowledge in order to settle issues of physical hypercomputation by determining the computational content of such systems. In this manner we might hope to achieve an explication of computation itself, depending entirely on the scope of the laws of physics, and to eliminate uncertainty about the truth value of PCT. Given the problems and reservations outlined in this section, it seems, however, that continuous computation devices do not provide a smooth conceptual reduction of com37
For an extensive discussion of the ‘continuum-discrete conundrum’ see [Trautteur 2000].
200
Hartmut Fitz
putation to physics but instead engender a tremendous explanatory inflation in rather controversial areas of philosophy. 4.1. Recursive Approximation
Epistemology does not settle ontology—that is to say, what exists is not determined by what is known. Yet, it would be inappropriate to ponder whether PCT is true and treat the question whether we could ever know, or could ever make use of physical hypercomputation, separately. Even on a purely ontological reading, the alleged falsity of PCT must have some observable consequences; otherwise the claim that nature realizes non-computability is a vacuous statement. Scientific methodology comprises the process of confirming or refuting theories by iterated prediction and testing. Consider a given system (S,T,O) and suppose T is non-recursive in the sense defined earlier in section 2. We now examine the possibility that PCT could have indirect consequences in that no other plausible recursive theory explains some observational data. Is it possible to become thoroughly convinced or at least obtain good reason to believe that (S,T,O) computes a non-recursive real based on a finite amount of finite precision data? The answer would seem tentatively positive if T were the only theory available which is consistent with the observational data, and even more so, if T were the only such theory conceivable. On the other hand, it is frequently assumed that from a logical point of view, for finite data we can always find a rival theory T ∗ which is in accord with that data and is recursive. There may be reasons to reject T ∗ on purely theoretical grounds and perhaps T in some definite sense best explains the observable data, but there always is such a recursive theory T ∗ . Is this assumption true? Quite a few authors seem to think that it is. For instance, Wang asserts that “an arbitrary real number or function can be approximated arbitrarily close by a computable number” [Wang 1993, p. 110]. Conjoined with the finite precision of measurement he concludes that “relative to what is observable, it is always possible to find algorithmic physical theories” [op. cit., p. 111], although we may not actually have them at our disposal. The mere existence of such a recursive theory T ∗ seems to undermine any observational evidence for T as being the theory to best capture the data. In similar vein,
Church’s Thesis and Physical Computation
201
Dreyfus declares that “even an analogue computer, provided that the relation of its input to its output can be described by a precise mathematical function, can be simulated on a digital machine” which is not super-Turing [Dreyfus 1997, p. 72]. We subsume these claims as the Turing approximation hypothesis for analog computers: (TAH) The input-output behavior of any analog computer can be approximated by a Turing machine to any desired degree of accuracy. TAH is a bold claim about all physically realizable systems with all
kinds of architecture which in the broadest sense can be called analog computers. Naturally, we would expect TAH to be true for some classes of analog computers, and false for others. TAH has basically two interpretations. Finite precision is supposed on part of the analog device. TAH then claims that for any degree of precision imposed on the performance of the device there is a TM which simulates it. On the second interpretation, through measurement, finite precision is supposed on part of an observer but the computational device itself is granted whatever degree of precision it requires to operate. TAH then claims that for any degree of precision imposed on measuring in- and output there is a TM which simulates the analog system. It is this interpretation in which we are interested. The significance of TAH for the current discussion is obvious: measurement has a finite degree of precision, hence it can capture the input-output behavior of an analog system only up to a certain degree of accuracy. TAH implies that there is a TM which simulates the analog system up to this degree of accuracy. Thus, there would always be a recursive theory T ∗ which is in accordance with the observational data, viz ‘system S behaves according to e’, where e is the index of the extensionally equivalent TM. In the eyes of an observer, T ∗ is a recursive behavioral description of S. In a trivial sense, this version of TAH is true. If, by theory T, system S computes the non-recursive real r at computable time t on computable input n, and r is truncated after the first d digits of its decimal representation, then there is a TM which on input n prints out exactly those d digits. But TAH is as trivially false if there is an analog system, which computes a non-recursive real and if approximation is understood in the usual sense of recursive analysis. Turing has characterized recursive reals as those reals whose decimal representation can be successively printed out by a TM. In classical
202
Hartmut Fitz
mathematics a real is the limit of a Cauchy sequence of rationals. A recursive real r is defined as the limit of a recursive sequence of rationals which converges effectively to r. A sequence {qn } of rationals is called recursive if there exist recursive functions f, g and (n) h such that qn = (−1)h(n) fg(n) . Effective convergence means that there is a recursive function e such that m ≥ e(n) implies |x − qm | ≤ 2−n .38 Thus, for any desired degree of accuracy, the recursive ‘error function’ e indicates how far through the sequence we must proceed to ensure that it has converged with at least that degree of accuracy. This definition precisely captures the intuition that a real is recursive if it can be approximated to an arbitrary degree of accuracy by effective means. Hence, an analog system S which computes a non-recursive real r cannot be approximated by a single TM to any desired degree of accuracy. Is there an interesting sense in which both Wang’s claim or TAH may be true or false? There is, namely in a procedural sense, involving repeated prediction and measurement with increased accuracy (cf. [Copeland 1997]). Suppose measurement yields a finite amount of finite precision data, which are in accord with the predictions of the non-recursive theory T. Ex post facto these data can be explained by a recursive theory T ∗ simply by being finite. However, the impossibility in general to effectively approximate (in the sense of CTT) the behavior of a system which computes an arbitrary real number r entails that T ∗ eventually fails to account for a sequence of successive measurements progressing through a sequence of refined precision degrees. Suppose that according to theory T, system S computes a non-recursive real r, S is run on some input, and measurement is made with precision degree d. Then there will be a TM and hence a recursive theory T which conforms locally with the observational data, yet this very machine may not conform with data obtained from S by repeated measurement of higher precision degrees. In fact, if r is a non-recursive real, there cannot be any single recursive theory T ∗ which is as good as T with respect to iterated experimental testing along a sequence of precision degrees.39 And 38
Cf. [Pour–El 1999]. It is interesting to note that all ‘effectivizations’ of classically equivalent definitions of real numbers turned out to be equivalent. This adds to the robustness of computability under CTT. 39 We ignore a natural objection: what happens to the scientific methodology of prediction and testing if the behavioral description T does not allow predictions,
Church’s Thesis and Physical Computation
203
this is not true only in the limit. If T is non-recursive and adequately describes the dynamical behavior of S, any recursive T∗ will be refuted after a finite number of measurements with ascending precision degrees. Consequently, if S is a physical system which is harnessable for computational purposes and is super-Turing under the description of T with respect to the observables O, and T in fact captures the dynamical behavior of S, then any competing recursive theory T ∗ can in principle be defeated in finite time by finite data, even assuming finite precision in measurement. This means that if (S, T, O) is a physical computer exemplifying the falsehood of PCT, there is an epistemic strategy to eventually eliminate any recursive rival hypothesis.40 This strategy, however, may be infeasible in practice, in that it may take an eternity to refute T ∗ , or because we may quickly reach the limits of realizable precision. If there are ultimate limits to the precision of measurement, which is a near certainty according to contemporary physics, so much the worse. Even if a particular theory T ∗ were defeated, we wouldn’t know whether this is because S in fact is super-Turing or because T ∗ was just the wrong recursive hypothesis.
5. Trivialization Problem In the attempt to enrich the concept of computation beyond the Turing limit, the assumption was made that the dynamical behavior of any physical system can be interpreted as a computational process. This approach to PCT conjures up the threat of trivializing the notion of computation. If every physical system can be consistently interpreted as computing, then the notion of computation is vacuous. This is one aspect of the trivialization problem which has disastrous consequences for doctrines that essentially draw on the distinction between computing and non-computing systems, such as computationalism, which holds that the brain is a computer, digital precisely because it is non-recursive? Indeed in the sense of Rice’s theorem, even recursive systems are highly unpredictable. A similar problem arises if S, according to T, exhibits sensitive dependence on initial conditions. 40 Furthermore, as Copeland has pointed out in [1997], even if S computes a recursive real and some TM does simulate it uniformly up to a certain degree of accuracy, there may not be any effective way of constructing a TM which does the same job for increased accuracy. Thus he interprets ‘can be simulated’ constructively.
204
Hartmut Fitz
or analog, classical or super-Turing.41 The consequences for PCT, on the other hand, don’t seem detrimental, because even if dynamos and digestion, egg-timers and sunflowers can be interpreted as computing, it is still a non-trivial empirical project to delimit the range of computable functions for all nomologically possible physical systems. There is a problem however for investigations into PCT as an empirical project if physical computation turns out to be nothing above interpretation as computing. This requires explanation. In a thoughtful paper Boyle distinguishes two ways computation can be viewed in the context of physical systems: 1) abstractly, in which case physical systems are interpreted as implementing one or more abstract computations, and 2) physically, in which case computing is the physical means by which certain systems [...] realize the abstract computations we interpret them as implementing. [Boyle 1994, p. 451]
According to the former, computation is an extrinsic, observerrelative property of physical systems, while according to the latter it is an intrinsic, observer-independent property of physical systems. Hence, there are in general two ways to block the trivialization of computation. The first one is to show that there are physical systems which cannot be consistently interpreted as implementing any abstract computation; the second one is to determine objective, physical criteria to distinguish genuine computing systems from noncomputing systems. The prospects for the first strategy seem bleak from the start because of our capacity to interpret temporal changes in almost any observable feature of a physical system as a meaningful computational sequence. Harnad [1994] on the other hand concedes that any physical system can be ascribed computational properties but at the same time claims that these cannot in any case be given a systematic interpretation. Computations which do not make ‘systematic sense’ he calls trivial; however, he offers no criteria to distinguish between trivial and non-trivial computations. Moreover, artificially restricting this interpretation will restrict the notion of implementation but 41
Dietrich [2001] responds that such a ‘trivialized’ computationalism is no less vacuous than the claim that ‘everything is made of atoms’. However, if computation is defined as ascribed information processing there clearly is a disanalogy between computationalism and atomism.
Church’s Thesis and Physical Computation
205
will not disqualify any physical system from the class of computing systems. The second strategy to prevent trivialization has been taken by Boyle himself. He argues that computation is an intrinsic property only of those systems, in which information processing can be associated with the causal mechanisms of pattern matching. Thus, while any physical system may be describable as computing, there is nonetheless a fact of the matter whether that system is a computer. Perhaps my radio alarm clock can be described as computing, perhaps it can even be described as a hypercomputer on a quantum mechanical level of description, but that doesn’t necessarily mean it is a computer if computers are a natural kind. For the sake of the argument, assume that computers are not a natural kind. This is certainly the orthodox view on physical computation, succinctly expressed by Churchland and Sejnowski: [T]here is no intrinsic property necessary and sufficient for all computers, just the interest-relative property that someone sees value in interpreting a system’s states as representing states of some other system, and the properties of the system support such an interpretation. [Churchland and Sejnowski 1992, p. 65]
Interestingly, there is unanimity on this point across otherwise largely incompatible camps. Searle formulates and defends a very similar position: A physical state of a system is a computational state only relative to the assignment to that state of some computational role, function, or interpretation. [N]otions such as computation [...] do not name intrinsic physical features of systems. Computational states are not discovered within the physics, they are assigned to the physics. [Searle 1992, p. 210]
And again, constraints on the notion of implementation may prevent the possibility that every physical system can be interpreted as computing every function, but they can’t save computation from becoming an extrinsic property. What would be the consequences of such a view for PCT? If computation is not individuated by intrinsic properties, then there is no fact of the matter whether a physical system is a computational device or not. If there is no fact of the matter whether a physical
206
Hartmut Fitz
system computes or not, if computation is an extrinsic, observerrelative property of physical systems, then the full force of the epistemic objections to physical hypercomputation from the preceding sections kicks in. Epistemic objections to invalidating PCT don’t usually cause hypercomputationalists much discomfort because physics is concerned with what there is, the nature of things and their relations. If, however, computation is not an intrinsic property of physical systems, the computational description and the system’s actual computational behavior become inseparable. In other words, we can’t describe a system as computing some observationally accessible finite set of ordered pairs ϕ but at the same time claim that it is in fact computing the super-Turing function ψ ⊃ ϕ, because over and above the description as computing ϕ there is no fact of the matter whether it computes anything at all. Physical systems do not compute any function independent of or beyond a description as computing that function. Such a description must be based on observational data and consequently it is meaningless to say that a physical system is hypercomputational in a realist sense, albeit being observationally inaccessible. PCT then is a claim about the limits to our methodology of ascribing computational content to physical systems, and no longer a claim about the limits of computation simpliciter; epistemic and ontological aspects of physical computation collapse. Computation cannot be a purely ascriptional, observer-dependent notion and at the same time PCT be a matter of fact, an open question of contemporary physics. Under such a conception of computation, PCT is still a meaningful, empirical project, but our epistemic procedures on which the ascription of computational content is based become a central issue of that project—indeed the very object of investigation. To the extent that our epistemic procedures depend on observation and experiment, PCT remains a matter of physics, but in an very different sense than initially envisaged. So far, the trivialization of computation was merely presented as a challenge, not as an established fact. It should be obvious, however, that the trivialization problem needs to be addressed in order to clarify the very meaning of PCT. Otherwise the debate over noncomputability in physical systems seems ill-founded. The conception of physical computation as an extrinsic, observer-relative property is
Church’s Thesis and Physical Computation
207
currently predominant, despite an important proviso.42 There is a distinction between information processing and computation, which marks off the difference between photosynthesizing flowers growing towards the sun, brains coordinating the sensorimotor behavior of frogs catching prey, etc. on the one hand, and physical systems harnessed for human computational purposes on the other. It is precisely the distinction between intrinsic, causally efficacious processes and processes externally interpreted as computations, relating ascribed representational content. A similar stance towards computation is implicit in Wittgenstein’s remarks on Turing machines. In [1980, §1096] Wittgenstein exclaims: “Turing’s ‘machines’. These machines are humans who calculate”. To compute a numerical function is to follow certain arithmetic rules. As is well-known, Wittgenstein argued that rule-following is an inherently normative activity. It requires the competence to explain, justify and correct one’s behavior. Thus, computation involves a whole host of normative concepts. Turing’s characterization of computation is an attempt to naturalize the normativity of rule-following behavior. According to Wittgenstein, however, for an agent (or mechanism) to compute, it is insufficient to just come up with the ‘correct’ result. Computation involves rule-following which is tied to regularity, but it does not reduce to regularity.43 Hence, computation is not an intrinsic feature of systems which behave in a regular fashion. Assuming the right kind of causally related successive internal states which produce such regular behavior is not constitutive of a computing system either. On Wittgenstein’s view, not even Turing machines compute—they merely display regular input-output behavior; computation as satisfaction of behavioral criteria is a conceptual confusion. Characterizing computation as an extrinsic, observer-relative notion has manoeuvered us into a peculiar situation. We should not give primacy to digital computers over other mechanical and physical systems which work in a fashion that is reliable and deterministic enough to be harnessed for computational purposes. After all digital computers are physical systems themselves. The transition from logically inspired artificial systems to physically inspired models of nat42
As a consequence, ‘physical computation’ should rather be called ‘computational physics’, to emphasize the ascriptional nature of computation. 43 Cf. [Shanker 1987] and the extensive and insightful discussion of Wittgenstein’s attitude towards Turing’s work therein.
208
Hartmut Fitz
ural systems, interpreted as computational devices, is no less a trivialization than a very welcome liberation. To name only a few, quantum computing, Hopfield networks, membrane and DNA computing, liquid computation, etc. tremendously enrich the universe of computational paradigms. Likewise, the computational analysis of natural systems has become a major research strand, not only in physics but also in chemistry, biology, genetics, and neuroscience. But if computation is ultimately just systematically interpreted causal behavior, and there is no fact of the matter whether a system computes or not, then the scope of physical computation will be delimited by those input-output relations between measurable quantities which are observationally accessible and experimentally determinable. And then the truth of PCT will almost certainly be the right horse to back.
6. A Note on Terminology The inhomogeneous use of terminology in the discussion of physical computation tends to obscure subtle but important distinctions and makes it difficult to contrast positions and evaluate arguments. Hence, it is impairing the quality of the debate around PCT. For this reason we propose a prescriptive scheme for the use of relevant terminology (see figure below). This scheme is neither complete, nor absolute. Further conceptual clarification will reveal that it is inadequate in some respects, that it needs adjustment and repair. Notwithstanding, we believe the debate is in need of some unification of terminology and we offer this provisional scheme as a starting point and intuition pump, which is compatible both with common sense usage and some of the more technical definitions. Many of the sketched relations are self-explanatory and hardly controversial. Others, such as the notions of instantiation and implementation, require elucidation. For instance, a particular algorithm, or virtual machine, A, for computing function ϕ is an instantiation of the infinite class of Turing machines which represents the abstract computation of ϕ. This is compatible with computer science as well as platonist terminology. A specific Turing machine e, on the other hand, can be viewed as an idealization of a physical system P if that system implements an algorithmic system A which in turn is an instantiation of the equivalence class of TMs represented by e. P implements such a system A if the causally governed observable
Church’s Thesis and Physical Computation
209
behavior of the system P can be homomorphically interpreted as A’s computation of a numerical function. Thus, the diagram reflects the inherent dependence of physical computation on observation and interpretation.
Differential equations
Machine model Idealization
Approximation
Instantiation
Realization Abstraction Physical system
Causation
Observable behavior
Implementation
Algorithmic system Emulation
Simulation Computation Interpretation
Numerical function
Figure 1: Proposed unification of terminology Equally important, the purpose of the diagram is to regulate the liberal use of misleading terminology frequently encountered. For example, Turing machines don’t simulate physical systems; abstract dynamical systems, given by sets of differential equations, don’t compute numerical functions; physical systems don’t realize or instantiate computations. In fact, according to the diagram physical systems by themselves don’t compute at all, but can be understood as computing via the notion of implementation, which involves the ascription of information processing capacities, and which itself is in need of conceptual clarification.44 This philosophical task must be carried out before PCT can ascend to the level of truth and falsehood. 44 Even desktop personal computers are physical systems which compute only because they are used by humans, who supply a semantic interpretation of the system’s behavior. To inflect a Wittgensteinian dictum, a computation is what it is used for.
210
Hartmut Fitz
Concluding Remarks The status of the Physical Church–Turing Thesis is a complex issue which has spawned considerable interest, with many proposals for physical hypercomputation having cropped up in recent years. Sometimes these are accompanied by pithy statements—“super-recursive computation is like a rocket that can take people beyond the Church– Turing Earth” [Burgin 2001, p. 7]—and high aspirations which aren’t cashed in upon close inspection. To detect the Achilles’ heel, an underlying supertask, a violation of physical principles, an element of actual infinity or covert ineffectiveness, more often than not is left to the reader. In this paper we presented some rather basic philosophical concerns regarding attempts to falsify PCT, which may appear quite old-fashioned vis-` a-vis the glamorous quest for hypercomputation, but which need to be addressed within the framework of a fullyfledged theory of physical computation. To recapitulate, these were the following: (i) The notion of implementation is one-many and therefore inherently vague, albeit not arbitrary. The approach towards determining the scope of computability within physics merely shifts the burden of explication to identifying constraints on setting up structural homomorphy. (ii) Non-computability in nature appears to be experimentally inaccessible; programmability seems crucial for the justification of behavioral descriptions in functiontheoretic terms. (iii) Several epistemic asymmetries between PCT and ordinary physical theories indicate that the alleged falsehood of PCT is a statement very unlike any other scientific theory we currently believe in. (iv) The idea to invalidate PCT by means of continuous computation is build on quicksand: not so much because infinite precision would be impossible or undetectable, but because it rests on controversial assumptions about the nature of scientific theories and the ontological status of the continuum as a mathematical model. (v) There are no principles for physical computation—no criteria of intrinsic identity—and physical computation cannot be detached from a human observer. Not only can physical systems be described as computing in many different ways, but without a description as computing, no system is computing; an appropriate ascription of computational content is a necessary condition for a system to compute. Thus, physical computation is not a natural phenomenon, or an objective property of systems, but merely a mode
Church’s Thesis and Physical Computation
211
of description for their behavior and therefore essentially dependent on and constrained by human epistemic procedures. Consequently, the existence of non-computability in nature is, unlike a rare insect, not susceptible to discovery. In contrast, Copeland and Sylvan surmised that it would be “one of the greatest astonishments of science if the activity of Mother Nature were never to stray beyond the bounds of Turing-machine computability” [Copeland and Sylvan 1999, p. 64]. Given the indicated vagueness of implementation and the observer-dependence of computation this claim loses its apparent intelligibility; Mother Nature doesn’t compute. Setting aside these qualms, contemporary physics indeed does not indicate that non-computability in nature would be an anomaly, a monstrosity, or even an impossibility. But this is because contemporary physics does not mention computation as a natural phenomenon. The conditions for implementing hypercomputational models proposed to date, on the other hand, are physically problematic at best. Moreover, if there is some justification for the presented epistemic challenges, it would be equally surprising if we could ever reliably identify or utilize non-recursive sources in nature.45 Unless these challenges are met, a refutation of Church’s Thesis via physical processes remains the notorious Kleenean ‘pie in the sky’.46 We briefly want to indicate how these challenges could be met. In his seminal work, Marr [1982] discerned three levels of analysis for information-processing systems. Ever since it has become customary to analyze natural computation in Marr’s terms. We ascribe representational content and ‘syntactic’ properties to a physical system, interpret its behavior as a computational process, and view both these levels of analysis as distinct from the physical, the implementational level. It is this descriptive gap between the computational and the implementational level which seems to necessitate an external observer in order to make systematic sense of the information processing capacities of physical systems, harboring many of the outlined problems. Marr’s distinction between autonomous levels of analysis, 45 Presumably, if PCT will ever be conclusively falsified it will be due to insights from logic, a novel type of effective procedure whose physical implementation can instantly be apprehended. 46 This is not to deny, of course, the enormous ramifications the very notion of super-Turing computation has for mathematics, theoretical computer science, and perhaps even for philosophy, independent of the truth value of PCT.
212
Hartmut Fitz
however, may be inadequate for physical computation. Perhaps we are blind to the existence of non-computability in nature because we erroneously superimpose Marr’s descriptive framework onto physical systems. Collapsing Marr’s levels of description would inevitably lead to a non-algorithmic, purely causal theory of physical computation and mark a radical departure from orthodoxy.47 To work out such a theory is a difficult task with unknown chances of success. Let’s finally put PCT into a historical perspective. A function is computable if there is a computational process to evaluate it. How one interprets the inconspicuous word is in this context is critical. Much of the current debate over physical computation revolves around the possible existence of hypercomputational systems in a realist, ontological sense. Disputants are striving to refute PCT theoretically, without the prospect of ever being able to harness such a system for actual computation, even though our best scientific theories may warrant its existence. Yet, more than any other intuitive concept which has in the history of mathematics been reconstructed with formal rigor, computability involves an agent and carries strong modal and epistemic connotations. Church and Turing attempted to give a formal explication of computability precisely to answer the question whether a calculating human being has the ability to decide the validity of first-order logic expressions. Replacing Church’s human calculator by physical systems does not eliminate these modal and epistemic aspects of computability. In light of this, putting forward purely theoretical arguments that non-computability in nature is nomologically possible or even likely to exist, without being observationally accessible or useful for any practical purposes, appears to be a misguided project from the outset.
References Abramson, F.G. [1971], “Effective Computation over the Real Numbers”, in Twelfth Annual Symposium on Switching and Automata Theory, Institute of Electrical and Electronics Engineers, Northridge, CA. Baer, R.M. [1995], “ET and an Infinitary Church’s Thesis”, The Mathematical Intelligencer 17(3), 57–61. 47
Related views are currently gaining momentum in cognitive science where, incidentally, misinterpretations of the Church–Turing Thesis have done most damage.
Church’s Thesis and Physical Computation
213
Bennett, C.H. and Landauer, R. [1985], “The Fundamental Physical Limits of Computation”, Scientific American 253(1), 48–56. Blum, L., Shub, M., and Smale, S. [1989], “On a Theory of Computation and Complexity over the Real Numbers”, Bulletin of the American Mathematical Society 21(1), 1–46. Bowie, G.L. [1973], “An Argument against Church’s Thesis”, Journal of Philosophy 70, 67–76. Boyle, F. [1994], “Computation as an Intrinsic Property”, Minds and Machines 4(4), 379–389. Branicky, M.S. [1995], “Universal Computation and other Capabilities of Hybrid and Continuous Dynamical Systems”, Theoretical Computer Science 138(1), 67–100. Bringsjord, S. and Zenzen, M. [2003], “Superminds: People Harness Hypercomputation, and More”, Studies in Cognitive Systems, vol. 29, Kluwer Academic Publishers, Dordrecht. Burgin, M. [2001], “How We Know What Technology Can Do”, Communications of the ACM, 44(11). Bush, V. [1931], “The Differential Analyser: A New Machine for Solving Differential Equations”, Journal of the Franklin Institute 212, 447–488. Chalmers, D.J. [1994], “On Implementing a Computation”, Minds and Machines 4, 391–402. Chalmers, D.J. [1996], “Does a Rock Implement Every Finite-State Automaton?”, Synthese 108, 309–333. Church, A. [1936], “An Unsolvable Problem of Elementary Number Theory”, American Journal of Mathematics 58, 345–363, reprinted in M. Davis [1965], The Undecidable, pp. 88–107. Churchland, P.S. and Sejnowski, T. [1992], The Computational Brain, MIT Press, Cambridge, MA. Cooper, S.B. and Odifreddi, P. [2003], “Incomputability in Nature”, in Computability and Models: Perspectives East and West, (S.B. Cooper and S.S. Goncharov eds.), Kluwer Academic/Plenum Publishers, pp. 137–160. Copeland, B.J. [1996a], “The Church–Turing Thesis”, in Stanford Encyclopedia of Philosophy (J. Perry and E. Zalta eds.). Copeland, B.J. [1996b], “What Is Computation?”, Synthese 108, 335–359.
214
Hartmut Fitz
Copeland, B.J. [1997], “The Broad Conception of Computation”, American Behavioral Scientist 40, 690–716. Copeland, B.J. [1998], “Even Turing Machines Can Compute Uncomputable Functions”, in Unconventional Models of Computation, Proceedings of the 1st International Conference, New Zealand, (C.S. Calude, et al. eds.), Springer, pp. 150–164. Copeland, B.J. [2000], “The Modern History of Computing”, in Stanford Encyclopedia of Philosophy, (J. Perry and E. Zalta eds.). Copeland, B.J. [2002], “Hypercomputation”, Minds and Machines 12, 461–502. Copeland, B.J. and Sylvan, R. [1999], “Beyond the Universal Turing Machine”, Australasian Journal of Philosophy 77(1), 46–67. Cotogno, P. [2003], “Hypercomputation and the Physical Church–Turing Thesis”, British Journal for the Philosophy of Science 54, 181–223. da Costa, N.C.A. and Doria, F.A. [1991], “Classical Physics and Penrose’s Thesis”, Foundations of Physics Letters 4, 363–374. da Costa, N.C.A. and Doria, F.A. [1994], “Undecidable Hopf Bifurcation with Undecidable Fixed Point”, International Journal of Theoretical Physics 33, 1913–1931. Davis, M. [1958], Computability and Unsolvability, McGraw-Hill, New York. Davis, M. [1982], “Why G¨ odel Didn’t Have Church’s Thesis”, Information and Control 54, 3–24. Davis, M. [2004], “The Myth of Hypercomputation”, in Alan Turing: Life and Legacy of a Great Thinker, (C. Teuscher ed.), Springer, Berlin, pp. 195–212. Delchamps, D.F. [1995], “Harnessing the Power of the Continuum. Asynchrony, Emergence and Church’s Thesis”, Nonlinear Science Today. Deutsch, D. [1985], “Quantum Theory, Church–Turing Principle and the Universal Quantum Computer”, Proceedings of the Royal Society, Series A 400, 97–117. Deutsch, D. [1997], The Fabric of Reality, Allen Lane, New York. Devitt, M. [1991], Realism and Truth, Blackwell, 2nd edition.
Church’s Thesis and Physical Computation
215
Dietrich, E. [2001], “The Ubiquity of Computation”, Psycoloquy, 12(40). Dreyfus, H.L. [1997], What Computers Still Can’t Do: A Critique of Artificial Reason, MIT Press, Cambridge, MA, 5th edition. Etesi, G. and N´emeti, I. [2002], “Non-Turing Computations via Malament–Hogarth Space-Times”, International Journal of Theoretical Physics 41, 341–370. Fields, C. [1996] “Measurement and Computational Description”, in Machines and Thought: The Legacy of Alan Turing, vol. I, (P.J.R. Millican and A. Clark eds.), Oxford University Press. Fitz, H. [2001], Church’s Thesis. A Philosophical Critique of the Foundations of Modern Computability Theory, Master’s thesis, Free University Berlin, Germany. Fouch´e, W.L. [2000], “Arithmetical Representations of Brownian Motion I.”, Journal of Symbolic Logic 65(1), 421–442. Gandy, R. [1980], “Church’s Thesis and Principles for Mechanisms”, in The Kleene Symposium, (J. Barwise, H.J. Keisler, and K. Kunen eds.), North-Holland, Amsterdam, pp. 123–148. Gandy, R. [1988] “The Confluence of Ideas in 1936”, in The Universal Turing Machine. A Half-Century Survey, (R. Herken ed.), Oxford University Press, pp. 55–111. Glymour, C. [1996], “The Hierarchies of Knowledge and the Mathematics of Discovery”, in Machines and Thought: The Legacy of Alan Turing, vol. I, (P.J.R. Millican and A. Clark eds.), Oxford University Press. Goel, V. [1992], “Are Computational Explanations Vacuous?”, in Proceedings of the Fourteenth Annual Conference of the Cognitive Science Society, Hillsdale, NJ, Lawrence Erlbaum. Gold, M.E. [1965], “Limiting Recursion”, Journal of Symbolic Logic 30(1), 28–48. Hamkins, J.D. and Lewis, A. [2000], “Infinite Time Turing Machines”, Journal of Symbolic Logic 65(2), 567–604. Hansson, S.O. [1985], “Church’s Thesis as an Empirical Hypothesis”, International Logic Review 16, 96–101.
216
Hartmut Fitz
Harnad, S. [1994], “Computation Is Just Interpretable Symbol Manipulation. Cognition Isn’t.”, Minds and Machines 4(4), 379–390. Hogarth, M. [2004], “Deciding Arithmetic Using SAD Computers”, The British Journal for the Philosophy of Science 55(4), 681–691. Horgan, T. and Tienson, J. [1996], Connectionism and the Philosophy of Psychology, MIT Press, Cambridge, MA. Kelly, K. [2001], “The Logic of Success”, The British Journal for the Philosophy of Science 51, 639–666. Kreisel, G. [1982], Review of [Pour–El and Richards, 1979; 1981], Journal of Symbolic Logic 47, 900–902. Lloyd, S. [2000], “Ultimate Physical Limits to Computation”, Nature 406, 1047–1054. Lucas, J.R. [1964], “Minds, Machines, and G¨odel”, in Minds and Machines, (A.R. Anderson ed.), Englewood Cliffs, New Jersey. Maass, W. and Orponen, P. [1998], “On the Effect of Analog Noise on Discrete Time Analog Computations”, Neural Computation 10, 1071–1095. Marr, D. [1982], Vision: A Computational Investigation into the Human Representation and Processing of Visual Information, W.H. Freeman, San Francisco. McCarthy, T. and Shapiro, S. [1987], “Turing Projectability”, Notre Dame Journal of Formal Logic 28(4), 521–535. Moore, C. [1990], “Unpredictability and Undecidability in Dynamical Systems”, Physical Review Letters 64, 2354–2357. Moore, C. [1996], “Recursion Theory on the Reals and Continuous-Time Computation”, Theoretical Computer Science 162, 23–44. Mundici, D. and Sieg, W. [1995], “Paper Machines”, Philosophia Mathematica 3(3), 5–30. Myhill, J. [1966], “Creative Computation Revisited”, technical report, USAF. Odifreddi, P. [1996], “Kreisel’s Church”, in Kreiseliana, (P. Odifreddi ed.), AK Peters, Ltd., Wellesley, MA, pp. 389–417.
Church’s Thesis and Physical Computation
217
Ord, T. [2002], Hypercomputation: Computing more than the Turing Machine, Master’s thesis, University of Melbourne, Australia. Penrose, R. [1995], Shadows of the Mind, Random House, London. Pitowsky, I. [1990], “The Physical Church Thesis and Physical Computational Complexity”, Iyyun. Jerusalem Philosophical Quarterly 39, 81–99. Pour–El, M.B. [1999], “The Structure of Computability in Analysis and Physical Theory: An Extension of Church’s Thesis”, in Handbook of Computability Theory, (E.R. Griffor ed.), Elsevier, Amsterdam, pp. 449–471. Pour–El, M.B. and Richards, I. [1979], “A Computable Ordinary Differential Equation Which Possesses No Computable Solution”, Annals of Mathematical Logic 17, 61–90. Pour–El, M.B. and Richards, I. [1981], “The Wave Equation with Computable Initial Data such that its Unique Solution is not Computable”, Advances in Mathematics 39, 215–239. Putnam, H. [1988], Representation and Reality, MIT Press, Cambridge, MA. Rabin, M.O. [1980], “Probabilistic Algorithm for Testing Primality”, Journal of Number Theory 12, 128–138. Rosen, R. [1988], “Effective Procedures and Natural Law”, in The Universal Turing Machine. A Half-Century Survey, (R. Herken ed.), Oxford University Press, pp. 523–537. Schonbein, W. [2005], “Cognition and the Power of Continuous Dynamical Systems”, Minds and Machines 15, 57–71. Searle, J.R. [1992], The Rediscovery of the Mind, MIT Press, Cambridge, MA. Shanker, S.G. [1987], “Wittgenstein versus Turing”, Notre Dame Journal of Formal Logic 28(4), 615–649. Shannon, C.E. [1941], “Mathematical Theory of the Differential Analyser”, Journal of Mathematics 20, 337–354. Shapiro, S. [1981], “Understanding Church’s Thesis”, Journal of Philosophical Logic 10, 353–365. Sieg, W. [1994], “Mechanical Procedures and Mathematical Experience”, in Mathematics and Mind, (A. George ed.), Oxford University Press, pp. 71–117.
218
Hartmut Fitz
Siegelmann, H. and Sontag, E. [1994], “Analog Computation via Neural Networks”, Theoretical Computer Science 131(2), 331–360. Siegelmann, H. and Sontag, E. [1995], “On the Computational Power of Neural Nets”, Journal of Computer and System Sciences 50(1), 132–150. Soare, R.I. [1999], “The History and Concept of Computability”, in Handbook of Computability Theory, (E.R. Griffor ed.), Elsevier, Amsterdam, pp. 3–37. Stannett, M. [1990], “X-Machines and the Halting Problem: Building a Super-Turing Machine”, Formal Aspects of Computing 2, 331–341. Stannett, M. [2004], “Hypercomputational Models”, in Alan Turing: Life and Legacy of a Great Thinker, (C. Teuscher ed.), Springer, Berlin, pp. 135–152. Stewart, I. [1991], “Deciding the Undecidable”, Nature 352, 664–665. Teuscher, C. and Sipper, M. [2002], “Hypercomputation: Hype or Computation?”, Communications of the ACM 45(8), 23–24. Thomas, W.J. [1972], Church’s Thesis and Philosophy, PhD thesis, Case Western Reserve University. Trautteur, G. [2000], “Analog Computation and the Continuum-Discrete Conundrum”, in Grounding Effective Processes in Empirical Laws. Reflections on the Notion of Algorithm, (R. Lupacchini and G. Tamburrini eds.), Bulzoni Editore, Bologna, pp. 23–42. Trautteur, G. [2005], “Beyond the Super-Turing Snare: Analog Computation and Digital Virtuality”, in New Computational Paradigms. First Conference on Computability in Europe, CiE 2005, Amsterdam, vol. 3526 of LNCS, (B.S. Cooper, B. L¨owe, and L. Torenvliet eds.), Springer, pp. 507–514. Turing, A.M. [1936], “On Computable Numbers with an Application to the Entscheidungsproblem”, Proceedings of the London Mathematical Society, Series 2 42, 230–265, reprinted in M. Davis [1965], The Undecidable, pp. 116–151. Vergis, A., Steiglitz, K., and Dickinson, B. [1986], “The Complexity of Analog Computation”, Mathematics and Computers in Simulation 28, 91–113.
Church’s Thesis and Physical Computation
219
Wang, H. [1993], “On Physicalism and Algorithmism: Can Machines Think?”, Philosophia Mathematica 1(2), 97–138. Wittgenstein, L. [1980], Remarks on the Philosophy of Psychology, vol. I, Blackwell, Oxford, transl. by G.E.M. Anscombe.
Janet Folina∗
Church’s Thesis and the Variety of Mathematical Justifications 1. Introduction My son goes to a Montessori pre-school, and his classroom has both a “binomial cube” and a “trinomial cube”. It’s great: the kids get to play with blocks and do mathematics at the same time. But what is the nature of the mathematics they are so doing? The point of materials like these is to provide sensorial access to mathematical concepts and numerical equations. But does successfully putting the blocks of, say, the binomial cube together constitute a proof that (a + b)3 = a3 + 3ab2 + 3a2 b + b3 ? What are the relationships between the various entries into mathematics, types of mathematical understanding and justification, and proofs? Along with binomial and trinomial cubes, Church’s Thesis raises questions about how to understand mathematical evidence that falls outside of traditional deductive proofs. It has been argued that Church’s Thesis [CT] is: proved [Gandy 1988]; partly proved [Mendelson 1990, Black 2000]; provable [Black 2000]; potentially provable [Mendelson 1990, Shapiro 1993]; unprovable but true [Mendelson 1963, Folina 1998]; contingent but possibly true [Cleland 1993]; and even false [Bowie 1973, Cleland 1993]. How can there be such a variety of opinions about a mathematical claim? If Church’s Thesis is mathematically proved, or even provable, it is hard to see how anyone could argue that it is contingent, never mind false. Mathematicians do not usually have disagreements of this sort—which suggests that there is a fundamental confusion (or two) here. Are people understanding “Church’s Thesis” in different ways? Are they ∗
J. Folina, Macalester College.
Church’s Thesis and the Variety of Mathematical...
221
understanding “proof” in different ways? It seems that the answer to at least one of these questions must be “yes”, but which one(s)? I think the answer is “yes” to both questions. There are at least two central equivocations underlying the variety of opinions on the status of Church’s Thesis. One muddle is over CT itself; and the other is over the concept of mathematical proof. I will discuss both of these confusions, emphasizing the latter. They both impact the CT literature, and in particular the issue of CT’s justificatory status. The concept of mathematical proof, however, is important not just for understanding CT, but for the philosophy of mathematics more generally. I begin by trying to detangle some of the background concepts.
2. What are some of the Background Issues? I did not go to pre-school, Montessori or otherwise. I had never seen a trinomial cube before a recent parent information session at my son’s school. The visual and tactile representations of mathematical concepts and equations provide a sensorial basis, which is designed to supplement and deepen the eventual cognitive understanding of the standard equation/symbolic-based presentations. There is no question that there are valuable representations, presentations, tasks, activities, and instantiations that are not part of standard “rigorous”, axiomatic, or “extensional”, mathematics. What is still in question, however, is just what are the relationships between intensional and extensional concepts in mathematics; between visual representations and deductive proofs; between understanding and justifying; between informal intuitive concepts and precise definitions. Furthermore, these different relationships can get entangled in the arguments concerning the status of CT. I will first try to distinguish some of the main underlying questions, as a step towards clarifying what the disagreements are really about. a. Intuitive, informal concepts versus precise, rigorous definitions. In 1963 Mendelson defended the traditional conception of CT as true but unprovable. In 1990 he made the surprising argument against his earlier stance. The later argument disputes the distinction between precise and imprecise mathematical concepts on which the traditional conception of CT (which he had earlier defended) is based. And, rather than argue that effective calculability is precise,
222
Janet Folina
he argued the opposite in 1990—that it’s no less precise, or no more vague, than many other concepts used in mathematics. So, he concludes, the prejudice in favor of set theoretic definitions is misguided. The problem with this line of thinking, however, is that it ignores the progress made in mathematics during the last 200 or so years. Has no progress really been made? The thought that set theoretic definitions are only arbitrarily regarded as clearer than the intuitive notions of calculability, function, and so on, seems wrong. But it is worth noting that there is a similar, though not equivalent, distinction that is used to make an opposing claim. b. Intensional versus extensional concepts. Extensional concepts, in the sense explained by Nicolas Goodman [1986], are those that can be clearly defined in mathematics so that the truth-values of sentences containing the relevant terms “are determined only by the denotations or referents [...] and not by their connotations or senses.” [Goodman, p. 475] Intensional mathematical concepts, he argues, include “provable”, “it is known that”, “able to be calculated”, etc. Viewed under this distinction, Church’s Thesis can be seen as identifying an intensional concept (effective calculability) with an extensional one (lambda definable functions, Turing computable functions, etc.). This distinction may appear to be a more specialized version of the above between intuitive and precise mathematical concepts. Goodman then argues that there is more to mathematics than what can be articulated with extensional concepts. For example, “talk about” mathematics is a genuine part of mathematics, though it cannot in general be reduced to or replaced by extensional language without loss of content [section 1]. This includes CT. Now one might expect Goodman to argue here, similarly to Mendelson [1990], that since there is more to mathematics than what’s extensional, there is more to mathematical proof than extensional, set theoretic, proofs. But instead Goodman argues that if interpreted as a strict identity, then CT is false, for “talk about the algorithm is not merely talk about the behavior of a machine.” [p. 484] And he believes that this shows that effective calculability, or algorithm, and Turing computability are not strictly identical. Well, “talk about” is rather vague. But I think the main claim is that owing to essential differences between intensional and extensional concepts, identifications such as CT cannot be thought of as strict synonyms.
Church’s Thesis and the Variety of Mathematical...
223
Yet it is not at all clear that this is the correct way to understand CT in the first place. The issue is muddied by Church himself who uses two intuitive concepts in his original paper: “the intuitive concept” [section 1; title of section 7] and “effectively calculable function of positive integers” [section 7]. Whether or not the original intuitive concept is truly intensional, however, seems to depend on which of these concepts is being explicated. As Mendelson [1990] points out, since the concept of function is already extensional (given mathematical developments in the 18th and 19th centuries), the concept of “effectively calculable function”—though intuitive—may not be intensional at all [p. 225]. There is thus some reason to think that CT was intended all along as a connection between two extensional concepts; and the aim was merely to come up with an account “which is thought to correspond satisfactorily to the somewhat vague intuitive notion.” [Church section 1] The “notion of computability” may well be intensional, as Bowie [1973] also argues. But that may be beside the point of CT. c. Conceptual analyses versus partial definitions. Owing to the above differences, Goodman also argues that CT is not a conceptual analysis. An extensional account “at most provides a necessary condition for the existence of an algorithm [...] However, a Turing machine program without additional explanation is not an algorithm, and an algorithm is not as it stands a Turing machine program.” [p. 487] Owing to what might be called the “residue” in the intensional concept of “algorithm”, the various extensional accounts may denote the same functions as the intuitive concept, but they do not provide a full analysis of the intuitive concept. So, as with the argument above, CT is misunderstood as a precise rendering of an imprecise concept. It may not be immediately obvious what’s so objectionable about calling CT a “conceptual analysis”, rather than a claim about extensional equivalence. But it does seem true that any one of the mathematical procedures proposed is narrower than the general idea (with the possible exception of Turing computability, which seems both quite general and intensionally closer). It is the distinction between how we calculate and what gets calculated. “Effective calculability” seems to be about a way of calculating, a kind of procedure. The extensional accounts are by nature indifferent to the way the functions are computed. I’m not sure, however, about the depth or significance
224
Janet Folina
of this distinction, especially if CT is, as Mendelson [1990] asserts, really a set-theoretic identification—between two extensions. d. Understanding versus justifying. This distinction as I will use it comes from outside the CT literature, but it is relevant to the general issue of mathematical justification. For example, Giaquinto’s work (e.g., [1992]) on the use of diagrams, and in particular visualizing them, shows that there may be gradations of justification in geometry alone. Some appeals to diagrams seem genuinely nonjustificatory, and may be strictly preliminary to justifying. For example, diagrams can enhance or even generate an understanding of a mathematical concept or theorem. Thought experiments using diagrams might convince one of a claim without actually justifying it. Diagrams can also assist in fixing the singular terms used in a proof (as is done in traditional Euclidean geometry proofs). But there are also cases where the diagram, or our thinking about it, carries genuine justificatory weight, in that it can enable us to discover a geometric truth. I will consider Giaquinto’s example below. For the moment I will simply agree with him that visualizing can carry justificatory weight without constituting a proof. A related issue is that of understanding more generally in mathematics, which is more than following each individual step of a proof. This might include understanding the sense of the whole of a proof; of the point of a theorem; of a theorem’s role in the larger system of mathematics; or having a sense of which theorems might be fruitful, and thus worth proving. None of this is articulable in what Goodman calls “extensional” math. So of course he is right that extensional mathematics is a mere fragment of mathematical activity in general. The implications of this for CT are addressed below, in sections 4–6. First let us also clarify what Church’s Thesis is, or was.
3. What was CT? Church’s original aim was to propose a mathematical definition of the informal concept of “effective calculability”; and the definition he proposed was lambda definability. “The purpose of the present paper is to propose a definition of effective calculability which is thought to correspond satisfactorily to the somewhat vague intuitive notion in terms of which problems of this class are often stated [...]” [Church 1936, section 1]. Calling this proposal “Church’s Thesis” is
Church’s Thesis and the Variety of Mathematical...
225
thus in many ways already misleading. Other candidates for defining the concept appeared around the same time, including Turing’s idea of a Turing machine and G¨ odel’s concept of recursive function. Some supporting arguments were more persuasive than others that the extensional definition successfully captures the informal concept. For example, Turing’s argument is generally recognized as very persuasive. But nobody at the time regarded himself as stating a thesis that might later be mathematically proved or disproved. Even if, as Black claims, what is thesis-like about CT is the implicit claim that the definition is correct of the intuitive notion [Black p. 244], such a thesis is not mathematical in the sense of something like Goldbach’s conjecture. Rather than a claim about mathematical objects such as numbers or sets, CT (insofar as it is an assertion) is a claim about a concept. Assessing it requires conceptual analysis; and though the concepts involved are mathematical, the method of conceptual analysis is more philosophical than mathematical. Of course, mathematical methods can be used to assess a definition. For example, a definition can be shown faulty if it is inconsistent with the rest of a framework; and it can be shown to yield fruitful and (intuitively) true consequences.1 So we can prove things about definitions; but that is not—at least it is not obviously—the same as proving the definitions. Axioms can be similarly assessed. Calling the proposals a “thesis” is just one issue. Another is that now, when people refer to CT, they do so in more than one way. Owing to the provable formal equivalence of the main extensional proposals, people feel free to use them interchangeably. This is perhaps harmless, but it is odd to refer to the proposals of Turing or G¨odel as “Church’s thesis”. More problematically, the “intuitive” side of the proposed equivalence varies in the discussion literature between “algorithm”, “effectively calculable function of positive integers”, “effectively calculable mathematical function”, “effective calculability”, “effective computability”, “effective procedure” to simply “computability”. Church himself used both the first and the third as his “target” concept, but they seem very different in that (as I explained above) one seems “extensional” and the other seems “intensional”. The problem with articulating the target concept differently is that it leads people to make different claims about CT. 1
Thanks to David McCarty for this objection.
226
Janet Folina
For example, Cleland [1993] discusses three different interpretations of the proper domain of the Church–Turing thesis which have appeared in the literature: “(1) the number theoretic functions; (2) all functions; (3) mental and/or physical phenomena.” [Cleland, p. 285] She then argues that though the thesis is not provable on any interpretation, on interpretation (3) it is false; on interpretation (2) it could be false (and it is contingent); and even on interpretation (1) the conceptual analysis is not clear enough to determine a truth value. She believes we can envision “mundane procedures” that yield causal processes, which cannot be ruled out as possibly “computing” number theoretic functions deemed uncomputable by the Turing analysis. [pp. 286–7] The mundane procedure she investigates in her paper, as I suppose a kind of analogy to undermine even interpretation (1), is that of following a recipe in a cook book. However, to class “mundane” procedures, such as following a cooking recipe, as on par with adding and subtracting positive integers seems based on a rather thin analogy— one that is far from the original point of CT. Indeed, Cleland’s version of the “target” intuitive concept of CT throughout most of the paper seems to be “effective procedure”. This does link Turing machines and recipes since she analyzes a procedure intensionally, as “something to be followed”; and she interprets “effective” in a highly general way too. [p. 287] But these links are not very useful, or informative. As Rosser points out, “effective” in CT “is used in the rather special sense of a method each step of which is precisely predetermined and which is certain to produce the answer in a finite number of steps.” [1939, p. 225] Sure, there is something in common between following a recipe and calculation—they are both embodiments of some kind of rule-following behavior. But the class of things-to-befollowed is simply too general and broad to be a plausible substitute for “effectively calculable function”. In other words, mathematical answers are not really like cakes. Thus Cleland’s argument that CT is, or could be, false seems based on an overly wide version of the informal concept articulated by CT. The point was to characterize the nature of a delimited class of mathematical functions. It was not intended to apply to nonmathematical procedures such as causally manipulating medium sized physical objects. As she really notes [p. 295]. Yet she persists in reasoning analogically, that “we have every reason to suppose that
Church’s Thesis and the Variety of Mathematical...
227
the formal structure of causal processes is potentially richer than the formal structure of Turing machine programs, and, hence, good reason for withholding judgement on even the narrowest version of the Church–Turing thesis [...] the limits to computation are not logical; they are causal.” [p. 309] I would have said exactly the opposite: the limits to computation are not causal; they are logical (and mathematical). Starting with a variant of the intuitive concept targeted—effective procedure vs. effective calculability (or “effectively calculable function of positive integers” [Church, p. 100])—thus leads Cleland to mis-assimilate the procedure of baking a cake with that of calculating 3 + 3. Shifting the target intuitive concept in this way, and perhaps giving too much credit to mi sinterpretations of CT in the literature, only further obscures the nature of CT.
4. What is a Proof? There is more to be said about the above issues, but for the rest of this paper I will focus on another equivocation, which is significant not just for CT but for the philosophy of mathematics more generally. This concerns the concept of mathematical proof. What I will call the “Euclidean” (though perhaps only recently spelled out) concept of mathematical proof is that a proof is a deductively valid argument in some axiomatic (or suitably well defined) system. A proof need not be completely formal or symbolic; and it may be a proof sketch, which jumps over many smaller steps. But a central requirement of a mathematical proof in this sense is the idea that, if spelled out, the inference steps are all valid deductive steps, each of which is syntactically checkable (in principle). Differing opinions over the justificatory status of CT—in particular the debate between those who think it provable and those who think it unprovable but true—can often be traced back to different opinions over the concept of proof. Those who argue that CT is provable do so in one of two ways. Either they explicitly claim that the Euclidean concept of proof is too narrow, or too rigid; and they urge its enlargement to include a wider variety of mathematical justifications. Or they simply presuppose a wider, broader conception of proof in their arguments. Either way, in order to argue that CT is provable, they must support (implicitly or explicitly) a “nonEuclidean” concept of mathematical proof. Now, calling this an
228
Janet Folina
“equivocation” is perhaps too strong, for it’s really a disagreement, or a lack of consensus. But my point is that disagreements about CT—about its justificatory status—are sometimes not about CT at all, but about the underlying concept of mathematical proof. In this sense there is something like an equivocation in the literature (if not always within individual papers). I will argue below that the debate over the justificatory status of CT thus involves this more general problem in philosophy of mathematics, which boils down to a lack of clarity, precision, and/or agreement over the central concept of mathematical proof. This lack of agreement muddies discussions of CT, picture proofs, etc. I will also argue that there are good reasons to use a narrow concept of proof and try to clarify the nature of the epistemic contributions made by other kinds of justifications. Rather than assuming that all mathematical justifications are or must be “proofs”, a finer-grained taxonomy here might lead to a better understanding of the differences between the epistemic contributions made by the variety of justifications in mathematics. A narrow concept of mathematical proof, far from yielding a narrow concept of mathematical justification, can actually encourage us to enrich it—by investigating the different kinds of justifications that can be understood as mathematical. Whatever light has been shed on the nature of justifications that fall outside the traditional twentieth century concept of “proof” (e.g., by arguments concerning the proof-status of CT, or concerning “picture proofs”), this light does not undermine the traditional conception of proof itself. So it does not undermine the traditional conception of CT as in principle unprovable.
5. How does the Concept of Mathematical Proof affect Discussions of CT? A striking feature of the debates about whether or not CT is provable is that people appear to take the same facts and draw opposite conclusions from it. Let us focus on the Mendelson/Black view that the informality of “effective calculability” does not prevent us from proving CT. And let us contrast this with Goodman’s view that the informality/intensionality of “effective calculability” prevents CT from being (strictly) true. I will assume that we cannot prove false things (and in any case both Mendelson and Black be-
Church’s Thesis and the Variety of Mathematical...
229
lieve that CT is true) so these two views are inconsistent conclusions about the same facts. Complicating matters we have two informal concepts to consider in determining whether or not part or all of CT can be proved: “effectively calculable function” and “proof”. In order to know what can be proved about effectively calculable functions we must know what it means to effectively calculate; but we also must know what it means to be a proof. Both concepts are informal, intensional, and contentious. And they are intertwined. For example, on the traditional account of proof, when fully written out the validity of each proof-step is not a matter of interpretation. In other words each proof step is syntactically determined, mechanically checkable, i.e., checkable by a Turing machine—if, that is, CT is true! In arguing that CT only offers a partial reconstruction, and thus not an analysis, Goodman focuses on the informal nature of “effective calculability”—which is, after all, a concept about a type of mathematical process. When Mendelson and Black argue that CT is partially or wholly proved or provable, they are calling into question the informal concept of proof. They are calling for a wider (partial?) analysis, one that allows non-deductive steps. Both arguments play on the difference between mathematics and formal extensional mathematical methods. Both argue that it would be profitable to pay more attention to the informal arguments and “intensional” concepts—the non-extensional aspects—of mathematics. And yet they draw contradictory conclusions about CT. The question is who is right? Perhaps neither. If Mendelson is right that the explicatum of CT is itself extensional—in that it is a concept about a sub-class of functions, where “function” is to be understood extensionally— then Goodman’s argument may represent a misunderstanding of the point of CT. Perhaps Goodman is in part trying to correct this misunderstanding, but his claim that CT is not strictly true depends on understanding the explicatum of CT intensionally. On the other hand, the Mendelson/Black view that CT is so far from false it is provably true depends too heavily on a reconstrual of the concept of “proof” as extremely open-ended and informal. Even if CT is a true connection between two extensional mathematical notions, one of them is still intuitive and undefined (apart from CT itself). So it cannot play a role in any proof which requires that its cen-
230
Janet Folina
tral concepts be mathematically defined. Yet the arguments for a more open-ended concept of “proof”—one that doesn’t require that its central concepts be defined, perhaps—are faulty or nonexistent. When arguments exist they typically appeal to the fact that there are convincing informal arguments such as the evidence on behalf of CT or what one can “clearly see”. And they urge that since these are convincing mathematical reasons, they are genuine mathematical justifications. So they must be proofs. Now, this way of summarizing these types of arguments may seem ad hominem, because, as represented, the arguments involve a rather obvious conflation between what is mathematical and what is provable. Or, another way to put it, as represented, they depend on a false dichotomy between what is mathematically provable and what is not mathematical at all. But this dichotomy is implicitly presupposed rather regularly in arguments for a less strict concept of mathematical proof. The reason, I believe, is that we have as yet no good way to articulate and understand the epistemic contributions made by mathematical justifications that are not proofs, i.e., informal, non-deductive mathematical evidence. In the final section of this paper I will make a case for more work on this interesting topic. First, however, I will sketch why neither Mendelson nor Black shows that CT is provable. (i) Mendelson’s arguments Mendelson’s strategy has two parts. First, he aligns CT with other things he dubs “theses” that are not ordinarily regarded as theses at all. In fact, in explaining these other theses Mendelson articulates them as “definitions” and “notions”. For example he includes as “theses”: the modern extensional “definition” of function; Tarski’s “definition” of truth; the model-theoretic “definition” of validity; Weierstrass’ “definition” of limit; the “notion” of measure; the “definition of dimension in topology”; the “definition” of velocity; etc. [pp. 231–232] So Mendelson’s first step is to widen the notion of a “thesis” to include “definitions” and “notions” that have been accepted. Second, Mendelson claims that we can “prove” at least some of these theses. But in so claiming he presupposes, rather than establishes, a very broad, weak concept of proof that includes all sorts of mathematical evidence—evidence which on the traditional con-
Church’s Thesis and the Variety of Mathematical...
231
ception we would regard as having a different pedigree from genuine proof. For example, to support his claim that “it is completely unwarranted to say that CT is unprovable” [p. 232] he first asserts (i) that it “deserves” to be accepted just as the other definitions above deserved to be and are accepted. But of course that a definition deserves to be accepted, or even is accepted, doesn’t constitute any sort of proof. He also argues (ii) that CT is not a connection between a vague notion and a precise definition. Rather, he asserts that all mathematical concepts, including that of set, are vague. However, even if this were true, it would not show that CT or any of the other definitions cited are the kinds of things that can be proved. An alternative view is that they are each part of the framework for the proofs in the relevant system. Perhaps to deflect this rejoinder, Mendelson also argues (iii) that there are proofs connecting intuitive and precise mathematical notions (despite having just cast suspicion on such distinctions). He seems to offer the fact that CT is “acknowledged to be obvious” in its easier direction as a support for the existence of such proofs. But (obviously) that something is obvious is also not a proof, so Mendelson argues that “we can describe” procedures to compute the initial functions; and “we can describe procedures that will compute the new functions” produced by substitution and recursion. [p. 233] This is surely true; but it also depends on an intuitive recognition that the procedures so described fall under the intuitive concept of effectively calculable function. That is, it still depends on a version of CT; and thus cannot prove it. Mendelson defends his argument thus: “The fact that it is not a proof in ZF or some other axiomatic system is no drawback; it just shows that there is more to mathematics than appears in ZF.” [p. 233] Again, it is of course true that there is more to mathematics than appears in ZF, but it has not been shown, or even argued, that there is more to mathematical proof than appears in ZF or any other axiomatic (or otherwise well-defined semi-formal) system. Mendelson’s final complaint (iv) is that “the usual viewpoint concerning CT is that it assumes that the only way to ascertain the truth of the equivalence asserted in CT is to prove it [... whereas] equivalences between intuitive notions and apparently more precise mathematical notions often are simply ‘seen’ to be true without proof [...]” [p. 233]. However, what he proposes is the “usual” viewpoint.
232
Janet Folina
It is the view that CT provides part of the framework for a system and doesn’t require proof. It is no support for Mendelson’s claim that there is nothing in principle barring CT from being proved. (ii) Black’s rendition of the arguments By asserting that important “definitions” or “notions” can be proved Mendelson is clearly operating with a very broad concept of proof. Black [2000] offers an argument largely parallel to Mendelson’s; and like Mendelson he wants to extend the concept of proof. One of Black’s arguments is, with Mendelson, that the “easier” half of CT is proved; and he quotes Mendelson’s argument as well as one from Shoenfield. But the latter’s argument, just as Mendelson’s, depends on the assertion that the initial recursive functions (addition, multiplication, etc.) are “clearly calculable”—which Black refers to only as a “remark” made by Shoenfield [Black, p. 249]. Yet this, again, presupposes the base cases of this version of CT and does not, therefore, prove it. Something can be clear without being (thereby) proved. Black’s second argument is reminiscent of another claim made by Mendelson, that there is more to mathematics than what is provable in ZF. Black’s version is the following. “It is in any case clear that one can do mathematics without even implicit reference to formal axiomatic systems. Arithmetic was done like this for centuries.” [p. 250] But it turns out that much of what was “done” is provable by current standards, as Black admits. In any case not all that one “does” in mathematics is prove things; so this argument, as well as Mendelson’s, appeals to the false dichotomy between what is mathematically provable and what is not mathematical at all. Black, third, claims with Mendelson that “Dedekind’s thesis”— that the second-order Peano Axioms capture the intuitive natural number structure—“is proved in both directions.” [p. 250] So he believes that we can prove that axioms are true of an intuitive structure. His evidence? “It is clear that the natural numbers satisfy the Peano axioms, and by the categoricity of those axioms any system satisfying them must be isomorphic with the natural numbers.” [p. 250] However, again, “clear” does not mean proved. Black similarly claims that completeness proves “Tarski’s thesis” that “truth in every model coincides with intuitive validity.” [p. 251] This seems off-base too. What completeness proves is that for a given formal
Church’s Thesis and the Variety of Mathematical...
233
system of first order logic the theorems of that system are those formulas that are true in every model. “Intuitive validity” is not in this proof. Concerning the traditional conception of CT as unprovable, Black says: Given a suitably narrow account of what is required for proof, I think there is a sense in which this is correct: our reasons for believing Church’s thesis are not and could not be, that it is, say, a theorem of ZF. But from this it does not follow that those reasons are, or must ever remain merely a posteriori, empirical or inductive, or in no way deserving of the title ‘proof’. [p. 244]
So for Black there are two concepts of proof: a narrow account and a wider one. On the wider account, the title “proof” is one that can be more or less deserved by arguments depending on the degree to which they are not “merely a posteriori, empirical or inductive”. But if this is an argument against the narrow account of “proof”, it is not very persuasive. Why would anyone think that from a narrow conception of proof it would “follow” that any other evidence for a mathematical claim must be merely a posteriori ? If anything, it is the wider conception of proof that is motivated by this way of drawing the distinction. And how does this argument support a broader conception of proof apart from some sort of confused appeal to emotion such as: these reasons (evidence for CT or other definitions) are a genuine part of mathematics; but if not called “proofs” they will be thrown out as merely a posteriori or inductive; so we ought to call them proofs; they deserve the title. This is potentially persuasive only if the sole alternative really is to dismiss all evidence that falls short of what we decide to title “proof”—and maybe not even then. But this is not the only alternative. Consider the evidence for Goldbach’s conjecture that every even number is the sum of two primes. Every case tested so far verifies it. So there is a mass of “merely inductive” evidence. But because each case tested is itself a mathematical equation the evidence is inductive and a priori at the same time. So there is evidence that is mathematical, a priori, and inductive in nature. Yet despite the massive amount of evidence, and its apriority, it is not regarded as “deserving of the title ‘proof’.” Interestingly, Black complains that I reject the informal arguments connecting intuitive and precise concepts, advocated by him-
234
Janet Folina
self and Mendelson, as mathematical proofs for the reason that arguments like these “are not strictly speaking mathematical [...]”. But this way of putting it is a misunderstanding my view. In fact, it attributes to me the false dichotomy I charge him and Mendelson [1990] with making. In the paper cited I remarked, “There is more to mathematics than rigorous [...] proofs [...] not every mathematical argument is a proof.” [p. 322] The evidence for CT does not add up to a mathematical proof, not because it is not mathematical. Rather, it is because proofs in mathematics require something more than convincing mathematical evidence. There are a variety of kinds of evidence in mathematics, many of which fail to add up to proofs. Rather than rejecting the evidence Black and Mendelson provide as mathematical, I agree that it is mathematical. I simply think it does not “deserve the title ‘proof’” just because it is mathematical. Admittedly, we use the word “proof” in a wider sense than the deduction of logical and mathematical theorems. “The proof is in the pudding” doesn’t claim that there is a deductive argument inside a dessert. Here “proof” is used in the wider sense of “evidence” (and, of course “pudding” doesn’t mean pudding). “The burden of proof is on the prosecution” and “Guilt must be proved beyond a reasonable doubt” seem similar. These uses of “proof” to mean an acceptable level of evidence are simply another way to use the same word; and the wider conception of mathematical proof strikes me as somewhat similar. However, “proved” in mathematics should mean something specific to the exact subject matter of mathematics; something technical and rigorous. Otherwise, why don’t we consider Goldbach’s conjecture “proved”? We all know that there is more to mathematics than theorem proving in a deductive axiomatic framework. There is also building the trinomial cube. There are mathematical experiments. There are computer verifications. There are new frameworks to invent. There are illuminating diagrams that can be drawn. The conclusion that these activities show that there is more to mathematical proofs than traditional deductions depends on the erroneous belief that there is NO more to mathematical justification than proof. In short, it is motivated by a false dichotomy between what is mathematically provable and what is not mathematical at all. What we really have is a trichotomy (at least): the non-mathematical, the mathematical but not provable, and the provable (theorems). Or perhaps cutting
Church’s Thesis and the Variety of Mathematical...
235
it a little differently, within mathematics there is more than one justificatory status. Mathematical beliefs might be simply unjustifiable; they might have non-mathematical (empirical, pragmatic) justifications; they might have mathematical justifications that are not proofs; and they might have proofs. An epistemology of mathematics ought to include reflection on all of these categories. But I think we suffer most from the paucity of attention given to the third.
6. Concluding Thoughts: The Undeniable Variety of Justifications If we think of the deduction of theorems as the “hard core” of mathematics, which is what I want to call the “proofs”, then disagreements like the above show that some very important and interesting mathematical activity falls outside this core. The difficulty is that this part of mathematics—which we might call with Lakatos its “protective belt”—is very hard to articulate, and thus it’s hard to better understand it. There are also risks, which include blurring Frege’s distinction between logic and psychology; or between the context of discovery and the context of justification. Nevertheless, disagreements such as those surrounding the CT literature indicate the need for a finer grained taxonomy than is given by these traditional distinctions, in order to better articulate and understand this highly significant aspect of mathematics. (For an in-depth and historically detailed analysis of progress in mathematics in terms of its protective belt as well as heuristics, see Hallett [1979].) A place to start is with some clear examples of justifications that are not generally considered proofs. We can consider several wellknown types. (i) Evidence on behalf of axioms. This evidence cannot by definition count as proofs, I assume no matter what concept of proof is held, because evidence for axioms is evidence for accepting the framework for a system of mathematical proofs. But this means neither that there is no such evidence, nor that it is not mathematical. Maddy [1997], for example, discusses some of the evidence mathematicians typically count towards accepting large cardinal axioms, such as fruitfulness and consistency. These are mathematical reasons for the truth (or at least acceptability) of a mathematical assertions (axioms), which are different in kind from the type of evidence pro-
236
Janet Folina
vided by proofs. Of course, people have tried to derive axioms, such as the attempts to derive Euclid’s parallel postulate; and the logicists’ attempts to derive the Dedekind–Peano Axioms.2 But when successful, such proofs make the statements in question theorems of that system, not axioms. (ii) Evidence for central definitions. Contra Mendelson and Black, I would class some of this sort of evidence as very similar to the evidence we count towards for axioms. In addition to being clear and precise (mathematically usable), a definition can be evaluated by its fruitfulness and consistency with other results in mathematics. As mentioned above, definitions can be evaluated with proof-methods; e.g., they can be shown to produce a unique function. But like axioms, definitions are generally judged in terms of the deductions, or results, they produce. This is at the opposite “end”, so to speak, of a proof from that which is inferred. Axioms and definitions are the information put into a proof; theorems are what come out. (iii) Inductive evidence. This provides another type of evidence from the above. The inductive evidence on behalf of CT—that all the plausible extensional definitions are provably equivalent—is somewhat different from the inductive evidence for, say, Goldbach’s conjecture. The first involves proofs linking various explications. The latter is a more straightforward example of induction in that we have many confirmations of a general claim. Every even number so far tested is the sum of two primes. Each of the verifications is mathematical, and a priori. And together the weight of evidence is strong, yet merely inductive. But the evidence for CT is also inductive in that each time a new plausible extensional definition is proved equivalent to the others, there is an increase in confidence that the central concept has been captured. The weight of evidence is increased for each new equivalence proof; and this makes the evidence inductive in nature. But, though a priori and mathematical, the evidence in both of these cases falls short of proof. (iv) Thought experiments and visualizations provide another kind of evidence for axioms, definitions and theorems. For axioms one can imagine our ability to “add one more” to support the Dedekind–Peano Axiom that every number has a successor; or we can imagine dumping two baskets of apples together into a larger basket 2
Thanks again to David McCarty for this objection.
Church’s Thesis and the Variety of Mathematical...
237
to support the union axiom of set theory. The definition of “circle” as the locus of points equidistant to a given point can be “verified”, by visualization, as correct of our original intuitive concept obtained through everyday examples. Giaquinto provides a beautiful example of how visualization can lead to the discovery that “any square x is twice as big as the square whose corner-points are midpoints of x’s sides.” [p. 385] As with confirming the definition of “circle” the evidence here is not really analogous to an experiment, for we are not actually testing the consequences of a conjecture. The visual evidence quite directly and immediately convinces us of a general claim. Let me develop Giaquinto’s argument a bit for he shows that visualizing alone can yield several types of evidence, each of which falls short of a proof. Giaquinto’s main goal is to show that visualization can lead to mathematical discovery. He is clear that by this he does not mean that it automatically leads to proofs. He gives examples of mathematical discoveries without proofs—mentioning discoveries by children, savants and Fermat, who apparently announced many theorems without giving proofs. [p. 384] Ramanujan would be another example along these lines. The point is, if we accept these beliefs as discoveries, the process of discovery must sometimes be different from the process of following, or finding, a proof. Giaquinto then gives a convincing example of how visualization can be a means of geometric discovery. Imagine a square. Each of its four sides has a midpoint. Now visualize the square whose corner-points coincide with these four midpoints [...] Clearly, the original square is bigger than the tilted square contained within it. How much bigger? [...] By visualizing this figure, it should be clear that the original square is composed precisely of the tilted square plus four corner triangles [...] One can now visualize the corner triangles folding over, with creases along the sides of the tilted square. Many people conclude that the corner triangles can be arranged to cover the tilted inner square exactly, without any gap or overlap [...] Assuming [this], you will infer that the area of the original square is twice the size of the tilted inner square. [p. 385]
238
Janet Folina
Key to the inference is the recognition of no overlap in the “folded layer”, nor any gap between it and the square under it. Giaquinto argues that this key belief comes directly from the visualization; furthermore, it yields a kind of knowledge that one can call “discovery”—though it does not yield public demonstration, or proof. Giaquinto also considers other functions that diagrams and visualizing might play in mathematics, including inner experiments and inductive inferences from sense experiences. [sections 4, 5] He agrees that we can arrive at geometric beliefs in both of these ways. But these ways of acquiring beliefs would not count as geometric discoveries because they would not be about the “perfect geometric forms” but either the forms as represented by the phenomenal experience (in the inner experiment), or the forms as imperfectly instantiated in the physical world. [p. 389, 391] In thus distinguishing his example as a case of bone fide discovery, he gives us two other ways we might use visualization to acquire mathematical beliefs. There would be more to consider in a detailed study of this interesting paper, but my point is this. If visualizing in geometry alone can yield different kinds of mathematical evidence (depending on the specifics of the case and on how it is used), then there is much more to say about the variety of evidence that lies below the level of public, demonstrable proof in mathematics more generally. Just as there are stronger and weaker justifications for empirical claims, there are stronger and weaker mathematical justifications. Now we might take these different justifying activities in mathematics and try to rank, or classify, them according to the weight they bring to bear on an assertion. But I don’t think this will be terribly fruitful. Visual evidence might be completely definitive in one case (e.g., some of the “picture-proofs” of the Pythagorean theorem); and more like a confirmation in a different setting (e.g., imagining snipping off the angles of a triangle and re-arranging them into a straight line). I am thus wary of trying to rank types of justification in this way. We might, however, try to rank what’s justified; and articulate the various ways we do justify claims at that level. For example here are four different kinds of mathematical assertions.
Church’s Thesis and the Variety of Mathematical...
239
(i) Hunches may be based on almost anything including “intuitions”; resemblances; mathematical and physical analogies (including isomorphisms); empirical experiments; and inductive evidence. (ii) Conjectures should be more supported than mere hunches or even collective agreement. A conjecture announces the judgment that something should be provable; this view may be supported by empirical experiments, thought experiments, inductive evidence, fitness with other results, etc. (iii) Theorems are justified by rigorous proofs, or public proofsketches that can be elaborated into deductive arguments from accepted axioms (or other first principles) and definitions alone. (iv) Axioms are based on collective agreement or intuitive judgments shared by a community that the statement ought to be part of the basic level of a mathematical framework; specific virtues include fruitfulness, intuitiveness, coherence, and consistency. This is not offered as a complete list. But it shows why we should agree that looking more closely at the “protective belt” and “heuristics” of mathematics will be interesting and fruitful. There is more than one kind of assertion in mathematics and there is more than one kind of evidence to go along with the different assertion-types. Of these kinds of evidence, proofs provide the strongest, strictest possible, most distinctive type of evidence in mathematics (relative to a framework of course). My proposal is to understand “proof” in the narrow, deductive, or “Euclidean”, sense. Rather than denigrating other kinds of mathematical justifications, the rigorous conception draws attention to the protective belt. It encourages us to reflect on a wider set of justifications, in an aim for a more complete epistemology of mathematics. Different kinds of justifications provide different kinds of support in mathematics; and—even if we do not yet have a good understanding of them—there is no reason to lump them all under the category of “proofs”. Just as CT is a prelude to a proof, so the traditional conception of proof is a prelude to a better understanding of the epistemic contributions made by the variety of mathematical justifications, including those supporting CT itself.3 3
Thanks for constructive feedback on a draft of this paper go to Geoff Gorham, David McCarty, and Rosamond Rodman.
240
Janet Folina
References Black, R. [2000], “Proving Church’s Thesis”, Philosophia Mathematica 8(3), 244–258. Bowie, G.L. [1973], “An Argument Against Church’s Thesis”, The Journal of Philosophy 70(3), 66–76. Church, A. [1936 (1965)], “An Unsolvable Problem of Elementary Number Theory”, [reprinted from:] The American Journal of Mathematics 58(1936), in Davis [1965, pp. 89–107]. Cleland, C. [1993], “Is the Church–Turing Thesis True?”, Minds and Machines 3, 283–312. Davis, M. (ed.) [1965], The Undecidable Raven Press, Hewlett, New York. Folina, J. [1998], “Church’s Thesis: Prelude to a Proof”, Philosophia Mathematica 6(3), 302–323. Gandy, R.O. [1988], The Confluence of Ideas in 1936, in The Universal Turing Machine, (R. Herken ed.), Oxford University Press, pp. 55–111. Giaquinto, M. [1992], “Visualizing as a Means of Geometrical Discovery”, Mind and Language 7(4), 382–401. Goodman, N. [1987], “Intensions, Church’s Thesis, and the Formalization of Mathematics”, Notre Dame Journal of Formal Logic 28(4), 473–489. Hallett, M. [1979], “Towards a Theory of Mathematical Research Programmes” (I and II), The British Journal for the Philosophy of Science 30, 1–25 and 135–159. Maddy, P. [1997], Naturalism in Mathematics, Clarendon Press, Oxford. Mendelson, E. [1963], “On Some Recent Criticisms of Church’s Thesis”, Notre Dame Journal of Formal Logic IV(3), 201–205. Mendelson, E. [1990], “Second Thoughts about Church’s Thesis and Mathematical Proofs”, The Journal of Philosophy 87(5), 225–233. Rosser, J.B. [1939 (1965)], “An Informal Exposition of Proofs of G¨ odel’s Theorem and Church’s Theorem”, reprinted from The Journal of Symbolic Logic 4 [1939], in Davis [1965, pp. 223–230].
Church’s Thesis and the Variety of Mathematical...
241
Shapiro, S. [1993], “Understanding Church’s Thesis, again”, Acta Analytica 11, 59–77.
Andrew Hodges∗
Did Church and Turing Have a Thesis about Machines? This article draws attention to a central dispute in the interpretation of Church’s Thesis. More precisely, we are concerned with the Church–Turing thesis, as it emerged in 1936 when Church endorsed Turing’s characterization of the concept of effective calculability. (The article by Sieg in this volume details this history. It is valuable also to note from Krajewski, also in this volume, that the word ‘thesis’ was used only in 1952.) This controversy has a scientific aspect, concerning the nature of the physical world and what can be done with it. It has a historical aspect, to do with the ‘confluence of ideas in 1936’. We shall focus on the historical question, but it is the continuing and serious scientific question that lends potency to the history. The principal protagonist in this matter is the philosopher B.J. Copeland, who when writing with his colleague D. Proudfoot for a wide readership in Scientific American, denounced a prevailing view of Church’s Thesis as ‘a myth’ [Copeland and Proudfoot 1999]. Copeland has made similar assertions in numerous leading articles for journals and works of reference, e.g. [Copeland 2000; 2002; 2004]. What is this myth? It is that the Church–Turing thesis places any limitation on what a machine can do. On the contrary, according to Copeland and Proudfoot, ‘Church and Turing claimed only that a universal Turing machine can match the behavior of any human mathematician working with paper and pencil in accordance with an algorithmic method—a considerably weaker claim that certainly does not rule out the possibility of hypermachines.’ ‘Hypermachines’ are defined by Copeland to be physical machines that outdo Turing ∗ A. Hodges, Wadham College, University of Oxford, Oxford OX1 3PN, U.K., .
Did Church and Turing Have a Thesis...
243
computability; Copeland and Proudfoot insist that Turing ‘conceived of such devices’ in 1938. The origin of this argument lies in the work of the logician Robin Gandy, who had himself been Turing’s student. His article ‘Principles of Mechanisms’ [Gandy 1980] distinguished Church’s thesis from what he called ‘Thesis M’, the thesis that what a machine can do is computable. Gandy [1988] also emphasised that Turing’s original argument was drawn from modelling the action of a human being working to a rule, and was not based on modelling machines, as may indeed readily be seen from [Turing 1936]. Gandy criticised Newman [1955] for saying that Turing embarked on ‘analysing the general notion of a computing machine’. There are good reasons for Gandy’s emphasis on this distinction, and for others to follow him in emphasising the model of human computation. One is that by the 1970s it was quite a common assumption that the digital computer already existed when Turing made his definition, and that he had written down an abstract version of it. This grossly understates Turing’s achievement: the digital computer arose only ten years after Turing wrote his paper, and it can be argued that his ‘universal machine’ supplied the principle on which the digital computer was based (directly, in his own plans, and indirectly, in von Neumann’s.) Another reason is that it must be appreciated that Turing was addressing Hilbert’s question about methods that can be applied by human mathematicians. Another reason lies in observing that Turing’s discussion of human memory and ‘states of mind’ makes his 1936 work basic to the cognitive sciences, and in particular to his own later discussion of artificial intelligence. Despite all this, however, this distinction was not clearly drawn by Church or Turing in that period of the ‘confluence’. The evidence comes from Church’s review of [Turing 1936] in which he endorsed Turing’s definition. Indeed the main point of this article is simply to bring this review [Church 1937a] to greater prominence. A full transcription of the review is given in [Sieg 1997, in this volume]. We need only the relevant opening paragraph. The author [Turing] proposes as a criterion that an infinite sequence of digits 0 and 1 be “computable” that it shall be possible to devise a computing machine, occupying a finite space and with working parts of finite size, which will write down the sequence to any desired number of terms if allowed
244
Andrew Hodges to run for a sufficiently long time. As a matter of convenience, certain further restrictions are imposed in the character of the machine, but these are of such a nature as obviously to cause no loss of generality—in particular, a human calculator, provided with pencil and paper and explicit instructions, can be regarded as a kind of Turing machine.
It is apparent that Church was unaware of Gandy’s distinction between the Church–Turing thesis and Thesis M. Indeed, if Church had actively set out to cultivate the ‘myth’ strenuously denounced by Copeland, he could hardly have done so more effectively. For Church’s words not only referred to machines, but actually claimed a definition of computability in terms of the properties of machines, considered as ‘devised’ objects with a ‘size’ in ‘space’. Note that Church could not have been using the word ‘machine’ with an implicit restriction to the Turing machine, because, by definition, he was introducing this new concept to readers ignorant of it. (Indeed, the expression ‘Turing machine’ was coined in this review.) ‘Computing machine’ here means any machine at all (of ‘finite size’) which serves to calculate. This assumed generality is confirmed in the immediately following article [Church 1937b] in the Journal of Symbolic Logic, where Church reviewed Post’s independently conceived formalism of a rule-following ‘worker’. Church criticised Post for requiring a ‘working hypothesis’ that it can be identified with effective calculability. He specifically contrasted Post’s formalism with Turing’s, and referring to Turing’s paper, wrote: To define effectiveness as computability by an arbitrary machine, subject to restrictions of finiteness, would seem an adequate representation of the ordinary notion, and if this is done the need for a working hypothesis disappears.
It was therefore the very generality of Turing’s machine concept, not its particular formalization, that led Church to commend it. Copeland goes much further than Gandy and holds that Church and Turing positively excluded machines from their thesis. How can Copeland reconcile his assertion with the fact that Church based his observations on the concept of ‘an arbitrary machine’ ? In [Copeland 2002] there is no mention of Church’s review of [Turing 1936], and the problem is thus avoided. But Copeland does there cite the immediately following review of Post, with the same quotation as given above, and with the following gloss:
Did Church and Turing Have a Thesis...
245
[...] he is to be understood not as entertaining some form of thesis M but as endorsing the identification of the effectively calculable functions with those functions that can be calculated by an arbitrary machine whose principles of operation are such as to mimic the actions of a human computer. (There is much that is ‘arbitrary’ about the machines described (independently, in the same year) by Turing and Post, for example the one-dimensional arrangement of the squares of the tape (or in Post’s case, of the ‘boxes’), the absence of a system of addresses for squares of the tape, the choice between a two-way and a one-way infinite tape, and, in Post’s case, the restriction that a square admit of only two possible conditions, blank or marked by a single vertical stroke.)
This gloss, with its bizarre interpretation of the word ‘arbitrary’, achieves Copeland’s reconciliation only by reversing the sense of Church’s statement. Church specifically described Turing’s human calculator as a particular example of a machine, not as the definitive form. Note that Church’s explicit use of the word ‘human’ confirms that his general setting for effective calculation is not necessarily human. As a summary of [Turing 1936], Church’s review was notably incorrect. Turing had not even referred to machines of finite size, let alone defined computability in terms of their alleged powers. One might criticise Church’s review in other ways: he omitted the emulation of mental states which is a striking feature of Turing’s analysis. Stipulating a finite number of working parts, rather than a finite size, would better indicate the finite number of configurations of a Turing machine. Moreover, Church’s over-briefly stated condition of finiteness fails to bring out that the working space (the ‘tape’) must not be limited. Only a finite amount of space is used in any calculation, but there is no preset bound on how much may be demanded. A completely finite machine must repeat itself: one aspect of Turing’s breakthrough was that he saw how to keep a finiteness of specification but escape this limitation. I owe to Wilfried Sieg [2005] the observation that G¨odel followed Church and ascribed to Turing an analysis of machines. There seems to be no obvious answer to the question of why both Church and G¨odel imputed to Turing’s analysis something that was not actually there. Another question arises when we imagine Turing at the Grad-
246
Andrew Hodges
uate College in Princeton in 1937, reading Church’s review. Did he recoil with horror, seeing it as a travesty of his achievement, or did he see it as a legitimate variant or development of his theory? If Turing had regarded it as seriously misrepresenting his ideas, he would in not have been deterred from saying so by Church’s seniority. He was shy socially but very confident of his own judgment in all sorts of matters. (Thus, regarding another kind of Church, he wrote in 1936: ‘As for the Archbishop of Canterbury, I consider his behaviour disgraceful’.) But he recorded no dissent. If Turing had wished politely and properly to distance himself from Church’s version of his definition, and re-assert his own, he had the opportunity in his 1938 doctoral thesis, subsequently published as [Turing 1939]. Yet in that paper, when giving his own statement of the Church–Turing thesis, he simply characterized a computable function as one whose values can be found ‘by some purely mechanical process’, saying that this may be interpreted as ‘one which could be carried out by a machine’. This does not have the full force of Church’s words ‘arbitrary machine’, for the words ‘a machine’ could be read as meaning ‘a Turing machine’, but it notably makes no effort whatever to alert the reader to any distinction between ‘machine’ and ‘human rule-follower.’ It is hard to see how Turing could have left his wording in these terms if he had regarded Church’s formulation as a serious and misleading error. Morever, Church also simply repeated his ‘machine’ characterization of computability in a later paper [Church 1940], which does not suggest that Turing had ever expressed an objection to it while they were in contact at Princeton. It appears that Church and Turing (and others, like G¨odel and Newman) used the word ‘machine’ quite freely as a synonym for ‘mechanical process’, without clearly distinguishing the model of a mechanical process given by the human rule-follower. In fact, Church’s review did not offer an absurd distortion or extrapolation of what Turing had done. With some sketch of what was assumed about ‘machines’ and what was meant to be ‘obvious’ about the complete generality lying behind Turing’s formalization, it could have been justifiable. The work of Gandy [1980] showed that under quite reasonable conditions on what is meant by ‘machine’, his Thesis M is actually true. Sieg [2002] has extended and improved upon Gandy’s results. The main point is that their conditions allow for machines
Did Church and Turing Have a Thesis...
247
which are not restricted to making one step at a time, but perform parallel computations. Even so, Gandy and Sieg’s analyses are far from being an exhaustive account of what a physical machine might be. They do not allow for the phenomenon of entangled quantum states, which is already of technological importance in quantum cryptography. For this reason alone, this type of logical analysis lags behind modern physics. Computer science depends upon the implementation of the logical in the physical, and the review of the distinguished computer scientist A.C.-C. Yao [2003] shows the depth and range of physical process now seen as relevant to its future progress. It is worth noting that Yao defines the Church–Turing thesis in terms of what can be computed by ‘any conceivable hardware system’, saying that ‘this may not have been the belief of Church and Turing, but it has become the common interpretation’ of their thesis. Yao regards the thesis not as a dogma but as a claim about physical laws which may or may not be true. Yao’s careful words about what Church and Turing believed are fair: we cannot know quite what they thought, but the evidence points to a standpoint closer in spirit to Yao’s than to Copeland’s. Yao is not quite so careful, however, in his statement of the thesis, for he omits to include a ‘finiteness’ condition such as Church emphasised. Some such condition is obviously essential—an infinitely long register of data must, for instance, be ruled out. Turing’s silence on the question of ‘arbitrary machines’ is rather surprising because he was in many ways an outsider to the rather isolated logicians’ world, having a broad grounding in applied mathematics and an interest in actual engineering. On the specific question of restriction to serial working, it is noteworthy that he had already discussed in [Turing 1936] how human mental ‘scanning’ of many symbols at once could be reduced to a serial process. Thus he could very well have initiated the kind of theory of machines later undertaken by Gandy—the more so since machines with parallel action (‘simultaneous scanning’, he called it) were crucial to Turing’s success with the Enigma in 1939–40. On broader issues too, he was well-qualified to point out that in 1937 formulations such as ‘parts’ of a machine and ‘sufficiently long time’ were already obsolete and demanded much more serious analysis: twentieth-century physics had transformed the classical picture of space, time and matter which Church’s words appealed to. At the age of sixteen he had understood
248
Andrew Hodges
the basis of quantum entanglement and of curved space-time, and there was nothing to prevent him drawing attention to the questions thereby aroused. (Curiously enough, G¨odel later found a solution of Einstein’s equations which exhibits closed timelike lines, a fact which in itself shows that the concept of ‘sufficiently long time’ is unsatisfactory without more refined analysis.) Turing’s background in physics did in fact re-assert itself later on. First, in his individualistic trajectory, came his own engineering of machines at Princeton; then came an extensive wartime experience of electromagnetic and electronic machines which led to his digital computer design in 1945. In 1948, Turing’s report ‘Intelligent Machinery’ gave a brief analysis of ‘machines’ which did take note of a necessary grounding in physical concepts (for instance thermodynamics and the speed of light). In this paper Turing simply summarised computability using the phrases ‘rule of thumb’ and ‘purely mechanical’ as equivalents, without drawing a distinction between the human rule-follower model and the machine model. Indeed, Turing drew these ideas together in a discussion of ‘Man as a Machine’ and the brain as a physical system. In his edition of Turing’s papers, Copeland [2004, p. 480] acknowledges that Turing wrote that a computer could replace ‘any calculating machine’, but explains this by saying that Turing ‘would’ have characterized a calculating machine as doing only what could be done by a human computer. But Turing never actually gave this definition, and indeed Turing [1948] gave his readers the reverse image: he described a program, to be worked out by a human rule-follower, as a ‘paper machine’. However, in this 1948 paper, and thereafter, Turing did refine the concept of ‘machine’. He distinguished ‘active’ machinery from ‘controlling’ machinery, giving ‘bulldozer’ as an example of the former; the latter type, which we would probably now call ‘informationtheoretic’, is the subject of his discussion. (Thus, we are concerned with abstracting what it is that makes a machine ‘mechanical’, not with its physical action.) Turing also distinguished ‘continuous’ from ‘discrete’ machines, and again it is the latter with which we are principally concerned. Turing’s main argument, both in this 1948 paper and in his very famous publication [Turing 1950], was that the action of the brain can be captured by a discrete ‘controlling’ machine. Of course, Turing by now had a more developed theory in which more ‘intelligent’ machine behaviour would be acquired through dy-
Did Church and Turing Have a Thesis...
249
namical interaction with the environment, but it was all still within the arena of the discrete machine and governed by computable operations. Turing [1948] was at pains to point out that the brain is actually continuous, even though there is ‘every reason to suppose’ that a discrete model will suffice to model it. In [Turing 1950] he gave a more explicit argument for this supposition. Thus Turing began to raise questions about connection of the computable and discrete with continuous physics. Since the 1950s many leading figures (e.g. Weyl, Kreisel, Wigner, Feynman, Chaitin) have raised questions about the physical basis of the Church–Turing thesis. Many articles in this volume indicate the range of ideas now studied. One notable contributor to this broader picture is Roger Penrose, who argues that there must be an uncomputable aspect to the physics of quantum measurement [Penrose 1989; 1994]. Interestingly, Turing [1951] showed evidence of contemplating just this possibility. In this radio talk, mainly rehearsing his earlier arguments about modelling the brain by a Turing machine, Turing inserted a new point that the uncertainty in quantum mechanics might make this impossible. This single sentence, which stands in contrast to his 1950 assertions, is the only actual reference by Turing to an aspect of physical law that might be uncomputable. In his last year of life, Turing also started an investigation of the quantum measurement process [Gandy 1954]. Thus if we look at a longer time-scale, we can see Turing as helping to open the whole question of computability and physics as it has slowly developed over the last 50 years, a point developed in [Hodges 2004]. However, Copeland’s contention is specifically concerned with the meaning of what was formulated in 1936. He holds both that the Church–Turing thesis is true, and that physical machines may be capable of computing uncomputable functions. The only way to reconcile these statements is to assert that Church and Turing positively held in 1936 that their concept of effective calculation did not refer to machines. The historical record does not support this contention. As we noted at the outset, there are both historical and scientific questions involved in this issue. One cannot separate them entirely because the views of great founding figures are of special significance and deserve to be studied. The importance of originators is reflected in the way Copeland enlists Turing in the cause of hypothesising machines which might perform uncomputable tasks, writing of Turing’s
250
Andrew Hodges
allegedly ‘forgotten ideas.’ Copeland and Proudfoot specifically assert that Turing’s ‘oracle-machine’ [Turing 1939] is to be regarded as such a machine, suggesting various ways in which it could be physically realised, e.g. as a quantity of electricity to be measured with infinite precision [Copeland and Proudfoot 1999]. Copeland and Proudfoot do not explain how this infinitude could possibly be effected in accordance with any known physical principle, and of course there is no suggestion of any such thing in [Turing 1939]. They nevertheless announce this implemented oracle-machine as a potential technological revolution as great as that of the digital computer, crediting the ‘real Turing’ with this vision [Copeland and Proudfoot 1998]. They see the prevailing ‘myth’ about the Church–Turing thesis as an impediment to realising this ambition. These assertions, scientific and historical, are alike ill-founded.
References Church, A. [1937a], Review of Turing [1936], Journal of Symbolic Logic 2, 42. Church, A. [1937b], Review of Post [1936], Journal of Symbolic Logic 2, 43. Church, A. [1940], “On the Concept of a Random Sequence”, Bull. Amer. Math. Soc. 46, 130–5. Copeland, B.J. and Proudfoot, D. [1998], Enigma Variations, London: Times Literary Supplement, 3 July 1998. Copeland, B.J. and Proudfoot, D. [1999], “Alan Turing’s Forgotten Ideas in Computer Science”, Scientific American 253:4, 98–103. Copeland, B.J. [2000], “Narrow Versus Wide Mechanism: Including a Re-Examination of Turing’s Views on the Mind-Machine Issue”, Journal of Philosophy 96, 5–32. Copeland, B.J. [2002], The Church–Turing Thesis, in Stanford Encyclopedia of Philosophy, (E.N. Zalta ed.), . Copeland, B.J. (ed.), and Turing, A.M. [2004] The Essential Turing Oxford: Oxford University Press. Gandy, R.O. [1954], letter to M.H.A. Newman, in the Turing Archive, King’s College, Cambridge, included in the Mathematical Logic, vol. The Collected Works of A.M. Turing,
Did Church and Turing Have a Thesis...
251
(R.O. Gandy and C.E.M. Yates eds.), Amsterdam: North-Holland, 2001. Gandy, R.O. [1980], “Principles of Mechanisms”, in The Kleene Symposium, (J. Barwise, H.J. Keisler, and K. Kunen eds.), Amsterdam: North-Holland. Gandy, R.O. [1988], “The Confluence of Ideas in 1936”, in: The Universal Turing Machine: a Half-Century Survey, (R. Herken ed.), Berlin: Kammerer & Unverzagt. Hodges, A. [2004], “What would Alan Turing have done after 1954”, in Alan Turing: Life and Legacy of a Great Thinker, (C. Teuscher ed.), Berlin: Springer. Newman, M.H.A. [1955], “Alan Mathison Turing”, Biographical Memoirs of the Fellows of the Royal Society 1, 253–263. Penrose, R. [1989], The Emperor’s New Mind, Oxford: Oxford University Press. Penrose, R. [1994], Shadows of the Mind, Oxford: Oxford University Press. Sieg, W. [1997], “Step by Recursive Step: Church’s Analysis of Effective Calculability”, Bulletin of Symbolic Logic 3, 154–180; also in this volume. Sieg, W. [2002], “Calculations by Man and Machine: Conceptual Analysis”, in Reflections on the Foundations of Mathematics: Essays in Honor of Solomon Feferman, (W. Sieg, R. Sommer, and C. Talcott eds.), Lecture Notes in Logic 15, pp. 390–409. Sieg, W. [2005], “G¨ odel on Computability”, to appear in Philosophia Mathematica. Turing, A.M. [1936], “On Computable Numbers, with an Application to the Entscheidungsproblem”, Proc. Lond. Math. Soc. (2), 42, pp. 230–265. Turing, A.M. [1939], “Systems of Logic Based on Ordinals”, Proc. Lond. Math. Soc. (2), 45, pp. 161–228. Turing, A.M. [1948], “Intelligent Machinery”. Report for the National Physical Laboratory, published in various forms since 1968, e.g. in [Copeland and Turing 2004]. Turing, A.M. [1950], “Computing Machinery and Intelligence”, Mind 49, 433–460.
252
Andrew Hodges
Turing, A.M. [1951], “Can Digital Computers think”, BBC radio talk, available in [Copeland and Turing 2004]. Yao, A.C.-C. [2003], “Classical Physics and the Church–Turing Thesis”, Journal of the ACM 50, 100–105. Added note: Since this article was written, Professor Copeland has advanced the discussion in this volume. He has widened the scope, and I thank the Editors for permitting a comment on one of his central additional arguments. This (page 162) rests on Turing’s 1950 discussion of computers. The ‘unlimited store’ described by Turing does not correspond, as Copeland asserts, to ‘an unlimited number of configurations’ in a Turing machine table of behaviour. This is because Turing’s 1950 explanation does not present a digital computer’s storage as analogous to the tape of a Turing machine. Instead, Turing omits the tape, and presents all the storage as internal to the machine. This makes it difficult to explain the full scope of computability, which requires the concept of unlimited tape. He has to refer to an ‘unlimited store’ instead. So his ‘unlimited store’ corresponds to the unlimited tape of a standard Turing machine, not to its configurations. (Turing says of the unlimited store that ‘only a finite amount can have been used at any one time’, just as with the storage on a Turing machine tape.) What Turing in 1950 calls a theoretical ‘infinitive capacity computer’ is the (universal) Turing machine of 1936. Its ‘states’ include what in the standard Turing machine description are states of the tape, which are indeed generally unbounded. Those discrete state machines with only a finite number of possible ‘states’—the condition that Copeland italicises as vital evidence—correspond to Turing machines which use only a finite tape, or equivalently, to totally finite machines which need no tape at all. Nothing here goes beyond computability. Rather, it emphasises the finite resources that Turing was discussing as necessary for mental behaviour—giving a figure of not much more than 1010 bits of storage.
Leon Horsten∗
Formalizing Church’s Thesis This paper investigates to what extent Church’s Thesis can be expressed and investigated within a formal framework. In particular, we discuss the formalization of Church’s Thesis in intuitionistic arithmetic and formalizations of Church’s Thesis in systems of epistemic arithmetic.
1. The Formal Investigation of Church’s Thesis Church’s Thesis (CT) states that every function on the natural numbers that is effectively computable, is computable by a Turing machine. Effectively computable functions are functions computable by an idealized but human calculator working in a routine or algorithmic fashion, i.e., on the basis of a rule-governed procedure where no ingenuity is required for its execution. Church’s Thesis is true. Most philosophers and mathematicians think that it is not mathematically provable.1 The reason that is usually given is that it is not a purely mathematical proposition. It is not even expressible in the language of mathematics, for its antecedent contains a notion (“algorithmic computability”) that does not belong to the language of mathematics. CT may be amenable to formal investigation provided we find a suitable interpreted formal language in which it can be expressed, even if this language contains notions which are not strictly speaking ∗ L. Horsten, University of Leuven, . Thanks to Mark van Atten for helpful discussions about the subject matter of this paper, and thanks to Jan Heylen for comments on a version of this paper. I am indebted to Anne Troelstra for answering a proof-theoretical question about the intuitionistic reading of Church’s Thesis. 1 But this is controversial. See the discussion between Folina, Mendelson, Black and Shapiro [Mendelson 1990], [Folina 1998], [Black 2000], [Mendelson this volume], [Folina this volume], [Shapiro this volume].
254
Leon Horsten
purely mathematical.2 In a suitable formal context, questions about CT can be investigated with formal rigor. This paper discusses some attempts that have been made in this direction. It is not hard to express CT in a formal language. One could just take the language of arithmetic, add to it a new predicate intended to apply to all algorithms, a relation symbol expressing the relation that holds between an algorithm and the function it computes, and perhaps some symbols standing for operations on algorithms. But that does not help much. For it is notoriously hard to formulate an acceptable criterion of identity for algorithms. In other words, the interpretation of the resulting formalism appears to be vague. Perhaps an illuminating set of basic axioms concerning the notion of algorithm can be formulated in this formal language, but no existing proposal has been met with widespread approval. In order to circumvent this difficulty, we shall in this paper settle for approximations of CT in formal contexts. We shall look at formal languages with a relatively clear and not overly complicated intended interpretation in which CT can be approximatively expressed. The advantage of this approach is that precise answers to questions concerning CT can sometimes be obtained. But the answer will always be an answer about the question concerning the approximation of CT that one is investigating. We shall be concerned with approximations, not with variations. I.e., we shall not be concerned with analogues of CT. We shall be concerned with attempts to come close to CT itself in a formal context. We shall not address the question whether CT can be proved. We shall be mainly concerned with the question whether CT can be formally expressed within a rigorous framework. And as such, it will be considered as a hypothesis (not as an axiom). Rather than being concerned with whether this hypothesis can be proved, we shall to a limited extent address the question whether facts of mathematical and philosophical interest can be derived from it.
2. Church’s Thesis in Intuitionistic Mathematics It has been claimed that CT can be expressed in the language of intuitionistic arithmetic, namely roughly as the following scheme: 2
Such an approach is be favorably looked upon by [Mendelson this volume].
Formalizing Church’s Thesis
∀x∃yA(x, y) → ∃e∀x∃m∃n[T (e, x, m) ∧ U (m, n) ∧ A(x, n)]
255 ICT
Here A ranges over all formulas of the language of first-order intuitionistic arithmetic.3 Note that a (weak) choice principle is presupposed in ICT, for A(x, y) expresses a relation, whereas the Turing machine e by definition computes a function. The fact that ICT at least goes some length in the direction of expressing CT is due to the semantics of the language of intuitionistic arithmetic. ∀x∃yA(x, y) intuitionistically means that there is a method for finding for all x at least one y which can be shown to stand in the relation A to x. And this seems close to asserting that an algorithm exists for computing A. There is disagreement in the intuitionistic literature about whether the method witnessing the truth of ∀x∃yA(x, y) should be accompanied by a proof that this method computes A. Some intuitionists deny this, saying that the method bears on its sleeves the task that it carries out. But in any case it has to be evident from the interpretation of ∀x∃yA(x, y). ICT is false if lawless sequences are allowed in the constructive universe. For these are assumed to be generated by a method (free choice) but already at the outset it seems unlikely that all manners of freely generating sequences of natural numbers in the infinite limit result in a computable function. Even if lawless or choice sequences are not allowed in the constructive universe, ICT looks suspect. For it asserts that there is a uniform method for transforming a method for finding for all x an y which can be shown to stand in the relation A to x into a Turing machine which does the same thing. It is not clear what this uniform method looks like, unless one thinks of methods as almost by definition something like Turing machines. So most intuitionists believe that ICT has been shown to be false. But this is at first sight puzzling. On the one hand we have intuitionists claiming that they have refuted CT. On the other hand in virtually all of the contemporary literature outside intuitionism CT is regarded as true. This puzzle disappears once it is noted that ICT does not really express CT. For one thing, a free choice process is not an algorithm in the sense explained in the first section of this paper. But even if we 3
We are assuming that the reader is familiar with Kleene’s T-predicate and U-function.
256
Leon Horsten
confine ourselves to the “lawlike” intuitionistic universe, ICT does not express CT. Firstly, the implication in ICT is constructive: it expresses a uniform transformation procedure. But the implication in CT is classical: it does not imply the existence of such a transformation procedure. And secondly, the consequent of ICT requires that for some e we have a proof that e computes A. But CT does not require this: it merely requires that some e in fact computes A. But still, if we confine ourselves to the lawlike universe, ICT is a fair approximation of CT. It gives us reliable implications of CT, provided we read them carefully. For instance, in the context of intuitionistic arithmetic ICT implies a violation of a variant of the law of excluded third. This does not mean that the universal version of the classical law of excluded third is false, but only that an effective version of it is incorrect. If CT is correct, then for some properties φ(x), there is no uniform method for finding out for an arbitrary number whether it has φ or not. Intuitionism provides a useful setting for thinking about aspects of CT. Shapiro discussed the phenomenon that CT is often used to prove theorems that can be stated in the language of (classical) Peano Arithmetic (P A), i.e., to prove statements in which the notion of algorithm does not occur [Shapiro 1983, p. 215]: In recursive function theory, there has developed an ‘informal’ method of argument, often called ‘argument by Church’s thesis’ that employs the [...] inference from computability to recursiveness. Typically, the object is to demonstrate the existence of a recursive function which has a given property P . Using this method, a mathematician first gives a procedure of calculation ‘informally’ (that is, outside of any particular formulation of algorithms) and then infers that the function computed by this procedure is recursive because it is computable. The mathematician then establishes that the function has the property P .
The question then arises whether such uses of CT are essential : can every theorem statable in the language of P A, say, that is proved using CT, also be proved without using CT? Generally it is assumed that the answer to this conservativity question is ‘yes’ but in the same breath it is added that since CT is an ‘informal’ principle, this cannot be formally proved.
Formalizing Church’s Thesis
257
In the intuitionistic setting, a precise answer can be given to one way of making this question precise. Let us restrict the discussion to the lawlike universe. Then a proof of a sentence of the form ∀x∃yA(x, y) consists (again modulo a choice principle) in a proof that A(x, y) is algorithmically computable. So in this setting, it seems that ICT can be meaningfully used to prove that certain functions are Turing-computable. Now the question arises whether modulo the double negation translation, intuitionistic arithmetic (HA, for Heyting Arithmetic) plus ICT is conservative over P A. The answer is affirmative. If we let δ be the double negation translation from the language of intuitionistic arithmetic to the language of classical arithmetic, then we have:4 Theorem 1. For every sentence φ of the language of classical arithmetic, if δ(φ) is provable in HA + ICT , then φ is provable in P A. This theorem is a direct consequence of the fact that (1) realizability is conservative over almost-negative formulas and the doublenegation translation translation transforms formulas into almostnegative formulas; (2) realizability makes ICT true. This conservativity phenomenon can be seen as a weak form of evidence for the thesis that CT is conservative over classical mathematics.
3. Church’s Thesis in Epistemic Mathematics The S4 laws of modal logic describe the propositional logic of the notion of reflexive provability. The reflexive notion of absolute provability should be carefully distinguished from the notion of provability in a formal system. As is well-known, the propositional logic of provability in a formal system is captured by the G¨odel-L¨ ob system GL of modal logic. The reflexivity axiom 2A → A is invalid on this interpretation, whereas it is valid on the interpretation of 2 as reflexive provability. The formal language of epistemic arithmetic then contains a sentential operator whose interpretation is the reflexive notion of provability. Aside from that, the language of epistemic arithmetic contains a constant 0 referring to the number 0, the function symbols 4
Thanks to Anne Troelstra for pointing this out to me.
258
Leon Horsten
s (successor) and + and ×, and names for all primitive recursive functions. S4P A, Peano Arithmetic (with the defining equations for the primitive recursive functions) plus the S4 axioms formulated in the language of arithmetic extended with the operator 2, describes the notion of reflexive provability in an arithmetical context. Actually, it seems that reflexive provability should be formalized as a predicate rather than as an operator. For we would like to be able to express things such as: “Some sentences are reflexively provable.” But this would seriously complicate the formalism. For we would have to take care to avoid the Kaplan–Montague paradox concerning absolute knowability [Montague 1963]. However, the effect of quantifying over sentences can also be achieved by adding a truth predicate to the language, governed by suitably restricted Tarskibiconditionals, for example. Anyway, here we keep matters simple and express reflexive provability by a sentential operator. S4P A is known as a system of intensional or epistemic mathematics [Shapiro 1985b], but strictly speaking this is a misnomer. For the notion of reflexive provability is not a purely mathematical notion, although it is of course related to mathematics. The notion of reflexive provability is just as much a philosophical notion as the notion of truth is. It must be admitted that the notion of reflexive provability is less well understood than the notion of truth is: we shall have to return to this later. Via G¨ odel’s modal translation, which closely follows Heyting’s proof semantics for the intuitionistic logical operations, S4P A is related to intuitionistic arithmetic by a faithfulness theorem [Goodman 1984], [Flagg and Friedman 1986]. But S4P A is nevertheless a classical theory. So aside from being able to express intuitionistic statements via the G¨ odel translation, it can express classical propositions as well as propositions which are in part constructive and in part nonconstructive. This opens the possibility of improving on ICT as an approximation of CT by removing the effectiveness from the implication and from its consequent. Several proposals have been made for expressing CT in epistemic arithmetic, but here we shall look at the following: 2∀x∃y2A(x, y) → ∃e∀x∃m∃n(T (e, x, m) ∧ U (m, n) ∧ A(x, n)) ECT
Formalizing Church’s Thesis
259
Here A ranges over all formulas of the language of classical first-order arithmetic plus the reflexive proof operator. As in the case of ICT, here too an epistemic choice principle is presupposed. The implication in ECT is classical. Therefore the first reason why ICT did not quite express CT does not apply here. Secondly, the whole consequent of ECT is nonconstructive. So also the second reason why ICT did not really express CT is not applicable here. The crucial part of ECT is its antecedent: 2∀x∃y2A(x, y). It contains no direct mention of the notion of algorithm. But it is equivalent to expressing that a suitable algorithm exists provided that the following thesis holds: Thesis 1. A proof witnessing the truth of 2∀x∃y2A(x, y) for a given formula A must involve displaying an algorithm for computing A. This thesis is related to the intuitionist claim that there is a close relation between a method for computing a function and a (constructive) proof that the function exists. It is difficult to see how ECT could be proved in epistemic mathematics from more fundamental principles. But in addition, ECT should probably not be adopted as an extra axiom. ECT is “necessarily” a hypothesis. For otherwise, the Necessitation rule can be applied to ECT. This would mean that there is a proof that for any algorithm computing a function, there is a Turing machine which computes that function. And as in the case of ICT, it would seem that this proof would have to involve a uniform transformation procedure for converting algorithms into Turing machines. And short of coming close to identifying algorithms with Turing machines by definition, it is hard to see what this transformation procedure could look like. ECT is consistent with epistemic mathematics [Flagg 1985]. In S4P A, one can prove that functions are effectively computable by proving sentences of the form 2∀x∃y2A(x, y). So again the question can be asked whether ECT allows us to prove purely arithmetical theorems that we could not prove without ECT. When ECT is taken as a hypothesis, the situation is the same as for ICT. When taken as a hypothesis, ECT is arithmetically conservative over P A [Halbach and Horsten 2000]. In other words, in epistemic mathematics, the informal notion of algorithmic computability, when it is governed by
260
Leon Horsten
ECT, does not give us new mathematical theorems. This is again a weak sort of evidence that CT is conservative over arithmetic. What if we do adopt ECT as an axiom? The question whether S4P A + ECT is arithmetically conservative over Peano arithmetic is still open, as far as I know, although it may well be the case that methods developed by Timothy Carlson can be used to show that S4P A + ECT is indeed conservative over P A [Carlson 1999], [Carlson 2000]. So the notion of algorithm is, at least insofar as we now know, unlike the notion of truth. For it is well-known that truth is nonconservative. Other proof-theoretical questions can be thought of that have philosophical significance. It was noted in the beginning of this section that HA is faithfully embedded in first-order epistemic arithmetic via G¨ odel’s modal translation from the language of intuitionistic arithmetic to the language of epistemic arithmetic. If one adds ECT to epistemic arithmetic, ICT (under G¨odel’s translation) does not directly follow from it. So one may wonder whether any new intuitionistic statements become provable from ECT (under G¨odel’s modal translation). This again appears to be question that is still open. One marked weakness of the whole program of epistemic arithmetic is that the notion of reflexive provability is not very well understood at all. Some suspect that the notion is inherently vague. In any case, our theory of it is very restricted at the moment.5 A long time ago, Myhill encouraged researchers to take an axiomatic approach to absolute provability [Myhill 1960, p. 468]. In that respect, Kreisel was pessimistic [Kreisel 1983, pp. 86–87]: Truth and general provability; at least so far, a distinction without much difference [...] nothing is formulated about general provability that does not also hold for truth [...] In fact, it seems open whether anything can be said about general provability in the language considered that does not also hold for truth.
But even in the light of our feeble grasp on the notion of absolute provability, the situation is not that grim. One cannot say that we have at present no (axiomatic) understanding of the notion of the notion of reflexive provability at all. First, the logical principles 5
I discuss this problem in some detail in [Horsten 2005b].
Formalizing Church’s Thesis
261
concerning absolute provability are probably not as strong those for truth. It would seem implausible that the Tarski-biconditionals hold for the notion of absolute (classical) provability. Secondly, ECT is a candidate for precisely the sort of principle that Kreisel was calling for. For ECT might well be true; but if in ECT the concept of provability is replaced by that of (classical) truth, the resulting principle is provably incorrect. In sum, taking the notion of reflexive provability as primitive is not quite on a par with taking the notion of algorithm as primitive. For in contrast with the notion of algorithm, we do appear to have an elegant theory which yields the basic logical properties of reflexive provability. And even if the content of the notion of algorithm cannot be fully reduced to the reflexive notion of provability, if 2∀x∃y2A(x, y) turns out to be true if and only if A is algorithmically computable, then this will have consequences that are interesting in their own right.
4. Intensional Aspects of Church’s Thesis Shapiro believes that CT cannot be directly captured in terms of the absolute provability operator. CT concerns the notion of computability: a function is computable if there is an algorithm that computes it. So computable is objective in the sense that it “does not involve reference to a knowing subject” and extensional [Shapiro 1985b, p. 41]. But closely related to the notion of computability there is a pragmatic notion, which Shapiro calls calculability (or “effectiveness”, in the terminology of [Shapiro 1980]). Calculability is a property of presentations of algorithms: a function presentation F is calculable if there is an algorithm P such that it can be established that F represents P [Shapiro 1985b, p. 43]. Shapiro suggests that Epistemic Arithmetic can be used to shed light on this latter notion: the antecedent of CT expresses the calculability of a function presentation. The claim that the notion of computability does not involve reference to a knowing subject strikes me as untrue. For the notion of computability is explicated in terms of the notion of algorithm. And this latter notion does involve reference to a knowing subject, as is evident from Turing’s analysis [Turing 1936].6 Turing explains 6
Turing’s analysis is carefully described and discussed in [Soare 1996].
262
Leon Horsten
the notion of algorithm in terms of the notion of a computor, which is a human being who calculates following a routing procedure.7 It is true that human calculation (which is involved in CT) is not the same as human provability (which is involved in ECT). But Thesis 1 is supposed to connect the two notions. Computability is indeed extensional: if two function presentations denote the same function, then one is algorithmically computable if the other is. We have no reason to think that the antecedent of ECT is extensional. It is not hard to think of two functional expressions which may denote the same function but for which one satisfies the antecedent of ECT while the other does not. But calculability is at least epistemically more basic than computability. For to determine that a function is computable, we have to determine that there is a function presentation which is calculable, i.e., an algorithm [Shapiro 1980, p. 219]. In fact, calculability seems also to be ontologically more basic than computability. For the notion of computability is obtained by an act of abstraction from the notion of calculability. There is an ambiguity at the heart of the notion of algorithm. On the one hand it is usually said that an algorithm is just a procedure for transforming numbers into other numbers. On the other hand, an algorithm is usually intended to compute a function given under a specific presentation. As such, an algorithm is more than a transformation procedure. Nicolas Goodman put it thus: “The specification of the algorithm is complete only if it includes a statement of the problem it is intended to solve” [Goodman 1987, p. 483]. This seems not quite right: it is rather a transformation procedure plus a proof that this procedure computes a function presented in a specific way. But it is not completely clear that even this is what is meant in the vulgar usage by the term ‘algorithm’. For it might also be thought that it should be evident from the specification of the algorithm itself which problem it is intended to solve. The indeterminacy here mirrors the ambiguity of the meaning of implication in intuitionism, which was discussed in section 2. However this may be, this phenomenon explains the puzzlement that students experience upon first being told that the total function 7
This is emphasized in [Wittgenstein 1980, p. 1096].
Formalizing Church’s Thesis
263
f , defined as f (x) = 1 if Goldbach’s conjecture is true; f (x) = 0 otherwise is algorithmically computable (either the constant 0 function or the constant 1 function computes it). One has the feeling that, e.g., the constant 1 function is not really an algorithm for computing the function presented above unless it is accompanied by a proof that the algorithm computes the function in question. After all, a transformation procedure involves also a function presentation. The transformation procedure is an algorithm for computing a function presented in a specific way only if the graph of the transformation procedure is provably the graph of the function in the intensional sense of the word. What has happened is that the contemporary textbook use of the term ‘algorithm’ is abstracted from the intensional use of the term algorithm just described. Mathematics is, at least officially, extensional: it abstracts from the ways in which mathematical objects are presented. The notion of algorithm is not a fully domesticated notion: it remains “informal”. But it has become a half-domesticated notion: it has been reduced to an extensional notion. But the extensional meaning of the contemporary use of the term ‘algorithm’ can be expressed in terms of the reflexive notion of provability—or at least it can be thus approximated. The idea is that a function is computable if it has a calculable presentation. In other words, we come closer to expressing CT if we say: If a function has a calculable presentation, then it is Turingcomputable.
In this sense, one might attempt some sort of reduction of the concept of computability to the concept of calculability. One advantage of this formulation is that the converse of it, is obviously valid. And this is as it should be, for it is generally held that the converse of CT is obviously valid [Black 2000], [Mendelson this volume]. It is not in the least obvious that the converse of ECT is valid! But the above attempt to paraphrase CT quantifies over function presentations. So we have to move to a second-order epistemic framework. Let S4P A2 be exactly like S4P A, except that the language of the theory contains second-order predicate variables, and
264
Leon Horsten
the background arithmetical theory is second-order (classical) arithmetic. In the framework of S4P A2, the paraphrase of CT in terms of calculability can be expressed: ∀X : [X expresses a function ∧ ∃Y : ∀x∀y(X(x, y) ↔ Y (x, y)) ∧ 2∀x∃y2Y (x, y)]
ECT 2
→ (X expresses a recursive function.) This formalization comes closer to capturing the content of CT. In contrast to ECT, the converse of ECT2 is clearly true. And this is as it should be, for as was noted above, the converse of CT is clearly true. Moreover, in contrast to the antecedent of ECT, the antecedent of ECT2 is clearly extensional. Therefore ECT2 respects Shapiro’s stricture that was discussed in the the beginning of this section. ECT2 shows how computability as an extensional notion is abstracted from what Shapiro calls the intensional notion of calculability.
5. More on the Epistemic Framework Quine has famously insisted that any regimented theory worthy of the name should have a clear interpretation. Let us apply this to the epistemic background theories of the previous two sections: firstand second-order epistemic arithmetic. The epistemic framework is an intensional logic. Quine has always felt that intensional logics do not have a clear interpretation. In section 3 it was conceded that the notion of reflexive provability is not as clear as one would wish it to be. But Quine held that there is a specific problem with intensional logic: quantifying into intensional contexts is genuinely problematic [Quine 1955]. Against this, I maintain that in a formal context, quantification into epistemic contexts is unproblematic and uncomplicated as long as every object in the domain of discourse can only be referred to by transparent terms. And this is the case for the languages of first- and second-order epistemic arithmetic that we have employed. Here is a
Formalizing Church’s Thesis
265
sketch of the intended interpretation of quantification in epistemic arithmetic.8 Let us first consider first-order quantification. Terms of the language of S4P A must be built from individual variables, 0, s, +, × and primitive recursive function symbols. Given an assignment of numbers to the free variables, identity of denotation between two terms s and t is always decidable. So the Kripkean identity and substitution principles s = t → 2(s = t) s = t → (φ(x \ s) ↔ φ(x \ t)) for all formulas φ are valid. Therefore we can read quantified statements in a “G¨odelian”, substitutional manner. A statement ∃x2φ(x) can for all intents and purposes be read as: “there is a natural number such that when its standard Peano-numeral replaces the variable x in φ(x), a provable sentence results.” The reason is that modulo provable equivalence, every natural number is nameable in the language by a unique term. The situation becomes more complicated only when not every object in the domain of discourse has a name (such as in the case of the real numbers) or when we allow different terms that in fact refer to the same number, but not provably so.9 But this was avoided in the epistemic frameworks that we have relied on. The opaqueness was restricted to the predicate expressions and was not allowed to spread to the terms. Substitution of predicates in intensional contexts is governed by the transitivity axiom of S4. So we do not have in S4P A2 the substitution principle ∀X∀Y : ∀x(Xx ↔ Y x) → (Φ(X) ↔ Φ(Y )), but we do have the weaker: ∀X∀Y : 2∀x(Xx ↔ Y x) → (Φ(X) ↔ Φ(Y )). 8 I have spelled out the intended semantics of first- and second-order languages of epistemic arithmetic in more detail in [Horsten 1998, section 4.2.] and in [Horsten 2005a, section 2]. 9 This latter situation may arise, for instance, when we include a description operator in the formal language.
266
Leon Horsten
Still, we have to be clear what an expression of the form ∃X2Φ(X) is supposed to mean. I suggest that we take the intended interpretation to be: “there is a set of natural numbers S and a presentation PS of S such that when the second-order variable X is replaced in Φ by PS , a provable sentence results.” It is important to be as liberal as possible with respect to admissible presentations of sets [Horsten 1998, p. 17, and footnote 30, p. 24]. Even a set itself is allowed to count as its own presentation. This ensures that the quantifier ∀X in ∀XΦ(X) has all sets of natural numbers in its range—even those that for all we know have no humanly conceivable presentation. Only certain kinds of presentations (notably presentations expressible in human languages) can figure in reflexive proofs. For this reason, a sentence such as ∀X2∀y(Xy ↔ Xy) is not valid. For of sets of numbers that have no humanly expressible and usable presentation it is impossible to prove anything. In this way both the absolute generality of second-order quantification and the impossibility of full-fledged de re knowledge of sets of objects can be respected. In sum, the interpretation of quantifying in epistemic contexts is straightforward and unproblematic for the epistemic theories that we have considered.
References Black, R. [2000], “Proving Church’s Thesis”, Philosophia Mathematica 8, 244–258. Carlson, T. [1999] “Ordinal Arithmetic and Sigma1 Elementarity”. Archive for Mathematical Logic 38, 449–460. Carlson, T. [2000], “Knowledge, Machines, and the Consistency of Reinhardt’s Strong Mechanistic Thesis”, Annals of Pure and Applied Logic 105, 51–82. Flagg, R. [1985], “Church’s Thesis is Consistent with Epistemic Arithmetic”, in [Shapiro 1985a, pp. 121–172]. Flagg, R. and Friedman, H. [1986], “Integrating Classical and Intuitionistic Type Theory”, Annals of Pure and Applied Logic 32, 27–51. Folina, J. [1998], “Church’s Thesis: Prelude to a Proof”, Philosophia Mathematica 6, 302–323. Folina, J. [this volume], “Church’s Thesis and the Variety of Mathematical Justifications”.
Formalizing Church’s Thesis
267
Goodman, N. [1984], “Epistemic Arithmetic is a Conservative Extension of Intuitionistic Arithmetic”, Journal of Symbolic Logic 49, 192–203. Goodman, N. [1987], “Intensions, Church’s Thesis and the Formalization of Mathematics”, Notre Dame Journal of Formal Logic 28, 473–489. Halbach, V. and Horsten, L. [2000], “Two Proof-Theoretic Remarks on EA + ECT ”, Mathematical Logic Quarterly 46, 461–466. Horsten, L. [1998], “In Defense of Epistemic Arithmetic”, Synthese 116, 1–25. Horsten, L. [2005a], “Canonical Naming Systems”, Minds and Machines 15, 229–259. Horsten, L. [2005b], “Remarks on the Content and Extension of the Notion of Provability”, Logique et Analyse 189–192, 15–32. Kreisel, G. [1983], “G¨ odel’s Excursions into Intuitionistic Logic”, in P. Weingartner and L. Schmetterer, G¨ odel Remembered. G¨ odel-Symposium in Salzburg, 10–12 July 1983, Bibliopolis, pp. 65–186. Mendelson, E. [1990], “Second Thoughts about Church’s Thesis and Mathematical Proofs”, Journal of Philosophy 87, 201–205. Mendelson, E. [this volume], “On the Impossibility of Proving the ‘Hard-Half’ of Church’s Thesis”. Montague, R. [1963], “Syntactical Treatments of Modality, with Corollaries on Reflexion Principles and Finite Axiomatizability”, Acta Philosophica Fennica 16, 153–167. Myhill, J. [1960], “Some Remarks on the Notion of Proof”, Journal of Philosophy 57, 461–471. Quine, W.V. [1955], “Quantifiers and Propositional Attitudes”, in W.V. Quine [1976], The Ways of Paradox and Other Essays, 3rd edition, Harvard University Press, pp. 100–112. Shapiro, S. [1980], “On the Notion of Effectiveness”, History and Philosophy of Logic 1, 209–230. Shapiro, S. [1983], “Remarks on the Development of Computability”, History and Philosophy of Logic 4, 203–220. Shapiro, S. (ed.) [1985a], Intensional Mathematics, North-Holland. Shapiro, S. [1985b], “Epistemic and Intuitionistic Arithmetic”, in [Shapiro 1985a, pp. 11–46].
268
Leon Horsten
Shapiro, S. [this volume], “Church’s Thesis: Computability, Proof, and Open-Texture”. Soare, R. [1996], “Computability and Recursion”, Bulletin of Symbolic Logic 2, 284–321. Turing, A. [1936], “On Computable Numbers, with an Application to the Entscheidungsproblem”, Proceedings of the London Mathematical Society 42, 230–265. Wang, H. [1974], From Mathematics to Philosophy, Routledge & Kegan Paul. Wittgenstein, L. [1980], Remarks on the Philosophy of Psychology, vol. 1, Blackwell.
Stanisław Krajewski∗
Remarks on Church’s Thesis and G¨ odel’s Theorem 0. Introduction Church’s Thesis says that computable functions are exactly the recursive functions. The notion of computability or effective computability is a general, informal, natural language notion; ‘recursive’ is a strict mathematical concept. Church’s Thesis has been seen as a definition of computability, e.g. by Alonzo Church himself; as a theorem by those who analyzed the concept of computability, like Alan Turing and later Robin Gandy; and as a thesis that requires various forms of verification and supportive evidence. The last option, and terminology, has been widely accepted at least since [Kleene 1952]. The above division into three approaches is, of course, very rough, but most views on the nature of Church’s Thesis seem to be falling under at least one of those approaches. As a matter of fact, the three approaches are interconnected: for example, proving Church’s Thesis as a theorem provides support for it as a thesis, and, given enough evidence for the thesis, it is natural to propose it as the definition of computability. According to Richard Epstein—to whom I am most grateful for comments which have improved the style and also the contents of this paper—Church’s Thesis is a formalization of a preformal notion, analogous to some other attempts “to formalize a pre-formal notion, such as geometry or-yes-even arithmetic (this is what a natural number is), or the real numbers, or [...]” It is silly, he says, to suspect a deep, philosophical truth in the fact that “the two main ∗
S. Krajewski, Institute of Philosophy, Warsaw University, Krakowskie Przedm. 3, 00–047 Warszawa, Poland; .
270
Stanisław Krajewski
attempts to formalize our notion of the real numbers (Dedekind cuts and Cantor approximations) turn out to define the same objects.” Therefore, Epstein concludes, Church’s Thesis “is a model, not a thesis at all. The only question is whether it is a good model.” It seems to me that to see Church’s Thesis as a model is very similar to seeing it as a definition. And by asking whether the model is good we re-introduce all the considerations about Church’s Thesis that were made to justify the definition or, for that matter, support the thesis; and the more careful they are, the more they can be used to demonstrate the theorem, Church’s Thesis as a theorem. This view finds support in the work of Wilfried Sieg (see his [2002a], [2002b], and [200?]) who has illuminated the problem of the similarity and dissimilarity of Church’s Thesis to other cases in which some “thesis” was proposed. It seems that Church’s Thesis provides a rare example of the situation when a thesis was proposed without a serious analysis of the informal notion (here, computability). Sieg (inspired by the work of Gandy and also that of [Kolmogorov and Uspensky 1953]) has argued that analyzing computations one can arrive to axioms which are satisfied by various mathematical constructions equivalent to that of recursive functions. He claims that there is no more need or place for any thesis. Still, the problem of the adequacy of the axioms appears here as it does in other cases; mathematicians must decide whether the axioms capture the properties of the informal notion. In the justification of the axioms various arguments used earlier to argue for the Thesis can be useful. Whatever view we prefer, the fact remains that Church’s Thesis was proposed after discussions with Kurt G¨odel and it is in various ways connected to the famous theorem of G¨odel. The connections are worth reviewing; some of them are well established, but some others are not uncontroversial. It was Stephen Kleene who first called Church’s Thesis a thesis. Several decades later the same scholar indicated a curious connection between Church’s Thesis and G¨odel’s theorem; namely, Church’s Thesis easily implies incompleteness. However, it is argued in Section 1 that this statement is somewhat misleading, as the use of Church’s Thesis is inessential. Next, in Section 2, we recall the well known ways in which the proper use of Church’s Thesis is made in connection with G¨ odel’s theorem. Namely, Church’s Thesis makes possible a general formulation of the theorem, as G¨odel himself emphasized
¨ del’s Theorem Remarks on Church’s Thesis and Go
271
after Turing’s work. Also, the proofs of undecidability are seen as philosophically significant thanks to Church’s Thesis. In Section 3, the alleged demonstration, based on G¨odel’s theorem, of the nonmechanical nature of the human mind is mentioned as also using Church’s Thesis, and a reference is made to an argument that refutes this claim of the G¨ odel-based demonstration of the superiority of mind over machines. This refutation can be presented as a precise theorem if Church’s Thesis is assumed.
1. Kleene’s Argument Kleene in his article [1987], written for the special issue on Church’s Thesis of the Notre Dame Journal of Formal Logic, presented an argument to the effect that Church’s Thesis provides a simple proof of G¨ odel’s incompleteness theorem. Even though only a weak form of the theorem is derived, stating incompleteness, not the strong one giving an example of an undecidable sentence, the argument seems to be quite remarkable. It gives reason to say, as Kleene did, that “if in 1936 mathematicians had been ignorant of G¨odel’s incompleteness theorem, one could have proposed Church’s thesis and let it lead one to G¨ odel’s theorem” ([1987, p. 491]). Kleene’s argument is closely related to a fact that must have become clear to G¨ odel and leaders in the field of mathematical logic quite early, but was systematically studied only by Judson Webb in [1980]. Namely, G¨ odel’s incompleteness gives “protection” to Church’s Thesis. To be more specific, if a standard system of number theory were complete, then we could extend all the functions representable in the system to total recursive functions, and then, the argument goes, the antidiagonal function would not be representable, even though it should—due to Church’s Thesis—since it would clearly be computable. Let us see this argument why the diagonal procedure would be lethal to Church’s Thesis in the presence of completeness, in some more detail. The argument is very general as it applies to any system S that is recursive (its proof procedure, whatever it may be, is recursive) and contains some basic arithmetic; these are the standard assumptions that make all recursive functions representable in S; we assume S consistent and also sufficiently sound, for example ω-consistent. Now, if S were complete we could proceed as follows to obtain the contradictory of Church’s Thesis. First, take the usual recursive enumeration of all (possibly
272
Stanisław Krajewski
partial) recursive functions of one variable: f1 , ...fn , .... By Kleene’s normal form theorem, fi (x) = U (µyτ (i, x, y)). (Here T is the appropriate universal relation, and let ‘τ ’ be the formula representing T in S.) Then we make all fi ’s total by putting the value 0 for each input on which the function is not defined. The extended total functions, g1 , ...gn , ..., are still recursive because by completeness either S proves ∃yτ (i,x,y) or S proves ∀y¬τ (x,y,z); both cannot be proved, by consistency, and whichever is must be true, by ω-consistency. In other words, given numbers i and x we search through the proofs in S and sooner or later get one of those formulas proved; in the first case fi (x) is defined, in the second 0 is the value. Now by diagonalizing we define d(x) = gx (x) + 1. The antidiagonal function d is obviously computable but must be non-recursive as it doesn’t appear on the list of all recursive functions, which is a contradiction with Church’s Thesis. This means we have demonstrated that if S is ω-consistent and complete, then ¬CT . And this is indeed equivalent to the weak form of G¨odel’s theorem, on the additional assumption of Church’s Thesis: If Church’s Thesis, then if S is ω-consistent then S is incomplete. Assuming the truth of Church’s Thesis, do we then get a seemingly new proof of G¨ odel’s theorem? Church’s Thesis is utilized in Kleene’s argument only once. It is used to get the recursiveness of function d. The function is computable, in the intuitive sense of the term, because, as explained above, given a number x we can search through the proofs in S and sooner or later get the proof of either ∃yτ (x,x,y) or of ∀y¬τ (x,x,y); in the first case fx (x) is defined, and d(x) equals U (µyT (x, x, y)) + 1; in the second case d(x) = 1. We have described an algorithm, so by Church’s Thesis we conclude that d is recursive (which gives the desired contradiction). But wait, do we need Church’s Thesis at all? No. Kleene himself observed in a footnote [1987, p. 495] that under the assumption of consistency and ω-consistency the functions gi (x) are recursive, and in fact gi (x) is recursive as a function of two variables i and x. That is all we need for Kleene’s argument. Church’s Thesis is not used. This case is just one example of many similar situations. Rather than present the proof that a function is recursive one sketches the algorithm or an intuitive description of the computation. This is often done by Hartley Rogers in his classic and highly influential book [1967]. This form of argument can be called a “proof by Church’s
¨ del’s Theorem Remarks on Church’s Thesis and Go
273
Thesis”, as is done for example in [Asch and Knight 2000, p. 5]. Yet the name is misleading. The proof of recursiveness is omitted but it can be reconstructed (at least, in principle). We are certain that this can be done due to the experience we got in the early years of the subject, and in classrooms since, when many specific functions have been shown to be recursive, and general techniques have been developed. In fact, this evidence is a major reason for believing Church’s Thesis. Thus “the proof by Church’s Thesis” is rather a proof based on the evidence for Church’s Thesis: one can be confident to be able to fill the gap because of the experience with closing such gaps, or, to say the same in a different way, with unsuccessful attempts to refute Church’s Thesis. The use of Church’s Thesis is convenient but is not really necessary. It is inessential when the experts know that with enough effort one could reconstruct the appropriate formal definition. If, as I have argued, Church’s Thesis is expendable in Kleene’s argument then what we get is a proof, without additional assumptions, of the weaker form of G¨ odel’s incompleteness theorem. Kleene wrote, “we thus get a (new?) proof of the absurdity of the completeness of F if F is ω-consistent.” [1987, p. 495, note 4]. The way he put it shows that he seems to have been tempted to say that it was a new proof and, at the same time, he was unsure, a rather strange situation, given that Kleene was one of the founders and preeminent leaders in the field. How can we explain that? The main reason is that probably nobody had offered that proof before but Kleene could not be sure. The proof was not needed because the general fact of incompleteness was already established. Another reason Kleene was uncertain might have resulted from the less than clear awareness that his use of Church’s Thesis was inessential.
2. Why We need Church’s Thesis The inessential use of Church’s Thesis discussed above is not something exceptional. Leaving gaps in a demonstration is a standard behavior in mathematics. Such shortcuts are necessary to avoid repetition of tedious details. In mathematical logic we deal similarly with formalization. As soon as we gather enough experience formalizing various arguments in first order logic, we can be confident when a given informal argument—in particular, a mathematical argument—can be translated into first order logic. Then, relying on
274
Stanisław Krajewski
the completeness theorem, we can be sure that there exists a formal proof in the first order predicate calculus which corresponds to the argument. One never presents actual details but the matter is incontestable for experts. Perhaps the role of Church’s Thesis in those “proofs by Church’s Thesis” is analogous to the role of the completeness theorem in proofs within formal predicate calculus. The parallel runs as follows: informal algorithms are like informal mathematical proofs; translations of algorithms to the language of functions from numbers to numbers is like translating informal mathematical arguments into the language of the predicate logic; both translations must be faithful, that is, no essential aspect can be lost. (Of course, this requirement is not formalizable.) Then Church’s Thesis assures that the number function is recursive, and the completeness theorem assures the existence of a formal proof in the logical calculus. The two situations differ in an important respect: completeness for first order logic has been demonstrated, Church’s Thesis remains a hypothesis. To those for whom it is a definition or a model the problem remains how to justify that it is good enough, and in the absence of a conclusive proof it functions as a hypothesis. However to some, as has been mentioned, Church’s Thesis is more like a theorem. It would be interesting to see whether for them the demonstrations—like that of Turing who considered idealized human ‘computors’ (term introduced by [Gandy 1988, p. 75]) and that of [Gandy 1980] who considered discrete, deterministic mechanical devices—establish the general possibility of the “proof by Church’s Thesis” and thereby justify the phrase. In contrast to the considerations of specific functions, global statements about the recursiveness of all procedures or functions of some kind constitute a legitimate and irreducible way of using Church’s Thesis. In the same article [1987], Kleene refers to his papers, beginning with [1936], as presenting the work that made use of Church’s Thesis, immediately after it had been proclaimed, in connection with G¨odel’s results. What he means is, however, completely different from the inessential use of the thesis in Kleene’s argument analyzed above. The other use of Church’s Thesis was made in order to generalize G¨odel’s theorem from specific systems like PM or PA to arbitrary formal systems, or to any system with an arbitrary formal notion of ‘proof’ (and including some basic arithmetic). We want to be able to extend G¨ odel’s results to all notions of proof or derivability
¨ del’s Theorem Remarks on Church’s Thesis and Go
275
based on form of expressions only, or, to use Kleene’s words, “meeting Hilbert’s demands for effectiveness” [Kleene 1987, p. 491]. Generality was the intention of Hilbert, in whose school problems concerning decidability were posed. The point was consciously stated by Emil Post and Alan Turing who also formulated versions of the thesis (their formulations were intensionally different but turned out to be extensionally equivalent). The purely formal proofs can be also called mechanical, and here the ‘mechanical’ is like ‘computable’ a general, preformal, natural language notion. When we identify, as Church’s Thesis proposes, the purely formal with the recursive, then we can be sure that arbitrary formal systems are just the recursively presented ones, and for those we have mathematical methods. A second use of Church’s Thesis is to get negative results. If some problem can be faithfully presented as a non-recursive function then on the basis of Church’s Thesis we can claim that the problem admits no computable solution whatsoever, or that no algorithm, in an informal sense, exists for the solution of the problem. Mathematical solution receives a broader importance because it is seen to refer not only to the strict mathematical concept but also to the seemingly wider informal notion. More generally, Church’s Thesis is “a bridge between the mathematics and the philosophical problems that generated the mathematics.” [Epstein and Carnielli 1989, p. 229]. The above two ways of using Church’s Thesis, for generalizing and for metamathematical negative results, are very well known. The distinction between the dispensable and essential uses of Church’s Thesis is well described in Odifreddi’s book [1989]. According to him, the avoidable use of Church’s Thesis, the shortcuts mentioned in Section 1 above, is made in Recursion Theory, and the essential use of Church’s Thesis is made in metamathematics by allowing for proofs of absolute unsolvability: “if we prove that a function is not recursive then, by the thesis, it is not computable by any effective means.” [p. 103]. The distinction is also stressed by Epstein and Carnielli who criticize the “proof by Church’s Thesis” as a confusion, “giving a fancy name to a routine piece of mathematics while at the same time denigrating the actual mathematics.” [1989, p. 229]. There exist other philosophical uses of Church’s Thesis, in connection with G¨ odel’s theorem, that take advantage of the two “essential” methods, the generalizing and the getting of negative results.
276
Stanisław Krajewski
An interesting example is provided by the alleged refutation of mechanism.
3. G¨ odel’s Theorem and Mechanism The anti-mechanist argument, known as Lucas’ argument after [Lucas 1961], and presented in a more sophisticated version by Penrose [1989] and [1994], uses G¨odel’s theorem to disprove the thesis that the human mind is mechanical or equivalent to a machine. The idea is very simple: if the mind were equivalent to a machine, that machine would produce mathematical truths, but—assuming it would not assert a contradiction—it could not produce the G¨ odel sentence constructed for the totality of those truths. On the other hand, we can prove that the G¨ odel sentence is true. Thus, it would seem, we are better than any machine. Now, rather than enter an analysis of the pitfalls of the argument, which has been done by various logicians, and is done in detail in [Krajewski 2003] (in Polish; an English version is in preparation), let us notice that to proceed we should be able to define what is a machine. For example, we would not accept a device with a little man hidden inside. We would accept computers, also their hitherto unknown versions. What then is a machine? It is natural, and perhaps unavoidable, to refer to Church’s Thesis. All attempts to define an abstract machine give a concept equivalent to recursive functions and Turing machines. Consequently, information processing machines, whatever they are, give a product that can be listed by a recursive function. This description of the initial conditions of Lucas’ argument is standard; for example, David Lewis wrote that “to be a machine is [...] to be something whose output, for any fixed input, is recursively enumerable” [Lewis 1979, p. 375]. For such machines G¨odel’s sentence can be constructed. Any Lucas style argument must respond to every machine that is equivalent to a specifiable Turing machine, or rather to, at least, every consistent machine; furthermore, we require the reaction to consist in the presentation of an arithmetical sentence not “provable” by the machine, and we assume that this reaction is effectively determined (otherwise, we would, circularly, assume non-mechanical abilities of the mind). Now, we can refer to Church’s Thesis in order to obtain a partial recursive function F defined for at least consistent machines (i.e. machines whose arithmetical output is consistent) and
¨ del’s Theorem Remarks on Church’s Thesis and Go
277
such that for an arbitrary number n, if the n-th machine is consistent and F (n) is defined, F (n) is an arithmetical sentence not provable by the arithmetical output of the n-th machine. The Theorem on Inconsistency, from my [2003], extending earlier results, states that under those assumptions the set of values of F is inconsistent. (Note that we assume neither that F (n) is produced using G¨odel’s technique nor that F (n) is true.) A variant related to Penrose’s work, the Theorem on Unsoundness, also in [Krajewski 2003], states that if F is defined for (at least) sound machines (ones that prove only true sentences) then the set of values of F is unsound. Loosely put, the former result shows that Lucas is inconsistent, and the latter that Penrose is unsound. The above theorems can be seen as an extension of G¨odel’s well known (since 1972) statement that it is not excluded by his results that “there may exist (and even be empirically discoverable) a theorem-proving machine which in fact is equivalent to mathematical intuition, but cannot be proved to be so, nor even be proved to yield only correct theorems of finitary number theory.” (After [Wang 1996, pp. 184–185]). Again, we can state the above results as referring not just to mathematical constructions but also to passionate anti-mechanists (without, to make it clear, expressing any conviction regarding the truth of mechanism) because, due to Church’s Thesis, the theorems apply to each effectively determined Lucas-style or Penrose-style procedure. And the procedure must be effectively determined because otherwise the mind is allowed to perform noneffective steps, which makes it automatically non-mechanical, so the claim is simply assumed and the whole business with G¨odel’s theorem is superfluous. And we are allowed to use Church’s Thesis here since Lucas, Penrose and others presenting that argument refer to G¨odel’s theorem, so they must refer only to the machines which are equivalent to Turing machines and recursive functions. And such machines are not too specialized, but are of general, philosophical interest precisely because of Church’s Thesis. The above theorems constitute strict versions of a more general principle, or “the basic dilemma confronting anti-mechanism: just when the constructions used in its arguments become effective enough to be sure of” then—thanks to the Church’s Thesis, in the version stating that humanly effective procedure is recursive—“a machine can simulate them.” [Webb 1980, p. 232]. This situation
278
Stanisław Krajewski
was realized by Post already about 1924 (see his paper published in [Davis 1965]), before G¨ odel’s work. His “Axiom of Reducibility for Finite Operations” [Davis 1965, p. 424] states that if a construction “is made completely conscious [...] it ought to be constructable purely mechanically”. This principle can be seen as an early version of Church’s Thesis in the form, “what is human-computable is computorable”, where the term ‘computorable’ is used, as in [Soare 1999] (cf. also [Sieg 1994]), to maintain the distinction between (idealized) humans who compute, or “computors”, and computing machines, or “computers”. (Cf., however, [Hodges 2002] for the argument that, historically speaking, the distinction is misleading because for Turing and Church the Thesis was about both humans and machines. Also, Sieg’s analysis mentioned above, in Introduction, results in the axioms that are good for both computers and computors.) An indirect, and perhaps inessential, connection to G¨odel’s results can also be perceived in a well known remark by Georg Kreisel, who indicated the possibility of a hypothetical “systematic error” that makes the equivalence of various strict characterizations of computability less than decisive. He compared the issue to “the overwhelming empirical support” of the thesis that if an arithmetic identity is provable at all, it is provable in classical first-order arithmetic, while, due to G¨ odel and later developments, we know that some diophantine equations have no solutions but the proof of it is non-elementary. (See [Kreisel 1965, p. 144]). It is unclear to me whether there is another example, not based on G¨odel’s theorem, of the mistaken but similarly “overwhelming empirical support” for some (meta)mathematical thesis.
References Asch, C.J. and Knight, J.F. [2000], Computable Structures and the Hyperarithmentical Hierarchy, Amsterdam: Elsevier. Davis, M. (ed.) [1965], The Undecidable, New York: Raven Press. Epstein, R.L. and Carnielli, W.A. [1989], Computability. Computable Functions, Logic, and the Foundations of Mathematics, Pacific Grove: Wadsworth & Brooks/Cole, reprinted with corrections and a timeline Computability and Undecidablity by Wadsworth, 2000.
¨ del’s Theorem Remarks on Church’s Thesis and Go
279
Gandy, R. [1980], “Church’s Thesis and the Principles for Mechanisms”, in The Kleene Symposium, (Barwise, Keisler, and Kunen eds.), Amsterdam: North Holland, pp. 123–148. Gandy, R. [1988], “The Confluence of Ideas in 1936”, in The Universal Turing Machine—a Half-century Survey, (R. Herken ed.), Oxford University Press, 1988, pp. 55–111. Hodges, A. [2002], “Alan Turing”, The Stanford Encyclopedia of Philosophy, Summer 2002 Edition, (E.N. Zalta ed.), ; see also an introduction by Hodges, . Kleene, S.C. [1936], “General Recursive Functions of Natural Numbers”, Mathematische Annalen 112, 727–742. Kleene, S.C. [1952], Introduction to Metamathematics, Amsterdam: North Holland. Kleene, S.C. [1987], “Reflections on Church’s Thesis”, Notre Dame Journal of Formal Logic 28, No 4, 490–498. Kolmogorov, A. and Uspensky, V.A. [1953], “On the Definition of an Algorithm”, (in Russian), Uspekhi Mat. Nauk, VIII, 125–176; (English translation: Amer. Math. Soc., Translations XXIX, 1963, 217–245). Kreisel, G. [1965], “Mathematical logic”, Lectures on Modern Mathematics, III, (T.L. Saaty ed.), New York: Wiley, pp. 95–195. Krajewski, S. [2003], Twierdzenie G¨ odla i jego interpretacje filozoficzne: od mechanicyzmu do postmodernizmu [=G¨odel’s Theorem and its Philosophical Interpretations: from Mechanism to Post-modernism], Warsaw: IFiS PAN. Lewis, D. [1979], “Lucas Against Mechanism II”, Canadian Journal of Philosophy, IX, 3, 373–6. Lucas, J.R. [1961], “Minds, Machines, and G¨odel”, Philosophy 36, 112–127. Odifreddi, P.G. [1989], Classical Recursion Theory, Amsterdam: North Holland. Penrose, R. [1989], Emperor’s new Mind, Oxford University Press. Penrose, R. [1994], Shadows of the Mind, Oxford University Press.
280
Stanisław Krajewski
Rogers, H. [1967], Theory of Recursive Functions and Effective Computability, New York: McGraw-Hill. Sieg, W. [1994], “Mechanical Procedures and Mathematical Experience”, in Mathematics and Mind, (A. George ed.), Oxford University Press, pp. 71–117. Sieg, W. [2002a], “Calculations by Man & Machine: Conceptual Analysis”, Reflections on the Foundations of Mathematics, (Sieg, Sommer, and Talcott eds.), pp. 396–415, “Calculations by Man and Machine: Conceptual Analysis”, Lecture Notes in Logic 15, 390–409. Sieg, W. [2002b], “Calculations by Man and Machine: Mathematical Presentation”, In the Scope of Logic, Methodology and Philosophy of Science, Proceedings of the 11th International Congress of Logic, Methodology and Philosophy of Science, (P. G¨ ardenfors, J. Woleński, and K. Kijania–Placek eds.), Kluwer Academic Publishers, pp. 247–262. Sieg, W. [200?], “Church Without Dogma: Axioms for Computability”, to appear. Soare, R.I. [1999], “The History and Concept of Computability”, Chapter 1 of Handbook of Computability Theory, (E.R. Griffor ed.), Amsterdam: Elsevier. Wang, H. [1996], A Logical Journey. From G¨ odel to Philosophy, Cambridge: MIT Press. Webb, J.C. [1980], Mechanism, Mentalism, and Metamathematics, Reidel.
Charles McCarty∗
Thesis and Variations Discussion of Church’s Thesis has suffered for lack of a precise general framework within which it could be conducted. [Montague 1962, p. 121]
The word “computability” covers a lot of territory. In the midst of that territory lies a range of definability properties of mathematical functions, one or another way in which a relational dependency, generally infinitary, is captured by finitary, algorithmic or otherwise limited means. Often, but not always, abstract or concrete machines do that capturing: suitably articulated objects that can reproduce, point-by-point, the graphs of functions in accord with pre-established recipes. Among abstract computing machines, those receiving the most publicity are near kin and lineal descendents of the imaginary devices first described in Alan Turing’s On computable numbers [1936–37]. In truth, all manner of real and surreal gadgets can and do serve as computers and, thus, give life to correlative notions of computability: register machines, semi-Thue systems, terms in the lambda calculus, desktop computers, idealized abaci, Brouwerian creative subjects, not to mention actual human beings with sharpened pencils and reams of paper. The machines are one and all articulated: for example, we are required to nominate features or addenda of them to serve as input registers, for representing arguments on which the function computed may or may not be defined, and others as output registers, displaying the values of the function computed. They may also employ memory states, either internal and permanent or of the temporary, workspace variety. Minutiae aside, whether abstract, concrete, human or nonhuman, the computer of partial or total natural number function f must somehow be able to ∗ C. McCarty, The Logic Program, 026 Sycamore Hall, Indiana University, Bloomington, IN, USA 47405.
282
Charles McCarty
supply the appropriate output f (n) for each natural number input n on which f is defined. In the following, functions on the natural numbers and digital computers receive undivided attention. We recognize that analog devices still exert theoretical fascination, and that logicians have long been exploring computable functions over the real numbers and the infinite ordinals, to mention two ready examples. Our approach may be abstract, but our object is concrete: to chart in familiar modal-logical terms the computability properties of everyday computing devices like pocket calculators. Our findings rest on only one substantial assumption about them, that the successor function on the natural numbers is computed, in the manner just described, by at least one such quotidian device.
Computability: A Feast of Modality A general study of computability ought to be a festival of modality. A mathematical function f is computable when it is definable in a canonical fashion, when a kind of articulable object can compute it, a dingus trading in potential inputs and outputs that are interpretable as arguments and values of f . For each acceptable argument n, a computing machine must be able to take n as input, process it, and produce f (n) as output. Sad to say, these modalities often cause headaches and dizziness in philosophers whose thinking is locked within narrow physical confines. Conceived wholly as a physical object, a pocket calculator might well (but, perhaps, had better not—Vide Variation Two infra) be thought to have and engage with a collection of states, computational or otherwise, that is strictly finite in size, as are the sizes of its input and output registers. Therefore, (some conclude that) as a matter of hard, cold physical fact, there is a fixed k ∈ N such that the calculator cannot accept as inputs and add together two natural numbers of value greater than k, although, for pairs of natural numbers less than k, it can represent them beautifully and add them correctly. A dizzy philosopher may then press the questions, “What does it mean to assert of a pocket calculator that it CAN add? Isn’t it mistaken to insist that the calculator, hemmed in by finiteness on every side, computes the infinite addition function and, hence, traverses in computation an infinite extent? This is something it literally cannot do! How can a number n be a potential input to it when the row of digits in n’s smallest
Thesis and Variations
283
suitable representation is longer than the diameter of the physical universe measured in millimeters?” It is natural to respond that the calculator in my desk drawer does indeed add numbers and does compute the natural number addition function. Recognition of this ability prompts one to advance certain conditional sentences in the subjunctive mood, for example, “For any natural numbers m and n, were the calculator to have input and output facilities sufficiently capacious, it would be able to accept as inputs suitable representatives for those numbers, process them and output a numerical representative for their correct sum, m + n.” Suchlike sentences in the subjunctive are interpretable as making implicit reference to a multitude of counterfactual circumstances, in each of which the calculator gets a finite amount of extra inputoutput room and workspace memory, so that the concrete device is ready to carry out finitely many more of the requisite additions. The collection of all those circumstances may be deemed infinite: for any given circumstance, there is another, affording the device yet more incidental capacity. There is no all-embracing circumstance, no maximum circumstance, no one circumstance in which the calculator inputs all pairs hm, ni, adds m to n, and responds with their sum. Although difficult or impossible to specify linguistically, the multitude of circumstances pertinent to matters of computational capacity is relatively specific. Only more philosophic dizziness is the upshot of confusing these circumstances with others, perhaps those married to physical possibility. For instance, the philosopher is mistaken who objects that these sentences in the subjunctive are indefensible, since we in fact remain quite ignorant of the counterfactual circumstances on which we think thereby to report. It is to betray a confusion to ask, “What do you really know about what this physical object, this hand calculator, would do if you enlarged its input-output registers and working memory? Mightn’t its circuits overload and the whole thing start to melt? Couldn’t it seize up and all computing activity grind to a halt?” The sorts of counterfactual situations the philosopher is describing are irrelevant, or largely so, to the question of the computational capacity, strictly speaking, of the hand calculator. In truth, we are justified in advancing the kinds of counterfactual claims a knowledge of the calculator qua computer licenses, even though, as our interlocutor would have it, we possess much dimmer awareness of the physical properties the same gad-
284
Charles McCarty
get would manifest under similar but not identical circumstances, physically construed. The objector has conflated two distinct properties of the one calculator, merging separate ranges of counterfactual situations and different dimensions along which to explore the capacities of a single, real device. To speak of physical circumstances under which the circuitry in the calculator would overload and the device begin to melt is to conceive of it as a physical object subject to well-known physical vicissitudes, say, electron flow through its components. For that, the precise sizes and cross-sectional areas of the calculator’s components matter a good deal, as do its overall mass and temperature. By contrast, to speak of its computational capacity strictly is to raise for discussion situations in which temperature, mass and cross-sectional diameters have a reduced, even nugacious bearing. Those physical parameters matter computationally as little as the circumstances under which I accidently crush the gadget or those in which my schnauzer drools on its keypad. Headaches and dizziness aside, we do know what sorts of questions would be relevant strictly to the issue of computational capacity. One can inquire after details of the algorithm implemented by the calculator without asking for the intentions of the device’s designer. One can sensibly ask whether the algorithm itself would permit input summands arbitrarily large. One can ask if the algorithm represents numbers in base ten or some alternative notation and how it manages when a “carry” is in view. Should we discover, upon investigation, that some natural number is essential to the workings of the algorithm implemented and bounds permanently the sizes of inputs, we may be called to revise, at least by qualification, any unqualified assertion that the device adds. One commits a different, though analogous, error, when thinking that a device with input and output facilities for natural numbers can equally be said to compute any old function. For the calculator in my desk, and for any triple hm, n, pi of natural numbers, there is some possible world or other in which the calculator, perhaps supplied with extra working memory and input-output ability, would take represented m and n as inputs and output a standard representative of p. By reasoning in this fashion, one can conclude that all functions are computable by any input-output device, anything to which inputs and outputs are assigned—and that is any-
Thesis and Variations
285
thing whatsoever. However, when we treat a concrete device as a computer of one or another function, there is a particular, delimited range of counterfactual circumstances and behaviors of the device in those circumstances that are relevant, whether or not we can circumscribe that range in a non-circular fashion. The circumstance just described, in which the device outputs a “random” value p when m and n are input, does not lie in that range. Consider the specifically athletic capacities of a player for sport. Imagine that, as a new season opens, the manager is interrogating an assistant on the prospects for the team’s star player. The manager asks, “How many home runs is Cal able to hit this season? Can he break his record of last year?” Normally, it would be perfectly reasonable for the assistant to reply, “Yes, he’s in better shape than he was at this stage last year and his swing has improved,” or “I don’t think so. As he ages, he keeps losing strength in his arms.” It would be less than satisfactory for the assistant to respond, “Like any of us, Cal might step off the curb and get run over by a bus,” or “Perhaps the stadium will catch fire during a game and Cal suffer severe burns.” The manager’s questions about his star’s athletic capacities have to be kept at noetic arm’s length from certain questions of what Cal is in fact going to do or is in fact likely to do over the season. To the manger’s questions, such answers from the assistant as “The latest contract negotiations with the players have broken down. I’m not sure that we will have a season this year” may be admitted, but do not, at least in the first instance, touch on Cal’s batting ability. Consider the state of mathematical affairs conveyed when we say, of a particular, real beachball, that it is spherical. Among other things, it implies that the ball’s volume is proportional to the cube of its radius. So, subjunctively speaking, doubling the ball’s radius would increase the amount of air enclosed by a factor of eight. The physical fact that the ball’s rubber skin would not allow that deformation without tearing is largely beside the point when addressing purely mathematical relations between its radius and volume. Were there convenient, reliable ways of measuring radii and correlative volumes of flexible spheres, beachballs could be worked like three-d slide rules to compute the function λn.n3 .
286
Charles McCarty
Logical Machines and Physical Devices The distinction between physical and computational conceptions of one and the same device is reminiscent of a distinction that Ludwig Wittgenstein drew in Philosophical Investigations [Wittgenstein 1953, remarks 193 through 195], and also in the lectures recorded as [Wittgenstein 1976, pages 194 through 197]. In the latter, his example is not a beachball, but a scale model of a movable piston connected by a rod to a wheel so that the back-and-forth motion of the piston turns the wheel. The model is made of wood, with joining pins of smooth metal. Wittgenstein points out that, by manipulating the model, simple kinematical questions, for instance, “How far would the piston move were it to push the wheel through one quarter turn?” are answered. Confident predictions are also made from models of this sort, with confidence untroubled by the facts that real-life, physical pistons are subject to friction and that actual rods joining pistons to actual wheels can break, loosen, and expand in length. In his lectures, Wittgenstein calls the model, when employed in the rigid, frictionless fashion to answer or predict, “logical machinery”. So conceived, the wooden model becomes an inference scheme by which we draw conclusions about the behaviors of actual pistons, rods, wheels. One must distinguish as carefully as possible between the physical machine, on the one hand, and the logical machine or, as Wittgenstein wrote at Philosophical Investigations [Wittgenstein 1953, section 193], “the machine as symbol of its way of working” on the other. The “machine as symbol” is not a peculiar quasiphysical condition of the machine or an ideal succubus clinging to an actual machine, but a property of one and the same thing revealed when the machine is treated as giving rise to rules of inference and, more specifically in the case of the model piston-rod-and-wheel, as generating rules of a rational kinematics in which wheel, rod, piston and their interconnections are absolutely rigid in length and diameter. Naturally, the piston of the logical machine is conceived as possessing a degree of freedom enabling it to travel back and forth along its enclosing cylinder, but no freedom to move in the vertical direction. Analogously, when one takes a view of the calculator as symbol or logical machine and asks about its computing capacity, its algorithm is conceived as rigid and its workspace memory as free to get larger or smaller without causing the device to break down or seize up.
Thesis and Variations
287
Unless things go bad in one or another exemplary fashion—the baby tosses the calculator into the bathtub or it is passed repeatedly through a powerful magnetic field—it holds the status of logical machine, and its additive verdicts, on given summands, become rules. Its outputs are read out as arithmetic gospel. At tax-form time, weary taxpayers deploy hand calculators to make final checks on mental or pen-and-paper additions. The calculator results become definitive, exposing mistakes made on paper, much as an engineer might use the piston-and-wheel model to expose the failures in an actual piston-and-wheel setup. And rarely, barring bathtub accidents or magnetic fields, the other way around. As noted above, the concept of the model as logical machine enforces a distinction between features of the device that can legally vary, e.g., the angular position of the wheel, and those that cannot, e.g., the length of the rod. By the same token, the concept of calculator as logical computing machine requires a distinction between computational features of the machine that can vary, e.g., size of input, and those that cannot, e.g., such computational states of the algorithm as “carry 3”. In effect, the distinction is one between extrinsic states, those that enable ever larger inputs and outputs to be handled, and intrinsic states, those fixed with the algorithm. No one is pretending that this distinction affords a decision procedure such that, given a computing device and any of its states, we can tell infallibly and a priori from any description of the state whether it is intrinsic or extrinsic. In distinguishing between physical and logical machines, Wittgenstein did not wish to prevent anyone from treating concrete devices as logical machines. At section 195 of [Wittgenstein 1953], he wrote, “And it is quite true: the movement of the machine as symbol is predetermined in another way from that in which the movement as any given real machine is predetermined.” Chief among present goals are exploring the former kind of predetermination and assessing its impact on Church’s Thesis. There are four parts to come: one theme and three variations. The first proffers a regimentation of the idea of concrete computing device as logical machine on which both modal and nonmodal versions of Church’s Thesis get validated. On the basis of that regimentation, we attempt to explain why the extensional form of Church’s Thesis is so familiar although the modal version may be more natural. The variations are clear movements
288
Charles McCarty
away from the purely logical conception, three alternative ideas of concrete device, the last of which represents one way to think of computing machines as physical objects. Modal Church’s Thesis fails miserably on all of the variations. As important as any theorem or proof here is the survey of a mathematical workspace, a conscientiously modal workspace, in which to address issues surrounding computability and Church’s Thesis for actual, concrete computers. Evidence is thereby gathered on behalf of a (meta)thesis: when it comes to concrete computing devices, foundational considerations of computability are best assayed once expressed in a suitable extension of a modal language for arithmetic, as in [Shapiro 1985].
Concrete Machines and Mathematical Functions Nowadays, logicians and programming-language semanticists have gained or inherited plenty of experience in analyzing algorithms run by real computing devices. They wield denotational semantics to attach, in rigorous fashion, abstract functions computed to algorithms doing the computing. Knowing those denotational attachments, they thereby know how to prove, usually via one or more inductive arguments, that the function attached is indeed the function computed. All this is as tight and tidy a mathematical business as one could want. For extra help analyzing and proving highly complicated algorithms, the experts have devised computer aids that prove their theorems automatically. Casting the formal net wide, one conjectures that this process could itself be formalized uniformly in a finite definitional extension of ZF set theory. (This cast is wide indeed, since a fragment of formal arithmetic would surely suffice. Yet, economy of means is not the point here.) This means that, for each concrete device D, there is a definitional extension of ZF, call it “ZFD ” in which, first, each circumstance α relevant to the strictly computational capacity of D is definable explicitly as a numerical term α∗ . Second, in the language of ZFD there is a formula φD such that, for each number pair hm, ni and circumstance α, D accepts m and outputs n in α if and only if ZFD ` φD (m, n, α∗ ).
Thesis and Variations
289
The set theoretic formula φD represents, at least in part, the results of the logician’s or program analyst’s efforts applied to D. It follows that there is a circumstance α in which D accepts m and outputs n if and only if, for some α∗ , ZFD ` φD (m, n, α∗ ). Conceived as a logical machine, D computes the function whose graph is the set of input-output pairs in all the circumstances in which the device can find itself. (In force throughout is the simplifying assumption that each machine computes a single function.) It follows from the above that the function computed by any concrete device is Σ1 . Moreover, since there are at the moment only finitely many different concrete devices, this process—associating with each device D the appropriate extension ZFD and formula φD —is itself uniformizable. Putting it all together, one sees that there is a primitive recursive predicate of arithmetic M (e, m, n, p) with this property: for each D, there is a number e such that, on m, the function computed by D outputs n if and only if ∃p.M (e, m, n, p). In other words, a function computed by a concrete computing device is identical to the extension of a standard Σ1 predicate of elementary arithmetic and uniformly so. In this conjecture, Church’s Thesis has not already been granted and all questions begged. First, to embrace a claim of this kind about machines, algorithms, and the functions they compute is not yet to have made an agreement about the detailed internal structures of the algorithms that may or may not compute those functions, when implemented on some device. In particular, there is here no presumption that any algorithm for computing a computable function is faithfully represented by a Turing machine or encoded by natural numbers in a uniform fashion. Second, what is claimed is a conjecture, not an assumption. Because the number of extant concrete devices is finite, it admits of definitive proof or disproof. Most likely, the proof would appear in the form of a vast argument by cases, associating with each machine the function it computes and its settheoretic definition. Finally, it should be clear from the manner in which the conjecture is made that no target of the present writing is a skeptical problem of specifying, in a non-circular and naturalistic fashion, the functions or algorithms that humans or computing devices indeed compute, a problem that Saul Kripke thought to find in Wittgenstein’s later writings [Kripke 1982].
290
Charles McCarty
By moving from devices computing functions “across possible circumstances” directly to the arithmetic definability of those functions, one skirts the vexed issue of Turing machines and their natural number encodings, although a G¨ odel numbering of formulae is required. A noted source of that vexation is the fallacious argument in [Turing 1936–37] for the conclusion that every function computable by a human being with pencil and paper is computed by a Turing machine. As proposed in [G¨ odel 1990] and [Kreisel 1972], and examined mathematically in [McCarty 1987], it is more plausible to represent calculating human beings as computers that are potentially infinite in certain respects. If so, students of the human computer are required to define and study, within intensional mathematics, Turing machines with potentially infinite collections of internal states, as in [Abramson 2006], or, within intuitionistic mathematics, abstract machines with state-sets that are Dedekind-finite, that is, finite with respect to intuitionistic means. In the latter case, one finds Turing machines with no ordinary natural number encoding: for some functions f computable by a human, there exists a Dedekind-finite Turing machine computing f , but no natural number e such that f = λn.{e}(n). Once Turing machines working over the natural numbers come to lack numerical codes, the question naturally arises, “What happens to those negative results of traditional computability theory, e.g., the unsolvability of the halting problem, that seem to depend in their formulations on the prospect that a machine operate on its own code?” This question has yet to be answered.
Theme: Possibility and Church’s Thesis The metatheory reigning here is Brouwer’s intuitionistic mathematics. Do not be afraid: this presents no substantial barrier to full appreciation by a conventional mathematician. When, later in the current section, lawless sequences crop up, conventional thinkers can silently pass into an extension of his or her universe provided by an apposite topos containing lawless objects. When Markov’s Principle, for any decidable set S of natural numbers, if ¬¬∃n.n ∈ S, then ∃n.n ∈ S,
and Church’s Thesis in its intuitionistic form, namely,
every function over the natural numbers is Turing computable,
Thesis and Variations
291
are required, the apprehensive conventionalist can take refuge in an interpretation over Kleene’s realizability universe, where both Markov’s Principle and Church’s Thesis in the latter style obtain conventionally. If a desktop computer, in its guise as logical machine, computes function f then, for each pair hm, ni in the graph of f , there is a counterfactual circumstance α in which the memory and workspace of the device are so enhanced that n is output in α when m is represented as input and the device activated on that input. Because each circumstance is strictly finite, no circumstance can contain all of f ’s graph when that graph is infinite. Current concerns are restricted to extensions of functions computed, to input-output relations; little or no attention is paid to other aspects of processing. Hence, there is no loss in using finite sets of natural numbers to model circumstances, pairs rendered using primitive recursive pairing. With circumstances so understood, if a device produces output n on input m in circumstance α, the pair hm, ni is a member of α. The totality C of all circumstances so construed is naturally ordered by set inclusion. For modeling modality, finite pathways through C can replace its individual members. So, the accessibility relation 5 on C is assumed to yield a countable, primitive recursive tree growing from a single root. Throughout theme and variations, the object language will be an extension of the formal language for second-order Peano–Dedekind Arithmetic with unary modal operators ¤ and ♦. Second-order set variables such as X range over arbitrary C-indexed sets of natural numbers. In addition, there will be second-order function variables such as f , treated as set variables so that hm, ni ∈ f will convey that f (m) = n. These variables quantify only over the C-indexed subgraphs of functions computed by concrete machines. For each α ∈ C, the first-order quantificational domain attached to α is the set N of natural numbers. X’s component at circumstance α is Xα ; f ’s component is fα , that subset of the graph of f that is computed at α. For α ∈ C, m, n ∈ N, and C-indexed set X, α ° X(n) whenever n ∈ Xα . Similarly, α ° hm, ni ∈ f in case hm, ni ∈ fα . Forcing for logical connectives, quantifiers and modal operators is defined as usual. A sentence φ of the extended second-order language will be said to hold in the model structure, in symbols ° φ, if it is forced at
292
Charles McCarty
every circumstance. With this stagesetting in place, it is immediate that Proposition: The internal mathematics of the model structure so defined contains the necessitations of all true sentences of conventional second-order Heyting Arithmetic including full, arbitrarily modalized comprehension. The internal logic of the structure extends intuitionistic S4 plus the first- and second-order Barcan Formulae. ¥ With f ranging over functions computed by concrete devices and M (e, m, n, p) the primitive recursive predicate of the last section (which we henceforth treat as atomic), we have ° ∀f ∃e¤∀m, n(♦(hm, ni ∈ f ) ↔ ∃pM (e, m, n, p)). Therefore, if the forcing condition α ° hm, ni ∈ f is altered to there is a circumstance β such that α 5 β and hm, ni ∈ fβ , we achieve a formal rendering of the “logical machine” conception: when f yields value n on argument m, there is a finite expansion of resources available to the machine computing f such that, once m is represented, the machine will accept that representation as input, and will produce a representation of n. To mark that conception, we write “hm, ni ∈♦ f ” in place of “hm, ni ∈ f ”. In such terms, our modal formulation of Church’s Thesis—call it “¤CT”—is ∀f ∃e¤∀m, n(hm, ni ∈♦ f ↔ ∃pM (e, m, n, p)). ¤CT asserts that each function f computed by a concrete device has at least one global index e that tracks the device from circumstance to circumstance and ensures throughout the Σ1 -definability of the function it computes. ¤CT should be compared with epistemic forms of Church’s Thesis on offer from [Flagg 1985], [Horsten 1998], and [Horsten 2006]. Theorem: ° ¤CT. ¥ If it is indeed right and proper to view concrete computing devices as modal-logical machines, why aren’t modal versions of Church’s Thesis more familiar? Why do customary statements of Church’s Thesis restrict themselves to what is computed, leaving what might
Thesis and Variations
293
or must be computed in outer darkness? It is a comfort that the answers to these questions is Brouwerian: we conceive the course of the computational future, rendered as a path p through the tree C of the model structure, as lawless, as outside future mathematical control. This means that, beyond a knowledge that certain resources in circumstances α0 ∈ p, α1 ∈ p, . . . αk ∈ p have already become available (and that modal arithmetic continues to be hold in every circumstance), we have no further mathematical ken of the computational circumstances α ∈ p that will someday arise. In consequence, a nonmodal Church’s Thesis will be seen to be true—for a perfectly reasonable notion of extensional truth—in each lawless future. Of course, popular magazines may offer us a knowledge (or its vain impostor) of ways in which technology will, in physical fact, multiply computational resources in future days. This presumptive knowledge of the actual course of physical reality is irrelevant to a theory of computing devices as logical machines. The idea of extracting, from a modal or topological model, a nonmodal structure determined by a lawless path first featured in [Kreisel 1958]. To aid the extraction, we define a translation φ 7→ φ¤ recursively over a fragment of the nonmodal extended second-order language. Definition: 1. For φ atomic and arithmetic, φ¤ = φ, 2. (hm, ni ∈ f )¤ = hm, ni ∈♦ f , 3. φ 7→ φ¤ commutes with ∧, ∨, and ∃x ∈ N, 4. (φ → ψ)¤ = ¤(φ¤ → ψ ¤), and 5. (¬φ)¤ = ¤¬(φ¤). For present purposes, a lawless path through C is subject to these two principles. Principles for Lawless Paths 1. (Density) For any strictly finite chain of circumstances α0 5 α1 5 . . . 5 αk , there is a lawless path p through C such that all of α0 , α1 , . . . and αk lie on p.
294
Charles McCarty
2. (Open Data) Let λq.P (q) be a property of paths q through C independent of lawless paths other than p. If P (p) holds, then there is an initial segment of p such that any lawless path sharing that segment also has P . In brief, all one knows of the future course of p through C is the finite initial segment of circumstances in p that have already transpired. The proof of this theorem requires both principles. Theorem: Let p be a lawless path through C. The (meta)predicate ∃α ∈ p.α ° φ¤ is a truth predicate on formulae φ for which the translation is defined, determining a nonmodal structure =p such that ∃α ∈ p.α ° φ¤ if and only if =p ² φ. Proof: One adapts to modal formulae the relevant arguments from [Kreisel 1958] or [McCarty 1996]. ¥ Corollary: Let p be a lawless path through C. The nonmodal Church’s Thesis CT, ∀f ∃e∀m, n(hm, ni ∈ f ↔ ∃p.M (e, m, n, p)), holds in =p . Proof: Let p be any lawless path through C, and let α ∈ p. We already know that α forces ¤CT, ∀f ∃e¤∀m, n(hm, ni ∈♦ f ↔ ∃p.M (e, m, n, p)). That is, if f is a function computed by a concrete device, there is a natural number e such that, for all natural numbers m and n, α ° ¤(hm, ni ∈♦ f ↔ ∃p.M (e, m, n, p)). The latter is (equivalent to) a formula in the image of the φ 7→ φ¤ translation. We know thereby that CT holds in =p . ¥ As far as logical machines are concerned, the course of the future through the computationally relevant circumstances is a lawless path. As we have just seen, what is forced by one or another circumstance α lying on lawless p is a part of the truth relative to p. ¤CT is forced at every circumstance on any such p. Hence, regardless of future computational history, CT will be one of the truths revealed in that history. Nonmodal CT should be compared with the second-order formalization of Church’s Thesis from [Kreisel 1965].
Thesis and Variations
295
That was our theme: a recognizable version of Church’s Thesis, ¤CT, obtains in a model structure for modal second-order arithmetic representing a treatment of concrete computing devices as logical machines. This guarantees that a recognizable nonmodal version of Church’s Thesis will be true as the computational future unfolds. The present investigations and results suggest a justification for pursuing modal arithmetic and set theory not canvassed in [Shapiro 1985] or Craig Smoryński’s incisive review [Smoryński 1991], namely, the rigorous study of concrete objects and their counterfactual features not confined by a narrowly physical possibility. Now come the variations. They are generated by leaving the logical machine behind in one way or another, by imposing stricter requirements on possible circumstances and on the forcing conditions for hm, ni ∈ f . The first requirement sets a simple bound on searches for circumstances under which a concrete machine inputs m and outputs n, the second takes seriously a demand that C be modeled explicitly as potentially infinite, the third represents one reasonable reaction to the physicality of the machine M computing f . Through all three, at least one outcome is saliently preserved: ¤CT fails dramatically.
Variation One: Bounds on Size Sometimes my thoughts also flutter with longing, anxious against the bars on the cage of finitude. [Du Bois–Reymond 1882, p. 110]
For α ∈ C, let us say that the size of α, s(α), is its maximum member, unless α is empty, in which case its size is 0. Hitherto, our definitions would have allowed circumstance β to be a direct descendant of α (in symbols, α 51 β) in the tree ordering on C even were s(β) vastly to exceed s(α) in size. One can sensibly ask what happens to Church’s Thesis in the form ¤CT when limits are set on the circumstances that count as “single step” descendants and correlative limits on the number of circumstance-expansion steps required to produce values of a function. Thinking in this vein, one might maintain that there is a perfectly good reading of “cannot” (alternative, of course, to the pure conception of logical machine) on which we cannot increase the extrinsic capacities of a machine computing f by huge amounts in a single step. Moreover, we would be incapable, in a similar sense, of expanding the capacities of a
296
Charles McCarty
device arbitrarily often. If we were to be able to compute f with concrete machine M on a large input k, we could only enhance the capacity of the M through a bounded series of limited, one-step enhancements. To model such an idea, let F and G be natural number functions, subject to the following conditions. Again, s(α) is the size of circumstance α. 1. F is strictly inflationary and monotone on N. 2. In the model structure, the tree on C is so trimmed that, for α and β in C, β is a direct descendant of α (on the current restricted conception) just in case α 51 β (on the old conception) and s(β) is less than F (s(α)). 3. As for forcing, α ° hm, ni ∈ f precisely when f (m) = n, and one is required to pass through a connected sequence of direct descendants of α (understood on the current conception) at most G(s(α)) in length before reaching a circumstance β at which hm, ni ∈ β. The first two conditions allow that, although one can expand a machine’s capacity in a single step in a number of ways bounded by F (the size of the current capacity), it is still possible, perhaps through many single-step enhancements, to expand the capacity to take on inputs and outputs that are arbitrarily large. The third condition imposes an α-dependent bound G(s(α)) on the number of iterated one-step expansions of α that can come into play when verifying a particular input-output relation at α. In other words, for one to see, in circumstance α, that the machine outputs n on input m, one can enlarge the resources available at α by an amount limited, via G, by the resources available at α. Definition: A sentence φ in the pure language of the second-order modal arithmetic is persistent if, for any α and β with α 5 β, α ° φ implies β ° φ. Definition: A formula φ of the pure language is positively modal if and only if it is equivalent, over the model structure, to a formula ψ in which ∧ and ∨ are the sole propositional connectives and every predicative occurrence of a second-order constant or variable S within ψ is in a subformula of the form ¤S(t), where t is a term.
Thesis and Variations
297
Proposition: Every positively modal sentence is persistent. ¥ Corollary: Every sentence of pure first-order arithmetic is persistent. ¥ The modal version of Church’s Thesis is now seen to fail: no ternary predicate Φ such that ¬Φ is positively modal is capable of capturing uniformly the extensions of all functions computed by concrete devices, given that the three conditions on 5 and the forcing condition for hm, ni ∈ f are all in place. We assume here that the successor function on the natural numbers is computed by some concrete device. Theorem: Let Φ be any ternary predicate in the language of pure, modal second-order arithmetic such that ¬Φ is positively modal. The formula ∀f ∃e¤∀m, n(hm, ni ∈ f ↔ Φ(e, m, n)). fails in every circumstance. Proof: Let α be a circumstance. Let su be the successor function on N. Let k be such that hk, k + 1i is greater than F G(s(α)) (s(α)). For the sake of argument, assume that the formula displayed in the theorem statement is forced at α. Then, there is an e such that α ° ¤∀m, n(hm, ni ∈ su ↔ Φ(e, m, n)). Assume further that α ° hk, k + 1i ∈ su. A simple calculation shows that there is a circumstance β such that hk, k + 1i ∈ β, where s(β) is at most F G(s(α)) (s(α)). But this violates the choice of k. Therefore, α 1 hk, k + 1i ∈ su. It follows that, for e as above, α ° ¬Φ(e, m, n). Because F is strictly inflationary, there is a γ ∈ C such that α 5 γ and γ ° hk, k + 1i ∈ su. However, since ¬Φ(e, m, n) is persistent, γ ° ¬Φ(e, m, n). Hence, γ 1 hk, k + 1i ∈ su, contradicting the foregoing. ¥ Corollary: Let Φ be any ternary predicate in the language of pure, elementary arithmetic. The formula ∃e¤∀m, n(hm, ni ∈ f ↔ Φ(e, m, n))
298
Charles McCarty
fails in every circumstance. Proof: Every purely first-order formula is positively modal. ¥
Variation Two: Potential Infinity It seems quite reasonable to judge a mathematical system by its usefulness. I admit that from this point of view intuitionism has as yet little chance of being accepted; [...] in my eyes its chances of being useful for philosophy, history and the social sciences are better. In fact, mathematics, from the intuitionistic point of view, is a study of certain functions of the human mind, and as such it is akin to these sciences. [Heyting 1976, p. 10]
As intuitionists, it is open to us to take more seriously the idea that the collection C of circumstances be potentially infinite. In particular, we can require the circumstances to comprise a subset of N containing no infinite, enumerable, constructive subset. Definition: A Dedekind-finite set of natural numbers is both nonfinite and noninfinite, that is, it contains no infinite, constructively enumerated set, but is not equipollent to any natural number. A straightforward constructive argument employing Church’s Thesis in its intuitionistic form shows that Dedekind-finite sets of natural numbers exist. These sets are simultaneously the intuitionistic analogues of Dekker’s recursively immune sets [Dekker 1954] and natural realizations of the old idea of potentially infinite sets. Since a Dedekind-finite set S is nonfinite, for every number n, S can contain a number greater than n. But S cannot be infinite; so it cannot be the case that, for any n, S does in fact contain a member greater than n. To be assured that a concrete device D computes f on input m, one might reasonably require more assurance than the mere fact, perhaps adventitious, that, for such m, there exists some extension or other β of the extrinsic resources currently on offer to D that would enable it, once armed with the extra resources available in β, to output f (m). It would certainly suffice were there to be a constructive function, represented by a constructive branch through the tree C of the model structure, enumerating an infinite sequence of circumstances in which the required input-output pairs can be located.
Thesis and Variations
299
(Of course, the Principle of Open Data implies that such a sequence could not coincide with any lawless path through C.) With the set of circumstances construed as before, any Turing-computable f would automatically produce the requested constructive branch. Matters become markedly different when our old notion of circumstance is altered to include with each circumstance a natural number weight, where one thinks of α’s weight as a measure of the cost or difficulty incumbent on bringing α into existence. With this reconception of circumstances, we shall ask whether the weights can always be arranged in a suitably constructive fashion if the set of circumstances is Dedekind-finite. For purposes of the current section, we adopt these restrictions. 1. A circumstance is now to be a pair consisting of a finite subset of N and a natural number weight m. Once again, C is to include all the (finite) circumstances arranged into a countable tree such that, if α and β are circumstances and α 5 β in the tree, then the finite subset of numbers contained in α is a subset of that of β. We take the set of weights appearing in C to be a Dedekind-finite set, with different circumstances possessing different weights. 2. With regard to forcing, we say that α ° hm, ni ∈ f if there is a constructive path p through C containing α with the property that, whenever f (q) = r, there is a β ∈ p such that hq, ri is in the finite subset associated with β and hm, ni is in the finite subset associated with α. The second restriction requires that, if α represents an enlargement of the resources to a device so that a successful evaluation of f on m can occur, then α must appear on a constructively-generated path through all the relevant circumstances sufficient, if only eventually and little-by-little, to evaluate f on all its input values. Roughly put, if a concrete device is able to compute an output on input m, one has to have to hand some mathematical recipe for achieving that result—and all other such results—with the device. When C and the forcing conditions for hm, ni ∈ f are restricted as above, the following outcome is immediate. Theorem: Let f be any function computed by a concrete device and Φ any arithmetic formula, Σ1 or otherwise, with nontrivial extension.
300
Charles McCarty
The formula ¤∀m, n(hm, ni ∈ f ↔ Φ(m, n)) fails at every circumstance. ¥
Variation Three: The Physical Device, Revisited One way to ensure that f be physically computable, at least in one classical sense, would be to insist that there be a path p through the collection C of circumstances that a physical object, classically construed, can follow such that p suffices for computing, if only incrementally, all the values of f . To model such a demand, we arm each circumstance α in C with, in addition to pairs hm, ni that lie on the graph of f , a real number pair to serve as its location. Assume further that all locations are drawn from the unit square. For a classical object to follow a path and the result be computable, it will be required that that the path trace out, through the locations, a solution to an ordinary differential equation. To render these ideas formally, we adopt these restrictions to the original conception. 1. Let C be augmented so that each circumstance bears a location that is a pair of real numbers drawn from the unit square, with different circumstances having different locations. 2. An ordinary differential equation E is constructive when there is a constructive function F on the unit square such that E is dy = F (x, y). dx 3. Let E be an ordinary, constructive differential equation. For f a function computed by a concrete device, a path p through the tree giving C generates f according to E if and only if, whenever hm, ni ∈ f , there is an α ∈ p for which hm, ni appears in α and the collection of locations from circumstances in p all lie on some constructive solution to E. 4. For E as above, α ° hm, ni ∈ f just in case α lies on a path p generating f according to E, f (m) = n, and hm, ni appears in some β ∈ p, β ≥ α.
Thesis and Variations
301
The idea behind the third restriction should be clear: when p generates f according to E, a sequence of locations from those circumstances sufficient to compute all the values of f can be strung out like beads on a string along the trajectory of an object obeying condition E. Then, α tells us that f (m) yields n when α can be extended along a path p generating f according to equation E to a circumstance β sufficient for m to be input and n output. With the restrictions in place, one sees that Theorem: There are ordinary, constructive differential equations E such that ¤∀m∀n(hm, ni ∈ f ↔ Φ(m, n))
fails at every α for any f and any arithmetic Φ with nontrivial extension.
Proof: There are ordinary, computable differential equations E having no computable solution on the unit square, as was demonstrated in [Pour–El and Richards 1979]. Using Markov’s Principle and Church’s Thesis, one can deduce from this that there are ordinary, constructive differential equations having no constructive solution on the unit square. ¥.
Acknowledgements The author is grateful to Darren Abramson, Nathan Carter, Janet Folina, and Christopher Tillman for their helpful comments and suggestions.
References Abramson, D. [2006], “Functionalism and the Church–Turing Thesis”, Bloomington, IN: Department of Philosophy, Indiana University, ms. Dekker, J.C.E. [1954], “A Theorem on Hypersimple Sets”, Proceedings of the American Mathematical Society 5, 791–796. Du Bois–Reymond, P. [1882], Die allgemeine Functionentheorie, T¨ ubingen, DE: Verlag der H. Laupp’schen Buchhandlung, pp. xiv+292. [Translations from this volume are my own]. Flagg, R.C. [1985], Church’s Thesis is Consistent with Epistemic Arithmetic, in Intensional Mathematics (S. Shapiro ed.), Amsterdam, NL: North-Holland, pp. 121–172.
302
Charles McCarty
Gandy, R. [1980], Church’s Thesis and Principles for Mechanisms, in The Kleene Symposium, (J. Barwise, et al. eds.), Amsterdam, NL: North-Holland, pp. 123–148. G¨odel, K. [1990], Some Remarks on the Undecidability Results, in Collected Works, vol. II, Publications 1938–1974, (S. Feferman, et al. eds.), Oxford, UK: Oxford University Press, pp. 305–306. Heyting, A. [1976], Intuitionism. An Introduction, Third Revised Edition, Amsterdam, NL: North-Holland Publishing Company, pp. ix+137. Horsten, L. [1998], “On Proposed Formalizations of Church’s Thesis in the Language of Epistemic Arithmetic”, The Bulletin of Symbolic Logic 4(2), June 1998, 213–214. Horsten, L. [2006], Formalizing Church’s Thesis, [this volume]. Kreisel, G. [1958], “A Remark on Free Choice Sequences and Topological Completeness Proofs”, The Journal of Symbolic Logic 23, 369–388. Kreisel, G. [1965], Mathematical Logic, in Lectures on Modern Mathematics, vol. 3, (T.L. Saaty ed.), New York, NY: Wiley, pp. 95–195. Kreisel, G. [1972], “Which Number-Theoretic Problems Can Be Solved in Recursive Progressions on P i11 -Paths Through O?”, The Journal of Symbolic Logic 37, 311–334. Kreisel, G. [1987], “Church’s Thesis and the Ideal of Informal Rigour”, Notre Dame Journal of Formal Logic 28(4), 499–519. Kripke, S.A. [1982], Wittgenstein on Rules and Private Language: An Elementary Exposition, Cambridge, MA: Harvard University Press, pp. x+150. McCarty, C. [1987], “Variations on a Thesis: Intuitionism and Computability”, Notre Dame Journal of Formal Logic 28(4), 536–580. McCarty, C. [1996], Completeness for Intuitionistic Logic, in Kreiseliana: About and Around Georg Kreisel, (P. Odifreddi ed.), Wellesley, MA: A.K. Peters, Ltd., pp. 301–334. Montague, R. [1962], Towards a General Theory of Computability, in Logic and Language: Studies Dedicated to Professor Rudolf Carnap on the Occasion of his Seventieth Birthday,
Thesis and Variations
303
(B. Kazemier and D. Vuysje eds.), Dordrecht, NL: D. Reidel Publishing Company, pp. 118–127. Pour–El, M. and Richards, I. [1979], “A Computable Ordinary Differential Equation which Possesses no Computable Solution”, Annals of Mathematical Logic 17, 61–90. Pour–El, M. and Richards, I. [1989], Computability in Analysis and Physics, Berlin, DE: Springer-Verlag, pp. x+206. Shapiro, S. [1985], Intensional Mathematics, Amsterdam, NL: North-Holland, pp. v+230. Smoryński, C.A. [1991], Review of Intensional Mathematics by Stewart Shapiro, The Journal of Symbolic Logic 56(4), 1496–1499. Turing, A.M. [1936–37], “On Computable Numbers, with an Application to the Entscheidungsproblem”, Proceedings of the London Mathematical Society, Series 2 42, 230–265. Wittgenstein, L. [1953], Philosophical Investigations, Third Edition, G.E.M. Anscombe (tr.), New York, NY: The Macmillan Company, pp. vi+250. Wittgenstein, L. [1976], Wittgenstein’s Lectures on the Foundations of Mathematics, Cambridge, 1939, (C. Diamond ed.), Ithaca, NY: Cornell University Press, pp. 300.
Elliott Mendelson∗
On the Impossibility of Proving the “Hard-Half” of Church’s Thesis As is well-known, Alan Turing presented in his 1936 groundbreaking paper an argument to show that any computable1 function f (x1 , . . . , xn ) from natural numbers to natural numbers could be computed by a certain kind of abstract machine (now called a Turing machine).2 The class of these so-called Turing-computable functions turned out to be provably identical with other classes of functions that were thought to capture the intuitive notion of computable function, including the class of Kleene’s partial recursive functions. Turing’s argument was so convincing that logicians and mathematicians were willing to accept the truth of Church’s Thesis [CT] that a function is computable if and only if it is Turingcomputable.3 The great incompleteness and undecidability results in the nineteen-thirties of G¨ odel, Church, Turing, Kleene, Rosser, and Tarski, as well as innumerably many results since then, depend for their scope and significance on the truth of CT. An argument somewhat similar to Turing’s but more general was given by Kolmogorov and Uspensky in 1953. In their case, as in all the previous treatments, the assertion of CT was made in a rather tentative way because it was considered an equivalence between an intuitive notion (computable function) and a “precise mathematical concept”. (Sometimes the word “precise” is replaced or supple∗
E. Mendelson, Department of Mathematics, Queens College, Flushing, NY 11367, . 1 I have dropped the adverb “effectively” from “effectively computable”, since it is redundant. 2 In actuality, Turing originally worked with infinite sequences of digits, so that he was dealing with computable real numbers in the interval [0, 1]. 3 The assertion CT*, that a function is recursive if and only if it is computable and total, follows from CT but seems to be weaker than CT, and appears to have attracted no independent interest.
On the Impossibility of Proving the “Hard-Half”...
305
mented by “well-defined” or “formal”.) A proof of such an equivalence is alleged to be impossible. These reservations about CT have been directly challenged, first in an approach sketched in Gandy [1980], and then more thoroughly in papers by Sieg ([1994] and [1997]), in which the mathematical apparatus for defining functions is transparent and the argument that all computable functions are definable in this way is convincing. Of course, one has to see that any way of computing a function can be suitably regimented to fit the prescribed pattern. This is unavoidable here, as it was for Turing, for Gandy, and for Kolmogorov and Uspensky. Another approach to a proof of CT, still unrealized, would call on us to add to ZF all the facts about computable functions that we know. Perhaps we would use here a predicate for computable function and possibly symbols for other notions. (I am not aware that this project has been seriously pursued anywhere.) Within this new theory we could prove the “easy half” of CT, namely, that every partial recursive function is computable, since all the assumptions used in any of the usual proofs for the “easy half” would be axioms of the theory. (See, for example, [Black 2000, pp. 249–250], or [Mendelson 1990, pp. 232–233].) The “hard half” of CT, that every computable function is partial recursive, appears to be difficult to prove. Moreover, as was mentioned above, many people have been certain that the “hard half” is not only difficult to prove, but also impossible to prove. As indicated earlier, the alleged reason for this impossibility is that there is no way of deriving precise (or well-defined or formal) mathematical properties of a function from the assumption that the function has the intuitive, informal, and allegedly vague and imprecise property of being computable. A clear, well-written statement of this position may be found in [Folina 1998]. I would like to examine briefly the distinctions that are made between “computable function” and “partial recursive function”. I am not challenging the usual distinction between “precise” and “imprecise” in mathematical discussions. That often plays a useful role. What I do think questionable is a careless application of that distinction to fundamental concepts like “set”, “class”, “natural number”, “partial recursive function” and “computable function”. I do not believe that the distinction between “precise” and “imprecise” serves to distinguish “partial recursive function” from
306
Elliott Mendelson
“effectively computable function”. The concept of “partial recursive function” is not assumed to be known intuitively, rather, it is defined in terms of previously defined mathematical notions as well as one or more basic mathematical notions like “set”. The choice of basic notions will depend on the choice of the underlying foundational language and theory. In actuality, the latter choice is usually not specified. The definition will use various notions like “set”, “natural number”, “function”, and “sequence” that, in this context, are understood without question, even by mathematicians who are not thoroughly acquainted with a foundational language and theory. Those mathematicians will understand the concept of partial recursive function only insofar as they have an intuitive understanding of the notions that occur in the definition. Those mathematicians who are thoroughly acquainted with an underlying foundational language and theory must also have an intuitive understanding of the basic notion or notions of that language. Typically, “set” will be such a notion. Although many of us (but by no means all of us) are quite comfortable with that notion, would we want to say that “set” is a “precise” notion, or, on the other hand, that “set” is an “imprecise” notion? Somehow, “precise” and “imprecise” may not be appropriate adjectives in this case, even though we would not hesitate to say that the notion of “partial recursive function” is precisely defined. The “precision” has to do with the definition within the theory, not with the underlying concepts of the theory. It also seems to me to be inappropriate to apply the adjectives “precise” or “imprecise” to the concept of “computable function”. If you believe that it is imprecise, you appear to be saying that it is vague;4 but the concept of “computable function” is hardly vague. After the usual explanation of its intended meaning, people seem to understand it without difficulty and to apply it in an appropriate way. I do not know of any cases in which it leads to contradiction or paradox.5 Sometimes, the assumed fact that the definition (or the notion) of “partial recursive” is formal is taken to be a barrier between it and the “informal” notion of “computable function”. I believe that Alonzo Church himself pointed to this as a reason for the unprov4
Folina [p. 311] maintains that the boundaries of intuitive notions “aren’t precise enough to be shown equivalent to anything”. 5 I do not mean to say that its ultimate explanation is simple. We might need to have recourse to ideal mathematicians without limitations of time, space, or brain power.
On the Impossibility of Proving the “Hard-Half”...
307
ability of CT. Of course, this distinction carries some weight only if the formal language in question has special significance; say it is the language of a set theory or a higher order predicate calculus. Even then, however, the reasoning that I have exhibited (for example, in the proof of the easy-half of CT) shows that logical connections exist between informal notions and concepts defined in those formal languages. I am indebted to Professor Folina for pointing out in [Folina 1998] various deficiencies in my 1990 paper. First of all, contrary to the impression that might be obtained from some of the things I said, I do not deny the extreme importance to mathematics and logic of the definitions of partial recursive functions and Turing-computable functions. Second, in the list of examples that I gave on pp. 230–232 of my paper to show that equivalences can be proved between what are traditionally thought of as an intuitive concept and a “precise well-defined mathematical concept”, my comparisons of the usefulness of the defined concept with that of the intuitive concept are sometimes tenuous. Nevertheless, those examples do show that there is no impenetrable barrier between the two kinds of concepts. A critique of [Folina 1998] may be found in [Black 2000], which provided an incisive picture of the situation with respect to Church’s Thesis.
Addendum I should like to mention another example having to do with the connections between intuitive mathematical notions and related definitions in accepted formalized axiom systems like ZF. The notion of an ordered pair < a, b > began to be used regularly in the seventeenth century when it was needed in analytic geometry and the calculus. As far as I know, its intuitive meaning was taken for granted and no definition was called for. This situation prevailed until the first definition {{∅, {a}}, {{b}}} was given by Norbert Wiener [1914]. The simpler definition {{a}, {a, b}}, found by Kuratowski [1921], is now the definition used in most axiomatic set theories. The Wiener and Kuratowski ordered pairs are provably different and are not the only suitable definitions, but it was noticed early on that any definition would suffice for mathematical purposes if it yielded the equivalence < a, b >=< c, d >↔ (a = c&b = d).
(∗)
308
Elliott Mendelson
(For a deeper study, see A. Oberschelp [1991].) Thus, in contradistinction to the case of “computable function” and its various definitions, there are many non-equivalent definitions of “ordered pair”. Moreover, none of the definitions of “ordered pair” yield any significant mathematical results beyond the equivalence (*). In fact, each such definition entails counter-intuitive, although apparently harmless, consequences. For example, the Kuratowski definition implies that every object b belongs to a set that is a member of the set < a, b >. The only advantage of any of the definitions of “ordered pair” is that it permits the construction of axiomatic set theories in which the symbol for the membership relation is the only non-logical symbol.6 As an alternative to using a definition of “ordered pair”, one could employ an undefined function letter f with the intention that f (a, b) stands for < a, b > and one could adopt (*), as well as other suitable statements, as axioms.7 Such an alternative theory would correspond in the case of computable functions to a theory I suggested above in which there is a predicate with the intended meaning of “computable function” and in which CT would be taken as an axiom.
References Black, R. [2000], “Proving Church’s Thesis”, Philosophia Mathematica 8, 244–258. Folina, J. [1998], “Church’s Thesis: Prelude to a Proof”, Philosophia Mathematica 6, 302–323. Gandy, R. [1980], Church’s Thesis and Principles for Mechanisms, in The Kleene Symposium, (J. Barwise, et al. eds.), North-Holland, pp. 123–148. Kolmogorov, A. and Uspensky, V.A. [1953 (1963)], “On the Definition of an Algorithm” [in Russian], Uspekhi Mat. Nauk, VIII, 125–176, [English translation:] Amer. Math. Soc., Translations, XXIX, 1963, 217–245. 6 Note that it is tempting to try to define < a, b > as the function g with domain {1, 2} and such that g(1) = a and g(2) = b. However, this definition is not available, since the notion of “function” is defined in terms of ordered pairs. 7 If individuals (that is, non-sets) are countenanced in the theory, one could require that < a, b > is always an individual or one can assume that < a, b > is always a set. In any of these theories, it would not be possible to prove that < a, b > is equal to the Wiener, Kuratowski, or any of the other possible definitions.
On the Impossibility of Proving the “Hard-Half”...
309
Kuratowski, K. [1921], “Sur la notion d’ordre dans la th´eorie des ensembles”, Fundamenta Mathematicae 2, 161–171. Mendelson, E. [1990], “Second Thoughts about Church’s Thesis and Mathematical Proofs”, The Journal of Philosophy 87, 225–233. Oberschelp, A. [1991], “On Pairs and Tuples”, Zeitschrift f¨ ur Mathematische Logik und Grundlagenforschung 37, 55–56. Shapiro, S. [1981], “Understanding Church’s Thesis”, Journal of Philosophical Logic, X, 353–365. Shapiro, S. [1993], “Understanding Church’s Thesis, again”, Acta Analytica 11, 59–77. Sieg, W. [1994], Mechanical Procedures and Mathematical Experience, in Mathematics and Mind, (A. George ed.), Oxford Univ. Press, pp. 71–117. Sieg, W. [1997], “Step by Recursive Step: Church’s Analysis of Effective Calculability”, Bulletin of Symbolic Logic 3, 154–180. Sieg, W. and Byrnes, J. [1996], K-Graph Machines: Generalizing Turing’s Machines and Arguments, in G¨ odel ’96. Lecture Notes in Logic, 6, (P. H´ ajek ed.) Springer, pp. 98–119. Sieg, W. and Byrnes, J. [1999], G¨ odel, Turing, and K-Graph Machines, in Logic and Foundations of Mathematics, (A. Cantini, et al. eds.), Kluwer, pp. 57–66. Turing, A. [1936], “On Computable Numbers, with an Application to the Entscheidungsproblem”, Proceedings of the London Mathematical Society 42, 230–265. Wiener, N. [1914], “A Simplification of the Logic of Relations”, Proceedings of the Cambridge Philosophical Society 17, 224–227.
Roman Murawski, Jan Woleński∗
The Status of Church’s Thesis The Church’s thesis can be simply stated as the following equivalence: (CT) A function is effectively computable if and only if it is partially recursive.1 Thus (CT) identifies the class of effectively computable or calculable (we will treat these two categories as equivalent) functions with the class of partially recursive functions. This means that every element belonging to the former class is also a member of the latter class and reversely. Clearly, (CT) generates an extensional co-extensiveness of effective computability and partial recursivity. Since we have no mathematical tasks, the exact definition of recursive functions and their properties is not relevant here. On the other hand, we want to stress the property of being effective computable, which plays a basic role in philosophical thinking about (CT).2 ∗
R. Murawski, Adam Mickiewicz University, Faculty of Mathematics and Comp. Sci., ul. Umultowska 87, 61–614 Poznań, Poland, ; J. Woleński, Jagiellonian University Institute of Philosophy, ul. Grodzka 52, 31–044 Kraków, Poland, . The financial support (for R. Murawski) of the Committee for Scientific Researches (Grant no. 1 H01A04227) is acknowledged. 1 We do not enter in the history of (CT) and its various formulations. Some people prefer to speak of the Church–Turing thesis: a function is effectively calculable if and only if it is Turing computable. In fact, Church used the concept of λ-definability. The principal historical details are to be found, for example, in Gandy [1988], Schulz [1997, pp. 159–171] (this book provides an extensive analysis of Church’s thesis, including the problems discussed in this paper) and Murawski [2004]. We choose the formulation via recursive functions for its applications in logic. 2 Henceforth, “computable” is an abbreviation for “effectively computable” and “recursive”—for “partially recursive”.
The Status of Church’s Thesis
311
A useful notion in providing intuitions concerning effectiveness is that of an algorithm. It refers to a completely specified procedure for solving problems of a given type. Important here is that an algorithm does not require creativity, ingenuity or intuition (only the ability to recognize symbols is assumed) and that its application is prescribed in advance and does not depend upon any empirical or random factors. Moreover, this procedure is performable in a finite number of steps. Thus a function f : N k → N is said to be effectively computable (briefly: computable) if and only if its values can be computed by an algorithm. Putting this in other words: a function f : N k → N is computable if and only if there exists a mechanical method by which for any k-tuple (a1 , . . . , an ) of arguments, the value f (a1 , . . . , an ) can be calculated in a finite number of prescribed steps. Three facts should be stressed here: (a) no actual human computability or empirically feasible computability is assumed in (CT); (b) functions are treated extensionally, i.e., a function is identified with an appropriate set of ordered pairs; (c) the concept of computability has a modal parameter (“there exists a method”, “a method is possible”) as its inherent feature. Typical comments about (CT) are as follows ((i) Kalm´ar [1959, p. 72], (ii) Kleene [1952, pp. 318–319], Kleene [1967, p. 232], (iii) Rogers [1967, p. 20]): (i) Church’s thesis is not a mathematical theorem which can be proved or disproved in the exact mathematical sense, for it states the identity of two notions only one of which is mathematically defined while the other is used by mathematicians without exact definition.3 (ii) While we cannot prove Church’s thesis, since its role is to delimit precisely a hitherto vaguely conceived totality, we require evidence that it cannot conflict with the intuitive notion which is supposed to be complete; i.e. we require evidence that every particular function which our intuitive notion would authenticate as effectively calculable is [...] recursive. The thesis may be considered a hypothesis about the intuitive notion of effective calculability; in the latter case, the evidence is required to give the theory based on the definition the intended significance. (iii) This is a thesis rather than a theorem, in as much as it proposes to identify a somewhat intuitive concept phrased in exact mathematical terms, and thus is not susceptible of 3
Kalm´ ar argues against (CT), but we do not enter into this question.
312
Roman Murawski, Jan Woleński proof. But very strong evidence was adduced by Church, and subsequently by others, in support of the thesis. It [(CT)—and other similar characterizations—R.M., J.W.] must be accepted or rejected on grounds that are, in large part, empirical. [...]. Church’s Thesis may be viewed as a proposal as well as a claim, that we agree henceforth to supply previously intuitive terms (e.g., “function computable by algorithm”) with certain precise meaning.
These three quotations shed some light on several problems raised by (CT). Firstly, we can and should ask for evidence for it. We take the standard position that the implication from recursivity to computability (every recursive function is computable) is obvious and the opposite implication from computability to mathematical definition of effective calculability, that is, recursivity (every computable function is recursive) has a sufficient justification.4 Secondly, one can ask for the fate of (CT) in some logical framework, in particular, in intuitionistic or constructive systems (see [Kleene 1952, pp. 318, 509–516], [Kreisel 1970], [McCarty 1987]), but we entirely neglect this question.5 Thirdly, there are various special problems, mostly philosophical, we believe, related to (CT). Does this thesis support mechanism in the philosophy of mind or not (see Webb [1980])? How is it related to structuralism in the philosophy of mathematics (see Shapiro [1983])? We also neglect this variety of questions, except eventual parenthetical remarks aimed at exemplification. Fourthly, and this is our main concern in this paper, there arises the problem of the status of (CT). We split this topic into two subproblems. (CT) can be considered from the point of view of its function in mathematical language or various conceptual schemes. The second subproblem focuses on the character of (CT) as a statement or sentence. To be more specific we note that one of the views considers (CT) as a definition. This gives an illustration of the former sub4
The accessible evidence for (CT) is collected in every textbook of logic having a part on recursion theory or a monograph about the topic. See for example, [Kleene 1952, pp. 317–323], [Kleene 1967, pp. 232–242], [Murawski 1999, pp. 85– 87], [Murawski 2004] and [Folina in this volume]. Since the implication (a) ‘if a function is recursive, then it is computable’ does not raise doubts the Church thesis is sometimes reduced to (b) ‘if a function is computable, then it is recursive’. On this route, (a) is called ‘the converse Church’s thesis’. We do not follow this custom. 5 Note, however, that although (CT) defines the concept of effective computability, it functions within classical, that is, not constructive metalogic.
The Status of Church’s Thesis
313
problem. However, independently whether (CT) has the status of a definition or not, it is captured by a sentence. Now, we can ask whether this sentence is analytic or synthetic, a priori or a posteriori. This provides an illustration of the latter subproblem. Although both subproblems are closely related, their separation, even relative, makes the analysis of the status of (CT) easier. The following views about the function of (CT) can be distinguished6 : (A) (CT) is an empirical hypothesis; (B) (CT) is an axiom or theorem; (C) (CT) is a definition; (D) (CT) is an explication. Ad (A). (CT) can be considered as referring to human (possibly idealized) abilities. Hence Church’s thesis is connected with questions concerning relations between mathematics and material or psychic reality. The last view is represented by Post, who regarded the notion of computability as a notion of a psychological nature (Post [1965, pp. 408, 419]; page-reference to the first edition): [...] for full generality a complete analysis would have to be made of all the possible ways in which the human mind could set up finite processes for generating sequences. [...] we have to do with a certain activity of the human mind as situated in the universe. As activity, this logico-mathematical process has certain temporal properties; as situated in the universe it has certain spatial properties.
Post maintained that (CT) should be interpreted as a natural law and insisted on its empirical confirmation (similarly DeLong [1970, p. 195]). However, one should be very careful with such claims. In fact, we have no general and commonly accepted psychological theory of human mental activities, even as far as the matter concerns the scope of computation. Hence, it is very difficult to think about (CT) as an empirical hypothesis about the human mind and its abilities. Eventually (CT) can be considered as related to mechanism in the philosophy of mind (see above) and even as confirming this view. Although we do not deny that (CT) has an application in philosophy, we do not think that this thesis as a tool for a philosophical analysis of the mind body problem functions as a genuine 6 We order these proposals according to their plausibility from our point of view.
314
Roman Murawski, Jan Woleński
empirical hypothesis. We think that the use of (CT) for supporting mechanism in the philosophy of mind is very similar to deriving (or not) indeterminism from the principle of indeterminacy. We have to abstain from further remarks concerning this interpretation of (CT), because it would require a longer metaphilosophical discussion. Ad (B). One should distinguish two understandings of axioms or theorems. Firstly, axioms can be considered as principal metatheoretical assumptions. When Kreisel (see [1970]) considers (CT) as a kind of reducibility axiom for constructive mathematics, he uses the concept of axiom in this sense. Yet we are inclined to think that his comparison of (CT) with “hypothesis” of V = L in set theory is rather misleading. In the case of V = L, there is no need to use quotes and say “hypothesis”, because we have to do with a set-theoretical axiom, whereas Kreisel’s analysis does not result in an axiomatic system of set theory. This leads to the second use of the term “axiom”, referring to an element of an axiomatic system. A similar problem arises with respect to the category of being a theorem. Mendelson (see [1990, p. 230]) writes: I would like to challenge the standard interpretation of CT as an unprovable thesis. My viewpoint can be brought out clearly by arguing that CT is another in a long line of well-accepted mathematical and logical “theses”, and that CT may be just as deserving of acceptance as those other theses. Of course, these theses are not ordinarily called “theses” and that is just my point.
Further, Mendelson mentions the following examples as comparable with (CT): (a) the set-theoretical definition of function as a kind of relation; (b) Tarski’s definition of truth; (c) the definition of logical validity; (d) Weierstrass’s definition of limit. And Mendelson continues [p. 232]: [...] it is completely unwarranted to say that CT is unprovable just because it states an equivalence between a vague, imprecise notion (effectively computable function) and a precise mathematical notion (partial-recursively function).
Mendelson gives three more specific arguments as supporting his view [pp. 232–233]: The concepts and assumptions that support the notion of partial-recursive function are, in an essential way, no less vague
The Status of Church’s Thesis
315
and imprecise than the notion of effectively computable function; the former are just more familiar and are the part of a respectable theory with connections to other parts of logic and mathematics. (The notion of effectively computable function could have been incorporated into an axiomatic presentation of classical mathematics, but the acceptance of CT made this unnecessary). [...]. The assumption that a proof connecting intuitive precise mathematical notions is impossible is patently false. In fact, half of (CT) (the “easier” half), the assertion that all partialrecursive functions are effectively computable, is acknowledged to be obvious in all textbooks in recursion theory. A straightforward argument can be given for it. (The so-called initial functions are clearly effectively computable [...]. Moreover, the operations of substitution and recursion and the least-number operator lead from effectively computable functions to effectively computable functions. [...].) This simple argument is as clear as a proof as I have seen in mathematics, and it is proof in spite of the fact that it involves the intuitive notion of effective computability. [...]. Another difficulty with the usual viewpoint concerning CT is that it assumes that the only way to ascertain the truth of the equivalence asserted in CT is to prove it. In mathematics and logic, proof is not the only way in which a statement comes to be accepted as true. Of course, this is a consequence of the truism that not all truths can be proved; proofs must assume certain axioms and rules of inference.
Of course, it is true that in mathematics there occur not only formal proofs but also other methods of justification accepted by mathematicians. Thus, Mendelson is right provided that the concepts of axiom and theorem are taken liberally. However, Mendelson’s view raises essential doubts (see also Shapiro [1993], Folina [in this volume]; Mendelson [in this volume] seems to take a more careful position). First of all, any discussion of (CT) and similar theses in the framework of logic or the foundations of mathematics requires an appeal to the metamathematical notions of proof, axiom and theorem, because, in the opposite case, the development of logic in any textbook of mathematical logic, including celebrated Mendelson [1964], would be redundant. To prove (in a formal way) Church’s thesis one should construct a formal system in which the concept of computability would be among the primitive notions and which would be based (among other things) on
316
Roman Murawski, Jan Woleński
axioms characterizing this notion. The task would be then to show that computability characterized in such a way coincides with recursiveness. But another problem would appear now: namely the problem of showing that the adopted axioms for computability do in fact reflect exactly the properties of computability (intuitively understood). Hence we would arrive at Church’s thesis again, though at another level and in another context. It should be also noted that Kleene and Rogers (see above) spoke about theorems in another sense, namely, as sentences (formulas) provable in an explicit formal (or formalizable) axiomatic system. “Being provable” means here being derivable by explicit inferential devices specified within an assumed system of logic. This treatment enables one to include axioms as a kind of theorems: logical axioms and their consequences are provable from the empty class of sentences, but extralogical axioms are their own proofs; that is, if A is an extralogical axiom of a system S, its proof in S consists solely of A itself (the same can be said about logical axioms as well).7 Secondly, even if the counterparts of intuitive and informal notions occur in axioms or theorems, they lose, entirely or partially, their ordinary or colloquial meanings (senses) and begin to function as axioms (postulates) and definitions of a given S. If a chemist defines water as H2 O, the substance denoted by “water” in theoretical chemistry is just H2 O and not water in the ordinary sense. Similarly, sets in axiomatic set theory are understood according to axioms, but not as collections in colloquial language or even in informal mathematics. Moreover, we find the phrase “concepts and assumptions that support the notion” very unclear. The circumstances supporting notions are always of a psychological nature, but, when we go to theories, concepts are supported only by the axioms in which they occur. Thirdly, we do not agree that the concept of recursive function is more familiar than the concept of computable function. Fourthly, the fact that the former is an element of a respectable mathematical theory, but the latter is not, constitutes just the point. Fifthly, although “the notion of effectively computable function could have been incorporated into an axiomatic presentation of classical mathematics”, we doubt whether 7
There are also theorists who regard Church’s thesis as proved. Gandy (see Gandy [1988]) argues that Turing’s direct argument pointing out that every algorithm can be simulated on a Turing machine proves a theorem. He regards this analysis to be as convincing as typical mathematical work. This concerns (CT) in terms of Turing machines.
The Status of Church’s Thesis
317
another way than the repetition of the theory of recursive functions (or an equivalent theory) appears as proper here. Thus, we think we can conclude that Mendelson’s position stems from the various uses of the concepts of proof, axiom and theorem. Yet we cannot ignore the role of (CT) in proofs. Rogers (see [1967, p. 21]) speaks about “proofs by Church’s Thesis” as relying on informal methods and, in this case, all evidence supporting (CT).8 In fact, these proofs translate results achieved in the theory of recursive functions into that about computability. In particular, (CT) is suited for establishing negative results in this respect. When one shows that a given function is not recursive, then one can conclude that it cannot be computed. Thus, we prove that the set of theorems of predicate calculus is not decidable because it is not recursive, where decidability is understood as a kind of computability. Perhaps the case of the first incompleteness theorem is the most interesting. Kleene (see [1987, p. 495, note 4]; see also Krajewski [in this volume]) observes that if S is an ω-consistent and complete system of formal number theory, then (CT) does not hold. Hence, we obtain, even by intuitionistic logic, that is, fully constructively, that if it is not true that ω-consistency of S implies its incompleteness, then (CT) does not hold. Webb (see [1980, p. 208]) considers the connection between the incompleteness of arithmetic and (CT) as very deep (emphasis follows the original; the letter S is introduced instead of F ): [...] true but unprovable sentences are just the guardian angels which look after the formalization of effectiveness found in any suitable S ; they protect (CT) from refutation by the diagonal argument.
The irrefutability of (CT) by diagonalization means that no nonrecursive function can be defined in S, provided that it is incomplete. Kleene (see [1952, p. 302], [1967, pp. 250–254]) proved the socalled generalized G¨ odel theorem: (GGT) There is no correct and complete formal system for the predicate “being a computable proof in S”. 8
This function of (CT) was indicated by Church in [1936a]: “[...] the author has proposed a definition of the commonly used term ‘effectively calculable’ and has shown on the basis of this definition that the general case of the Entscheidungsproblem is unsolvable.”
318
Roman Murawski, Jan Woleński
This does not support, however, that (CT) functions as an axiom or theorem. The proof of (GGT) is by Church’s thesis in the Rogers sense (see above) and, as Kleene himself points out (see Kleene [1967, p. 252]), can be easily converted into a derivation without appealing to (CT). Moreover, the actual significance of the protection of (CT) by the incompleteness phenomenon should be properly understood. We can make S complete by adding the ω-rule, which is non-finitary. This immediately leads to a new proof-predicate, which is not computable in the sense of (CT), that is, recursive. Thus S’=S+ the ω-rule generates a wider class of number-theoretical functions than S itself. Now we can generalize (CT) in order to capture new functions; this would lead to a different concept of recursivity. We have the following version of the incompleteness theorem (see Smullyan [1993, p. 41]; we assume that S is consistent): (GT’) If S is an axiomatizable system in which some non-recursive set is representable, then S is incomplete. By contraposition, we obtain (GT”) If S is complete, then S is non-axiomatizable or no nonrecursive set is representable. Since S’ is axiomatizable and complete, it has to represent some non-recursive set, where ‘recursive’ means what is usually accepted by this adjective. Hence, S’ extends the class of computable functions beyond (CT). The connection of incompleteness and (CT) is actually very deep but not purely deductive. In particular, it appears that the assumptions of G¨ odel’s theorem and (CT) are exactly the same as far as the matter concerns the finitary character of deductive devices. Kleene (see [1987, p. 495, note 4]) rightly points out that the dependence “if it is not true that ω-consistency of S implies its incompleteness, then (CT) does not hold” asserts (we prefer to say “suggests”) the absurdity of the completeness of S, “rather than giving a specimen of an undecidable formula”. Putting this in another words, the completeness of S would abolish its character as a system based on recursive (= computable) machinery.9 However, 9
There is also another way (see also Kleene [1967, p. 252], although our argument goes further) to justify the same conclusion. Instead of adding the ω-rule, we could take all arithmetical truths in the standard model as axioms. Although the resulting system is complete and has a finite deductive machinery
The Status of Church’s Thesis
319
assume that one started with S’ as the proper number theory, arguing that mathematicians use the ω-rule, but keeping the standard intuition about effectively calculable functions. Obviously, one could say “well, I understand calculable functions as corresponding to recursive functions associated with S=S’—the ω-rule”. The effect is exactly the same as in the case of (CT). Ad (C). Church (see [1936, pp. 90, 100]; the italic follows the original; page-reference to the reprint) introduced (CT) in the following way: The purpose of the present paper is to propose a definition of effective calculability [...]. We now define the notion of [...] an effectively calculable function of positive integers by identifying it with the notion of a recursive function of positive integers.
A similar treatment was suggested by G¨odel (see [1946, p. 150]; page-reference to the reprint): [...] one has for the first time succeeded in giving an absolute definition of an interesting epistemological notion, i.e., one not depending on the formalism chosen.
If we look at the formulation of (CT), its structure does not exclude considering it as a definition. To see this we can convert (CT) into (CT’) a calculable function is a function which is recursive. Now “calculable function” appears here as the definiendum, but the phrase “a function which is recursive” serves as the definiens. However, closer inspection immediately leads to some questions. The word “function” seems to have exactly the same meaning in both parts of (CT’). Hence, we cannot say that the concept of function is a genus which is determined by a differentiam specificam. This means that (CT’) does not fall under the classical formula that definition fit per genus proximum et differentiam specificam. It appears rather that (CT) offers a definition of calculable by the category of recursive, independently of whether it is treated intensionally or extensionally. Since the classical formula of framing all definitions by the proximate genus and the specific difference proves to be inadequate, it for proving every axiom by itself (otherwise speaking: every axiom has a proof consisting of the single element, that is, the axiom in question), it is based on a non-recursive axiomatic base. Hence, it defines a class of functions, which is not recursive in the usual sense.
320
Roman Murawski, Jan Woleński
is nothing wrong with the fact that (CT) lacks of this character. More important worries concern how (CT) should be qualified as a definition. Let us recall some traditional distinctions. Firstly, we distinguish nominal and real definitions, that is, those of words and things, respectively. Secondly, there are analytic (reporting), regulative and synthetic (designer) definitions. Now it is difficult to locate (CT) under these rubrics. Certainly, it is not a nominal definition, because nobody regards its acceptance or rejection as a matter of taste, convenience, etc. Even if one says that (CT) concerns the use of “calculable” to some extent, its obvious actual enterprise consists in providing an objectual (substantial) characterization. But what does “objectual” mean in this context? To touch the essence of a calculable function? Is the essence of calculable the same as the essence of recursive? We claim that since no straightforward answer can be given to these questions, any reasonable decision whether (CT) considered as a definition is nominal or real is hopeless. We qualify in a similar manner the solution of the question whether (CT) is analytic, regulative or synthetic. To start with the middle case, “calculable” is not a vague adjective as “bald” or “short”, because functions are or are not calculable, without being such to some degree, which should be made precise.10 Since “calculable” needs to be defined in order to put it into a formal mathematical theory, the alleged definition cannot be analytic, but since (CT) is certainly not an arbitrary claim, this prevents its treatment as a synthetic proposal. There is still another reason to doubt whether (CT) is a definition in the proper sense. We do not agree that the expression “calculable function” occurs only in colloquial language, understood as the speech of ordinary people; eventually, “colloquial” means here “functioning in the language of ordinary mathematics” (we will return to this question). Now mathematical definitions can similarly be considered as axioms, proofs or theorems (see above). In one sense, a definition can explain some more or less established intuitions. For example, one can say that to obtain a natural number n, one should start with 0 and subsequently apply the operation of adding 1 by performing n times this step. This definition of the natural number n is to be then formalized by the Peano axioms. There is no other way to check the correctness of such steps than by appealing to a 10
We do not exclude the situation that some functions are approximately calculable, but admitting that possibility would change the meaning of calculability.
The Status of Church’s Thesis
321
conformity between informal sources and formalizations. It is perhaps interesting that an informal definition of a natural number is not subjected to an evaluation as correct or not, although it is very far from being arbitrary. On the contrary, it is determined by quite definite intuitions stemming from ordinary counting. However, if we add definitions of adding and multiplication to formal (or formalizable) number theory, it is easy to establish whether they are correct or not. Analogically, (CT), as far as it functions in the language of informal mathematics, can be more or less supported by various data, but we are not able to establish its correctness in any other way than by simultaneous acting with its informal as well as formal exposition. What we can do with full precision is to assert that the definition of recursive function is correct or not, but we do this within formal (formalizable) mathematics. Our conclusion is that (CT) as a definition functions only as an exposition of intuitions and thereby is not a definition in the proper sense. Ad (D). The concept of rational reconstruction is closely related to the notion of explication introduced by Carnap in the following way (see Carnap [1952, pp. 3–8; the quotation is taken from p. 3 and 7]; the emphases follow the original)11 : The task of explication consists in transforming a given more or less inexact concept into an exact one or, rather, in replacing the first by the second. We call the given concept (or the term used for it) the explicandum, and the exact concept proposed to take the place of the first (or the term proposed for it) the explicatum. The explication belongs to everyday language or to a previous stage in the development of scientific language. The explication must be given by explicit rules for its use, for example, by a definition which incorporates it into a well-constructed system of scientific either logimathematical or empirical concepts. [...] the task of explication may be characterized as follows. If a concept is given as explicandum, the task consists in finding another concept as its explicatum which fulfils the following requirements to a sufficient degree: 1. The explicatum is to be similar to the explicandum in such a way that, in most cases in which the explicandum has so far been used, the explicatum can be used; however, 11 We prefer to speak about explication than rational reconstruction. The second concept is used by Mendelson (see [1990, p. 229]) and Schulz (see [1997, p. 182]).
322
Roman Murawski, Jan Woleński close similarity is not required, and considerable differences are permitted. 2. The characterization of the explicatum, that is, rules for its use (for instance, in the form of a definition), is to be given in an exact form, so as to introduce the explicatum into a well-connected system of scientific concepts. 3. The explicatum is to be a fruitful concept, that is, useful for the formulation of many universal statements (empirical laws in the case of a nonlogical concept, logical theorems in the case of a logical concept). 4. The explicatum should be as simple as possible; this means as simple as the more important requirements (1), (2), and (3) permit.12
Since we can admit that the idea of calculable function needs to be explained, (CT) conforms to the task of explication in Carnap’s sense and we can take the concept of calculable function as the explicandum, and the notion of recursive function as the proposed explicatum for the former. Clearly, it leads to replace an intuitive and not quite exact concept by a fully legitimate mathematical category. Doubtless, the requirements 2. and 3. are satisfied, because the rules for the explicatum are given in an exact form and it expresses a concept which is useful for the formulation of many universal theorems. Since the explicatum replaces the explicandum in all known applications, the most important part of 1. is fulfilled. This similarity of the explicatum to the explicandum and the simplicity of the former might be problematic, but this question does not need to worry users of (CT). A clear advantage of the method of explication consists in its neutrality with respect to the nominal/real distinction. Independently of whether we take the concept or the term as an explicandum (explicatum) the result is the same. In our case, it means that (CT) explicates the notion of calculable function as well as the term “calculable function”. Although treating (CT) as an explication appears to us essentially better than (A)-(C), we consider it to be somewhat unsatisfactory. Our reservations do not concern the clear vagueness of 1. and 4. in Carnap’s formulation. In fact, one should postulate similarity and simplicity in question, although no exact measure is possible 12
Carnap gives i.a. the following examples of this procedure: the explication of “warm” (explicandum) by “having such and such temperature” (explicatum) and “logical probability” (explicandum) by “the degree of confirmation” (explicatum).
The Status of Church’s Thesis
323
here. In order to formulate our position let us return to the use of “calculable function”. As we already noted we do not agree that it belongs to the vocabulary of a colloquial speech. However, there is more to say. As we observed at the beginning of this paper, the concept of computability has a modal parameter, expressed by phrases, like “there exists a method” or “a method is possible”. Although such notions always raise doubts concerning their content, boundaries, etc., it would be incorrect to say that they have no standard meaning, particularly if they belong to specialized languages. This matter can be explained by using Ryle’s analysis of the adjective “ordinary” (see Ryle [1953]). He distinguishes two expressions, namely (a) the use of ordinary language; (b) the ordinary use of expressions, and observes that “ordinary” has different meaning in (a) and (b). He writes [pp. 302–304]: When people speak of the use of ordinary language, the word “ordinary” is in implicit or explicit contrast with “‘out-of-theway’, ‘esoteric’, ‘technical’, ‘poetical’, ‘notational’ or, sometimes, ‘archaic’. ‘Ordinary’ means’ ‘common’, ‘current’, ‘colloquial’, ‘vernacular ‘, ‘natural’, ‘prosaic’, ‘non-notational’, ‘on the tongue of Everyman’, and usually in contrast with dictions which only a few people know how to use, such as the technical terms, or artificial symbolisms of lawyers, theologians, economists, philosophers, cartographers, mathematicians, symbolic logicians and players of Royal Tennis. There is no sharp boundary between ‘common’ and ‘uncommon’, ‘technical’ and ‘untechnical’ or ‘old-fashioned’ and ‘current’. [...]. But in other phrase, ‘the ordinary use of the expression “...”’, ‘ordinary’ is not in contrast with ‘esoteric’, ‘archaic’ or ‘specialist’, etc. It is in contrast with ‘non-stock’ or ‘standard’. [...]. If a term is a highly technical term, most people will not know its stock use or, a fortiori, any non-stock use of it either, if it has any. [...]. A philosopher who maintained that certain philosophical questions are questions about the ordinary or stock uses of certain expressions would not therefore be committing himself to the view that they are questions about the uses of ordinary or colloquial expressions. He could admit that the noun ‘infinitesimals’ is not on the lips of Everyman and still maintain that Berkeley was examining the ordinary or stock use of ‘infinitesimals’, namely the standard way, if
324
Roman Murawski, Jan Woleński not the only way, in which this word was employed by mathematical specialists. Berkeley was not examining the use of a colloquial word; he was examining the regular or standard use of a relatively esoteric word. We are not contradicting ourselves if we say that he was examining the ordinary use of an unordinary expression.13
It is fairly problematic how to draw a borderline between the functioning of “computable” or “calculable” as common adjectives and situations in which they belong to a very specialized mathematical vocabulary. On the other hand, Church examined the standard (stock) use of the term “computable function” in the language of informal mathematics.14 However, this does not mean that the ordinary (standard) use suffices for some special tasks. If this becomes the case, the standard use of an expression requires a move toward making it more precise and its explication appears as profitable or even necessary in some circumstances.15 We think that explications of the ordinary (standard) use of expressions consist in their normalization by exact conceptual means drawn from well-established theories. Normalizations in this sense usually cohere with definite intuitions and this protects them from being arbitrary, but the conditions for their correctness remain partially open. Now it should be clear why Carnap’s formulations of constraints 1.–4. employing phrases like “close similarity” or “as simple as possible” are subjected to various and mutually conflicting interpretations. Although Carnap himself was very optimistic about the effectivity of the method of explication, a modest position appears to be more justified. In general intuitive properties, like computability, cannot exactly coincide with 13
To prevent possible misunderstandings, we note that we do not subscribe to Ryle’s metaphilosophy on which most philosophical questions concern the ordinary (standard) use of expressions belonging to ordinary (common) language. 14 In fact, all of Mendelson’s examples (see above), perhaps except Tarski’s truth-definition, are of the same character. 15 The reasons for that cannot be given in advance. Avoiding paradoxes, building theories, needs to be more precise, etc. belong to typical causes. Although predictions are difficult here, scientific (or even colloquial) practice has decisive significance. However, we do not agree, even very strongly, with the following opinion (Kalm´ ar [1959, p. 79]): “There are pre-mathematical concepts which must remain pre-mathematical ones, for they cannot permit any restriction imposed by an exact mathematical definition. Among these belong, I am convinced, such concepts as that of effective calculability, or of solvability [...].” On the contrary, we are convinced that ‘must’ is entirely unjustified in this context.
The Status of Church’s Thesis
325
precise ones, like recursiveness or its formal equivalents. And this must be always remembered when we adopt Carnapian requirements as proper for the practice of normalization.16 In the case of (CT), we should also note that it replaces a modal property of functions by one that is non-modal and is definable in arithmetic. This feature of (CT) appeals to an intuition that Peano arithmetic and set theory accurately reflect the possibilities of computation. We are inclined to consider eliminating modalities for purely extensional concepts as a very typical aspect of normalizations.17 Understanding (CT) as a result of normalization explains why it is sometimes considered as problematic and why there is no hope for changing this situation.18 We now pass to the second issue concerning the status of (CT), namely its character as a sentence. The choice of the position from the variety (A)-(D) determines the direction of analysis of how to solve the second subproblem to some extent. If one selects the position (A) as proper, (CT) must be understood as a genuine empirical statement, being a subject of confirmation or disconfirmation by empirical data as any other sentence of this type. The choice of (B) qualifies (CT) as a formula expressing a mathematical theorem and the further course depends on how such entities are conceived (analytic, tautologous, synthetic a priori, empirical, etc.). Similarly, (C) is associated with a view concerning the status of definitions. The matter looks differently for various kinds of definitions: reported, designer, regulative, real or nominal. Since we adopted (D) as a legitimate position, we only very roughly outline possible ways related to (A)-(C). As far as the matter concerns (D), we will defend the view that (CT) is an analytic sentence of a sort, as well as a priori. The proposed solution requires a closer treatment of analyticity and aprioricity (we follow Woleński [2004], although we restrict our considerations to mathematics). We begin with the former category. The crucial conceptual point consists in introducing two distinctions 16
Just this character of normalizations causes typical conditions of the correctness of definitions not to apply to the former. Although we can speak about definitions in a wider sense, which also covers normalizations, the differences here are essential. 17 This view is completely independent of well-known arguments pro and contra modal scepticism, inspired by Quine. See Shapiro [1993] for a discusion of this matter in the context of (CT). 18 Although we deal with normalizations in science, we can find examples of them in simple situations taken from the Everyman perspective. In fact, every regulative definition, for example, “being an adult” consists in a normalization.
326
Roman Murawski, Jan Woleński
directed to the concept of analyticity. Firstly, we distinguish analytic sentences in an absolute and a relative sense, and both of these in a semantic and a syntactic understanding. Absolute analytic sentences in the semantic are those which are derivable from the empty set of sentences. By the completeness theorem, they can be identified with theorems (= validities) of first-order logic, that is, formulas true in all models. The absolute analytic sentences in syntactic sense are those which are absolute in the semantic sense and belong to a set for which the decision problem has a positive solution (this category is not essential here). Having logic (= the set of first-order logical truths) well-defined, we say that A is relatively semantic analytic in a theory T if and only if it is true in all models of T; A is relatively syntactic analytic in T if and only if it is relatively semantic analytic in T and belongs to a decidable subset of T. Now we add analytic sentences in the pragmatic sense. Assume that T has standard as well as non-standard models. We say that A is an analytic sentence in the pragmatic sense in a theory T if and only if it is true in standard (intended) models of T. The label “in the pragmatic sense” stresses the fact that there are no purely general semantic criteria of standardness. If we qualify some models as standard, we appeal to certain intuitions; the adjective “intended” well displays this situation. Every analytic sentence in the pragmatic sense is relative, but we have inclusions inside the class of analyticities: absolute syntactic sentences are a subset of absolute semantic sentences; absolute semantic sentences are a subset of relative sentences, for any theory T (logic belongs to every set of sentences closed by logical consequence); absolute analytic sentences are a subset of analyticities in the pragmatic sense. Truths associated with the standard/non-standard distinction appear as perhaps the clearest kind of analytic sentences in the pragmatics sense. However, we have also another source of pragmatic analyticity, namely definitions. If a definition functions in a theory, its role as a generator of analytic sentences in the pragmatic sense is obvious. Observe that every definition occurs in some context, for example, legal or even colloquial. If we define “being adult” as “being a person who has finished 18 years”, we introduce a pragmatic analytic sentence. The qualification “pragmatic” is justified for being related to an intuition, for instance, that an adult should be conscious of his or
The Status of Church’s Thesis
327
her actions and their results, but this expectation is fulfilled by the average person who has finished 18 years. Explications and normalizations in the above sense can be considered as a further source of pragmatic analyticity. This immediately leads us to (CT) and its status. Since (CT) occurs in the language of informal mathematics (more strictly: metamathematics), its context is fairly definite. We can say that (CT) functions in a conceptual scheme of mathematics and equates two concepts within this body. One notion, namely that of computable (calculable) function, is informal and intuitive, but the second, namely that of recursive function, is a component of the relatively semantic analyticities of arithmetic. Now (CT) as an equivalence normalizes the use (or meaning) of “computable function” in informal metamathematics by equating it with the use of “recursive functions”. This move is governed by intuitions concerning the idea of computability and related to the identification in question. Although the resulting sentence, namely (CT), is not a tautology, it can be considered as an analytic sentence in the pragmatic sense. Such sentences are always surrounded by some empirical data, which motivate or strengthen the intuitions. In the case of (CT), the equivalence of many and various formal normalizations of the concept of computable function controls our initial intuitions as sufficiently correct, but it does not exclude that they could be revised. However, until a given normalization is accepted, (CT) should be considered as unconditionally true. We stress this point because there are views denying (CT) as a sentence with a definite truth-value (see Shapiro [1993] for a critical discussion of this position). Remembering that A is always true (false) in a language (better: in a theory or at least in a conceptual scheme) should be enough to admit that (CT) is true, if accepted, even as a pragmatic analyticity. Is (CT) true a priori or a posteriori? The answer depends on how a priori (a posteriori) is understood. In the Kantian understanding, that is, when a priori truths are entirely independent of experience, pragmatic analyticities are synthetic a posteriori. However, this theory of the a priori is too absolute for our tasks. We need a more liberal view, just modeled on the use of “a priori” in probability theory. Recall that an a priori probability is one which is ascribed to an event before studying this event in order to establish whether and how it is probable in the light of collected data. A priori probability
328
Roman Murawski, Jan Woleński
can be established relatively to previously accumulated experience, for instance, when one assumes that the distribution of a statistical property in a population P is normal, although no investigation of P was performed. This approach suggests a distinction between the absolute and the relative a priori. Without deciding whether the absolute a priori occurs (we are inclined to think that only logic can serve as a example here), we admit that analytic sentences in the pragmatic sense are at the same time relative apriorities. Looking at normalizations from this perspective suggests that they also are simultaneously processes of apriorization. In fact, every conceptual system requires some elements accepted a priori for its cognitive stabilization. Consider (CT) once again. Mathematicians have various intuitions associated with computability and want to use them in a precise way. However, the concept of computable function is not enough in this respect; for example, it does not suffice to obtain results about decidability. Now formulating these results solely with the apparatus of recursive function theory does not suit mathematical intuitions (we do not evaluate, we only note the situation as we see it). Thus, mathematicians need a principle, a rule, etc. which would improve the situation. Even if it is formulated after collecting empirical data, the result consists in apriorization leading to the acceptance of (CT) as working before, that is, just relatively a priori, investigations concerning a definitely stated problem, for example, the decidability of predicate calculus. We stress that the presence of empirical data is not at odds with analyticity and apriority, provided that both are understood as relative.19
References Carnap, R. [1952], The Logical Foundations of Probability, University of Chicago Press, Chicago. Church, A. [1936], “An Unsolvable Problem of Elementary Number Theory”, American Journal of Mathematics 58, 345–363; repr. in [Davis 1965, pp. 89–107]. Davis, M. (ed.) [1965], The Undecidable: Basic Papers on Undecidable Propositions, Unsolvable Problems, and Computable Functions, Raven Press, Hewlett, N.Y. 19
We claim that a similar analysis concerns other famous proposals in metamathematics, for example, Tarski’s definition of truth and logical consequence.
The Status of Church’s Thesis
329
DeLong, H. [1970], A Profile of Mathematical Logic, Addison-Wesley, Reading, Mass. Gandy, R. [1988], “Confluence of Ideas in 1936”, in The Universal Turing Machine—A Half-Century Survey, (R. Herken ed.), Oxford University Press, New York, pp. 55–111. G¨odel, K. [1946], “Remarks Before the Princeton Bicentennial Conference on Problems in Mathematics”, first published in [Davis 1965, pp. 84–88]; repr. in K. G¨odel, Collected Works, (S. Feferman, et al. eds.), v. II: Publications 1938–1974, Oxford University Press, New York, pp. 150–153. Kalm´ar, L. [1959], “An Argument Against the Plausibility of Church’s Thesis”, in Constructivity in Mathematics, Proceedings of the Colloquium Held at Amsterdam 1957, (A. Heyting ed.), North-Holland Publ. Company, pp. 72–80. Kleene, S.C. [1952], Introduction to Metamathematics, Noordhoff, Groningen. Kleene, S.C. [1967], Mathematical Logic, John Wiley & Sons, New York. Kleene, S.C. [1987], “Reflections on Church’s Thesis”, Notre Dame Journal of Formal Logic 28, 490–498. Kreisel, G. [1970], “Church’s Thesis: a Kind of Reducibility Axiom for Constructive Mathematics”, in Intuitionism and Proof Theory, (J. Myhill, A. Kono, and R.E. Vesley eds.), North-Holland Publ. Comp., Amsterdam, pp. 121–150. McCarty, C. [1987], “Intuitionism and Computability”, Notre Dame Journal of Formal Logic 28, 536–580. Mendelson, E. [1964], Introduction to Mathematical Logic, Van Nostrand, New York. Mendelson, E. [1990], “Second Thoughts about Church’s Thesis and Mathematical Proofs”, The Journal of Philosophy 87, 225–233. Murawski, R. [1999], Recursive Functions and Metamathematics. Problems of Completeness and Decidability, G¨ odel’s Theorems, Kluwer Academic Publishers, Dordrecht. Murawski, R. [2004], “Church’s Thesis and Its Epistemological Status”, Annales UMCS Informatica, A12, 57–70. Post, E. [1965], “Absolutely Unsolvable Problems and Relatively Undecidable Propositions—Account of an Anticipation”, The
330
Roman Murawski, Jan Woleński
Undecidable: Basic Papers on Undecidable Propositions, Unsolvable Problems, and Computable Functions, (M. Davis ed.), Raven Press, New York, pp. 340–433; repr. in E. Post, Solvability, Provability, Definability: Collected Works, (M. Davis ed.), Birkh¨ auser, Basel 1994, pp. 375–441. Rogers, H. [1987], Theory of Recursive Functions and Effective Computability, McGraw-Hill, New York. Ryle, G. [1953], “Ordinary Language”, The Philosophical Review, XLII, 252–271; repr. in G. Ryle, Collected Essays 1929–1968, v. 2, Thoemmes, Bristol, pp. 301–319. Schulz, K.-D. [1997], Die These von Church. Zur erkenntnistheoretischen und sprachphilosophischen Bedeutung der Recursionstheorie, Peter Lang, Frankfurt am Main. Shapiro, S. [1983], “Understanding Church’s Thesis”, Journal of Philosophical Logic 10, 353–365. Shapiro, S. [1993], “Understanding Church’s Thesis, Again”, Acta Analytica, 59–77. Smullyan, R. [1993], Recursion Theory for Metamathematicians, Oxford University Press, Oxford. Webb, J. [1980], Mechanism, Mentalism, and Metamathematics, D. Reidel Publishing Company, Dordrecht. Woleński, J. [2004], “Analytic vs. Synthetic and Apriori vs. A Posteriori”, in Handbook of Epistemology, (I. Niiniluoto, M. Sintonen, and J. Woleński eds.), Kluwer Academic Publishers, Dodrecht, pp. 781–839.
Jerzy Mycka∗
Analog Computation and Church’s Thesis The main purpose of this paper is to present fundamentals of analog computation by means of real recursive functions and to point out their connections with Church’s thesis. We recall some models of analog computations (including these allowing to perform Turing uncomputable tasks). Then we support the suggestions that hypercomputable capabilities of such systems can be explained by the use of infinite limits. Finally, a short discussion of Church’s thesis and its modification in this context is presented.
Preliminaries Alan Turing clarified the notion of algorithm giving it a precise meaning as well as introduced a coherent framework for discrete computation. In a short time, the new results showing the relations of his model with other approaches, such as recursive functions or Church’s λ-calculus (for information about this subject see [Odifreddi]), gave consistent theoretical basis to the standard computation theory. However, all these models used enumerable domains and treated time of computations as discrete. Nevertheless, computers need not have such qualities. Analog computers with the continuous internal states rather than discrete, as in digital computation, were invented and discussed quite thoroughly. Unfortunately, because of the problem of a coherent theoretical basis for analog computation and the fact that analog computers technology did not improved in the second half of the last century analog computation was about to be forgotten. But last years for ∗
J. Mycka, Institute of Mathematics, University of Maria Curie–Skłodowska, Lublin, Poland, . This paper is based on the previously published articles [Mycka 2004], [Mycka in pr.] of the same author.
332
Jerzy Mycka
many reasons (new paradigms of computation, search for good tools for numerical analysis, new technologies) this situation seems to be changing. The basic model in the field of computation with continuous time is Shannon’s General Purpose Analog Computer [Shannon]. This was defined as a mathematical model of an analog device, the Differential Analyzer, the fundamental principles of which were described by Lord Kelvin in 1876 [Thomson]. The Differential Analyzer was developed at MIT under the supervision of Vannevar Bush and was indeed built in 1931. The Differential Analyzer input was the rotation of one or more drive shafts and its output was the rotation of one or more output shafts. From the early 1940’s, the differential analyzers at Manchester, Philadelphia, Boston, Oslo and Gothenburg, among others, were used to solve problems in engineering, atomic theory, astrophysics, and ballistics, until they were dismantled in the 1950s and 1960s following the advent of electronic analog computers and digital computers [Bowles], [Holst]. The General Purpose Analog Computer (GPAC) is a computer whose computation evolves in continuous time. The outputs are generated from the inputs by means of dependences defined by a finite directed graph (not necessarily acyclic) where each node is one of the following boxes. • Integrator : a two-input, one-output unit with a setting for initial condition. If the inputs are unary functions R t u, v, then the output is the Riemann–Stieltjes integral λt. t0 u(x)dv(x) + a, where a and t0 are real constants defined by the initial settings of the integrator. • Constant multiplier : a one-input, one-output unit associated to a real number. If u is the input of a constant multiplier associated to the real number k, then the output is ku. • Adder : a two-input, one-output unit. If u and v are the inputs, then the output is u + v. • Multiplier : a two-input, one-output unit. If u and v are the inputs, then the output is uv. • Constant function: a zero-input, one-output unit. The value of the output is always 1.
333
Analog Computation and Church’s Thesis
u
×k
u v
ku
×
1
uv
A multiplier unit u v
R
u+v
An adder unit
A real constant multiplier unit associated to the value k
u v
+
1
A constant function unit
R λt. tt u(x)dv(x) 0
An integrator unit Representations of different types of units in a GPAC.
Although the above notion of GPAC seems fairly intuitive and natural, the accepted definition is due to [Pour–El]. Let us now present a precise version of her definition for functions of one variable. In the following, I will denote a closed bounded interval with nonempty interior. Definition 1 (Pour–El) The unary function y is generated by a GPAC on I if there exist a set of unary functions y1 , . . . , yn and a set of initial conditions yi (a) = yi∗ , i = 1, . . . , n, where a ∈ I, such that: 1. y¯ = (y1 , . . . , yn ) is the unique solution on I of a system of ODEs of the form A(x, y¯)
d¯ y = b(x, y¯) dx
(2)
satisfying the initial conditions, where A(x, y¯) and b(x, y¯) are n×n and n×1 matrices, respectively. Furthermore, each entry of A and b must be linear in 1, x, y1 , . . . , yn . 2. For some 1 ≤ i ≤ n, y = yi on I. 3. (a, y1∗ , . . . , yn∗ ) has a domain of generation with respect to the above equation, i.e., there are closed intervals J0 , J1 , . . . , Jn (with non-empty interiors) such that (a, y1∗ , . . . , yn∗ ) is an interior point of J0 × J1 × . . . × Jn and, furthermore, whenever (b, z1∗ , . . . , zn∗ ) ∈ J0 × J1 × . . . × Jn , there exist unary functions
334
Jerzy Mycka
z1 , . . . , zn such that: (i) zi (b) = zi∗ for i = 1, . . . , n; (ii) (z1 , . . . , zn ) satisfy the equation (1) on some interval I ∗ with non-empty interior such that b ∈ I ∗ ; (iii) (z1 , . . . , zn ) is unique on I ∗ . The existence of a domain of generation indicates that the solution of the above equation remains unique for sufficiently small changes on the initial conditions. Let us recall that a function f (x) is differentially algebraic (see [Rubel 1988]) if this function and its derivatives satisfy a polynomial equation P (x, f (x), . . . , f (k) (x)) = 0 for some polynomial with rational coefficients. A function of several variables is differentially algebraic if it is a differentially algebraic function of each variable when the others are fixed. Provided with the above definition, Pour–El shows the following result: Proposition 2 (Pour–El) If y is generable on I by a GPAC, then there is a closed subinterval I 0 ⊆ I with non-empty interior such that on I 0 , y is differentially algebraic. Another important model of analog computation is Rubel’s Extended Analog Computer (EAC) [Rubel 1993]. This model is similar to the GPAC, but we allow other types of units and several independent variables because Rubel does not seek any equivalence with existing models. The new units add an extended computational power relatively to the GPAC. However, Rubel stresses that the EAC is a conceptual computer and that it is not known whether it can be realized by actual physical, chemical or biological devices. It is not even known whether it can compute all analytic functions, in which case it would be too broad to be interesting as a model of computation. The EAC works on a hierarchy of levels, getting more versatile as one goes to higher levels in the hierarchy. At the lowest level 0, it produces and manipulates real polynomials of any finite number of real variables. The EAC has no inputs from outside, but it has a finite number of settings (arbitrary real numbers). The settings determine behavior of the machine from the first level through all subsequent levels. The EAC has outputs and we demand them to be real analytic. The outputs at level n − 1 can be used as inputs at level n. The EAC satisfies the condition that when inputs at some
Analog Computation and Church’s Thesis
335
level are modified by small errors then the outputs differ from the original output only by a small amount on each compact set. At each level, the actual computing is done by some boxes of different kinds. First, there are the constant boxes, which produce arbitrary real constants. There are projection boxes, which produce any of variables x1 , . . . , xn . Later, there are adders, multipliers and substituters (for given v, u1 , . . . , uk they give v(u1 , . . . , uk ) as the output). Moreover, there are inverters, which for given f1 , . . . , fk : Rn+k → R find real analytic functions y1 , . . . , yk of x1 , . . . , xn such that for all 1 ≤ i ≤ k we have fi (x1 , . . . , xn , y1 , . . . , yk ) = 0. We have differentiators, that for any function produce desirable mixed derivatives. Certain sets in Euclidean space are produced by the machine for a given function f , namely {(x1 , . . . , xn ) : f (x1 , . . . , xn ) > 0} or {(x1 , . . . , xn ) : f (x1 , . . . , xn ) ≥ 0}. Then there are the analytic continuation boxes, which produce a unique analytic continuation for given function and domain. The quintessential box is the boundary-value problem box, which solves a finite set of partial and ordinal differential equations on the given set with certain bound and boundary values prescribed. For some given function f and set Ω with the given boundary we can also use restricted limit box, which defines φ(x) = limy→x,y∈Ω f (y). Using the results from [Rubel 1988], [Rubel 1993] we can formulate the following statement. Proposition 3 (Rubel 1993) The set of GPAC-computable functions is a proper subset of the set of EAC-computable functions. For example, the EAC can generate the Γ and ζ functions (any GPAC cannot solve these problems). A new approach was given by C. Moore in 1996. In the work [Moore] he defined a set of (vector-valued) functions on the reals (called R-recursive functions) in the analogous way to the classical recursive functions on the natural numbers. His model has also a continuous time of computation (a continuous integration instead of a discrete recursion). Moore’s seminal paper gave rise to a further development in real recursive function theory (in spite of some problems with his definition and results). In [Mycka & Costa] one can find the definition, which is a derivative of Moore’s original formulation, introduced to avoid problems
336
Jerzy Mycka
involved in the latter. It is important to see that the following definition is based on the vector operations. Definition 4 (Mycka & Costa) The set of real recursive vectors is generated from the real recursive scalars 0, 1, −1 and the real recursive projections Ini (x1 , . . . , xn ) = xi , 1 ≤ i ≤ n, n > 0, by the operators: 1. composition: if f is a real recursive vector with n k-ary components and g is a real recursive vector with k m-ary components, then the vector with n m-ary components (1 ≤ i ≤ n) λx1 . . . xm .fi (g1 (x1 , . . . , xm ), . . . , gk (x1 , . . . , xm )) is real recursive. 2. differential recursion: if f is a real recursive vector with n k-ary components and g is a real recursive vector with n (k + n + 1)ary components, then the vector h of n (k +1)-ary components, which is the solution of the Cauchy problem for 1 ≤ i ≤ n hi (x1 , . . . , xk , 0) = fi (x1 , . . . , xk ), ∂y hi (x1 , . . . , xk , y) = gi (x1 , . . . , xk , y, h1 (x1 , . . . , xk , y), . . . , hn (x1 , . . . , xk , y)) is real recursive whenever h is of the class C 1 on the largest interval containing 0 in which a unique solution exists. 3. infinite limits: if f is a real recursive vector with n (k + 1)-ary components, then the vectors h, h0 , h00 with n k-ary components (1 ≤ i ≤ n) hi (x1 , . . . , xk ) = lim fi (x1 , . . . , xk , y), y→∞
h0i (x1 , . . . , xk ) = lim inf fi (x1 , . . . , xk , y), y→∞
h00i (x1 , . . . , xk ) = lim sup fi (x1 , . . . , xk , y), y→∞
are real recursive in the domain containing these points, where these limits exist for all 1 ≤ i ≤ n.
Analog Computation and Church’s Thesis
337
4. Arbitrary real recursive vectors can be defined by assembling scalar real recursive components. 5. If f is a real recursive vector, then each of its components is a real recursive scalar function. From the physical point of view with such a definition of differential recursion we are ready to use only a finite amount of energy. The possibility of operations on undefined functions is excluded here: our functions are strict in the sense that for undefined arguments they are also undefined. But to obtain some interesting functions we should improve the power of this system by the use of the operators of infinite limits. Let us point out that introducing of infinite limits gives discontinuous functions in the set of real recursive functions.
Properties of Analog Computation by means of Real Recursive Functions Now we can consider a new problem. Are there different levels of difficulty in a computation if it goes in the analog way? The natural measure of a function difficulty can be joined with the degree of its (dis)continuity. The above considerations lead us to the conception of the η-hierarchy which describes the level of nesting limits in the definition of a given function. Syntactic descriptions of real recursive vectors are needed to formulate a precise definition. Some kind of symbols called basics descriptors are introduced for all basic real recursive functions. The combination of such descriptions for given real recursive functions will form a new description of another function. For basic functions we can propose: ijk is a k-ary description for projection Ikj for all 1 ≤ j ≤ k; 1k , ¯ 1k , 0k are k-ary descriptions for constants 1, −1, 0 used with k variables. We must also add operator symbols (descriptors) for all introduced operators: dr—for a differential recursion, c—for a composition, l, ls, li for a respective kind of limits (lim, lim sup, lim inf). Definition 5 (Mycka & Costa) The collection of descriptors of real recursive vectors is inductively defined as follows: • ijn is a n-ary description of Inj ,1 ≤ j ≤ n ∈ N ; 1n is a n-ary description of f (x1 , . . . , xn ) = 1; ¯1n is a n-ary de-
338
Jerzy Mycka
scription of f (x1 , . . . , xn ) = −1; 0n is a n-ary description of f (x1 , . . . , xn ) = 0; for all (x1 , . . . , xn ) ∈ Rn , n ∈ N ; • if hhi = hh1 , . . . , hm i is a k-ary description of the real recursive vector h and hgi = hg1 , . . . , gk i is a n-ary description of the real recursive vector g, then c(hhi, hgi) is a n-ary description of the composition of h and g; • if hhi = hh1 , . . . , hn i is a k-ary description of the real recursive vector h and hgi = hg1 , . . . , gn i is a (k + n + 1)-ary description of the real recursive vector g, then dr(hhi, hgi) is a (k + 1)-ary description of the function defined by differential recursion; • if hhi = hh1 , . . . , hm i is a (n + 1)-ary description of the real recursive vector h, then l(hhi), li(hhi), ls(hhi) is a nary description of an appropriate infinite limit (respectively lim, lim inf, lim sup) of h; • if hf1 i, . . . , hfm i are n-ary descriptions of real recursive k-ary scalar functions f1 , . . . , fm , then v(hf1 i, . . . , hfm i) is a k-ary description of the real recursive vector f = (f1 , . . . , fm ). At the moment we can define the η-number for a description of some real recursive function f . Definition 6 (Mycka & Costa) For a given n-ary description s of a vector function f let Eik (s) (the η-number with respect to i-th variable of the k-component) be defined as follows: 1. Ei1 (0n ) = Ei1 (1n ) = Ei1 (¯ 1n ) = 0; 2. Eim (c(hhi, hgi)) = max1≤j≤k (Ejm (hhi) + Eij (hgj i)), where h is a n components k-ary vector and g is a k-components m-ary vector; 3. for a differential recursion we distinguish two cases: • i ≤ k: Eij (dr(hf i, hgi)) = max(Ei1 (hf1 i), . . . , Ei1 (hfn i), Ei1 (hg1 i), . . . , Ei1 (hgn i), 1 (hg i), . . . , E 1 (hg i)) Ek+1 n 1 k+1
Analog Computation and Church’s Thesis
339
• i = k + 1: Eij (dr(hf i, hgi)) = 1 1 max0≤m≤n (max(Ek+m+1 (hg1 i), . . . , Ek+m+1 (hgn i))) where f is a n components k-ary vector and g is a n components (k + n + 1)-ary vector; 4. Eik (l(hhi)) = Eik (li(hhi)) = Eik (ls(hhi)) = k k max(Ei (hhi), En+1 (hhi)) + 1, where h is a k components (n + 1)-ary vector. The main idea of the above definition is to count nested limits in descriptions. In point (3) we should distinguish the case i = k + 1 (differential recursion is given with respect to this variable); in this case hf i is not important for the counting. For the n-ary description s of m components we can define now E(hhi) = maxk maxi Eik (hhi) for 1 ≤ i ≤ n, 1 ≤ k ≤ m. Now we can deal with the η-number for a real recursive function. Definition 7 (Mycka & Costa) For a given real recursive function f , let η(f ) be defined as the minimum of E(hf i) for all possible descriptions of the function f . We are ready to conclude with the definition of the η-hierarchy as a family of Hj = {f : η(f ) ≤ j}. It will be comfortable to think about the η-hierarchy as a measure of the difficulty of real recursive functions. If f ∈ Hj , then j nested limits is used to define f . However we can patch functions defined by infinite limits, so j can be seen as the number of nested (non-parallel) η needed to patch the function f to the total function. Hence, as another equivalent definition we can suggest the following: if f is a real recursive function, then E(f ) = j if at most j nested η operations are necessary to create ftotal such that ftotal is everywhere defined and if f (¯ x0 ) is defined, then ftotal (¯ x0 ) = f (¯ x0 ). Let us illustrate the above part of the text with some examples. Example 8 The functions +, ×, −, exp, sin, cos, λx. x1 , /, ln, λxy.xy are real recursive functions from H0 . The Kronecker δ function, the signum function, absolute value, the Heaviside function Θ, the binary maximum max, the square-wave function s and the floor function are in H1 . Let us give the examples of some functions which are significant in mathematics and can be expressed in terms of real recursiveness.
340
Jerzy Mycka
Example 9 The Bessel functions of the first kind Jv of order v (integer) are real recursive functions of the class H0 . The Euler’s Γ-function and the Riemann ζ-function are real recursive functions from the class H1 . R∞ Let us comment briefly Euler function: Γ(x) = 0 tx−1 exp(−t)dt. R s0 It is simple to observe Γ(x) = lims0 →∞ 0 sx−1 exp(−s)ds. Because R s0 sx−1 exp(−s) is a real recursive function and 0 sx−1 exp(−s)ds is in H0 hence Γ is in H1 . Proposition 10 (Mycka & Costa) The class of real recursive functions is a proper superset of the class of GPAC-computable functions. The above proposition is obvious considering our results that Γ Euler function and ζ Riemann function are real recursive functions and the result of Marion Pour–El [Pour–El] that these functions are not GPAC-computable. Let us add that if f is a (n + 1)-ary real recursive function, then its derivative ∂y f (x1 , . . . , xn , y) = limω→∞ ω(f (x1 , . . . , xn , y + 1 ω ) − f (x1 , . . . , xn , y)) is a real recursive function, whenever such a limit exists. For example, if we take λy.1/y then limω→∞ (1/(y + 1 1 1 2 ω ) − 1/y)ω = limω→∞ ω(y − y − ω )/[(y + ω )y] = −1/y is a real recursive function. Derivatives are physically realizable: the class of differential algebraic functions is closed under derivatives, making a large class of derivatives physically realizable. For the proper analysis of functions it is important to control the domain and singularities of functions. We can postulate new operators which may check the points: are they in the domain of some functions or not. Definition 11 (Mycka & Costa) For any function f : Rn+1 → R let ½ 1 if limy→∞ f (¯ x, y) exists, ηy f (¯ x, y) = 0 otherwise.
The definitions of ηyi and ηys are given by a replacement of lim by lim inf and lim sup, respectively, in the above equation.
Defined in this way ηy f (¯ x, y) is a characteristic function for the set of such x ¯ that limy→∞ f (¯ x, y) is well defined.1 Analogously, ηyi f (¯ x, y), 1 Whenever we say that lim, lim sup, lim inf are defined, we want to say that they belong to R.
Analog Computation and Church’s Thesis
341
ηys f (¯ x, y) play the same role respectively for lim inf y→∞ f (¯ x, y), lim supy→∞ f (¯ x, y). The problem arises whether such operators are real recursive operators. If the answer to this question is “yes”, we may patch any partial function defined in the above way to a total one. For example, let the function f be total and Ftotal (¯ x) = limy→∞ (ηy f (¯ x, y))f (¯ x, y), F (¯ x) = limy→∞ f (¯ x, y). The function Ftotal (¯ x) is total and has such a property that if F (¯ x) is defined, then Ftotal (¯ x) = F (¯ x). For points which are not in the domain of F we have Ftotal (¯ x) = 0. Proposition 12 (Mycka & Costa) The functions ηy g, ηyi g, ηys g are total real recursive functions if g is a total real recursive function. Now we can turn to some application of the η operator. We consider a possibility of a process of Turing machines simulation by real recursive functions. Let us consider a Turing machine given by the following description. It consists of an infinite tape for storing the input, output, and scratch working, and a finite set of internal states. All elements on a tape are strings. Without any loss of generality, we can choose some alphabet for these strings, the binary alphabet is a practical choice. The machine works in steps. At one step it scans the symbol from the current position of the tape (under the head of the machine), changes this symbol according to the current state of the machine and moves the position of the tape to the left or right with a transformation of state. Some states are distinguished as final, when the machine reaches one of them, then it stops. Our Turing machine model obeys to the classical constraints: (a) input is finite and (b) output is finite, regardless of the computation length. Proposition 13 There are real recursive functions from the class H1 , which can simulate any Turing machine, i.e. for a machine M there exists a real recursive function fM : R2 → R2 such that for the n (x, y) initial tape and state encoded into (x, y) the n-th iteration fM gives the codes of the tape and state after n steps of Turing machine. It can be mentioned that the process of simulation is especially important for universal Turing machines. The results in this area proved last years (e.g. [Rogozhin]) give us the interesting restrictions of the size of such machines (for example, there exists a universal
342
Jerzy Mycka
Turing machine for 5 states and 5 symbols) which leads us to a significant simplicity of the constructed function. It is worth pointing out that fM can be analytical (see [Koiran & Moore]). Let us signal a few important questions concerning Turing machines. The first problem is known as the halting problem: does the machine M for input (x, y) reach the final state? There is not a natural recursive characteristic function of this problem. But for real recursive functions we have the following result. Proposition 14 (Mycka & Costa) For any Turing machine M , there exists a real recursive function which is the characteristic function of the halting problem for M . To obtain the function computed by M , it is enough to iterate the steps up to reaching the final state by the machine. If the machine M ends in the final state for some tape (x, y), then there exists such n (x, y) is constant for n ≥ n . Then F n0 ∈ N that the sequence fM 0 M is defined whenever lim exists and the Turing machine M reaches for the initial tape (x, y) the final state, otherwise is undefined. Let us turn for a moment into the problems of computation beyond the power of Turing machines. The problem of infinity, which can appear in the sequel of not finishing computation, introduced problems into the computability theory and practice. The first step to improve this situation is directed to change the behavior of a Turing machine. For this purpose we may use an accelerated Turing machine (see [Copeland]). Its description is the same as for a standard Turing machine, but a temporal pattern of steps is given. Each subsequent step is performed in half the time of the step before. Such machines could complete an infinity of steps in two time units only. This feature of accelerated Turing machines gives us the power to puzzle out the halting problem by programming the following algorithm: mark the first square on the tape by 0, change it only in the final (last) step to 1, if after 2 time units we have 0 in the distinguished square, then machine does not halt, otherwise it halts. However, some difficulties arise also in this model. Let us imagine the machine changing value of one square from 1 to 0 and, conversely, in all steps using only one non-final internal state. We can hesitate what is on the tape after all steps (in infinity) because in this case the computation diverges. The accelerated Turing machine can be simulated in the same way as the standard Turing machine with only one modification: in the definition of FM (x, y) it is not necessary to
Analog Computation and Church’s Thesis
343
have the result (zx , zy ) with a final state i written in zx . Hence, the convergent infinite computations and finite computations both give the correct result, however the divergent computations have an undefined result. As we can observe, real recursive functions are quite a powerful tool of a description for this problem of infinite computation. The above remarks prove that η operator gives us an additional power to standard models of computation by controlling the domain of computable functions and machines. Such possibility is an effect of checking in a finite amount of time an infinite number of computation elements. At this moment we can generalize our considerations from the notion of Turing machine up to a wider characterization of computability. We will proceed with the relations of natural numbers taken from the arithmetical hierarchy. The class Σ00 = Π00 contains only such relations which have recursive characteristic functions, i.e. which can be computed by Turing machines. The upper stages of this hierarchy can be constructed from the lower ones in the following way: Σ0n+1 = {P : (∃P 0 ∈ Π0n )P (m) ¯ ≡ ∃sP 0 (m, ¯ s)}, Π0n+1 = {P : (∃P 0 ∈ Σ0n )P (m) ¯ ≡ ∀sP 0 (m, ¯ s)},
where P ⊆ N k , P 0 ⊆ N k+1 , k ≥ 1. To complete our hierarchies we can add the following equation ∆0n = Σ0n ∩ Π0n , n ≥ 0. The importance of the arithmetical hierarchy is connected with many fields. It can be observed as a kind of formal description of definiability (see [Odifreddi]). Its classes can be used to classify a complexity of mathematical notions (e.g. the definition of a limit of sequences is of Π03 class). From the computability theory point of view we can see the arithmetical hierarchy as the levels of natural functions (given by their graphs) which are different in quantity of infinite “while” loops necessary to their computation. Also linguistic problems of computer science can be expressed in terms of this hierarchy. The most known example is the one of the classes of recursive (Σ00 ) and recursive enumerable (Σ01 ) languages. In some sense we can also see this hierarchy as a measure of noncomputability of functions (or undecidability of relations). Especially, the halting problem is known to be Σ01 . We are interested in a correlation of this infinite hierarchy of sets and relations to the η-hierarchy. Going to infinity with n for the
344
Jerzy Mycka
n and using the fact that all recursive sets and relations function fM have Turing computable total characteristics, we get the following conclusion.
Corollary 15 (Mycka & Costa) Every recursive set or relation (with argument from N) is in H2 , i.e. Σ00 = Π00 ⊂ H2 . For the rest of the arithmetical hierarchy we have the following result. Proposition 16 (Mycka & Costa) The sets and relations from Σ0i , Π0i belong to Hi+2 for i ≥ 0. This fact does not mean that a better real recursive restriction for the arithmetical hierarchy does not exist. Let us analyze one aspect of the analytical hierarchy. This hierarchy exceeds strongly the arithmetical one. We can deal with especially important class Π11 . Let us mention that Π11 sets can be uniformized by sets of the same class. Moreover, for any n ∈ N the classes Σ0n and Π0n are subsets of Π11 . The class of Π11 relations is defined by a function quantifier used on an arithmetical relation: R ∈ Nk+1 is Π11 if R(¯ x, y) ≡ (∀f : N → N)Q(¯ x, f (y)), where Q is from some level of the arithmetical hierarchy. Proposition 17 Any relation R ∈ Π11 is in H6 .
Analog Computation versus Turing Machines Now, we introduce a simple concept—by an analogy with the recursive functions of Kleene, whenever a function is defined only with composition and differential recursion (f ∈ H0 ), we call f a primitive real recursive function (there is a slight difference since classical primitive recursive functions are always total and primitive real recursive functions can be partial). Proposition 18 (Mycka & Costa) Every primitive real recursive function f defined on the closed domain D ∈ Rn is GPACcomputable. However, let us point out that there are functions (like λx.|x| in the interval [−1, 1]), which are bounded with their derivatives
Analog Computation and Church’s Thesis
345
but they, or some of their derivatives, are not continuous (and not primitive real recursive ones). We can observe that the model of analog computation given by real recursive functions includes GPAC-computable functions. Proposition 19 (Mycka & Costa) Every GPAC-computable function with real recursive numbers as parameters is a real recursive function. Now let us do the first step to analyze when some functions generated by analog computation are beyond the Turing limits of computations. Of course, real computable functions described in the previous section are under this limit. But what is the situation of GPAC-computable functions? This question is additionally important because, as we can observe from the above propositions, GPAC-computable functions form the basic level of analog computations, which is a common part of EAC-computable functions and real recursive functions. The answer can be found in Rubel’s paper [Rubel 1989], where the following theorem (in the slightly different formulation) is proved. Proposition 20 (Rubel 1989) For any GPAC with a locally unique solution, which is analytical on an open interval I containing 0 with an analytical output f , if all the constants are rational numbers, then there is an open subinterval 0 ∈ J ⊂ I, on which f is digitally computable. In [Rubel 1989] the phrase “f is digitally computable” means that there is a uniform algorithm for producing polynomials pn with rational coefficients such that |pn (x) − f (x)| < n1 for all x ∈ J. Let us return to primitive real recursive functions. The basic functions (constants and projections) can be approximated by Turing machines. Of course, composition of two approximable functions is also approximable by Turing machines. Hence, the only one not obvious case is a definition by differential recursion. However, an integration used by this operator can be approximated by some kind of numerical integration with a control of a precision, which gives us a desirable solution. So, we have the corollary. Corollary 21 Every primitive real recursive function can be approximated by Turing machines.
346
Jerzy Mycka
The above results teach us that analog computation in some cases does not increase the computational power beyond Turing border. This is not sufficient (as it is sometimes suggested) to use real numbers to cross restrictions of Turing computability. Of course, the precise comparison of Turing and analog computability depends on a detailed analysis of all operations allowed in the discussed models. Now, let us try to enumerate some examples of hypercomputational properties of analog computation and to find the common elements. Moore in his paper [Moore] proves the theorem that the halting problem is solvable by R-recursive functions if we allow to use the zero-finding µ-operator. However, we can avoid the use of this, somehow unnatural for the analog world of mathematical analysis, operator. The good choice seems to be guaranteed by infinite limits. Proposition 22 (Mycka 2003) If f (¯ x, y) : Rn+1 → R is a real recursive function then the function g : Rn → R, g(¯ x) = µy f (¯ x, y) is real recursive too. Hence, we obtain a corollary that the halting problem can be solved by real recursive functions, when we use infinite limits in the solution. The same fact, by a different method, is given in our Proposition 14. Let us discuss briefly another analog device. It is an open problem whether EAC can be simulated by Turing machines (in general, by digital computers). However, this is a known fact that EAC is stronger that GPAC and it can compute some difficult functions (e.g. Γ-function, ζ-function). We can observe that these properties are connected with infinite limits, namely the proofs of EACcomputability for such functions use the mentioned operation. It would suggest that the power of EAC is strongly connected with limits incorporated in its structure. In the classical framework we can create uncomputable functions (relations) extending the set of Turing computable functions to the whole arithmetical hierarchy. As a consequence of the definition of this hierarchy we find unrestricted quantifiers as tools leading toward noncomputability. Hence, we need to analyze the method of quantifiers usage.
Analog Computation and Church’s Thesis
347
For every function f : Rn+1 → R we can construct such a real recursive function ρf : Rn → R that ½ 1 ∃y ∈ Nf (¯ x, y) = 0, ρf (¯ x) = 0 ∀y ∈ Nf (¯ x, y) 6= 0. To this effect we start with a description of the function fc (¯ x, y) = 1 − δ(f (¯ x, y)). This function has the following property fc (¯ x, y) = 1 ≡ f (¯ x, y) 6= 0, fc (¯ x, y) = 0 ≡ f (¯ x, y) = 0. It is easy to observe that now ½ z Y 0 ∃y ∈ Nf (¯ x, y) = 0, fc (¯ x, j) = lim z→∞ 1 ∀y ∈ Nf (¯ x, y) 6= 0. j=0
Qbzc Hence ρf (¯ x) = 1 − limz→∞ j=0 fc (¯ x, j). We should indicate two points. The first: real recursive functions are closed under the product operation. It can be defined as an iteration of the funcn+2 → Rn+2 , t (¯ tion x, yf (¯ x, i), i + 1), because f x, y, i) = (¯ Qn t f : R 2 (tn (¯ f (¯ x , i) = I x , 1, 0)). The second: let us analyze the stage 3 f i=0 of η-hierarchy which contains ρf (¯ x) if f ∈ Hi , where i ∈ N is a given number. The function Q fc is in Hi+1 and consequently by properties of an iteration nj=0 fc (¯ x, j) ∈ Hi+1 . Finally, we can claim that ρf ∈ Hi+2 . The same method is applicable to the general quantifier. In this way we can, once more, conclude that an uncomputable character of natural functions and relations given by quantifiers appear in the context which can be explained by real recursive functions with infinite limits. In the light of the above considerations, hypercomputation for analog devices has a strong connection with infinite limits. The use of infinite limits gives a possibility of an analysis for infinite number of cases in a finite amount of resources (time, space, energy). Therefore, at this stage of our work we can present the uncontroversial thesis: with infinite limits some hypercomputational properties are added to analog models of computation.
Church’s Thesis and Analog Computation The results concerning different computability models led to formulate Church’s thesis (compare [Copeland]). In this paper we understand this thesis in the way presented below.
348
Jerzy Mycka
Our informal notion of an effective computing is parallel to the precise mathematical definition of Turing machines. The conditions usually associated with efficiency are connected with a finite number of activity instructions (rules) which are represented by finite strings accompanied by an action (in a proper case) in a finite number of steps with a possibility (in general) of realizing the algorithm by a man without using intuition, creativity, or direct insight into the essence. Let us pay attention to what Church’s thesis does not say: it does not claim that other types of computing are not possible. It just points out that the calculations that are effective, are related to the above mentioned models. Let us notice that the whole process of computing described by Turing can be easily imagined as the work of a man who, by means of a sheet of paper and a pencil, realizes (thoughtlessly) consecutive changes of symbols according to strict rules. R. Soare [Soare] stressed the character of Turing’s computability, naming the computing subject “computor”, to emphasize the idealization of the human activity which was used here. An analysis of the machine potentials (physical systems) does not seem to be the Turing’s aim. In agreement with the previous remarks the construction of Turing machines encloses rather the possibilities of “an ideal mathematician” activity. The above considerations give us the conclusion that Church’s thesis does not have any special connection with analog computation. This is the result of the meaning of efficiency, which is used in the typical formulation of Church’s thesis and which is not satisfied by analog devices. However, the current situation of computability theory and practice seems to suggest a consideration of somehow similar thesis. Namely, physical devices are closer to our idea of computation than any kind of human activity. With respect to this observation some modification of Church’s thesis (called physical Church’s thesis— compare [Odifreddi]) can be proposed. Everything that can be computed by any physical process can be calculated by the Turing machine as well. Let us notice that, using the rule of contraposition in this statement, we obtain an equivalent formulation: a problem impossible to be solved by the Turing machine will not be computable by means of any physical process.
Analog Computation and Church’s Thesis
349
From the previous results it seems obvious that a rejection of the above statement demands to point out how some kind of infinite limits can be implemented or observed in the physical reality. Hence, for establishing the boundaries of practically realized models of computability, the nature of the material world becomes essential. It is important, however, to become aware of an obvious fact that we do not possess a direct knowledge of this quality of the Universe. That is why an analysis of its features and limits always takes place by means of physical theories. These theories become the only way to perceive the quantitative relations that occur in the physical world. Therefore we face the next boundary of our analysis. We cannot discuss ultimate boundaries of computability, but the limits of computability possibilities that result from a physical theory which is regarded as given. However it may appear that such a far-reaching claim, namely the postulate of realizing infinity (energy, time) in finite sector of physical reality is not acceptable to every physical theory. To weaken slightly the last sentence, it is possible to restrict the considerations at least to commonly approved (not particularly exotic) physical theories. It occurs, though, that the above assumptions are not true. Two examples of physical theories allowing hypercomputability will be presented at this stage. The first is the Newton mechanics. In 19th century P. Painlev´e together with H. Poincar´e proposed a particular analysis of an issue connected with the mechanics of heavenly bodies, that is the question of n-bodies. In the very issue of n-bodies a solution of an equation system of movements for n gravitational interacting bodies is sought. P. Painlev´e and H. Poincar´e opened a discussion not about the way to discover a particular solution, but about an analysis of qualities of these solutions. There is a crucial question whether there may exist such problem solutions that contain a singularity. The singularity as a solution has the quality such that its equation adopts infinite (not specified) values. It is obvious that a situation of this kind happens when two, from all the described by the problem, bodies collide. Yet, the question arises whether the singularity may appear without any collision. The answer to this question was given by Z. Xia [Xia 1992]. He claimed that for a problem of five bodies in the threedimensional space there exist non-collision solutions. The Xia answer causes throwing one of the bodies to infinity in finite time. As it can be seen, the Newton mechanics allows finite realizations of infinity
350
Jerzy Mycka
and potentially supports a possibility of calculations exceeding the limits of the Turing machine. An obvious aim that appears at this moment is relating similar considerations to the physical theories that are regarded as currently valid. For this purpose we will use the theory of general relativity. There are such solutions of Einstein’s equations in which there exist a time-like half-curve γ as well as a point p in spacetime such that the entire stretch γ is contained in the chronological past of p. Such spacetime structures (i.e. anti-de Sitter spacetimes) have been examined by physics with pointing out possible material systems that fulfill the required qualities (comp. [Etesi]). Moreover, the usage of such systems to create computing systems (see [Shagrir & Pitowsky]) have been proposed. Summing up, the next of the analyzed theories, the one which is regarded to be valid nowadays, allows conducting an infinite number of operations in limited time of some precisely chosen observer. The above results do not entitle us to accept the thesis that hypercomputability is possible in our world. They show, however, that the possibility of crossing the Turing machine barriers is, in the light of some physical theories, real. As we can observe, a new cognitive situation is introduced. The boundaries of computability become valid only for a stated physical theory. Moreover, they receive the provisional and temporary character. When a physical theory regarded as the proper description of the Universe changes, there may occur a change in computability boundaries. The theory of computability gained additionally a relative character, this time in relation towards the physical theory assumed as a starting point in a construction of computing systems. Also, it seems natural to devote more work of researches to the above given (or similar) physical Church’s thesis, which can be viewed as the main conjecture connecting the world of physics with computability theory. And in this investigation the role of analog computation and a place of infinite limits within it can appear significant.
References Bowles, M.D. [1996], “U.S. Technological Enthusiasm and British Technological Skepticism in the Age of the Analog Brain”, IEEE Annals of the History of Computing 18(4), 5–15.
Analog Computation and Church’s Thesis
351
Copeland, B.J. [1996], The Church–Turing Thesis, in Stanford Encyclopedia of Philosophy (J. Perry and E. Zalta eds.). Etesi, G. and N´emeti, I. [2002], “Non-Turing computations via Malament–Hogarth space-times”, Int. J. Theor. Phys. 41, 341–370. Holst, P.A. [1996], “Svein Rosseland and the Oslo Analyser”, IEEE Annals of the History of Computing 18(4), 16–26. Thomson, W. (Lord Kelvin) [1876], “On an Instrument for Calculating the Integral of the Product of Two Given Functions”, Proc. Royal Society of London, 24, pp. 266–268. Koiran, P. and Moore, C. [1999], “Closed-Form Analytic Maps in One or Two Dimensions can Simulate Turing Machines”, Theoretical Computer Science 210, 217–223. Moore, C. [1996], “Recursion Theory on the Reals and Continuous-Time Computation”, Theoretical Computer Science 162, 23–44. Mycka, J. [2003], “µ-Recursion and Infinite Limits”, Theoretical Computer Science 302, 123–133. Mycka, J. [2004], “Empirical Aspects of Computability Theory”, Studies in Grammar, Logic and Rhetoric 7(20), 57–67. Mycka, J. [in print], “Analog Computation beyond the Turing Limit”, Journal of Applied Mathematics and Computation. Mycka, J. and Costa, J.F. [2004], “Real Recursive Functions and their Hierarchy”, Journal of Complexity 20, 835–857. Odifreddi, P. [1989], Classical Recursion Theory, North-Holland. Pour–El, M.B. [1974], “Abstract Computability and Its Application to General Purpose Analog Computer”, Transactions Amer. Math. Soc. 199, 1–28. Rogozhin, Y. [1996], “Small Universal Turing Machines. Universal Machines and Computations”, Theoretical Computer Science 168(2), 215–240. Rubel, L.A. [1988], “Some Mathematical Limitations of the General-Purpose Analog Computer”, Advances in Applied Mathematics 9, 22–34. Rubel, L.A. [1989], “Digital Simulation of Analog Computation and Church’s Thesis”, Journal of Symbolic Logic 54(3), 1011–1017.
352
Jerzy Mycka
Rubel, L.A. [1993], “The Extended Analog Computer”, Advances in Applied Mathematics 14, 39–50. Shagrir, O. and Pitowsky, I. [2003], “Physical Hypercomputation and the Church–Turing Thesis”, Minds and Machines 13, 87–101. Shannon, C. [1941], “Mathematical Theory of the Differential Analyzer”, J. Math. Phys. MIT 20, 337–354. Soare, R. [1999], The History and Concept of Computability, in (E. Griffor ed.), Handbook of Computability Theory, Elsevier. Xia, Z. [1992], “The Existence of Noncollision Singularities in Newtonian systems”, The Annals of Mathematics 135(3), 411–468.
Piergiorgio Odifreddi∗
Kreisel’s Church Church’s thesis has, within logic, a similar function to dogmas and doctrines within the Church. The faithful get excited at the cost of being ridiculous to outsiders. [G. Kreisel 24.xii.92]
In Classical Recursion Theory [Odifreddi 1989] I dedicated 20 pages to a discussion of Church’s Thesis. It was Kreisel who alerted me at the subtleties of the subject, and at the insufficient treatments available in print. Suffering of a seemingly widespread illness, I understood only part of what he said or wrote; and even of what (I thought) I understood, I made use for my own purposes.1 However, his name in boldface in a number of places in that discussion may have created an illusion that I was reporting on his views: making him uncomfortable, and others confused. I intend to attempt such a report here, letting as much as possible Kreisel’s original words and formulations speak for themselves; in ∗
P.G. Odifreddi, Dipartimento di Matematica, Universit` a di Torino. As a related example of how things may come to be used in a (purposely) distorted way, I quoted on p. 2 of my book a parallel that Kreisel had made in §4 of [1985], between Euclid’s Book X (which classified irrationals by means of a notion of degree) and the theory of Turing degrees. His purpose was to draw attention to the fact that the shift from the classification of Book X to the measures of irrationality in modern number theory (based on diophantine approximations) required an absolute level of imagination, while nothing in (classical) Recursion Theory “approaches the philosophical detachment from the original set-up that was so essential for progress in the parallel from number theory”. My purpose was to claim that Recursion Theory is part of classical mathematics, and I thus presented the parallel as an item of the subject’s pedigree, claiming that “degrees were used for the purpose of a classification of reals already in Euclid’s Book X ”. Kreisel has whipped me more than once for “squandering a cute quotation”. 1
354
Piergiorgio Odifreddi
the hope not of satisfying him (an obviously impossible task), but of dispelling such a confusion. Since Kreisel’s range of interests (even in the limited area of concern to this article) is quite wide, the reader is advised to browse among the various topics, looking for ones of his or her own interest.
1. General Remarks Observations about general aspects of Church’s Thesis are scattered in Kreisel’s papers and reviews, especially in: • Analysis of the general concept of computation (1.(c) and 4.(c) of [1971]), • Principal distinctions (II.(a) of [1972]), • Church’s Thesis and the ideal of Informal Rigour ([1987]). The following unstructured selection somehow reflects the occasional character of those observations. 1.1. Informal Rigour
Kreisel states in [1987] that Church’s Thesis is a candidate for informal rigour, “a venerable [2000-year-old] ideal in the broad tradition of analysing precisely common notions or, as one sometimes says, notions implicit in common reasoning [at a given time]”. He repeatedly warns us, at the end of [1987], about the obvious risks involved in any such enterprise; specifically: • First, “there is no end in sight to the possibilities of coherent and imaginative analysis in the tradition of informal rigour”. What is in doubt is the adequacy of the common notions to be analysed, not only to the phenomena for which they are intended, but even to our practical knowledge of them. • Second, “even if the contributions [to informal rigour in general, and Church’s Thesis in particular] were more central than they are, the market would be limited by the background knowledge needed for more than an illusion of understanding. It is a hallmark of philosophical questions that they present themselves to those of us who do not have such knowledge (and even as not requiring any)”.
Kreisel’s Church
355
It is then of no surprise to know that Kreisel sees work about Church’s Thesis, and more generally informal rigour, as an answer to the question: How to talk in the face of ignorance? Specifically, an answer taught by philosophy to those not satisfied with the easiest answer: stay silent. Be that as it may, the common notion to be analysed here is of course effective computability, and work around it is a candidate not only for the pursuit of the ideal of informal rigour, but also for the examination of the pursuit itself. In other words, not only to show that informal rigour can be achieved, but also to discover if and how it can be used when achieved: and “it’s a sight more difficult to find any use for (the truth of) such a thesis than to decide its truth”. §2 of [1987] reminds us that there are two opposite principles in the foundational literature: on the one hand, “the words ‘essence of computation’ are a directive to look for one variant or, equivalently, to what is common to all [of them]”; on the other hand, “the familiar homily ‘it all depends’ (on situation, purposes, etc.) suggests the need for an endless array of variants”. Kreisel’s position, supported by experience in mathematics with relatively few so-called basic structures, is to look for “relatively few variants [that] could be adequate for relatively many situations”. 1.2. Variants of Church’s Thesis
Kreisel proposed in 2.7 of [1965] to consider the variants of Church’s Thesis obtained by specifying “effectively calculable” as: mechanically, constructively, humanly, and physically realizable. As noted in 2.(c).(iii).(β) of [1966], “at the time [of Church], one would have been prepared to regard effective, intuitionistic, constructive, mechanical, formal as equivalent when applied to rules! After all, less than ten years before Church formulated his thesis, von Neumann and Herbrand took it for granted that finitist and intuitionistic had the same meaning! But, perhaps, it would be historically more correct not to call it Church’s Thesis; for, once alerted to the difference between intuitionistic and mechanical rules, he would surely have formulated the thesis for the latter”. [1972] however points out that “it seems safe to say that the sensational aura around references in popular philosophy to Turing’s analysis or to Church’s Thesis reflect the—conscious or
356
Piergiorgio Odifreddi
unconscious—assumption that the humanly effective, not only the mechanically effective definitions are in question”. In any case, 2.715 of [1965] states that explicit support for Church’s Thesis exists only in the case of mechanically computable functions. Precisely, it “consists above all in the analysis of machinelike behavior and in a number of closure conditions, for example diagonalization”, for which Kreisel refers to the discussion in Kleene [1952]. 1.3. Equivalent Characterizations of Recursiveness
In 2.715 of [1965] one finds the first statement of a point that Kreisel will often repeat,2 asserting that support for Church’s Thesis is not to be found in empirical evidence such as the equivalence of different characterizations of recursiveness: “what excludes the case of a systematic error?” For comparison, he quotes “the overwhelming empirical support for: if an arithmetical identity is provable at all, it is provable in classical first order arithmetic; they all overlook the principle involved in, for example, consistency proofs”. In 1.(c).(i) of [1971] Kreisel stresses the fact that “the mathematics has here much the same role as in the natural sciences: to state rival hypotheses and to help one deduce from them a consequence, an experimentum crucis which distinguishes between them; one will try to avoid artefacts and systematic error. Equivalence results do not play a special role, simply because one good reason is better than 20 bad ones, which may be all equivalent because of systematic error”. A note adds that “the familiar emphasis on stability or equivalence results is not rooted in some kind of ‘common sense’, but in a positivistic philosophy of research which rejects the objectivity of [informal rigour]. An equivalence result allows one to act in accordance with this doctrine without formally adopting it: the result allows one to evade the issue (for the time being)”. In II.(a).(v) of [1972] the mantra is chanted again: “equivalence of different notions such as definability in λ-calculus or by Post rules is often said to provide evidence: evidence for what? Such equivalences may indeed provide evidence of some interest. But they cannot provide evidence for equivalence to a notion which is not among 2
Such repetitions were obviously needed, and should be contrasted with the repetitions of the argument of equivalence in practically every textbook of Recursion Theory, till this day.
Kreisel’s Church
357
those considered! And if the intended notion is explicitly included among the notions considered then there is no need for equivalence proofs—on the principle that one good reason is better than 20 bad ones”. Finally, in 1.2.1 of [1990a]: “the conventional ‘evidence’ neglects safeguards against systematic errors, and thus the axiom of experimental science (derived from experience, not only doctrine) that the most insidious errors are not at all random, but systematic”. 1.3.1. A Shift of Emphasis
§3 of [1987], while recalling “the curious ‘evidence’ provided by equivalence between various definitions, as if not every notion had many definitions”, introduces a new twist: “those totally absorbed in pursuing such equivalences do not ask whether the schemes are all equally sensible or equally silly. Less obviously, they do not ask whether the details that are left out in the matching are significant; enough to make one scheme practically superior, at least occasionally”. As adumbrated in the last quotation, in recent years Kreisel’s criticism has taken a positive side too, well expressed in 3.(b).(ii) of [1987a]: “the drivel about evidence for Church’s Thesis obscures a genuine virtue of having many equivalent definitions or, more simply, descriptions of the same notion (whether or not they define the originally intended matter). When solving problems about the notion, use can be made of knowledge of the different concepts involved in those descriptions. [...] It is an object of research to discover which description suits particular problems, even though it may well be that other descriptions tend to force themselves on us”. This shift of emphasis (from what equivalent descriptions have in common to their different potentials) is repeated in 1.2.1 of [1990a]: “those different schemes are not viewed as different, so-called informal analyses of a familiar notion, but are simply the logical aspects of different (mechanical) processors”. 1.4. Church’s Superthesis
Mere equivalence of characterizations hides a neglected aspect, for which Kreisel introduced in 4.(c).(i) of [1971] a special name. He called Church’s Superthesis a stronger version of Church’s Thesis,
358
Piergiorgio Odifreddi
in which one not only claims that certain mathematical tasks are equivalent to recursive ones, but rather that each such task is equal to some program for an idealized computer: “to each mechanical rule or algorithm is assigned a more or less specific programme, modulo trivial conversions, which can be seen to define the same computation process as the rule”. Kreisel has been attentive to positive evidence for the superthesis. In particular: • In [1972] he notes that Turing’s analysis of the notion of computability does establish a version of the superthesis, for the notion of mechanically computable function. • In [1972a] and §4 of [1987] he reports on work by Barendregt,3 establishing the superthesis for: reduction of terms in the λcalculus, execution of programs by Turing machines, and evaluations according to the computation diagrams for partial recursive functions given by Kleene. Thus “not only the classes of functions defined by the different familiar schemes are equal, but the definitions themselves match so as to preserve computation steps”. §5 of [1987] stresses the value of the emphasis on the superthesis in another direction. “Common sense says: If you want to find out about things, for example, processes, don’t hide them in black boxes! Try to look at them. Specifically, in connection with a refutation of Church’s Thesis, don’t rely on the off-chance of some process being grossly non-mechanical; so much that not even its effects that strike the eye, the so-called output, can be computed mechanically from the input.” 1.5. Turing’s Analysis
§3 of [1987] recalls that among the equivalent characterizations of recursiveness, the one in terms of Turing machines has a particular intrinsic value: “Turing’s description of computations, by the rules of his universal machine, is so vivid that it would establish a common notion together with its elementary properties even if it were not present before. [...] This constituted essential progress for informal 3
In the supplementary Part II of his unpublished dissertation, briefly discussed on p. 43 of the second edition of his book [1981].
Kreisel’s Church
359
rigour, and is not changed by the many defects of the notion”, some of which are at issue here. As noted in II.(a).(i) of [1972], the distinction between mechanically and humanly computable functions was clearly presupposed in Turing’s attempt [1936] to establish that “a machine can reproduce all steps that a human computer can perform”.4 However, G¨ odel [1972] noticed a problem in the details of Turing’s assumptions about distinguishable states of mind (which, however, does not invalidate his analysis of mechanical instructions). Precisely, Turing proposed a compactness argument to establish that the number of such states is finite, but according to G¨odel he disregarded the fact that mind develops, and thus that such number (though finite at any given moment) may tend to infinity. G¨odel’s remark is accepted by Kreisel in II.(a).(i) of [1972], Note 4.(c).(ii) of [1987a], and Appendix I of [1990]. In II.(a) of [1972] Kreisel finds that “an even more important error in Turing’s argument consists in a kind of petitio principii assuming that the basic relations between (finite) codes of mental states must themselves be mechanical”. While “in the case of (Turing) machines whose states are finite spatio-temporal configurations it is quite clear how to code states by natural numbers, [...] coding (mental) states of the human computer is a much more delicate matter”. In particular, “even if we assume a coding by finite configurations, [...] what is the arithmetic character of the relation (between codes) induced by meaningful relations between the mental states considered”? In a word, the problem here is that “the human computations are more ‘complicated’ or, better, more abstract than the objects on which they operate (our thoughts may be more complicated than the objects thought about)”. In contrast, “the mechanical computations and their arguments are on a par”. According to Note 4.(c).(iii) of [1987a], G¨odel’s view on Kreisel’s objection that the coding operators may be non-recursive, “was that we know so little about the details that only very simple assumptions can be convincing. But here, in contrast to his reaction to other, 4
In II.(b).(ii) of [1972] Kreisel notes how Turing’s introduction of progressions of formal systems (called by him ‘ordinal logics’) may be taken as showing that he did not take his own claim too seriously.
360
Piergiorgio Odifreddi
apparently comparable cases, [...] he rejected the thought that we may know too little for anything convincing”. 1.6. Perfect Fluids vs. Perfect Computers
In §3 of [1987] Kreisel reminds us that in the 19th century “not only geometric notions [such as area] were analysed with informal rigour, but also those belonging to the aptly named subject of rational mechanics, with notions of uneven scientific value, including the notoriously imperfect notion of perfect liquid”. Experience with such a notion was considered in §2 of [1985] as an object lesson, not to be forgotten in the context of perfect computers: “progress was made by shifts of emphasis away from the original context. The two dimensional motion of such liquids provides a valid description of—not merely, as is sometimes said, a metaphor for—the notion of function of complex variable. The latter is firmly established in mathematics, even used in parts of mathematical physics, but just not primarily in successful hydrodynamics”. The exceptionality of the idea of perfect fluid is spelled out by Kreisel in unpublished notes of 1989: • “First, by and large it is a very imperfect idealization compared to, say, celestial mechanics of the planets. There is no area of familiar experience of fluids where the neglected aspects—viscosity, compressibility, turbulence, etc.—are absent to a comparable degree as, say, friction and air resistance in outer space. • Secondly, the mathematical properties of that idealization belong to function theory, one of the jewels of mathematics. Specifically, the potentials and stream lines of these ideal motions are simply—given by—the real parts of functions of one complex variable. The same theorem is both one of the most useful mathematical tools and, applied to the idealization, one of its severest limitations. It is Cauchy’s Theorem (on the vanishing of integrals round closed curves), which implies, for such ideal flow, that a stream does not exert any drag of any cylinder.” In the words of §3 of [1987], the example of perfect fluids shows that, when a common notion has been shown to have a mathemat-
Kreisel’s Church
361
ical equivalent by means of informal rigour, “there is the possibility of discovering other areas, in pure mathematics or its applications, where the mathematical equivalent is suited to describing the facts”. Tacitly, in the case of recursiveness “other areas” means ones not directly connected to computability. 1.6.1. A First Success: Higman’s Theorem
§3 of [1987] states that, “as is (or should be) well known, the prototype of such a discovery is Higman’s answer to the question: Which finitely generated groups can be embedded in finitely presented groups? It is given in recursion-theoretic terms,5 and is a model of evidence for the use of a notion to tell us what we want to know about (groups)”. §6 adds that “a mere corollary to [Higman’s] positive answer is a finitely presented group with unsolvable word problem; in other words, something of concern to Church’s Thesis. So Higman’s answer shifts attention away from the latter. The answer, in terms of recursiveness, is tested by its contribution to the demands of group theory; not primarily by the validity of Church’s Thesis in any of its versions”. 1.7. Infinitistic Character of Recursiveness
Kreisel has noted in recent years that an obstacle to simpleminded applications of recursion theory to problems such as those discussed below (especially in Sections 3 and 4) lies in the infinitistic character of the notion of recursiveness, which makes all finite sequences of numbers automatically recursive. This was adumbrated in §6 of [1987], where he noted that “it is generally assumed that there [is no experimental consequence of the existence of irrational numbers], and it seems very plausible that there is no single measurement that could be interpreted to establish irrationality; or rationality, for that matter. For the record, I am not persuaded that (ir)rationality results have no experimental implications at all. [...] Be that as it may, problems of similar flavour 5
Namely: exactly the recursively presented ones, i.e. those for which the set of words equal to 1 is recursively enumerable, are embeddable.
362
Piergiorgio Odifreddi
come up with the two demarcations, between rational/irrational and computable/non-computable”. The suggestion becomes explicit in Appendix I.3 of [1990]: “a problem comes from the ordinary separation between observational knowledge and its theoretical interpretation(s): on the one hand, data of the observational kind are (hereditarily) finitely described; on the other hand, any such (necessarily finite) set of data is recursive. Evidently, only the most coarse-minded would conclude from this that the mathematical property [of being recursive] is without any scientific significance. An obvious question is: where, if anywhere, is such a significance? In other words, recursiveness is an infinitistic property, and so its interpretation is more demanding (in imagination)”. The point is reiterated in [1992]: “It is well known that infinitesimal properties like irrationality or (repeated) differentiability have no place in so-called phenomenological interpretations, that is those that strike the naked eye. Now, computability in the logical sense is quite coarsely infinitistic: every finite sequence of (hereditarily finite) data is computable in that sense. This does not exclude a physically suitable interpretation, for example, by reference to some appropriate micro-theory, but this matter is demanding”. For direction and comparison, Kreisel refers in Appendix I.3 of [1990] to experience with partial differential equations, where “conditions on solutions being once or twice differentiable are, often demonstrably, mathematically significant; most simply, for admitting or excluding a particular P.D.E. as (even) a candidate for a theory of (the aspects of) the phenomena considered”; for example, as noted in [1982], “the most visible features of many phenomena obeying the wave equation, such as caustics (images) in optics, occur only with weak solutions”, i.e. their second derivative exists but is not continuous. “But, again, every observational set of data is consistent with those conditions and also with their negation.”
2. Constructive Mathematics Church’s Thesis for constructive mathematics was discussed by Kreisel in: • Church’s Thesis (2.7 of [1965]),
Kreisel’s Church
363
• Church’s Thesis: a kind of reducibility axiom for constructive mathematics ([1970]), • Church’s Thesis for effective definitions of number theoretic functions (Part II of [1972]), • Laws of thought: this side of the pale (§5 of [1987]). In its simplest setting, it amounts to saying that “every constructive number theoretic function has an equivalent definition by means of a certain kind of computation procedure”.6 The reason to consider constructive mathematics is recalled in §5 of [1987]: “originally Church’s Thesis was intended and understood in the sense [that] effectiveness for the ideal mathematician was meant. The recursive undecidability results were advertized under the slogan: what mathematicians cannot do. [...] In view of how little is known about the outer limits of mathematical imagination, Church’s Thesis in its original sense is simply beyond the pale. If anything remotely like it is to be pursued, some shift of emphasis is required, [...] and the intuitionistic variant presents itself at least as a candidate. [...] The link with the common notion in question is the meaning of intuitionistic logic as originally explained by Brouwer and Heyting: in terms of mental constructions (of the ideal mathematician).” 2.1. Mechanical and Constructive Rules
Kreisel notes in §1 of [1970] that there is an issue here, since constructive and mechanical are not equivalent: “it is almost banal that we understand non-mechanical rules; on the contrary too detailed, that is ‘too’ mechanical rules only confuse the human computer.” This is sharpened in §7 of [1987]: “everyday experience of creative and mechanical thinking shows that the former is simply more congenial to us, less prone to errors, and accordingly more reliable; but also (perhaps disappointingly) P the latter can be more efficient. Thus, a modern computer sums 1≤n≤100 n more quickly—not more 6 Extensions to functions of higher types are also considered in [1965], with the role of the recursive functions variously played by Kreisel’s and Kleene’s continuous functionals, G¨ odel’s primitive recursive functionals, and Kleene’s recursive functionals. A different extension, to partial functions, is considered in [1972]. We don’t discuss these extensions here.
364
Piergiorgio Odifreddi
reliably—by routine addition than Gauss did at the age of 6 by use of a bright idea. (Computers do mechanical work more reliably than people.)” In 2.35 of [1965], elaborated in §1 and §2 of [1970], Kreisel goes as far as proposing the following as a specific example of a constructive but apparently non-mechanical function.7 Given a constructively valid formal system F for arithmetic, constructively enumerate its proofs, and associate to n: • 0 if either the conclusion of the n-th proof is not an existential assertion, or it is but the proof does not provide an explicit witness for it; • m + 1 if the conclusion of the n-th proof is an existential assertion, and the proof provides m as an explicit witness for it. Transformation of this (obviously constructive) function into an equivalent mechanical one encounters a number of obstacles: F does not necessarily have the so-called Constructive ∃-Rule (if an existential assertion is provable, then so is some of its numerical instances); even if it does, F does not necessarily admit recursive procedures that associate numerical witnesses to provable existential assertions; even if it does, the problem still remains of knowing whether one of such procedures (which are not all necessarily equivalent, in the sense of providing the same witnesses for the same provable existential assertions, unless the system has the so-called ∃-Stability 8 ) is equivalent to the function above, and if so which one.9 The main characteristic of this example is isolated in II.(a) of [1972] as “the passage between a formal derivation [...] and the corresponding mental act, namely the proof expressed by the derivation”. 7
Note 9 of [1970] criticizes Kalm´ ar [1959], an inconclusive paper whose title had attracted some attention, and that claimed to contain an example with similar properties. 8 Warning: the term “stability” in this context does not fit the usual meaning of stability w.r.t. (small) changes of data, and is instead applied to changes in interpretation. This kind of stability is made insignificant, from a proof-theoretical point of view, by the kind of instability (this time in the usual meaning of the word) discovered by Girard in proofs of a theorem by Van der Waerden, and reported in his book [Girard 1986]. 9 An example of a system with both the Constructive ∃-Rule and ∃-Stability is Heyting’s Arithmetic.
Kreisel’s Church
365
In particular, because of this reference to mental acts the definition above is not even meaningful from a set-theoretical standpoint! The example was shown in 1.(b) of [1971a] to be mechanically computable (for a large class of formal systems, including Heyting’s Arithmetic) by use of normalization techniques. But the mere fact that relatively advanced work was needed to answer the question establishes that the latter was genuinely problematic. 2.2. Formal Versions of Church’s Thesis
In 2.72 of [1965] Kreisel proposes two formal versions of Church’s Thesis, for systems for constructive mathematics: CT1 ∀f ∃e∀x∃z[T1 (e, x, z) ∧ f (x) = U (z)] CT2 ∀x∃yR(x, y) → ∃e∀x∃z[T1 (e, x, z) ∧ R(x, U (z))]. The former, suitable for second-order systems with functional variables (for lawlike functions), says directly (via the Normal Form Theorem) that every function is recursive. The latter, suitable also for first-order systems, expresses (via the axiom of choice, extracting a function from a ∀∃ form) the fact that every function is recursive. Naturally, there is no question of provability of CT1 or CT2 in usual systems for constructive mathematics, since in them the corresponding classical systems are interpretable, and in the latter CT1 and CT2 are false. 2.3. Consistency of Church’s Thesis
2.723 of [1965] states that both CT1 and CT2 are consistent with the systems for constructive mathematics considered there (intended for treatments of free choice sequences, generalized inductive definitions, and bar recursion), as well as with those in Kleene [1952]. This consistency result is extended in Kreisel and Troelstra [1970] to the theory of species of natural numbers (an intuitionistic analogue of classical analysis with the comprehension axiom and the axiom of dependent choices), and hence to a number of theories (including the ones just quoted) that can be modelled in it. As Note 9 of [1970] states, “the main purpose of consistency results is to help avoid fruitless lines of research, since our principal interest is the refutation of Church’s Thesis”: “consistency results
366
Piergiorgio Odifreddi
exclude even a ‘weak’ refutation in the sense of showing the absurdity of a proof, not only a ‘strong’ one in the sense of exhibiting a counterexample” (e.g. of the kind considered in 2.1 above). In the opposite direction, II.(c).(ii).(β) of [1972] notices that an inconsistency of CT1 or CT2 “would only mean the absurdity of assuming the existence of a proof, and it would not establish a counterexample”. In other words, inconsistency would only show that Church’s Thesis cannot be proved by methods in the system considered, but it would fall short of providing an example of a constructive function that is not recursive. 2.4. Validity of Church’s Thesis
The question of validity of CT1 and CT2 is posed in 2.75 of [1965], with the remark that “there is no reason why the question should not be decidable [in the negative] by means of evident axioms about constructive functions”, whose discovery is described as “one of the really important open problems” (and, in 2.(c).(iii).(β) of [1966], “one of the more feasible problems at the present time”). Obviously, one is not thinking here of axioms stated in the language of constructive mathematics, but justified by explicitly nonconstructive, or otherwise arbitrary interpretations. Here are three examples: • Spector’s bar recursive functionals.
2.(c).(iii).(β) of [1966] states, and 2.b.(iii) of [1971a] proves, that they are inconsistent with Church’s Thesis. But, “because of excessive extensionality conditions imposed on them, the contradiction is of little interest”.
• Various notions of choice or lawless sequence α.
As noted in 4.b of [1970], all known such notions naturally satisfy the negation of Church’s Thesis: “for dice α (or lawless sequences) you don’t expect to prove that successive values of α will follow a recursive, or for that matter, any law”.
• Brouwer’s thinking subject.
This is an analysis of mathematics into ω stages, and states that every set of natural numbers is constructively enumerable (over the natural numbers). The application one has in mind
Kreisel’s Church
367
here is to the set of all (numbers coding) constructive proofs, and hence to the possibility of enumerating such proofs constructively in an ω-ordering. As noted in 4.c of [1970], “the thinking (freely creating) subject will not convince himself that his (mathematical) behaviour is subject to any law”. The assumption of the thinking subject is actually provably inconsistent with Church’s Thesis.10 2.5. Church’s Thesis as a Reducibility Axiom
Kreisel points out in 2.74 of [1965] how the consistency results quoted above show that Church’s Thesis plays a somewhat similar role in intuitionistic mathematics as G¨odel’s constructible sets in set theory:11 “not only is consistent with the known axioms, but it can also be used to show the formal character of interesting open questions”. This is quoted not as a mere possibility in principle, but with explicit examples: in particular, the result that “the rules of intuitionistic predicate logic cannot be proved complete [w.r.t. the intended semantics] by any method consistent with Church’s Thesis” (a result sketched in [1962] and 2.741 of [1965], and fully proved in Technical Note I of [1970]12 ). In particular, as noted in §3 of [1970], this shows that “the notion of constructive validity of first-order formulas depends on problematic properties of the basic notion of constructive 10 The idea of the proof, due to Kripke and reported in Note 10 of [1970], is the following. If Church’s Thesis holds, every constructively enumerable set is recursively enumerable. But in constructive mathematics one can show that there is a set which is not recursively enumerable (for example, the usual complement of the Halting Problem). One thus has a counterexample to the thinking subject assumption. 11 Incidentally, as noted in II.(a).(i) of [1972], the constructible sets were proposed by G¨ odel as an analysis of humanly effective definitions (and the letter “L” stood for “lawlike”). Later G¨ odel expanded the analysis to the notion of ordinal-definable sets. 12 The idea of the proof is the following. In [1962] Kreisel had proved that constructive completeness of predicate logic implies (actually, is equivalent to) a constructive version of K¨ onig’s Lemma. Consider an infinite recursive (hence, constructive) tree with no infinite recursive branch: if K¨ onig’s Lemma holds constructively, such a tree has an infinite constructive branch, which cannot be recursive. One thus has a counterexample to Church’s Thesis.
368
Piergiorgio Odifreddi
function (like second-order validity, but unlike first-order validity in the classical case)”. The parallel with set theory is explored in [1970] and supplemented in (d) of [1971b], where Kreisel compares: • on the one hand: the abstract notion of set, its basic properties described by the Zermelo–Fraenkel axioms, G¨odel’s constructible model, the assumption V = L, and a nonaxiomatizability result for infinitary predicate calculus following from it; • on the other hand: the abstract notion of constructive arithmetical function, its basic properties described by Heyting’s axioms, Kleene’s realizability model, the assumption of Church’s Thesis, and the non-axiomatizability result for intuitionistic predicate calculus following from it, and quoted just above. In this context, axioms refuting Church’s Thesis would play a role similar to set-theoretical axioms (such as the existence of measurable cardinals) contradicting V = L. 2.6. Church’s Rule
For systems for which (consistency of) Church’s Thesis is not known to hold, or it actually fails, one can restrict attention to (consistency of) closure under Church’s Rule, i.e. the assertion that if the premise of CT2 is provable then so is the conclusion. Technical Note II of [1970] warns that Church’s Rule is genuinely problematic: even if CT2 is valid, provability of the premise implies only validity of the conclusion, not necessarily its provability (because of incompleteness of usual systems). As noted in 2.7231 of [1965], the first consistency result of closure under Church’s Rule was obtained by Kleene for his system in [1952] (a result strengthened in 2.723 of [1965] to consistency of Church’s Thesis). Closure under Church’s Rule of the theory of species of natural numbers without choice was proved in Technical Note II of [1970]. The result was extended to the theory with choice in Kreisel and Troelstra [1970] (a result supplemented there by consistency of Church’s Thesis), while the proof was simplified in 2.a.(ii) of [1971a], and (b) of [1971b].
Kreisel’s Church
369
In general, II.(c) of [1972] showed that every sound formal system satisfying the Constructive ∃-Rule (quoted in 2.1 above) is closed under Church’s Rule.13 This implies that for a refutation of Church’s Rule one can only look at systems that either are not formal or do not satisfy the Constructive ∃-Rule. III.3 of [1974] points out that the insistence on considering formal systems satisfying the Constructive ∃-Rule was a systematic error that precluded the possibility of disproving closure under Church’s Rule. This is balanced by §5 of [1987], where it is noted that, however, “there are no rewarding candidates of systems in sight that can be established with informal rigour to hold for the constructions of the ideal mathematician, but do not have both the two properties [of being formal and satisfying the constructive ∃-Rule]”.14 The assessment of results about Church’s Rule is a delicate matter, discussed in II.(c).(ii).(α) of [1972]: closure under the rule refers to provability in the system, and thus if true is significant only for systems complete for constructive mathematics (for which however Church’s Thesis would hold), and if false is merely a symptom of incompleteness; on the other hand, inconsistency would instead disprove Church’s Thesis.
3. Theories of Mathematical Reasoning The possibility of a theory of mathematical reasoning was touched upon by Kreisel in: • Mechanistic theories of reasoning (§4 of [1966]), • Genetic theories of effective definitions (II.(b) of [1972]).
As a general point, the end of §4 of [1966] states that “the use of technically advanced machinery in analysing reasoning is encouraging; after all, Aristotle thought about reasoning; one would like to see clearly what one has that he did not have! (It is no comfort to 13
The proof is the following. If ∀x∃yR(x, y) is provable, let n be given: then ∃yR(n, y) is provable, and by the Constructive ∃-Rule so is R(n, m) for some m. But the system is formal, and by enumerating its theorems one can find (one such) m. This defines a recursive function f such that, for every n, R(n, f (n)) is provable. 14 Specially concocted intuitionistic formal systems not satisfying the Constructive ∃-Rule do exist by the incompleteness theorems: an explicit example is given in II.(c) of [1972].
370
Piergiorgio Odifreddi
know that over 2000 years have passed since his time unless one sees just how one has used the experience of these 2000 years.)” 3.1. Individual Reasoning
In discussing formalist rules of reasoning, Kreisel notes that (in the terminology introduced in 1.4) what is at stake here is the superthesis for (mathematical) reasoning. According to §5 of [1987], “the pioneers, in particular Frege, had of course a lot to say about the distinction between thought processes and their results. He called the latter ‘objective’ thoughts, [... and] saw a principal use of his objective analysis (ignoring subjective processes) in the greater security it gave to common reasoning”. However, Note 31 of [1965] remarks that “the conviction, probably, is not merely that such rules happen to generate the provable statements in a particular domain of mathematics, but that [...] this is really all that goes on”. Kreisel proposes a parallel with the early days of chemistry, where “one did not merely mean that the particular integral ratios in chemical reactions happen to be formally explained by an atomic hypothesis, but that there were such things as atoms”. And notices that the attraction of formalism “derives at least partly from this: long before electronic computers were thought of, one could see more or less how behavior according to such formal rules could be realized by a mechanism, that is an old fashioned mechanism in the sense of a Turing machine”. This is reiterated in 2.(a).(i) of [1966]: “Probably the major attraction of formalization was that it suggested the possibility of a mechanistic theory of human reasoning, in particular, that [mathematical] propositions not only can be decided by means of formal rules, but that something like repeating application of such rules is all that goes on even if we consciously think of reasoning differently; more precisely, that the higher nervous system consists of a mechanism whose behaviour is given by the formal rules, as an electronic computer is a mechanism whose physical behaviour realizes certain mechanical laws (the ‘instructions’ which it is given)”. 4.(a).(i) of [1966] notices however that “it is remarkable how little work was done on this even in areas, such as predicate logic, where the set of valid statements is recursively enumerable. The least one would have to do is to show that there is something mechanical about the actual choice of proofs, not only about the set of results”.
Kreisel’s Church
371
The point is taken up again in 4.(c).(i) of [1971], where the notion of superthesis is applied to “Frege’s empirical analysis of logical validity in terms of his formal rules; the superthesis would then correspond to an assignment of specific deductions, modulo trivial conversions, to intuitive logical proofs. Here [...] the theorem proved does not determine the process, that is the proof (a fortiori, not the formal description of the process); in fact, not even in propositional logic: thus we have at least two obviously different proofs of the theorem (p ∧ ¬p) → (p → p), one using p → p and q → (r → q) with q = p → p and r = p ∧ ¬p, the other using (p ∧ ¬p) → s with s = p → p”. Kreisel notes there that what is now called the Curry–Howard isomorphism (between derivations in intuitionistic calculi and terms in corresponding typed λ-calculi) provides an example of work in the direction of the superthesis in the sense just discussed. 3.2. Collective Reasoning
The cooperative phenomenon (in the language of statistical mechanics) of the mathematical community, and its behavior with respect to arithmetic problems, is considered in Note 29 of [1965], and in 4.(a).(ii) of [1966]: “This behavior seems asymptotically stable. We certainly have no better theory at present than this: a statement will be accepted if true”. Now this theory is certainly not recursive and hence not mechanistic, “but the whole issue is whether reasoning is mechanistic, and so it is a petitio principii to require that only mechanistic theories of reasoning are admitted”. In II.(b).(i) of [1972] a shift from truth to provability is made: the possibility of “an all-encompassing formal system F for the whole of mathematics (or even the part dealing with number theoretic predicates)” is considered, and it is noted that such a formal system would establish Church’s Thesis for humanly realizable functions.15 Equivalently, any example of a humanly realizable, non-recursive function would refute the possibility of such a system. 15 A technical result sketched in [1971c] and proved in Part I of [1972] shows that the same would hold in the weaker hypothesis that “mathematical reasoning is encompassed not by a single formal system, but by a recursive progression on a Π11 path through Kleene’s O”. See Note 4.(b).(iii) of [1987a] for an account of G¨ odel’s role in prompting this result.
372
Piergiorgio Odifreddi
II.(b).(i).(α) shows that G¨ odel’s incompleteness results prove not the impossibility of such a system, but only that we cannot have mathematical evidence of its adequacy. II.(b).(i).(γ) discusses empirical, non-mathematical evidence, in particular stability of actual practice: systems such as Principia Mathematica are as adequate for number theory or analysis today as they were in the 19th century. Kreisel notices that knowledge of G¨odel’s incompleteness proof would, at least naively, be expected to spoil the adequacy of such systems, but in practice it does not.16 3.3. Mind
According to the end of [1980], “throughout his life G¨odel looked for good reasons which would justify the most spectacular conclusion that has been drawn from his first incompleteness theorem: minds are not (Turing) machines. In other words, [...] the laws of thought are not mechanical (that is, cannot be programmed even on an idealized computer)”. As stressed there, “the popular reasons are quite inconclusive. Certainly, by (Matyasevic’s improvement of) the incompleteness theorem, those minds which can settle all diophantine problems are not machines; but we have not found any evidence of such minds. Nor there is the slightest hint of any computer programs which simulate (even in outline) actual proof search; not even for solving problems which do have a mechanical decision procedure (for example, propositional algebra)”. In §4 of [1966] Kreisel proposes a variant to a favourite twist of G¨odel’s, brought up in conversation: “either there are mathematical objects external to ourselves or they are our own constructions and mind is not mechanical”. The variant differs from G¨ odel’s formulation in two respects: first, Kreisel makes no assumption that “if mathematical objects are our own constructions we must be expected to be able to decide their properties”;17 second, he would “like to use an abstract proof 16
He intriguingly remarks that also “knowledge of Freud’s interpretation of the dream symbolism would be expected to produce new symbols (to deceive the superego) but, apparently, it does not”. 17 Kreisel has a gift for provocative comparisons, and displays it in this case by adding: “I do not see why one should expect so much more control over one’s mental products than over one’s bodily products—which are sometimes quite
Kreisel’s Church
373
of the non-mechanical nature of mind [...] for the specific purpose of examining particular biological theories”. For the latter purpose, granted a negative result about the mechanical nature of mind, 4.(a).(iii) of [1966] points out that one needs to make specific assumptions about such theories (in addition to the general ones stated below, at the end of 4.1). In particular: “mathematical behavior is regarded as an integral part of the experience to be explained, and not as some corner far removed from the principal activities of the organism” (an assumption whose rejection implies an acceptance of the division between mental and ‘ordinary’ biological phenomena), and it is “to be explained in terms of the basic laws themselves”; moreover, “the basic laws are such that the laws for cooperative phenomena, i.e. interaction of organisms such as involved in mutual teaching of mathematics, are also recursive”. Kreisel proposes the following as a debating point: “compare the place of mathematical behaviour among biological phenomena to the place of astronomical behaviour among mechanical phenomena; the former is far removed from ordinary life, exceptionally predictable, exceptional both in the sense that the predictions are precise, and also that they were the first to be noted; since astronomical phenomena played an important part in building up physical theories, should one not expect the analogue too?” As to the role of G¨ odel’s incompleteness theorem, Kreisel states in 4.(b) of [1966] that he does not think that the result “establishes the non-mechanistic character of mathematical activity even under [the assumptions] above without [G¨odel’s own] assumption that we can decide all properties of our (mental) productions. For, what it establishes is the non-mechanistic character of the laws satisfied by, for instance, the natural numbers: and the theory of the behavior of arithmeticians mentioned above may well be wrong!” Actually, there is the possibility that “the natural tendency of mathematicians to be finitist or predicativist is significant for the psycho-physical nature of reasoning”. And if finitism or predicativism turned out to be the correct description of the behavior of surprising” and, as added at the end of [1980], “can have painfully unexpected properties”. According to the last source, “G¨ odel remained unsympathetic to [this] admittedly tasteless comparison”. As another example, in §8 of [1985] he attacks the “blithe talk about ‘natural’ notions” by reminding one of “the obvious parallel from botany where perfectly natural and often pretty mushrooms can be addictive or poisonous in other ways”.
374
Piergiorgio Odifreddi
finitist or predicativist mathematicians, one could actually mention problems which neither can decide.
4. Physical Theories The question of whether physically realizable functions are recursive was first raised by Kreisel in 2.714 of [1965], and discussed in: • Mechanism and materialism (4.(d) of [1966]), • Analogue versus Turing computers (§3 of [1970a]), • A notion of mechanistic theory ([1974]), • Theories in natural science: rational and computable laws (§6 of [1987]). On the positive side, II.(a).(v) of [1972] states that the (important and neglected) empirical evidence provided the fact that a large class of patently non-mechanical functions turn out to be equivalent to recursive ones should be taken as a sign of the importance of the notion of recursive function. On the negative side, Kreisel notes in 4.(d) of [1966] and at the end of [1980] that the possibility of physically realizable but nonrecursive functions shows on the one hand that “the hypothesis that reasoning is not mechanistic is by no means anti-materialist or antiphysicalist”, and suggests on the other hand the possibility that “the notion of machine is not adequate ‘in principle’ to separate mind and matter”. To avoid misunderstanding, [1974] stresses the fact that it is not phenomena, but theories about them that are considered here. Accordingly, Kreisel defines a theory as mechanistic if “every sequence of natural numbers or every real number which is well defined (observable) according to theory is recursive or, more generally, recursive in the data (which, according to the theory, determine the observations considered)”. As a corollary to this position, “the reader should not allow himself to be confused [...] by doubts about the validity of a theory with regard to the phenomena for which it is intended”, although obviously “such doubts imply doubts about the relevance (to those
Kreisel’s Church
375
phenomena) of any results about the mechanistic character of the theory”. 4.1. Positive Results
The obvious starting point for the search of mechanistic theories is, as the name implies, classical mechanics. In 2.714 of [1965] Kreisel noticed that “(excepting collisions as in the 3-body problem, which introduce discontinuities) the theory of partial differential equations shows that the behavior of discretely described (finite) systems of classical mechanics is recursive”.18 This is reiterated in Note 2 of [1970], where the possibility of “(finitely specified) physical systems whose most probable behavior is non-recursive” is reconsidered, and the fact that “the theory of partial differential equations gives a negative answer for a general class of systems in classical mechanics” restated (this time, with the comment that “the result is not trivial since we are dealing with the mechanics of the continua and Turing machines are discrete mechanisms”). In §4 (Footnote 1) of [1966] attention is shifted to probabilistic processes, and a proof is given of the fact that “if in a stochastic process (with a finite number of states) the transition probabilities are recursive, any sequence of states with non-zero probability is automatically recursive”.19 This is sharpened in 3.(c) of [1970a], where it is shown that the result “can be extended to stochastic processes with an infinite number of discrete states and a recursive table of transition probabilities”.20 Finally, in §4 of [1966] Kreisel touches on biological processes, and claims that “the stable macroscopic properties of organisms would be expected to be recursive” if, as currently assumed, biological theories will be general schemas for the explanation of biological processes, 18 “Discrete” means that all relevant parameters take discrete values, not only that the systems are finite. 19 The idea of the proof is the following. If the transition table is recursive, the tree of all possible sequences of states is recursive. If a sequence of states has non-zero probability, it is an isolated branch in such a tree. And any isolated branch of a finitely branching recursive tree is recursive. 20 One has to use an appropriate definition of “sequence of states with non-zero probability” to prove that the latter is recursive, since an argument as in note 19 would only prove it is hyperarithmetic.
376
Piergiorgio Odifreddi
based on “combinatorial basic steps iterated a (large) number of times” (a characteristic of recursive processes). This is supplemented in §5 of [1987], where it is noted that “such characteristic aims of the logical tradition as unity by reduction to a few primitives may be misplaced here. Thoughtful biologists are sensitive to those aims, and tell us that they are not compatible with the process of evolution. It selects from a mass of random mutations those specific elements that are adapted to the surroundings in which they happen to be. Quite simply, the process doesn’t have a logical feel, and so the laws could not expected to have such a feel either. At most, somewhere on the molecular level the laws might satisfy the idea(l)s of the logical tradition, though often they do not”. 4.2. Evaluation of Positive Results
The result quoted in 4.1 provide empirical evidence for the mechanistic character of existing physical theories. III.3 of [1974] draws a parallel with the fact that by the end of the twenties “the huge bulk of the mathematical problems that were regarded as solved had formal, that is, mechanically computable, solutions”, and that “even today we do not have any theorem in ordinary number-theoretic practices which cannot be proved in Principia Mathematica”. Nevertheless, “the non-mechanistic nature of the axiomatic theory of natural numbers was discovered, not by sifting existing applications which accumulated in the course of nature (here: in number-theoretic practice) but by looking for unusual or neglected applications (here: to metamathematical questions): applications specifically chosen for their relevance to questions of mechanization or, equivalently, formalization”. This is, according to Kreisel, “the lesson to be learned from the experience with axiomatic theories of mathematical objects; for use with our present problem concerning the mechanistic character of (other) scientific theories”. The points in 4.3 below have been raised throughout the years with this explicit lesson in mind. 4.3. Where to Look for Negative Results
4.(a).(iii) of [1966] notices that, unlike discrete classical systems, co-operative phenomena are not known to have recursive behavior.
Kreisel’s Church
377
Note 2 of [1970] hints at the possible non-recursiveness of a collision problem related to the 3-body problem,21 and suggests as a possible consequence “an analog computation of a non-recursive function by repeating collision experiments sufficiently often”. A more explicit discussion of this example is in IV.2 of [1974].22 But, as stated in 2.714 of [1965], the natural place where to look for non-recursive behavior is “the quantum theory, for example, of large molecules”. Here are two examples proposed by Kreisel: • 4.(d) of [1966] notes that “it is not known whether there exists a physical system with a Hamiltonian H such that, for instance, σ(n) is the set of possible spins in the n-th energy state, σ(n) finite for each n, and σ(n) is not a recursive function of n”. • 3.(a) of [1970a] suggests the possibility of “large molecules whose spectrum (or: to have a dimensionless quantity, the ratio of the first spectral line to the second) is not recursive”. 21
The critical question is whether or not the masses collide (during the interval of time considered). If they don’t, it is obvious that their paths can be computed as precisely as one wants. A physically meaningful formulation of the computability of this matter of collision must then refer not to points in phase space (in other words, precise positions and velocities), but neighbourhoods. More precisely, one does not ask for a recursive decision procedure to determine whether, for arbitrary times t, a collision occurs exactly at time t (or ≤ t). Rather, one asks for such a procedure to determine, for arbitrary t, an interval (t−t0 , t+t0 ) with small t0 (possibly depending on t), such that one of the following happens: either there is no collision before t − t0 , or there is a collision at some time after t + t0 . 22 Kreisel states in [1976] that the formulation of this problem in [1974] is “distinctly better” than in [1970]. Judgments of this sort, both positive and negative, abound in Kreisel’s (a bit schizophrenic) self-reviews, and the following choice may give a flavour. In [1971b] he describes the arguments and formulations of [1974] as “unnecessary, [...] inconclusive [...] and practically useless”. He mocks himself, by noticing that “the author, who does not usually avoid self-reference, forgets to quote [one of] his own observation[s]”, and that “the author’s objections [...] seem to the reviewer much stronger that the author can have realized”. In [1971c] he complains that “the title [of [1970a]] is misleading”, and that “the discussion trails off feebly instead of referring to the relevant literature”. Moreover, “the author [amazingly] fails to stress what is perhaps the most obvious relevant contribution”, and “is unsympathetic to his own subject”. However, he “has a number of very interesting concrete suggestions”. In [1972a] he depicts the discussions in [1971] as “hesitant and discoursive”, and the explanations as “incompetent but convincing”. Finally, in [1976] he plays on his double role by noticing that “the author gives no reference [...] and the reviewer does not know any reference either”.
378
Piergiorgio Odifreddi
However, Kreisel seems to have reached a negative impression about these examples, and in 3.(a) of [1970a] conjectures that “Kato’s theorem could be used to give arbitrarily close recursive approximations”.23 A different example, using sequences of eigenvalues, has been suggested by Pour–El and Richards [1989]: the example has the flavour of a Hamiltonian, but does not seem to satisfy the conditions of Schr¨ odinger’s equation (in particular, Kato’s theorem may not apply). 4.4. Physical Relevance
In [1974], Kreisel considers the step from obtaining constants by empirical (usually approximate) measurements to calculating them theoretically, and suggests the possibility that this extension of the notion of physical theories is “liable to introduce non-mechanistic elements in a perhaps non altogether trivial way”. But Kreisel notes that the exhibition of a problem without recursive solutions would not be enough to show the non-mechanistic character of (related) physical theories. Further work would be needed: in particular, “it would be necessary to describe (an ensemble of) experiments and their statistical analysis for which the most probable outcome of the experiments is determined by the solution to the problem. In other words, if the problem has no recursive solution the most probable outcome of the experiments should be non-recursive too”. 4.4.1. The Wave Equation
§IV of [1974] discusses examples of non-recursive objects with a physical look, but without physical relevance: for example, recursively continuous curves which do not attain their maximum at any recursive point. A step forward in this direction was made by Pour–El and Richards [1981]: they proved that for certain choices of recursive data (initial conditions) the wave equation has a unique, but not recursive solution. 23
Kato’s theorem provides upper and lower bounds for arbitrary Schr¨ odinger’s equations.
Kreisel’s Church
379
Kreisel’s review [1982] discusses this result: on the positive side, the equation itself is provided by current physical theory; on the negative side, the data are not (known to be) generated by recursively describable phenomena.24 This criticism is reiterated in [1992]: “naturally, some of the operators considered appear in theoretical physics. But not all their formal properties have a physically sensible interpretation!” Thus the further work quoted above, needed to step from mathematical to physical relevance, is still lacking. 4.4.2. Hadamard’s Principle
IV.2 of [1974] discusses Hadamard’s principle, restricting the class of meaningful (physical) theories to ones providing functions continuous in their data. In particular, Kreisel suggests a refinement, requiring functions to be recursive in their data, and presents the collision problem (related to the 3-body problem) quoted in 4.3 as an example satisfying Hadamard’s principle, but not known (even today) to satisfy this refinement. In Note 4.(a).(ii) of [1987a] Kreisel says that he had actually thought for some time that even the refinement had been a tacit assumption for people working on problems in mathematical physics; forcing them, as a consequence, to miss non-recursive solutions. The work of Pour–El and Richards [1981] discussed above provided a refutation to this impression, and showed in particular that this had not been the case with Kirchhoff’s solution of the wave equation. Nevertheless, “non-recursive solutions are often indeed unsatisfactory as they stand. But, once recognized, they may be explicitly excluded for physical reasons, [...] or they may suggest new questions that have more manageable solutions”. 24 In passing, Kreisel suggests in [1982] a possible shift of emphasis from recursiveness to subrecursiveness: “the realistic potential of (suitable) analogue computers for cheaply and reliably solving problems that are costly for Turing machines, is at least as significant as that of doing a recursively unsolvable job” (at issue here).
380
Piergiorgio Odifreddi
Acknowledgments I wish to thank Sol Feferman and Georg Kreisel for their comments on a first draft of this paper.
References Barendregt, H. [1981], The Lambda Calculus, its Syntax and Semantics, North Holland. Fitting, M.C. [1981], Fundamentals of Generalized Recursion Theory, North Holland. Girard, J.Y. [1986], Proof Theory, Bibliopolis. G¨odel, K. [1972], Some Remarks on the Undecidability Results, in [1989, pp. 305–306]. G¨odel, K. [1986], Collected Works, vol. I, Oxford University Press. G¨odel, K. [1989], Collected Works, vol. II, Oxford University Press. Kalm´ar, L. [1952 (1959)], An Argument against the Plausibility of Church’s Thesis, in Constructivity in Mathematics, (Heyting ed.), North Holland, 1959, pp. 72–80. Kleene, S.C. [1952], Introduction to Metamathematics, North Holland. Kreisel, G. [1962], “On Weak Completeness of Intuitionistic Predicate Logic”, J. Symb. Log. 27, 139–158. Kreisel, G. [1965] Mathematical Logic, in Lectures on Modern Mathematics, vol. 3, (Saaty ed.), Wiley, pp. 95–195. Kreisel, G. [1966], Mathematical Logic: What Has It Done for the Philosophy of Mathematics?, in Bertrand Russell. Philosopher of the century, (Schoemann ed.), Allen and Unwin, pp. 201–272. Kreisel, G. [1970], Church’s Thesis: a Kind of Reducibility Axiom for Constructive Mathematics, in Intuitionism and Proof Theory, (Kino, et al. eds.), North Holland, pp. 121–150. Kreisel, G. [1970a], “Hilbert’s Programme and the Search for Automatic Proof Procedures”, Springer Lect. Not. Math 125, 128–146. Kreisel, G. [1971], Some Reasons for Generalizing Recursion Theory, in Logic Colloquium ’69, (Gandy and Yates eds.), North Holland, pp. 139–198.
Kreisel’s Church
381
Kreisel, G. [1971a], “A Survey of Proof Theory”, II, in Proceedings of the Second Scandinavian Logic Symposium, (Fenstad ed.), North Holland, pp. 109–170. Kreisel, G. [1971b], Review of [1970], Zentr. Math. Grenz. 199, 300–301. Kreisel, G. [1971c], Review of [1970a], Zentr. Math. Grenz. 206, 277–278. Kreisel, G. [1972], “Which Number-Theoretic Problems can be Solved in Recursive Progressions on P i11 Paths Through O?”, J. Symb. Log 37, 311–334. Kreisel, G. [1972a], Review of [1971], Zentr. Math. Grenz. 219, 17–19. Kreisel, G. [1973], Review of [1972], Zentr. Math. Grenz. 255, 28–29. Kreisel, G. [1974], “A Notion of Mechanistic Theory”, Synth 29, 11–26. Kreisel, G. [1976], Review of [1974], Zentr. Math. Grenz. 307, 18–19. Kreisel, G. [1980], “Kurt G¨ odel”, Bibl. Mem. Fell. Royal Soc. 26, 149–224. Kreisel, G. [1982], Review of Pour–El and Richards [1979] and [1981], J. Symb. Log 47, 900–902. Kreisel, G. [1985], Review of Fitting [1981], Bull. Am. Math. Soc. 13, 182–197. Kreisel, G. [1987], “Church’s Thesis and the Ideal of Informal Rigour”, Notre Dame J. Form. Log. 28, 499–519. Kreisel, G. [1987a], G¨ odel’s Excursions into Intuitionistic Logic, in G¨ odel remembered, (Weingartner and Schmetterer eds.), Bibliopolis, pp. 77–169. Kreisel, G. [1990], Review of G¨ odel [1989], Notre Dame J. Form. Log. 31, 602–641. Kreisel, G. [1990a], Logical Aspects of Computations, in Logic and Computer Science, (P.G. Odifreddi ed.), Academic Press, pp. 205–278. Kreisel, G. [1992], Review of Pour–El and Richards [1989], Jahres. deutsch. Math. Verein. 94, 53–55.
382
Piergiorgio Odifreddi
Kreisel, G. and Troelstra, A.S. [1970], “Formal Systems for Some Branches of Intuitionistic Analysis”, Ann. Math. Log. 1, 229–387. Odifreddi, P.G. [1989], Classical Recursion Theory, North Holland. Pour–El, M.B. and Richards, I. [1979], “A Computable Ordinary Differential Equation which Possesses No Computable Solutions”, Ann. Math. Log 17, 61–90. Pour–El, M.B. and Richards, I. [1981] “The Wave Equation with Computable Initial Data Such that its Unique Solution is not Computable”, Adv. Math. 39, 215–239. Pour–El, M.B. and Richards, I. [1989] Computability in Analysis and Physics, Springer Verlag. Turing, A.M. [1936], “On Computable Numbers with an Application to the Entscheidungsproblem”, Proc. London Math. Soc., 42, pp. 230–265.
Adam Olszewski∗
Church’s Thesis as Formulated by Church — An Interpretation Church’s Thesis is becoming the subject of more and more intensive research from academics of various disciplines. The growing number of papers on this theme have been accompanied by a growing number of ambiguities, different expressions and misunderstandings. Therefore, a more careful analysis of the history of Church’s Thesis as a phenomenon is necessary, in particular its origins. My task here is to indicate the right terminology that should be employed when examining the Thesis. Although 70 years have passed since Church’s Thesis was first presented it would appear that little progress has been made in research terms. An analysis of the texts written by Church himself will be of key importance in this paper. I will also consider the philosophical views of Stephen Kleene, which bear some relation to Church’s Thesis. 1. First, I shall consider what the term Church’s Thesis refers to. The term was first introduced by Kleene. In his “Recursive Predicates and Quantifiers” [Kleene 1943] he referred to [Church 1936] and formed the following thesis: (T1) Every effectively calculable function (effectively decidable predicate) is general recursive. Kleene [1952, p. 300] cited T1 and mentioned Church’s Thesis on [p. 314].1 Kleene made an explicit reference to [Church 1936]. Thus, it should be assumed that the term Church’s Thesis denotes what Church meant in [Church 1936]. ∗
A. Olszewski, Pontifical Academy of Theology, Cracow, Poland. Kleene discussed the Thesis in greater detail in paragraph 62 of [Kleene 1952]. In his [1967, p. 232] Kleene uses “Church’s Thesis” interchangeably with “Church–Turing Thesis”. 1
384
Adam Olszewski
When he announced the abovementioned paper in the abstract referred to AMS, Church wrote (C1):2 In this paper a definition of recursive function of positive integers which is essentially G¨ odel’s is adopted. And it is maintained that the notion of an effectively calculable function of positive integers should be identified with that of a recursive function, since other plausible definitions of effective calculability turn out to yield notions which are either equivalent to or weaker than recursiveness.
The same paper of [1936] contains two versions of the Thesis, which I define as CT and CT1, respectively. Text (C2) is used to single out CT: We now define the notion [...] of an effectively calculable function of positive integers by identifying it with the notion of a recursive function of positive integers. [Church 1936].
How should text (C2) be understood? If we interpret it literally we should assume that the procedure of defining consists in identifying notions (concepts). The term identifying can be interpreted in two different ways: [i1] According to the first interpretation, identifying would involve ascertaining a certain fact, the relation between the concepts having existed prior to the act of defining. The Thesis would then be a reporting definition and as such could be declared true or false. [i2] According to the second interpretation, identifying is the procedure used to determine the identity. In such a case, the precise concept of an effectively calculable function as defined in the Thesis may not exists before identifying. Such a precising definition is used to determine the identity by defining the manner in which a defined term is used. In such a case it makes no sense to consider whether it is true or false. I intentionally use the term concept instead of notion due to the more technical character of the former3 , even though logicians 2
[Church 1935, 333]. Technical in a sense that if we attempt to create a theory within a particular science we speak of a theory of concepts and not a theory of notions. 3
Church’s Thesis as Formulated by Church...
385
and mathematicians use the latter term more frequently. The initial chapters of many mathematics textbooks tend to refer to concepts.4 Concepts appear as a basic tool for pursuing mathematics. G. Kreisel formed similar opinions, maintaining that intuitive concepts are what we have when we begin to pursue mathematics.5 Ad [i1]. If we accept the interpretation [i1] of identity we would come up with the following definition: (CT) The concept of an effectively calculable function is identified [i1] with that of a recursive function.6 Schematically, this can be expressed by referring to the concept of an effectively calculable function (defined in the domain of natural numbers) as π.[calculable] and to the general recursive concept as π.[recursive]:7 (CT’) π.[calculable] = π.[recursive]. Footnote 3 in the paper dated 1936 contains the following text (C3) by Church: As will appear, this definition of effective calculability can be stated in either of two equivalent forms, [...] (2) that a function of positive integers shall be called effectively calculable if it is recursive [...]. [Church 1936, p. 90]
Incidentally, we should be aware of the way the word if is used in the text. According to Suppes [1957, p. 8], the implication (P → Q) corresponds, inter alia, to the English expression Q if P. Providing that Suppes is right, Church’s formulation in the abovementioned quotation has been used to express an obvious implication contained in Church’s Thesis.8 In fact, however, Kleene and others understood 4 I looked through ten textbooks relating to different branches of mathematics and each of them made use of concepts. The authors of those textbooks that were written in English usually used the word notion, but I also came across the word concept, for instance [Hale and Ko¸cak 1991, p. 3]. 5 [Kreisel, p. 499]; Kreisel uses the term notion. 6 In this paper, the term recursive function means general recursive function. 7 The notation with the π operator refers to lambda calculus notation. It is to play a role similar to the lambda operator. The notion is subordinate to the predicate. 8 Of course, if we understand Church’s Thesis as a form of equivalence, as is usually the case today, Kleene’s T1 is a more difficult implication of such an equivalence.
386
Adam Olszewski
the abovementioned formulation to be an equivalence or reverse implication (for instance T1) because it is quite common among mathematicians to express equivalence in such a manner. The abovementioned quotation—C3—leads to the following formulation using T1: (CT1) A function of positive integers is effectively calculable if and only if it is recursive.9 Church wrote that the CT definition could be shaped in the abovementioned manner. What made him adopt such an opinion? If two concepts πx.[F(x)] and πx.[G(x)] are identical, and we mark it as πx.[F(x)] = πx.[G(x)], there should be some rule which enables the transition to: F(x) ≡ G(x), for any x; where F and G are the concept words (Begriffsworte 10 ) and the variable x runs over the universe of objects which fall under the relevant terms. In general, it can be expressed in the following manner: (BL) If πx.[F(x)] = πx.[G(x)], then for any x, F(x) ≡ G(x). However, such reasoning would still require more. If CT is to serve as the definition expressed in CT1 there should be an implication reverse to BL, namely: (BL1) If for any x, F(x) ≡ G(x), then πx.[F(x)] = πx.[G(x)]. BL together with BL1 result in equivalence. In this case, Church followed Frege, who believed that the identity of concepts could be brought nearer in this manner, although he never formulated such a law. Frege believed that both the BL and BL1 implications could be reconciled by his understanding of concepts. However, it is possible to imagine such an understanding of concepts and a domain where BL1 would be false, for example equiangular triangle and equilateral triangle. It should also be noted that the idea of formulating Church’s Thesis in the form of an identity between two classes of functions, as sometimes happens, refers to Basic Law V which, in Frege’s system, resulted in a contradiction. 9
This formulation can be expressed in the following manner: ∀x (E(x) ≡ R(x)), where E is to be an effectively calculable predicate and R is to be a recursive predicate. 10 This is Frege’s terminology. Today we tend to speak of predicates.
Church’s Thesis as Formulated by Church...
387
In order to bring the identity of the concepts πx.[F(x)] = πx.[G(x)] nearer, the relevant expressions F(x) and G(x) should be synonymous. In such a case, material equivalence is too weak relation. What did Church have in mind when he wrote that the identification of two concepts “can be stated” in the form of material equivalence? As we can conjecture, the minimal condition he imposed on the statement of the Thesis was the equivalence between CT and CT1. It also seems he wanted to make it susceptible to logical research. At the time, Frege’s achievements offered the only serious approach to researching concepts, but they could not be applied to CT. This was due to the fact that the Frege’s concept is essentially set–theoretical. However, we cannot exclude a different formulation of the Thesis, which provides for the development of a new theory of concepts—in fact a non-set-theoretical one. One example of such a theory includes the suggestion made by Pavel Tich´ y and developed by Pavel Materna, according to which concepts that are indefinable in terms of set theory are understood to mean some kind of constructions or procedures (algorithms) used to identify the denotation of terms.11 Church’s Superthesis, as elaborated by G. Kreisel, is a partial step towards regarding CT as a form of identity between concepts.12 Ad [i2]. However, the abovementioned (C1), (C2), (C3) and the following (C4) seem to point to the fact that Church understood the thesis according to [i2]. In the case of the second interpretation of identity between concepts we come up with the following formulation of the Thesis: (CT2) π.[calculable] =[id2] π.[recursive]. The following text (C4) concludes the first paragraph of Church’s paper of [1936]: The purpose of the present paper is to propose a definition of effective calculability which is thought to correspond satisfactorily to the somewhat vague intuitive notion [...]. 11 Cf. [Tich´ y 1988] and [Materna 1998]. Perhaps the research recently carried out on algorithms by, for instance, Yuri Gurevich will result in some interesting theory of concepts. 12 As discussed in [Odifreddi 1996, §1.4].
388
Adam Olszewski
In (C4), the term definition seems to assume a different meaning. It is not only CT that serves as the definition and as such defines the relation between two concepts. In this case, the definition is the specification of recursive function because the specification is supposed to correspond to an intuitive concept. In this case, the definiens contained in CT is to function as the definition and the precise concept of the recursive function is intended to eliminate the vagueness of the concept of intuitive effective calculability. Let us assume that the concept of an effectively calculable function is vague, as Church wants us to. Then, the ordered pair of sets (X, Y), where U is the set of all functions defined in the set of natural numbers, X ⊂ U is the set of functions which are certainly effectively calculable and Y ⊂ U is the set of functions which are certainly not effectively calculable, is meant as the denotation of the term effectively calculable function. In such a case, the vagueness consists in the fact that [U−(X∪Y)] 6= ∅. If so, the difficulty in accepting the Thesis when formulated as CT1 (∀x (E(x) ≡ R(x))) would be due to the fact that we hardly know what the truth conditions for CT1 are. Let us take the nonrecursive function f ∈ [U−(X∪Y)]. We do not know how to determine the truth of the following sentence: E(f) → R(f). The consequent of the implication would be false, but there is some doubt concerning the logical value of the antecedent of the implication. On the other hand, if we accept CT2, the sentence E(f) → R(f) is true because it is synonymous with the tautology R(f) → R(f). Church does not seem to be absolutely certain what the definiendum in CT is in its form as a definition. When he writes about the Thesis in the first paragraph of his paper of [1936], he refers to it as the definition of “effective calculability” and in the seventh paragraph he writes about “the effective calculable function of positive integers.” I mention this fact for form’s sake because it is known that the identification of the notion “effective calculability” with the notion of recursive function is false due to the existence of the effectively calculable function of the G¨odel numbering. 2. The two textbooks, namely [Kleene 1952; 1967], contain a small number of Kleene’s statements concerning the philosophy of mathematics. He was extremely careful in shaping his philosophical opinions. For example, in his article on G¨odel’s accomplishments Kleene [1976] wrote that he was not competent to report on the philosophical results of the Austrian logician. However, he maintained he
Church’s Thesis as Formulated by Church...
389
had considered choosing mathematics and philosophy as his major field of study. We will nonetheless try to determine Kleene’s views on the existence of abstract objects on the basis of his brief comments on the matter. In [Kleene 1952, §6], in his comment on Kronecker’s statement according to which God created integers, Kleene noted that the discovery of the natural number sequence could not boil down to “[...] anything essentially more primitive than itself” [p. 19] (it refers at least to people). In this case, Kleene’s views appear to assume the form of structuralism. We are not investigating what natural numbers are but “[...] only how they form the natural number sequence” [p. 20]. Numbers are only the objects placed in that sequence. His structuralist viewpoint is even more profoundly felt in paragraph eight. A system of objects, which are not set individually and “[...] among which are established certain relationships” [p. 24], forms the basis for the investigations. In the case of abstract systems, the structure is investigated and it remains undefined what the objects are. The genetic and axiomatic methods of introducing systems of objects are used in mathematics. Thus, undefined, initial concepts are characterized only by axioms. Paragraph 60 of [Kleene 1952] describes the way of formalizing informal predicate P (x ), for instance a numerical predicate. The formal system for such a predicate must have a domain of “formal objects,” which are identified through P (0),P (1),P (2),... etc. sentences. With regard to the language of the formal system, formula A(x ), which is used to express predicate P (x ), corresponds to the abovementioned predicate. It seems that the informal predicate is an intuitive concept formalized in this manner. Kleene does not mention the predicate as an abstract entity. It seems that the predicate is the concept a mathematician has at his/her disposal. It appears to denote a conceptualistic view. In his [1967], Kleene clearly maintains that there is more mathematics than there is formal mathematics. To support his statement he argues that proving the satifability of an axiomatic system requires means from outside the system, in particular from another axiomatic system and the satisfability of this set of axioms requires, in turn, another system, and so on ad infinitum. He writes: “[...] if we are not to adopt a mathematical nihilism, formally axiomatized mathematics must not be the whole of mathematics. At some places there must be meaning, truth and falsehood [My em-
390
Adam Olszewski
phasis AO]”.13 Although the abovementioned view offers some hope, regarding the existence of abstract objects, but it can also fit in with the conceptualistic view. Probably, it cannot fit in with the nominalistic view if we understand nominalism to mean the doctrine that no abstract objects (concepts, judgments) exist—semantics, mathematics and other similar sciences are to be explained without any reference thereto. I believe Kleene’s opinions demonstrate that in his view CT does not have any mathematical, well defined sense. It states the identity of two concepts and the concepts are not a well defined universe of objects. The concepts can in fact be approached using an axiomatic method, but there is the issue that the relations between them are unclear. If concepts do not exist objectively research carried out on them using the axiomatic method will always be arbitrary because there is no ‘target’ base for developing research on particular theories. The abovementioned argument also refers to the conceptualistic view of concepts and, in particular, to the CT2 version of the Thesis. Set theory, in the case of which axiomatic strengthening can lead to, lets say, some (independent) axiom A as well as to the negation thereof, can serve as an example. For this reason, Kleene centered his investigations on the Thesis on the formulation originating from Church CT1. Studies of CT1 over the past seventy years have shown that this path is perhaps wrong given the insignificant progress achieved. The conciseness of Kleene’s statement concerning the existence of abstract objects is also evidence that the Thesis as expressed in the form of CT becomes ‘invisible’ to a mathematician (logician) that has the same attitude as Kleene. Such an investigator simply fails to notice it because it makes no sense to him. It means that we should search for a presupposition; i.e. a sentence that, when accepted, makes CT a sentence that is sensible to a logician. The following sentence may serve as such a presupposition: (PCT) There are concepts and they can be investigated using exact methods. One test to distinguish presupposition A of sentence B (symbolically: BIA; A is presupposed by B or B presupposes A) from the consequence thereof is the fact that the consequence of a sentence is not the consequence of the negation thereof as long as the above13
[Kleene 1967, p. 193].
Church’s Thesis as Formulated by Church...
391
mentioned consequence is not a tautology of logic. In our case, we come up with: CTIPCT and ¬CTIPCT. If we can accept CT|= PCT, ¬CT|= PCT does not have to be the case because the notions cannot exist or cannot be investigated in any intersubjective sense. However, the negation of CT can remain true. Question: Given the above, shouldn’t the paradigm of research on Church’s Thesis be altered by dealing first with PCT? It seems that in 1935 Church also did not envision any hope of attacking the Thesis in its form as an identity of two concepts. According to Anderson [1998], he was initially a supporter of fictionalism, the doctrine that mathematical objects are fictions, a part of the abstract structure developed to enable an understanding of reality. Later, Church adopted a more realistic view. He seemed to believe that the issue of the existence of abstract objects (including whether general terms had designations) was in general illusory, but only if considered outside some specified theory. It was only on the basis of a theory properly tested by its consequences that the existence of objects stipulated by that theory was to be justified [Anderson 1998 p. 137]. Church in his [1951] tried to create a formal, interpreted system to describe a universe of abstract entities, including, in particular, concepts. I believe that throughout his scientific career, Church conducted logical research on abstract objects and intensional logics and perhaps it was related (at least indirectly) to his research on his Thesis.14 In [1951] and [1956] Church changed his understanding of concepts and officially broke with Frege’s conception. He eliminated Frege’s concepts and adopted the following definition basing on Frege’s senses: The meaning of expression E = the concept of its denotation.15 3. Two interpretations of Church’s Thesis were considered in this paper. I have argued against Kleene’s formulation of the Thesis as non adequate. In the last part Kleene’s implicit philosophical convictions and their impact on the formulation of the thesis were brought out. 14 Unlike Kleene, who did not deal with such matters. The abovementioned paper by Anderson is an excellent demonstration of the development of Church’s opinions on the issues in question. 15 Cf. [Church 1956, pp. 5–6]. I use the expression contained in the paper of Materna [1998, p. 15].
392
Adam Olszewski
References Anderson, C.A. [1998], “Alonzo Church’s Contributions to Philosophy and Intensional Logic”, The Bulletin of Symbolic Logic 4, 129–171. Church, A. [1935], “An Unsolvable Problem of Elementary Number Theory”, Bulletin AMS 41, 332–333. Church, A. [1936] “An Unsolvable Problem of Elementary Number Theory”, The American Journal of Mathematics 58, 345–363; reprint in [Davis 1965, pp. 88–107]. Church, A. [1951], “The Need for Abstract Entities”, American Academy of Arts and Sciences Proceedings 80, 100–113. Church, A. [1956], Introduction to Mathematical Logic, Princeton. Davis, M. (ed.) [1965], The Undecidable, Raven Press. Hale, J. and Ko¸cak, H. [1991], Dynamics and Bifurcartions, Springer Verlag. Kleene, S. [1943], “Recursive Predicates and Quantifiers”, Transactions of the American Mathematical Society 53, 41–73; reprint in [Davis 1965, pp. 255–287]. Kleene, S. [1952], Introduction to Metamathematics, Van Nostrand. Kleene, S.C. [1967], Mathematical Logic Wiley & Sons. Kleene, S.C. [1976], “The Work of Kurt G¨odel”, The Journal of Symbolic Logic 41, 761–778. Kreisel, G. [1987], “Church’s Thesis and the Ideal of Informal Rigour”, Notre Dame Journal of Formal Logic 28, 499–512. Materna, P. [1998], “Concepts and Objects”, Acta Philosophica Fennica 63, Helsinki. Odifreddi, P.G. [1996], “Kreisel’s Church” in Kreiseliana, P.G. Odifreddi (ed.), AK Peters. Suppes, P. [1957], Introduction to Logic, Van Nostrand. Tich´ y, P. [1988], The Foundations of Frege’s Logic, de Gruyter, Berlin-New York.
Oron Shagrir∗
G¨ odel on Turing on Computability 1. Introduction In section 9 of his paper, “On Computable Numbers,” Alan Turing outlined an argument for his version of what is now known as the Church–Turing thesis: “the [Turing machine] ‘computable’ numbers include all numbers which would naturally be regarded as computable” [p. 135]. The argument, which relies on an analysis of calculation by an (ideal) human, has attracted much attention in recent years.1 My aim here is to explain G¨odel’s puzzling response to Turing’s argument. On the one hand, G¨odel emphasizes at every opportunity that “the correct definition of mechanical computability was established beyond any doubt by Turing” [193?, p. 168]. On the other, in a short note entitled “A philosophical error in Turing’s work,” G¨ odel [1972a] criticizes Turing’s argument for this definition as resting on the dubious assumption that there is a finite number of states of mind. What are we to make of this apparent inconsistency in G¨odel’s response to Turing’s argument? How could G¨odel praise Turing’s definition of computability, yet maintain that Turing’s argument for this definition was flawed? This paper is part of a wider project that aims to establish that, contrary to the widespread assumption, there is no unique concept of that which would naturally be regarded as computing. In earlier papers, I showed that the so-called informal, intuitive, or pre-theoretic notion of computing is understood differently in different contexts. What we take computing to be in the context of computer science2 ∗
O. Shagrir, Departments of Philosophy and Cognitive Science, The Hebrew University of Jerusalem. 1 See, e.g., [Sieg 1994; 2002]. 2 Shagrir [2002].
394
Oron Shagrir
is different from what we take it to be in the context of physical computation3 or neuroscience.4 Further, the pioneers of computability, namely, Church, G¨ odel, Kleene, Post and Turing, had no shared conception of computing. In the present paper, I will argue that G¨odel and Turing anchor finiteness constraints on computability in different considerations. For Turing, these constraints are imposed by limitations on human cognitive capacities, whereas for G¨odel, they arise out of foundational work in mathematics in which they are invoked, principally Hilbert’s finitistic program The paper is structured as follows. I start with a brief review of Turing’s analysis of computability (section 2). I consider the question of why G¨ odel praises Turing’s analysis (section 3), and then address G¨odel’s critique of Turing’s argument (section 4). In the last section (section 5), I reconcile this apparent inconsistency, suggesting that G¨odel endorsed the finiteness constraints set down by Turing, but rejected Turing’s claim that they arise out of the limitations imposed by human processing capacities.
2. Turing’s Analysis of Computability Four articles published in 1936 put forward precise mathematical characterizations of, to use Turing’s idiom, that which would naturally be regarded as computable, or what we now call “effective computability.” Church [1936a] characterized the computable in terms of lambda-definability,5 Kleene [1936] characterized the computable in terms of general recursive functions,6 Post [1936] characterized it in terms of combinatory processes, and Turing [1936] in terms of Turing machines.7 Turing wrote the paper when he was a student at Cambridge, and unaware of the other works. He was attempting to solve a problem he had learned of in a course taught by M.H.A. Newman.8 The problem, referred to by Hilbert and his disciples as the Entscheidungsproblem, is now known as the decid3
Shagrir and Pitowsky [2003]. Shagrir [forthcoming]. 5 Church started this work in the early 1930’s and completed it with Kleene, who was his student at Princeton; see [Kleene 1981]. 6 This characterization is based on the expansion of primitive recursiveness by Herbrand [1931] and G¨ odel [1931, 1934]; for some of the history see [Kleene 1981]. 7 The term ‘Turing machine’ first appears in Church [1937a], a review of Turing [1936]. 8 Hodges [1983, pp. 90ff]. 4
¨ del on Turing on Computability Go
395
ability of first-order predicate logic.9 To show that the problem has no solution, it is necessary to provide a precise characterization of that which can be solved by an effective (algorithmic) procedure, of which there are infinitely many. Church [1936b] proved the unsolvability of the Entscheidungsproblem by utilizing the notions of lambda-definability and recursiveness. Turing came up with a different approach: he reduced the concept of an algorithmic procedure to that of a Turing machine, and then proved that no Turing machine can solve the Entscheidungsproblem.10 Turing opens his paper with this statement: “The ‘computable’ numbers may be described briefly as the real numbers whose expressions as a decimal are calculable by finite means” [1936, p. 116]. It quickly becomes apparent that Turing associates the intuitive idea of computability with calculability by humans.11 He sets out to provide an explicit definition of computability in terms of (Turing) machine computability, and asserts that “the justification [for this definition] lies in the fact that the human memory is necessarily limited” [p. 117]. Turing then compares “a man in the process of computing a real number” to “a machine which is only capable of a finite number of conditions” [p. 117]; after an informal exposition of the machine’s operations, he states that “these operations include all those which are used in the computation of a number” [p. 118]. The justification for this contention is not produced until section 9. In the interim, Turing provides a mathematical characterization of his machines, proves that the set of these machines is enumerable, shows that there is a universal (Turing) machine, and describes it in detail. He formulates the halting problem, and proves that it cannot be decided by a Turing machine. On the basis of that proof, Turing arrives, in section 11, at his ultimate goal: proving that the Entscheidungsproblem is unsolvable. In section 9 Turing turns to the task of justifying the identification of what would naturally be regarded as computable with Turing machine computability. He adduces three arguments, with the caveat that “all arguments which can be given are bound to 9
Hilbert and Ackermann [1928]. Turing proved the equivalence of Turing machine computability with both lambda-computability and recursiveness in the appendix he added to the 1936 paper after becoming aware of the other work; see [Kleene 1981] for the historical details. 11 See also [Turing 1947, p. 107]. 10
396
Oron Shagrir
be, fundamentally, appeals to intuition, and for this reason rather unsatisfactory mathematically” [p. 135]. Turing presents the first argument, on which I will focus in this paper, in part I of section 9, with a modification being added in part III. He characterizes it as a “direct appeal to intuition” and “only an elaboration” of the ideas he had put forward in section 1 [p. 135]. The second argument, in part II, is a proof that all Turing machines can be embedded in first-order logic (“restricted Hilbert functional calculus”). The third argument, in section 10, consists of “examples of large classes of numbers which are computable” [p. 135]. Turing’s argument rests on three ingenious ideas. The first is that computation is defined over symbols and carried out by an agent: this agent is a human computer. Thus an adequate characterization of computability should rely on an analysis of human computability. The second idea is that to characterize the computable functions (or numbers, or relations), we should focus on the relevant computational processes: “The real question at issue is ‘What are the possible processes which can be carried out in computing a number?’” [p. 135]. The problem, of course, is to characterize the set of infinitely many different computational processes. Here, Turing’s third idea kicks in. Instead of trying to capture all possible processes, Turing offers a small number of constraints on the process of computation. The pertinent constraints must have two properties. First, they must be sufficiently general that their truth will be virtually self-evident. Second, they must be such that when satisfied, the operations of the computing agent can be mimicked by a Turing machine, in other words, the computed function must be Turing-machine computable. Turing’s argument can be summarized as follows: Premise 1 (Thesis H): A human computer satisfies certain constraints. Premise 2 (Turing’s theorem): The functions computable by a computer satisfying these constraints are Turing-machine computable. Conclusion (Turing’s thesis): The functions computable by a human computer are Turing-machine computable. What are the constraints on computation? Turing begins his analysis with the observation that “computing is normally done by
¨ del on Turing on Computability Go
397
writing certain symbols on paper. We may suppose this paper is divided into squares like a child’s arithmetic book” [p. 135]. He remarks that the paper is normally two-dimensional, but we can safely assume, with no loss of generalization, that the paper is onedimensional, i.e., a tape divided into squares. We can also assume that “the number of symbols which may be printed is finite” [p. 135], as we can always use more squares if the string of symbols is lengthy. We can thus assume, with no loss of generality, that we can print at most one symbol in each square. Turing then sets out a number of more specific constraints. The first is a sort of determinism: “the behavior of the computer at any moment is determined by the symbols which he is observing, and his ‘state of mind’ at that moment” [p. 136]. He then formulates two restrictions on the conditioning states, i.e., the observed symbols on the tape and the computer’s own states of mind, and two on the computer’s operations. The first restriction is that “there is a bound B to the number of symbols or squares which the computer can observe at one moment. If he wishes to observe more, he must use successive observations” [p. 136]. This restriction is motivated by the limitations on our sensory, mainly visual, apparatus, and, in particular, the fact that the visual field contains a limited number of symbols or squares that a human can recognize. The second restriction is that “the number of states of mind which need be taken into account is finite” [p. 136]. Turing justifies the restriction with the obscure statement that “if we admitted an infinity of states of mind, some of them will be ‘arbitrarily close’ and will be confused” [p. 136]. He adds that “the restriction is not one which seriously affects computation, since the use of more complicated states of mind can be avoided by writing more symbols on the tape” [p. 136]. In part III of section 9 Turing takes this suggestion a step further, replacing states of mind with a “more physical and definite counterpart” [p. 139], namely, symbolic expressions that can be written on the tape. A crucial assumption is being made here, the assumption that the externalized expressions are finite, but Turing provides little support for it. He seems to appeal to the second argument, found in part II of section 9, according to which any state of a Turing machine is convertible into a formula in first-order predicate logic (and vice versa), but this is just begging the question, for the
398
Oron Shagrir
problematic assumption is the finite character of states of mind, not Turing machines. This will be further addressed below. Turing proceeds to specify restrictions on the possible operations of a computer, asking us “to imagine the operations the computer performs are to be split up into ‘simple operations’” [p. 136]. “Every such operation,” he asserts, “consists of some change of the physical system consisting of the computer and his tape.” These changes are of three types: changes in the symbols on the tape, changes in the squares being observed, and changes in the computer’s state of mind. As to changes in the symbols, Turing suggests that “the squares whose symbols are changed are always ‘observed’ squares.” As to changes in the squares being observed, Turing suggests that the distance between new and previously-observed squares “does not exceed a certain fixed amount,” i.e., it is bounded. And regarding changes in the computer’s state of mind, he says only that such changes can be conjoined with either of the other two types of changes [p. 137]. Given these restrictions, Turing undertakes to establish the second premise, namely, that we can “construct a [Turing] machine to do the work of this [human] computer” [p. 137]. Turing argues for this only briefly, but the idea is amazingly simple: the constraints imply that there are only finitely many types of changes, or atomic operations, that a computer can undergo. It is thus not difficult to agree that these operations can be mimicked by a Turing machine. It is true, as Kleene later explains in his discussion of Turing, that “the human computer is less restricted in behavior than the machine” [1952, p. 377]. In particular, Kleene claims, a human can (a) observe more than one token symbol at a time; (b) perform more complicated atomic acts; (c) use tape that is multi-dimensional; and (d) choose a variety of symbolic representations. But we can reduce these operations to those that can be executed by a Turing machine. Turing takes this reduction to be non-problematic, stating that every action performed by the human computer can be reduced to a finite number of successive steps by a Turing machine. Kleene [1952, pp. 378–381] shows in detail how this reduction is to be carried out with respect to each of the four features of human computing just mentioned. Gandy [1980] and more explicitly, Sieg [2002], formalize the restrictive conditions and provide a detailed mathematical
¨ del on Turing on Computability Go
399
proof of the second premise.12 We can thus call this result Turing’s theorem.
3. G¨ odel on Turing’s Analysis of Computability It is well known that a statement that seems to be much like the Church–Turing thesis appears in the printed version of G¨odel’s 1934 Princeton lectures. In the body of the paper, G¨odel formulates what is generally taken to be the “easy” part of the Church–Turing thesis: “[primitive] recursive functions have the important property that, for each given set of values of the arguments, the value of the function can be computed by a finite procedure” [p. 348]. In a footnote to this statement, G¨ odel remarks that “the converse seems to be true if, besides [primitive] recursions [...] recursions of other forms (e.g., with respect to two variables simultaneously) are admitted [i.e., general recursions]. This cannot be proved, since the notion of finite computation is not defined, but it serves as a heuristic principle” [p. 348, note 3]. However, in a letter to Martin Davis (February 15, 1965), G¨odel denies that the 1934 paper anticipated the Church–Turing thesis: It is not true that footnote 3 is a statement of Church’s Thesis. The conjecture stated there only refers to the equivalence of ‘finite (computation) procedure’ and ‘recursive procedure.’ However, I was, at the time of these lectures, not at all convinced that my concept of recursion comprises all possible recursions; and in fact the equivalence between my definition and Kleene [1936] is not quite trivial. [Davis 1982, p. 8]
Indeed, Church, who met with G¨ odel—apparently, early in 193413 — commented on the encounter in a letter to Kleene [dated November 29, 1935]: In regard to G¨ odel and the notions of recursiveness and effective calculability, the history is the following. In discussion with him [sic] the notion of lambda-definability, it developed that there was no good definition of effective calculability. My 12
Gandy does not invoke Turing’s restrictions directly but they can be derived from his conditions on machine computation by adding more restrictions; see [Sieg and Byrnes 1999]. 13 Sieg [1997, p. 160, note 9].
400
Oron Shagrir proposal that lambda-definability be taken as a definition of it he regarded as thoroughly unsatisfactory.14
But before too long, G¨ odel’s attitude changed. In an unpublished paper, probably from 1938, he writes: When I first published my paper about undecidable propositions the result could not be pronounced in this generality, because for the notions of mechanical procedure and of formal system no mathematically satisfactory definition had been given at that time. This gap has since been filled by Herbrand, Church and Turing [193?, p. 166].15
Thus just four years after having rejected it, G¨odel embraces Church’s proposal, attributing the “mathematically satisfactory definition” of computability to Herbrand, Church and Turing. Why did G¨odel change his mind? It is difficult to say with certainty, but clearly, Turing’s work was a significant factor. Initially, he mentions Turing together with Herbrand and Church, but a few pages later G¨odel refers to Turing’s work as having demonstrated the correctness of the mathematical definition(s): “that this really is the correct definition of mechanical computability was established beyond any doubt by Turing” [p. 168]. More specifically: [Turing] has shown that the computable functions defined in this way are exactly those for which you can construct a machine with a finite number of parts which will do the following thing. If you write down any number n1 ,...,nr on a slip of paper and put the slip into the machine and turn the crank, then after a finite number of turns the machine will stop and the value of the function for the argument n1 ,...,nr will be printed on the paper [p. 168].
From this point on, G¨ odel refers to Turing’s work as establishing the “correct definition” of mechanical computability. In his Gibbs Lecture, for instance, he declares: The greatest improvement was made possible through the precise definition of the concept of finite procedure, which plays a decisive role in these results. There are several different ways 14 15
[Davis 1982, p. 9]. Davis dates the article to 1938; see his introduction to [G¨ odel 193?].
¨ del on Turing on Computability Go
401
of arriving at such a definition, which, however, all lead to exactly the same concept. The most satisfactory way, in my opinion, is that of reducing the concept of finite procedure to that of a machine with a finite number of parts, as has been done by the British mathematician Turing [G¨odel 1951, pp. 304–305].
And in his 1964 postscript to the 1934 article, G¨odel reaffirms: In consequence of later advances, in particular of the fact that, due to A.M. Turing’s work, a precise and unquestionably adequate definition of the general concept of formal system can now be given, the existence of undecidable arithmetical propositions and the non-demonstrability of the consistency of a system in the same system can now be proved rigorously for every consistent formal system containing a certain amount of finitary number theory. Turing’s work gives an analysis of the concept of ‘mechanical procedure’ (alias ‘algorithm’ or ‘computation procedure’ or ‘finite combinatorial procedure’). This concept is shown to be equivalent with that of a ‘Turing machine’ [pp. 369–370].
Generally speaking, whether we use Turing’s precise definition of computability or other definitions, e.g., general recursiveness, is irrelevant from G¨ odel’s perspective.16 After all, they have been proven to be extensionally equivalent. Yet it is Turing’s analysis that G¨ odel consistently appeals to as accounting for the correctness of these equivalent definitions of computability. This is a significant point. Turing’s argument has been fully appreciated only recently.17 Turing’s contemporaries mention the analysis, along other familiar arguments, e.g., the confluence of different notions, quasi-empirical evidence, and Church’s step-by step argument, but do not ascribe 16
In his “Remarks before the Princeton bicentennial conference on problems in mathematics” [G¨ odel 1946], G¨ odel, again, does not give priority to Turing computability over the other definitions: “Tarski has stressed in his lecture (and I think justly) the great importance of the concept of general recursiveness (or Turing’s computability)” [p. 150]. 17 Turing’s argument has been praised e.g., Copeland [2002a], Shagrir [2002] and Sieg [2002]; Gandy [1988], Sieg [1994] and Soare [1996] even take it to prove the Church–Turing thesis. The rediscovery of Turing’s analysis is underscored in Martin Davis’s comment [1982, p. 14, note 15] that “this [Turing’s] analysis is still very much worth reading. I regard my having failed to mention this analysis in my introduction to Turing’s paper in Davis [1965] as an embarrassing omission.”
402
Oron Shagrir
it any special merit.18 Logic and computer science textbooks from the decades following the pioneering work of the 1930s ignore it altogether.19 In light of this, G¨ odel’s praise of Turing’s analysis as the decisive argument for the definition of computability is noteworthy.20 Nevertheless, as G¨ odel does not explain just what it is that makes Turing’s conceptual analysis so convincing, let me begin by clarifying this. First, G¨ odel favors conceptual analyses over other arguments. He sees the problem of defining computability as “an excellent example [...] of a concept which did not appear sharp to us but has become so as a result of a careful reflection” [Wang 1974, p. 84]. He also says that “Turing’s work gives an analysis of the concept of ‘mechanical procedure.’ [...] This concept is shown to be equivalent with that of a Turing machine” (my emphasis),21 and, that it is “absolutely impossible that anybody who understands the question and knows Turing’s definition should decide for a different concept” [Wang 1974, p. 84]. Second, Turing’s analysis is based on an axiomatic approach, the constraints being formulated as basic axioms. This is the approach G¨ odel recommends in his 1934 conversation with Church, in which he rejects Church’s proposal as “thoroughly unsatisfactory.” G¨ odel suggests to Church that “it might be possible, in terms of an effective calculability as an undefined notion, to state a set of axioms which would embody the generally accepted 18
Church describes Turing’s identification of effectiveness with Turing-machine computability as “evident immediately” [1937a, p. 43], “an adequate representation of the ordinary notion” [1937b, p. 43], and as having “more immediate intuitive appeal” [1941, p. 41], but does not say it is more convincing than other arguments; see also [Kleene 1952]. 19 Turing’s argument is mentioned in the early days of automata theory, e.g., by McCulloch and Pitts [1943], Shannon and McCarthy [1956], in their introduction to Automata Studies, and Minsky [1967, pp. 108–111], who cites it almost in full in his Finite and Infinite Machines. Yet even Minsky asserts that the “strongest argument in favor of Turing’s thesis is the fact that [...] satisfactory definitions of ‘effective procedure’ have turned out to be equivalent” [p. 111]. After Minsky, there is no mention of Turing’s argument in logic and computer science textbooks. The two arguments always given for the Church–Turing thesis are the confluence (equivalence) of definitions, and the lack of counterexamples; see, e.g., [Boolos and Jeffrey 1989, p. 20], and [Lewis and Papadimitriou 1981, pp. 223–224]. 20 G¨ odel doesn’t mention the confluence of definitions and other quasi-empirical evidence as supporting the correctness of the precise definitions in either the Gibbs Lecture or the 1964 postscript to the 1934 paper. 21 In a footnote to this sentence [note 35, p. 370] G¨ odel refers readers to Turing’s section 9, where the analysis of computability is carried out. He also mentions “the almost simultaneous paper by E.L. Post [1936].”
¨ del on Turing on Computability Go
403
properties of this notion, and to do something on that basis” [Davis 1982, p. 9].22 Third, Turing’s definition highlights what G¨odel takes to be the defining properties of a formal system: being governed by a procedure that is finite and mechanical. Let me elaborate. There is an important difference in the contexts in which Turing and G¨ odel invoke the definition of computability. While Turing defines computability in the context of the Entscheidungsproblem, G¨odel sees the significance of Turing’s definition in the context of the generality of the incompleteness theorems.23 As he says, the incompleteness results “could not be pronounced in this generality,”24 until the “gap” was “filled by Herbrand, Church and Turing.” But how exactly has the gap been filled? When G¨ odel discusses formal systems, even before 1936, he invokes two essential properties. The property of being finite is stressed in 1934, where G¨ odel opens his address with a characterization of a “formal mathematical system” [p. 346]: We require that the rules of inference, and the definitions of meaningful formulas and axioms, be constructive; that is, for each rule of inference there shall be a finite procedure for determining whether a given formula B is an immediate consequence (by that rule) of given formulas A1 ,..., An , and there shall be a finite procedure for determining whether a given formula A is a meaningful formula or an axiom. [p. 346]
The property of being mechanical is spelled out in G¨odel’s [1933b] address to the Mathematical Association of America (“The Present 22
Sieg [2002, p. 400] has suggested that Turing’s analysis meets G¨ odel’s desideratum. 23 Turing refers to G¨ odel’s theorems [1936, p. 145], but does not mention the importance of his work for establishing their generality. At the time, G¨ odel was interested in classes of formulas he took to be decidable (see Goldfarb’s introductory note to G¨ odel [1932, 1933a]), but was not concerned with the Entscheidungsproblem; see also [Gandy 1988, pp. 63–64]. 24 G¨ odel’s results were published in 1931 under the title “On formally undecidable propositions of Principia Mathematica and related systems I.” As the title implies, the results apply to “Principia Mathematica and related systems,” and, more precisely, to the formal system P, which is “essentially the system obtained when the logic of PM is superposed upon the Peano axioms” [1931, p. 151], and its extensions, which are the “ω-consistent systems that result from P when [primitive] recursively definable classes of axioms are added” [p. 185, note 53].
404
Oron Shagrir
Situation in the Foundations of Mathematics”). He opens the address with a rough characterization of formal systems, pointing out that the “outstanding feature of the rules of inference being that they are purely formal, i.e., refer only to the outward structure of the formulas, not to their meaning, so that they could be applied by someone who knew nothing about mathematics, or by a machine” [p. 45]. From G¨ odel’s perspective, Turing, in defining computability in terms of “a machine with a finite number of parts” (emphasis added), i.e., in terms of a Turing machine, provides a precise mathematical characterization of a procedure that explicitly satisfies both constraints. As such, Turing provides “a precise and unquestionably adequate definition of the general concept of formal system,” hence the incompleteness results are true for “every consistent formal system containing a certain amount of finitary number theory.”25 It is important to note, however, that although G¨odel vacillates between the terms ‘mechanical procedure’ and ‘finite procedure’ when referring to formal systems,26 he makes clear that they are not synonymous.27 Moreover, while G¨odel takes the finite and mechanical constraints to go hand in hand in the context of a formal system, 25 Or as G¨ odel puts it in the 1964 postscript to 1934: “Turing’s work gives an analysis of the concept of ‘mechanical procedure’ (alias ‘algorithm’ or ‘computation procedure’ or ‘finite combinatorial procedure’). This concept is shown to be equivalent with that of a ‘Turing machine.’ A formal system can simply be defined to be any mechanical procedure for producing formulas, called provable formulas” [pp. 369–370]. See also his Gibbs Lecture: “This requirement for the rules and axioms is equivalent to the requirement that it should be possible to build a finite machine, in the precise sense of a ‘Turing machine,’ which will write down all the consequences of the axioms one after the other” [1951, p. 308]. 26 In 1934 he says “finite procedure” and “finite computation,” in 1938, “mechanical procedure,” in 1951 he reverts to “finite procedure,” and in 1964 to “‘mechanical procedure’ (alias ‘algorithm’ or ‘computation procedure’ or ‘finite combinatorial procedure’).” 27 When explaining, in the postscript to the 1934 paper, the relevance of Turing’s characterization to the (incompleteness) results, G¨ odel states that “for any formal system in this sense [of mechanical procedure] there exists one in the sense of page 346 above that has the same provable formulas (and likewise vice versa), provided the term ‘finite procedure’ occurring on page 346 is understood to mean ‘mechanical procedure.’ This meaning, however, is required by the concept of formal system, whose essence it is that reasoning is completely replaced by mechanical operations on formulas” [1964, p. 370]. Further down the same page G¨ odel reiterates the point, saying that “if ‘finite procedure’ is understood to mean ‘mechanical procedure,’ the question raised in footnote 3 can be answered affirmatively.” Footnote 3 states G¨ odel’s 1934 assertion that finite computation is included in recursiveness.
¨ del on Turing on Computability Go
405
in the sense that the procedure must satisfy both constraints, he does not maintain that they coincide in other contexts. In particular, he insists that there are finite procedures that are non-mechanical: “the question of whether there exist finite non-mechanical procedures, not equivalent with any algorithm, has nothing whatsoever to do with the adequacy of the definition of ‘formal system’ and of ‘mechanical procedure’” [G¨ odel 1964, p. 370]. In a footnote he refers to his 1958 article, where he posits procedures that can be construed as fulfilling the finiteness requirement, in that they are constructive, yet are non-mechanical in that they “involve the use of abstract terms on the basis of their meaning” [ibid., note 36].28 As we will now see, these procedures play an important role in G¨odel’s critique of Turing.
4. A Philosophical Error in Turing’s Work In 1972, G¨ odel publishes three short remarks on the undecidability results [1972a]. The third is entitled “A philosophical error in Turing’s work.” G¨ odel declares: Turing in his [1936, section 9] gives an argument which is supposed to show that mental procedures cannot go beyond mechanical procedures. However, this argument is inconclusive. What Turing disregards completely is the fact that mind, in its use, is not static, but constantly developing, i.e., that we understand abstract terms more and more precisely as we go on using them, and that more and more abstract terms enter the sphere of our understanding. There may exist systematic methods of actualizing this development, which could form part of the procedure. Therefore, although at each stage the number and precision of the abstract terms at our disposal may be finite, both (and, therefore, also Turing’s number of distinguishable states of mind) may converge toward infinity in the course of the application of the procedure [p. 306].
What could have motivated G¨odel to ascribe this philosophical error to the very analysis he otherwise praises? Fortunately, there is no need to speculate: the answer is explicitly articulated in G¨odel’s 28 See also [G¨ odel 1958], especially [p. 245], and [1972b, pp. 271–275]. These procedures are put forward in the context of expanding Hilbert’s finitary proof methods to include more than just computational procedures; see [Bernays 1935], [Zach 2003].
406
Oron Shagrir
conversation with Wang [1974, pp. 324–326]. G¨odel contends that the incompleteness results imply that “either the human mind surpasses all machines (to be more precise: it can decide more numbertheoretical questions than any machine) or else there exist number theoretical questions undecidable for the human mind” [p. 324].29 He consistently and unambiguously chooses the former option, averring that “Hilbert was right in rejecting the second alternative” [p. 324], namely, the claim that there are unanswerable mathematical questions. The results, he maintains, indeed indicate that some questions are not decidable in the sense of solvable by means of a formalism, or as we can now put it, by means of a finite machine, viz., a Turing machine. But they do not imply that such questions are “absolutely unsolvable,” namely, cannot be answered by other means. In the 1964 postscript, he expresses this view as follows: “the results mentioned in this postscript do not establish any bounds for the powers of human reason, but rather for the potentialities of pure formalism in mathematics” [p. 370].30 The human mind, according to G¨odel, “infinitely surpasses the powers of any finite machine” [1951, p. 310]. This makes it readily apparent what bothers G¨odel about Turing’s argument. The concern is that any system constrained by the finiteness conditions set down by Turing cannot transcend the computable. If these constraints indeed apply to the human mind, it cannot surpass the powers of a Turing machine.31 To avoid this conclusion, G¨ odel rejects the constraint on the number of states of mind, namely, the finitude of human memory, assuming instead that there are finite (constructive) but non-mechanical mental procedures that enable the mind to infinitely surpass the powers of any finite ma29
In his Gibbs Lecture G¨ odel expresses the dilemma thus: either the “human mind (even within the realm of pure mathematics) infinitely surpasses the powers of any finite machine, or else there exist absolutely unsolvable diophantine problems of the type specified ” [1951, p. 310]. 30 In a footnote [1972a, p. 306, note 2], G¨ odel explains that the third remark can be seen as a footnote to this sentence. 31 In a conversation with Wang, G¨ odel says: “Turing’s argument becomes valid under two additional assumptions, which today are generally accepted, namely: 1 There is no mind separate from matter. 2. The brain functions basically like a digital computer. (2 may be replaced by: 2’ The physical laws, in their observable consequences, have a finite limit of precision.)” According to Wang, however, ”while G¨ odel thinks that 2 is very likely and 2’ practically certain, he believes that 1 is a prejudice of our time, which will be disproved scientifically” [Wang 1974, p. 326]. Thus G¨ odel apparently holds that Turing’s constraints, or a version of them, apply to the brain, but not the mind.
¨ del on Turing on Computability Go
407
chine. G¨ odel admits that “construction of a well-defined procedure which could actually be carried out (and would yield a non-recursive number-theoretic function) would require a substantial advance in our understanding of the basic concepts of mathematics” [1972a, p. 306]. He mentions two possible examples of such procedures: “the process of forming stronger and stronger axioms of infinity in set theory,” and “the process of systematically constructing, by their distinguished sequences αn → α, all recursive ordinals α of the second number-class” [1972a, p. 306].32 Whether Turing, in 1936, sought to provide an argument to the effect that mental procedures cannot go beyond mechanical procedures, is open to question, as is the degree to which his analysis of computability motivated this claim, if he indeed made it.33 Also open to question is whether we can avoid this mechanistic conclusion by invoking non-mechanical procedures.34 Our question here, however, is different, and pertains to the duality in G¨odel’s response to Turing. On the one hand, as we saw, G¨odel repeatedly praises Turing’s analysis of computability, saying it produced a “correct and unique” definition of “the concept of mechanical” in terms of “the sharp concept of ‘performable by a Turing machine’” [Wang 1974, p. 84]. On the other, he deems “fallacious” Turing’s “alleged proof” for the “equivalence of minds and machines,” which rests on the very same analysis [Wang 1974, p. 325]. How could G¨odel praise Turing’s 32
For more examples, see [Wang 1974, pp. 325–326]. Although it has been claimed that Turing [1950] endorsed a mechanistic view of the mind, namely, the view that thought consists of finite and mechanical procedures, it is fairly clear that this claim is incorrect. It is also doubtful that G¨ odel attributes this view to Turing. What G¨ odel apparently attributes to Turing is the weaker view that “mental procedures cannot go beyond mechanical procedures” in the more behavioristic sense that the behavior generated by mental procedures would not be distinguishable from that generated by mechanical procedures. G¨ odel might have thought that this weaker view, which Turing apparently advances in his 1950 Mind article, was expressed in the 1936 article, and motivated by the claim (which G¨ odel does attribute to Turing) that the finiteness constraints on mechanical procedures are general limitations on the human mind. Whether Turing really made the latter claim is a matter of controversy, and is briefly discussed in the next section [see note 40]. 34 Webb [1980] argues that even if the mind is capable of infinitely many states “it would still have to be shown that we could make effective use of all these states” [p. 223]. Kleene [1987] argues that since the methods for converging to infinity are not explicitly stated, what G¨ odel is contemplating is just “pie in the sky” [p. 494]. 33
408
Oron Shagrir
definition but not Turing’s argument? How could he invoke Turing’s analysis of computability and reduction of “the concept of finite procedure to that of a machine with a finite number of parts,” while at the same time rejecting what seems to be a key element of this analysis? In his introductory note to G¨odel [1972a], Webb suggests that G¨odel was of the opinion that “all Turing was really analyzing was the concept of ‘mechanical procedure,’ but in his arguments for the adequacy of his analysis he overstepped himself by dragging in the mental life of a human computer” [1990, p. 302]. This, I think, is exactly right, but the question, as Webb notes, remains: how could G¨odel “enjoy the generality conferred on his results by Turing’s work, despite the error of its ways”? [1990, p. 293].
5. Making (More) Sense of G¨ odel’s Comments An obvious way to reconcile the disparity is to appeal to the distinction between Turing’s argument in part I of section 9 (henceforth, type I argument), where Turing assumes that the “the number of states of minds which need be taken into account is finite” [p. 136], and its modification in part III of section 9 (henceforth, type III), in which “we avoid introducing the ‘state of mind’ by considering a more physical and definite counterpart of it” [p. 139], i.e., written symbolic instructions. Thus, according to Webb, Feferman maintains that “G¨ odel rejected only Turing’s type I argument, while accepting his ‘more physical’ type III argument” [1990, p. 297], and Webb concurs that “G¨ odel took issue with [...] [Turing’s] type I argument” [1990, p. 302]. If this suggestion is correct, G¨odel rejects the type I argument because it anchors the finite and mechanical nature of computational procedures in the assumption that human memory is necessarily limited, and, in particular, the number of states of mind is bounded. He embraces the type III argument because it does not rest on this assumption, but on a “more physical and definite counterpart of it.” What must still be explained, however, is how the “more physical and definite counterpart” anchors the finite and mechanical nature of the computation procedure. Is it not possible that there is a computation procedure that is either non-finite or non-mechanical? As we saw, G¨ odel himself alludes to constructive procedures, which can be written down in finitely many symbols, but whose execution is nonmechanical inasmuch as it appeals to the symbols’ meanings. There
¨ del on Turing on Computability Go
409
can also be a mechanical procedure, consisting of infinitely many conditions, that produces a solution for the halting problem.35 So what could justify the finite and mechanical nature of the procedure, if not the fact that human memory is necessarily limited? One answer might be that the finite and mechanical nature of the computation procedure is entailed by Turing’s other constraints. Sieg [2002], taking this tack, has reformulated Turing’s constraints, dropping the requirement about states of mind and memory altogether.36 The claim being made, then, is that Turing’s type III argument simply highlights the fact that this requirement is redundant. But this answer is unsatisfactory. First, it has been shown that the other constraints do not alone suffice to establish the identification of effectiveness and Turing-machine computability.37 Second, this answer does not explain G¨ odel’s response to Turing. For if the requirement on states of mind is redundant, then, from G¨odel’s perspective, the other constraints should suffice to establish that the mind cannot surpass the powers of a Turing machine. A second answer might be that although G¨odel denies that human memory is necessarily limited, he agrees that the number of states of mind that must be taken into account, when cal culating, is finite. The latter assumption is quite weak, and suffices, with the other constraints, to establish the identity of mechanical computability and Turing-machine computability.38 Here, the claim being made is that the type III argument highlights the fact that the weaker assumption suffices, and there is no need for the stronger-and 35
See, e.g., [Shagrir and Pitowsky 2003]. A simpler example is a machine with infinitely many states, where each state n encodes the self-halting state of the nth Turing machine. Thus given input n, the machine moves n states (in n steps) and produces the self-halting state of the nth Turing machine. 36 In his recent presentation of Turing’s analysis, Sieg [2002, p. 396] invokes only two restrictive conditions: boundedness (“there is a fixed bound on the number of configurations a computor [human computer] can immediately recognize”), and locality (“a computor can change only immediately recognizable (sub-) configurations”). 37 See [Shagrir and Pitowsky 2003], who demonstrate that without the two general constraints, finite-and-mechanical procedure, and finitely many steps in a finite time, Turing-machine computability can be surpassed. An infinite (yet mechanical!) procedure that encodes the infinitely many self-halting states of all machines can be used to compute the self-halting problem without violating boundedness and locality [see too note 35]. 38 This was suggested to me by Jonathan Yaari.
410
Oron Shagrir
false-assumption that limits the number of states of mind in general. There are two problems with this proposal. One is that Turing himself suggests the weaker assumption, saying that “the number of states of mind which need be taken into account is finite” [p. 136, my emphasis].39 So it is unclear what motivates G¨odel to emphasize a stronger reading of Turing.40 The second pertains to the source of the finitude: why is the number of states of mind that must be taken into account bounded? If the number of states of mind in general may be infinite, or at least unbounded, is there any reason to think that with respect to calculation the number of states of mind is bounded? It would seem that an implicit assumption is being made here, to the effect that that the process of (human) calculation is associated with a cognitive ‘module’ with a finite number of states of mind. But this assumption is exceedingly contentious, and renders Turing’s analysis, and G¨ odel’s reasons for extolling it, vulnerable to obvious objections. Another proposal is that the finite and mechanical nature of the procedure lies in some publicity constraint. Thus Kleene [1987] argues that whether or not the mental states converge to infinity has “no bearing on what number-theoretic functions are effectively calculable” [p. 493]. The idea of ‘effective calculability’ or an ‘algorithm’ involves a set of instructions that is fixed in advance. This condition is motivated by a publicity constraint, namely, that it must be possible “to convey a complete description of the effective procedure or algorithm by a finite communication, in advance of performing computations in accordance with it. My version of the Church–Turing thesis is thus the ‘Public-Processes Version’” [pp. 493–494]. The public character of the procedure is also emphasized by Sieg [2006], who contends that “it was the normative demand of radical intersub39 This point is emphasized in [Shagrir and Pitowsky 2003]; see also [Sieg 2006], who writes: “Turing argues there that only finitely many different states of mind affect the mechanical calculation of a human computer; he does not make any claim concerning general mental processes as G¨ odel assumes” [note 12]. 40 As we saw, the stronger assumption comes out in Turing’s statement in section 1 that the justification for the identification of effective computability with Turing-machine computability “lies in the fact that the human memory is necessarily limited.” Moreover, Turing clearly does not intend to provide a very different argument in section 9, as he says there that his analysis “is only an elaboration of the ideas of [section 1]” [p. 135]. G¨ odel’s reading may have been influenced by Turing’s 1950 Mind article [see note 33].
¨ del on Turing on Computability Go
411
jectivity between humans that motivated the steps from axiomatic to formal systems.” It may well be that G¨ odel accepts the type III argument because it does not rest on dubious assumptions about the human mind, but on the “normative demand of radical intersubjectivity.” The question, however, is whether this public accessibility constraint must be explicated in terms of finiteness, i.e., “finite communication.” Why is it not possible for the relevant procedures to be conveyed through “infinite” communication? Sieg [2006] argues that the answer lies in the “limitations of [...] [human] processing capacities, when proceeding mechanically,” which is precisely the reason “Turing most appropriately brings in human computers in a crucial way.” Sieg thus concludes that “in a deep sense neither Church nor G¨odel recognized the genuinely distinctive character of Turing’s analysis,” i.e., that the calculations are “carried out programmatically by human beings.” If this is right, then, as Webb puts it, there is no essential difference between type I and type III arguments: “Turing has one basic argument, which is presented in Section 1 [...] and whose central premise is ‘the fact that the human memory is necessarily limited’” [1990, p. 302]. Webb thus concludes that on G¨odel’s premises, even Turing’s type III argument is unacceptable. G¨odel accepts it, Webb argues, only because he mistakenly interprets it to refer to mechanical operations shorn of human aspect; or as Webb puts it, “in reflecting on this argument [type III] in 1972, G¨odel forgot that the word ‘computer’ here meant only what that word meant in 1936: a person doing calculations” [1990, p. 302].41 On the picture described by these critics, then, G¨odel’s reading of Turing is confused. G¨ odel, they claim, rejects Turing’s type I argument, yet embraces Turing’s type III argument, though the two arguments are essentially the same. He rejects the type I argument because he takes it to rest on the questionable assumption that the number of states of mind is bounded, yet ignores Turing’s assertion that this assumption is made solely with respect to the context of mechanical calculation. And G¨ odel embraces the type III argument even though it too is premised on the very limitations on human processing he rejects. I think we can be more charitable to G¨odel. My suggestion is that we take him as holding the view that the finite and mechanical 41
See also [Hodges 1983, p. 105].
412
Oron Shagrir
character of computation is not matter of the human condition, but of the epistemic role of computation in the foundations of mathematics, and, in particular, in the finitistic program of Hilbert. Let me explain. There is a major difference between the historical contexts in which Turing and G¨ odel worked. Turing tackled the Entscheidungsproblem as an interesting mathematical problem worth solving; he was hardly aware of the fierce foundational debates.42 G¨odel, on the other hand, was passionately interested in the foundations of mathematics. Though not a student of Hilbert, his work was nonetheless deeply entrenched in the framework of Hilbert’s finitistic program, whose main goal was to provide a meta-theoretic finitary proof of the consistency of a formal system “containing a certain amount of finitary number theory.”43 In this foundational context, a formal mathematical system is a system governed by a procedure that is finite and mechanical.44 A computation procedure is just another name for this finite and mechanical procedure. Thus the procedure’s finite and mechanical nature is a given and not open to question. Its finite and mechanical nature is underwritten by its role in the foundational project, which is defining a formal mathematical system.45 Turing’s error, on this account, is anchoring the procedure’s finite and mechanical character in the human condition, specifically, in the number of states of mind. Turing’s analysis, according to G¨ odel, does not establish that a computation procedure is a finite and mechanical procedure, for this is not questionable at all. As we saw, G¨odel 42
The Entscheidungsproblem, while described by Hilbert and Ackermann [1928] as the most fundamental question in mathematical logic, is peripheral to Hilbert’s program. 43 See [Bernays 1935] and [Zach 2003]. Already in the incompleteness article, G¨ odel writes that the second result does not “contradict Hilbert’s formalistic viewpoint. For this viewpoint presupposes only the existence of a consistency proof in which nothing but finitary means of proof is used, and it is conceivable that there exist finitary proofs that cannot be expressed in the formalism P (or of M or A)” [1931, p. 195]. See also [G¨ odel 1933c, 1958 and 1972b], where G¨ odel produces a consistency proof by means of finite but non-mechanical procedures. 44 See [Sieg 1994] for an historical survey, going back to Leibniz, of the relations between computation and formal systems. G¨ odel [1933b, pp. 50–52] discusses the constraints Hilbert’s program imposes on the definition of a formal system. 45 Thus in a footnote added to his 1946 remarks for the Davis anthology, G¨ odel defines a computable function f in a formal system S “if there is in S a computable term representing f ” [p. 84]. A similar definition is advanced in [G¨ odel 1936], [Church 1936a], and [Hilbert and Bernays 1939].
¨ del on Turing on Computability Go
413
used the notion of a mechanical and finite procedure/computation before he encountered Turing’s analysis, at about the time he rejects Church’s proposal as “thoroughly unsatisfactory.” By G¨odel’s lights, even if it turns out that a human has infinite memory or can carry out infinitely many steps in finite time, this would not change either the definition of computability or that of a formal system. Indeed, G¨odel thinks that human memory is not limited, and that human thought is more powerful than any finite machine, but this implies nothing about the finite and mechanical character of the computational procedures and formal systems.46 I am not suggesting that G¨ odel sees no connection between mechanical computability and human computability. Quite the contrary: the epistemic context requires that a human be able, at least in principle, to follow the computation procedure, i.e., to check whether a configuration of symbols constitutes a formal proof or not. G¨odel praises Turing precisely for this, for analyzing the concept of a human who follows a finite and mechanical procedure. This analysis indeed invokes a set of constraints that are specific to a human calculator, and are the basis for the definition of Turing-machine computability. The analysis is correct because to establish that definition, it suffices to assume that the human follows a finite and mechanical procedure; i.e., that he or she follows a finite set of instructions, execution of the instructions requires no reference to the meanings of the symbols, and the process itself, if it terminates at all, consists of finitely many steps. Turing’s error, according to G¨odel, is to assume, in addition, that the finite and mechanical nature of the procedure lies in the human condition, i.e., in limitations on human processing capacities.47 On the picture I advance here, the finite and mechanical character of computation is rooted in its role in defining a formal system. But it could be argued that even granting my construal of the characterization of computation as finite and mechanical, no account of 46 G¨ odel mentions the possibility of accelerated processing in [Wang 1974, p. 325]. See also [Copeland 2002b] and [Shagrir and Pitowsky 2003], who point out that acceleration increases computational power, without violating the Church–Turing thesis. 47 See [Gandy 1980] and [Sieg 2002], who do away with boundedness in the context of physical machines. See also [Shagrir 2002] for (non-physical) machines that violate locality. However, even here the more general constraints must be preserved; otherwise, non-Turing-machine computability emerges [Shagrir and Pitowsky 2003].
414
Oron Shagrir
the finite character of the epistemic constraints has been provided. We are still in the dark as to why the procedure governing a formal system must be finite and mechanical. This calls to mind Sieg’s notion of the ‘stumbling block’ in the analysis of computation. On Sieg’s account, Turing’s major achievement is bootstrapping from the circularity of the appeal to the finite nature of computation.48 Turing justifies both the restrictive conditions on the computation procedures and the epistemic conditions on formal systems by anchoring the finiteness conditions in human processing capacities. It is precisely at this point, and for this reason, Sieg claims, that “Turing most appropriately brings in human computers in a crucial way and exploits the limitations of their processing capacities, when proceeding mechanically” [Sieg 2006]. Failing to appreciate this reasoning, Church and G¨ odel failed to recognize the “genuinely distinctive character” of Turing’s analysis. I agree with Sieg that Turing’s analysis provides an elegant account of the finite character of the relevant epistemic constraints. I also agree that G¨ odel does not provide an alternative account. But I do not see the necessity for such an account. Hilbert’s program is one attempt to secure mathematics; it is not the only way to secure mathematics. Had it succeeded, mathematics would have been “secured.” Because, as it turned out, the program failed, there is still room, at least in principle, for another such project.49 If my explication of G¨ odel on Turing is on track, then being finite and mechanical is not a condition on any epistemic procedure? We invoke the notion of finite-and-mechanical procedures because we think it has epistemic value. Hence, if a question of justification arises at all, it is not that which Sieg poses, namely, what justifies the finite and mechanical nature of the procedure. The relevant question is whether the finiteness conditions have epistemic value at all: whether what is done by means of finite and mechanical procedures is thereby to be considered “secured.” This has been debated,50 but Turing’s answer 48
Sieg [1994; 1997] presents this circularity in the context of Church’s step-by step argument. Church defines computability in terms of representability in a formal system, but then assumes that the basic operations of the system must be finite, e.g., recursive. 49 In fact, G¨ odel’s 1958 and 1972b papers can be read as suggesting a renewed and expanded Hibertian program; see also [Bernays 1935]. 50 Thus strict finitists would object that computation procedures can be applied to any numeral whatsoever. It has also been claimed, more recently, that com-
¨ del on Turing on Computability Go
415
is no more edifying than G¨ odel’s. G¨odel’s stance that the human mind infinitely surpasses any finite machine is consistent with the idea that the finiteness constraints do guarantee knowledge. In closing, let me reiterate that my aim here was not to defend G¨odel’s interpretation of Turing. Turing’s comments are ambiguous, and subject to conflicting interpretations. It may even be the case that his position is not all that different from the position I ascribe to G¨odel. My aim was to make sense of G¨odel’s seemingly conflicted response on Turing’s analysis. G¨ odel praised Turing for his analysis of an ideal human who calculates by means of finite and mechanical procedures. He was critical of what he deemed Turing’s superfluous assumption that the finite and mechanical character of computation is somehow anchored in limitations on human cognitive capacities.51
References Bernays, P. [1935] “Hilbert’s Investigations of the Foundations of Arithmetic”, the German text is from Hilbert’s Gesammelte Abhandlungen, vol. 3, pp. 196–216, translated and reprinted in the Bernays Project, . Boolos, G.S. and Jeffrey, R.C. [1989], Computability and Logic, 3rd edition, Cambridge: Cambridge University Press. Church, A. [1936a] “An Unsolvable Problem of Elementary Number Theory”, American Journal of Mathematics 58, 345–363; reprinted in [Davis 1965, pp. 88–107]. Church, A. [1936b] “A Note on the Entscheidungsproblem”, Journal of Symbolic Logic 1, 40–41; reprinted in [Davis 1965, pp. 108–115]. Church, A. [1937a], Review of Turing [1936], Journal of Symbolic Logic 2, 42–43. Church, A. [1937b], Review of Post [1936], Journal of Symbolic Logic, 2, 43. Church, A. [1941], The Calculi of Lambda-Conversion, Princeton: Princeton University Press. putation procedures are of value to proof theory only if they are of polynomial complexity, “reasonable” length, and so on. 51 This research was supported by The Israel Science Foundation, grant 857/0307.
416
Oron Shagrir
Copeland, J.B. [2002a], “The Church–Turing Thesis”, in The Stanford Encyclopedia of Philosophy, (E. Zalta ed.), . Copeland, J.B. [2002b], “Accelerating Turing Machines”, Minds and Machines 12, 281–301. Davis, M. (ed.) [1965], The Undecidable: Basic Papers on Undecidable Propositions, Unsolvable Problems and Computable Functions, New York: Raven. Davis, M. [1982], “Why G¨ odel Didn’t Have Church’s Thesis”, Information and Control 54, 3–24. Gandy, R. [1980], “Church’s Thesis and Principles of Mechanisms”, in The Kleene Symposium, (J. Barwise, H.J. Keisler, and K. Kunen eds.), Amsterdam: North-Holland, pp. 123–148. Gandy, R. [1988], “The Confluence of Ideas in 1936”, in [Herken 1988, pp. 51–102]. G¨odel, K. [1931], “On Formally Undecidable Propositions of Principia Mathematica and Related Systems”, Monatshefte f¨ ur Mathematik und Physik 38, 173–198; in Collected Works I, pp. 144–195. G¨odel, K. [1932], “A Special Case of the Decision Problem for Theoretical Logic”, in Collected Works, I, pp. 231–235. G¨odel, K. [1933a], “On the Decision Problem for the Functional Calculus of Logic”, in Collected Works, I, pp. 307–327. G¨odel, K. [1933b], “The Present Situation in the Foundations of Mathematics”, in Collected Works, III, pp. 45–53. G¨odel, K. [1933c], “On Intuitionistic Arithmetic and Number Theory”, in Collected Works, I, pp. 287–295. G¨odel, K. [1934], “On Undecidable Propositions of Formal Mathematical Systems”, in Collected Works, I, pp. 346–369. G¨odel, K. [193?], “Undecidable Diophantine Propositions”, in Collected Works, III, pp. 164–175. G¨odel, K. [1946] “Remarks before the Princeton Bicentennial Conference on Problems in Mathematics”, in [Davis 1965, pp. 84–88], and in Collected Works, II, pp. 150–153. G¨odel, K. [1951], “Some Basic Theorems on the Foundations of Mathematics and their Implications”, in Collected Works, III, pp. 304–323.
¨ del on Turing on Computability Go
417
G¨odel, K. [1958], “On a Hitherto Unutilized Extension of the Finitary Standpoint”, in Collected Works, II, pp. 241–251. G¨odel, K. [1964], “Postscriptum to G¨odel 1934”, in Collected Works, I, pp. 369–371. G¨odel, K. [1972a], “Some Remarks on the Undecidability Results”, in Collected Works, II, pp. 305–306. G¨odel, K. [1972b], “On an Extension of Finitary Mathematics which has not yet been Used”, in Collected Works, II, pp. 271–280. G¨odel, K. [1986–1995], Collected Works, vol. I–III, (S. Feferman, et al. eds.), Oxford: Oxford University Press. Goldfarb, W.D. [1986], “Introductory Note to G¨odel 1932 and 1933[a]”, in Collected Works, I, pp. 226–231. Herbrand, J. [1931], “On the Consistency of Arithmetic”, in Jacques Herbrand Logical Writings, (W.D. Goldfarb ed.), Cambridge MA: Harvard University Press [1971], pp. 282–298. Herken, R. (ed.) [1988], The Universal Turing Machine: A Half-Century Survey, Oxford: Oxford University Press. Hilbert, D. and Ackermann, W. [1928], Grundzuge der Theoretischen Logik, Berlin: Springer-Verlag. Hilbert, D. and Bernays, P. [1939], Grundlagen der Mathematik II, Berlin: Springer-Verlag. Hodges, A. [1983], Alan Turing: The Enigma, New York: Simon and Schuster. Kleene, S.C. [1936], “General Recursive Functions of Natural Numbers”, Mathematische Annalen 112, 727–742; reprinted in [Davis 1965, pp. 236–253]. Kleene, S.C. [1952], Introduction to Metamathematics, Amsterdam: North-Holland. Kleene, S.C. [1981], “Origins of Recursive Function Theory”, Annals of the History of Computing 3, 52–67. Kleene, S.C. [1987], “Reflections on Church’s Thesis”, Notre Dame Journal of Formal Logic 28, 490–498. Lewis, H.R. and Papadimitriou, C.H. [1981], Elements of the Theory of Computation, Eaglewood Cliffs, NJ: Prentice-Hall.
418
Oron Shagrir
McCulloch, W.S. and Pitts, W. [1943], “A Logical Calculus of the Ideas Immanent in Nervous Activity”, Bulletin of Mathematical Biophysics 5, 115–133. Minsky, M.L. [1967], Computation: Finite and Infinite Machines, Eaglewood Cliffs NJ: Prentice-Hall. Post, E.L. [1936], “Finite Combinatory Processes—Formulation I”, Journal of Symbolic Logic 1, 103–105; reprinted in [Davis 1965, pp. 288–291]. Shagrir, O. [2002], “Effective Computation by Humans and Machines”, Minds and Machines 12, 221–240. Shagrir, O. and Pitowsky, I. [2003], “Physical Hypercomputation and the Church–Turing Thesis”, Minds and Machines 13, 87–101. Shannon, C.E. and McCarthy, J. (eds.) [1956], Automata Studies, Annals of Mathematics Studies 34, Princeton: Princeton University Press. Sieg, W. [1994], “Mechanical Procedures and Mathematical Experience”, in Mathematics and Mind, (A. George ed.), Oxford: Oxford University Press, pp. 71–117. Sieg, W. [1997], “Step by Recursive Step: Church’s Analysis of Effective Calculability”, Bulletin of Symbolic Logic 2, 154–180. Sieg, W. [2002], “Calculations by Man and Machine: Conceptual Analysis”, in Reflections on the Foundations of Mathematics, (W. Sieg, R. Sommer, and C. Talcott eds.), Lecture Notes in Logic 15, Natick, MA: Association for Symbolic Logic, pp. 390–409. Sieg, W. [2006], “G¨ odel on Computability”, Philosophia Mathematica, (forthcoming). Sieg, W. and Byrnes, J. [1999], “An Abstract Model for Parallel Computations: Gandy’s Thesis”, Monist 82, 150–164. Soare, R.I. [1996], “Computability and Recursion”, Bulletin of Symbolic Logic 2, 284–321. Turing, A.M. [1936], “On Computable Numbers, with an Application to the Entscheidungsproblem”, Proceedings of the London Mathematical Society 42, 230–265; correction in 43(1937), 544–546; reprinted in [Davis 1965, pp. 115–154]; page numbers refer to the [1965] edition.
¨ del on Turing on Computability Go
419
Turing, A.M. [1947], “Lecture to the London Mathematical Society on 20 February 1947”, in A.M. Turing’s ACE Report of 1946 and Other Papers, (B.E. Carpenter and R.W. Doran eds.), Cambridge MA: MIT Press. [1986], pp. 106–124. Turing, A.M. [1950], “Computing Machinery and Intelligence”, Mind 59, 433–460. Wang, H. [1974], From Mathematics to Philosophy, London: Routledge & Kegan Paul. Webb, J.C. [1980], “Mechanism, Mentalism and Metamathematics”, Dordrecht: D. Reidel. Webb, J.C. [1990], “Introductory Note to Remark 3 of G¨odel 1972[a]”, in Collected Works, II, pp. 292–304. Zach, R. [2003], “Hilbert’s Program”, in The Stanford Encyclopedia of Philosophy, (E. Zalta ed.), .
Stewart Shapiro∗
Computability, Proof, and Open-Texture Mathematics never proves anything about anything except mathematics, and a piece of rope is a physical object and not a mathematical one. So before worrying about proofs, we must have a mathematical definition of what a knot is [...] This problem [...] arises whenever one applies mathematics to a physical situation. The definition should define mathematical objects that approximate physical objects as closely as possible [...] There is no way to prove [...] that the mathematical definitions describe the physical situation exactly. Crowell and Fox [1963, p. 3]
1. Church’s thesis and proof. Church’s thesis (CT) is the proposition that all and only recursive functions are effectively computable. Let us recall Alonzo Church’s 1936 proposal: We now define the notion [...] of an effectively calculable function of positive integers by identifying it with the notion of a recursive function of positive integers [...] This definition is thought to be justified by the considerations which follow, so far as positive justification can ever be obtained for the selection of a formal definition to correspond to an intuitive one. [Church 1936, §7]
At the time, and for the next half century, there was a near consensus that CT is not subject to mathematical proof or refutation. I take the liberty of using an old paper of mine to articulate this once standard view, along with a more or less standard argument for it: ∗
S. Shapiro, The Ohio State University, University of St. .
Andrews,
Computability, Proof, and Open-Texture
421
Computability is a property related to either human abilities or mechanical devices, both of which are at least prima facie non-mathematical. It is therefore widely agreed that the question of Church’s thesis is not a mathematical question, such as the Goldbach conjecture [...] That is to say, mathematicians do not seek to show either that CT follows from accepted laws of number theory or that it contradicts such laws. Nevertheless, both mathematicians and philosophers have offered various non-mathematical arguments either for or against the thesis. Goldbach’s conjecture can be settled, if at all, only by mathematical argument, but CT can be settled, if at all, only by arguments that are, at least in part, philosophical. [Shapiro 1981, pp. 353–354]
Church himself says that he argues in favor of the thesis “so far as positive justification can ever be obtained for the selection of a formal definition to correspond to an intuitive one”. Stephen Cole Kleene [1952, pp. 317, 318–319] is even more explicit: Since our original notion of effective calculability of a function (or of effective decidability of a predicate) is a somewhat vague intuitive one, [CT] cannot be proved [...] While we cannot prove Church’s thesis, since its role is to delimit precisely an hitherto vaguely conceived totality, we require evidence that it cannot conflict with the intuitive notion which it is supposed to complete; i.e., we require evidence that every particular function which our intuitive notion would authenticate as effectively calculable is [...] recursive. The thesis may be considered a hypothesis about the intuitive notion of effective calculability, or a mathematical definition of effective calculability; in the latter case, the evidence is required to give the theory based on the definition the intended significance.
Kleene’s argument here is straightforward. Computability is a vague, intuitive notion from ordinary language. Depending on how it is interpreted, computability concerns the (idealized) abilities of humans following algorithms, or of (idealized) mechanical computing devices, or something similar.1 In contrast, recursiveness is a precisely and rigorously defined property of number-theoretic functions. So there 1
Most of the founders, certainly Turing, were thinking about what can be computed by a human following an algorithm (see, for example, Copeland [1997]). Some writers distinguish this human computability from machine computability, thus producing different “theses” for each. We will briefly return to this below.
422
Stewart Shapiro
is no sense of proving that those two are extensionally the same. It would be like someone trying to prove that a man is bald if and only if at least 37.4% of his scalp is exposed. Of course, things could not be this simple. To complete the case against the provability (or refutability) of CT along these lines, we would need an argument that computability is vague, and this would probably require an account of what it is to be vague.2 Are there any borderline computable functions? Can we construct a sorites series from a clearly computable function to a clearly non-computable function? The mere fact that computability is an “intuitive” notion does not seem to remove it from the sphere of mathematical demonstration. If it did, there would not be much mathematics, and there would not have been any before the advent of formal deductive systems. Stay tuned. Even at the beginning, it was only a near consensus. Kurt G¨odel stands as a notable exception to the trend, one that cannot be ignored. The following appears in a letter that Church wrote to Kleene:3 In regard to G¨ odel and the notions of recursiveness and effective calculability, the history is the following. In discussion with him. [...] it developed that there is no good definition of effective calculability. My proposal that lambda-definability be taken as a definition of it he regarded as thoroughly unsatisfactory. I replied that if he would propose any definition of effective calculability which seemed even partially satisfactory I would undertake to prove that it was included in lambdadefinability.
G¨odel wound not have find Church’s challenge satisfactory. Suppose that several new “definitions” of computability were proposed, and that in each case, Church rose to the challenge (as surely he would) and delivered a proof that the definition is “included” in lambdadefinability and thus in recursiveness. This would have provided more of the same kind of evidence for CT that was already available, and 2
During my student days, a famous logician once asked me what we would think if we started with an admittedly vague, intuitive notion, but then showed that there is a unique sharp, rigorously defined notion that underlies it. I have been ruminating over this remark for some time now. 3 The letter, which was dated November 29, 1935, is quoted in [Davis 1982, p. 9].
Computability, Proof, and Open-Texture
423
there was already plenty of that. G¨odel, it seems, was not satisfied with the state of play. He was not happy with the kinds of evidence cited in defense of CT.4 Church’s letter continues: His [G¨ odel’s] only idea at the time was that it might be possible, in terms of effective calculability as an undefined term, to state a set of axioms which would embody the generally accepted properties of this notion, and to do something on that basis.
It is not easy to interpret these remarks, but G¨odel seems to have demanded a proof, or something like a proof, of CT. A conceptual analysis of computability would suggest axioms for the notion. Then, perhaps, one could derive that every computable function is recursive, and vice versa, from those axioms. What did convince G¨ odel was Alan Turing’s landmark [1936] article (see Kleene [1981], [1987], Davis [1982], and my review of those, Shapiro [1990]). G¨ odel’s much quoted [1946] lecture, published in Davis [1965, pp. 84–88], begins: Tarski has stressed in his lecture (and I think justly) the great importance of the concept of general recursiveness (or Turing’s computability). It seems to me that this importance is largely due to the fact that with this concept one has for the first time succeeded in giving an absolute definition of an interesting epistemological notion [...]
Although it is not clear at the time whether G¨odel regarded Turing’s analysis as a rigorous proof of CT, it seems that he was satisfied with it. In a 1964 postscript to G¨odel [1934], he wrote that “Turing’s work gives an analysis of the concept of ‘mechanical procedure’ (alias ‘algorithm’ or ‘computation procedure’ or ‘finite combinatorial procedure’). This concept is shown to be equivalent to that of a Turing machine”. The near consensus that CT is not subject to mathematical proof (or refutation) was eventually shattered, at least among philosophically inclined logicians. Elliott Mendelson [1990] and Robin Gandy 4
G¨ odel’s own work yielded a notion now known as G¨ odel-Herbrand definability. This provides yet another “definition” of computability, which turned out to be equivalent to recursiveness.
424
Stewart Shapiro
[1988] both claimed that CT is susceptible of rigorous, mathematical proof; and Gandy argued that CT has actually been proved. Turing’s [1936] study of a human following an algorithm is cited as the germ of the proof. In fact, Gandy referred to (a version of) CT as “Turing’s theorem”. Wilfried Sieg [1994] was a bit more guarded, but his conclusion is similar. He defined “Turing’s theorem” to be the proposition that if f is a number-theoretic function that can be computed by a being satisfying certain determinacy and finiteness conditions, then f can be computed by a Turing machine. Turing argues that humans satisfy some of these conditions, but apparently Sieg considered this part of Turing’s text to be less than proof. Through a painstaking, careful analysis, Sieg [2002], [2002a] claims to complete the project: My strategy [...] is to bypass theses altogether and avoid the fruitless discussion of their (un-)provability. This can be achieved by conceptual analysis, i.e., by sharpening the informal notion, formulating its general features axiomatically, and investigating the axiomatic framework [...] The detailed conceptual analysis of effective calculability yields rigorous characterizations that dispense with theses, reveal human and machine calculability as axiomatically given mathematical concepts, and allow their systematic reduction to Turing computability. [Sieg 2002, pp. 390, 391]
The Mendelson/Gandy/Sieg position(s) generated responses (e.g., Folina [1998], Black [2000]), and the debate continues. The issues here revolve around some of the most fundamental questions in the philosophy of mathematics. What is it to prove something? How do we recognize a proof when we have one? An even more basic question is not far from the surface of the issue: What is mathematics about? If the subject matter of mathematics is a realm of abstracta that are causally separated from the physical world and the mathematician, then what do we mean when we say that a function is, or is not, computable by a human following an algorithm, or by a machine? Why are these empirical elements interfering in the austere and serene realms of mathematics? Maybe this is why some of us had trouble conceiving of CT as subject to proof. How do you mathematically prove things about the empirical world of humans following algorithms or of mechanical computing devices? If Turing’s [1936] paper constitutes a proof, or even the germ of a proof, then why were so many left unconvinced for so long? Some
Computability, Proof, and Open-Texture
425
major, or at least semi-major, figures, such as Jean Porte [1960] and L´aszló Kalm´ ar [1959], continued to hold that CT is false as late as twenty years after Turing’s work was published, and Rózsa P´eter [1957] maintained some doubts (of which more later). Almost everyone else agreed that CT is (probably, or almost certainly) true, but it is not the sort of thing that can be proved. One would think that if Turing [1936] constitutes the germ of a proof, then the mathematicians of the day, and the next fifty years, would recognize it as such. These folks should be able to detect proofs if anybody can. Mathematicians should be able to recognize their own subject, shouldn’t they? 2. What is a proof, and what does a proof prove? Of course, it would take us far afield to address all of the deep metaphysical and epistemological issues in the philosophy of mathematics that are involved here. But we have to broach those topics. Intuitively, a proof is a valid argument whose premises are all either self-evident or previously established. Of course, one cannot prove everything that is known in mathematics. Proofs must start someplace, and the epistemic status of axioms remains an interesting and troubling question. A common theme, traced to Aristotle, is that validity is a matter of form: if an argument is valid, then so is every argument that has the same logical form. This, of course, raises questions concerning the nature of logical form. The foundational work in the early decades of the twentieth century—the very environment that bred the extensive study of computability—led to several explications or models of deduction and proof. The most common of these—the one that we impose on our students in courses in mathematical logic—construes a deduction to be a sequence of well-formed-formulas in a formal language, constructed according to certain rules. Such a deduction is, or corresponds to, a proof only if the deductive system is sound and if the premises are interpreted as statements that are either self-evident or previously established. Call such a sequence a formal proof. A formal proof of CT would presumably take place within a formalization of number-theory. One would add a predicate for computability, together with some axioms for it. The axioms, presumably, should be unquestionably true. Each line of the formal proof would either be an axiom of number theory or of computability, or
426
Stewart Shapiro
would follow from previous lines by unquestionably valid, formally correct rules of inference. The last line would be CT. Recall G¨ odel’s early suggestion to Church that they try to “state a set of axioms which would embody the generally accepted properties of [effective calculability] and [...] do something on that basis”. This could be interpreted as a request for a formal proof of CT. The “something” the logicians might do on the “basis” of the envisioned axioms would be to produce a rigorous derivation in a sound deductive system. What counts as an unformalized proof is a vague matter. It depends on how close the unformalized text is to a formalization, and how plausible the resulting formalized axioms and premises are, with respect to the author’s intentions. This perspective might allow one to think of Turing’s text, or something similar, as a formalizable proof, despite that fact that some were not convinced, and despite the fact that most did not, and many still do not, recognize it as such. I presume that it would not be difficult to formalize the treatment of computability in Turing [1936], or anyplace else for that matter. Formalizing argumentative texts is an exercise we often give to beginning logic students. But how would we guarantee that the stated axioms or premises are necessary for computability? This question cannot be settled by a formal derivation. That would start a regress, at least potentially. We would push our problem to the axioms or premises of that derivation. In a collection of notes entitled “What does a mathematical proof prove?” (published posthumously in [1978, pp. 61–69]), Imre Lakatos makes a distinction between the pre-formal development, the formal development, and the post-formal development of a branch of mathematics. Lakatos observes that even after a branch of mathematics has been successfully formalized, there are residual questions concerning the relationship between the formal deductive system and the original, pre-formal mathematical ideas. How can we be sure that the formal system accurately reflects the original mathematical structures? These questions cannot be settled with a derivation in a further formal deductive system, not without begging the question or starting a regress—there would be new, but similar, questions concerning the new deductive system.
Computability, Proof, and Open-Texture
427
The residual questions are perhaps (merely) philosophical or quasi-empirical. But it does not follow that they are in any way non-mathematical. In some cases one can and should regard the questions as settled. Moreover, this is the normal situation in mathematics. There is nothing unusual about CT. A second explication or model of proofhood has it that a proof is a derivation in Zermelo–Fraenkel set theory (ZF), or a sequence of statements that can be “translated” into a derivation in ZF. Call this a ZF-proof. The staunch naturalist Penelope Maddy [1997, p. 26] sums up this foundational role: [I]f you want to know if there is a mathematical object of a certain sort, you ask (ultimately) if there is a set-theoretic surrogate of that sort; if you want to know if a given statement is provable or disprovable, you mean (ultimately), from the axioms of the theory of sets.
Here we have to pay attention to the “translation” into the language of set theory. What should be preserved by a decent translation? Meaning? The only non-logical term in the language of ZF is “∈”, the sign for membership. To echo Crowell and Fox [1963, p. 3] in the epigraph of this paper, before worrying about whether CT can be proved in ZF, we would need formulations of “computability” and “recursiveness” (or “Turing computability”, “λ-definability”, or the like) in the language of set theory. There are, or could be, good formulations of the latter in ZF. That much would be routine, and there would be little room for doubt on that score, at least at this point in history. Formulating “computability” in the language of ZF is another story. How could we assure ourselves that the proposed set-theoretic predicate really is an accurate formulation of the intuitive, pre-theoretic notion of computability? Would we prove that? How? In ZF? In effect, a statement that a proposed predicate in the language of set theory (with its single primitive) does in fact coincide with computability would be the same sort of thing as CT, in that it would propose that a precisely defined—now set-theoretic—property is equivalent to an intuitive one. We would be back where we started, philosophically.5 Clearly, in claiming that CT is capable of proof, 5
If there were a theorem in ZF equating a set-theoretic predicate with (the set-theoretic surrogate for) recursiveness, we would have more evidence for CT, or
428
Stewart Shapiro
Mendelson and Gandy are not asserting the existence of a ZF-proof for CT. Mendelson [1990, p. 223] wrote that the “fact that it is not a proof in ZF or some other axiomatic system is no drawback; it shows that there is more to mathematics than appears in ZF”, echoing Hamlet. The conclusion, so far, is that if there is to be a mathematical proof of CT, the proof cannot be fully captured with a formal proof or a ZF-proof. If one identifies mathematical proof with formal proof or ZF-proof, then one can invoke modus tolens and accept the conclusion that CT is not subject to mathematical proof. There is an essential “quasi-empirical” or “philosophical” side to CT. Against the received views, and with Mendelson and Gandy, I submit that this is a false dilemma. The proper conclusion of the foregoing considerations is only that CT is not entirely a formal (or ZF) matter. This, of course, is not particularly deep, original, or controversial. The problem of evaluating the adequacy of translations of informal arguments into the language of set theory is a wide and deep one, not to be solved here. But clearly, Mendelson is correct that there is more to mathematics than ZF. If nothing else, the adequacy of translations of informal mathematics into the language of ZF are themselves pieces of non-ZF mathematics. So are most informal arguments, derivations, and proofs. There is a trend in philosophy, traced at least to W.V.O. Quine, that proposes a blurring of the various disciplines, at least on epistemological grounds. Mathematics is not as distinct from empirical science as has been thought. If so, then perhaps our question loses some of its force. What, exactly, is the difference between mathematics, which proceeds via rigorous proof, and other branches of science that rely on inductive arguments, plausible hypotheses, and the like? More importantly, why does the question of the epistemological status of CT matter? R.J. Nelson [1987, p. 581] writes: Although Church’s thesis (CT) has been central to the theory of effective decidability for fifty years, the question of its epistemological status is still an open one. My own view, which is prompted by a naturalistic attitude toward such questions in mathematics as elsewhere, is that the thesis is an empirical else evidence that the set-theoretic formulation is correct (or both). The indicated theorem would be the same sort of thing as the equivalence of recursiveness and Turing computability.
Computability, Proof, and Open-Texture
429
statement of cognitive science, which is open to confirmation, amendment, or discard, and which, on the current evidence, appears to be true.. . I wish [...] to advocate the metathesis that CT is empirical, yet mathematical.
In what follows, I hope to help sort out the underlying foundational questions. 3. Theses everywhere. My two earlier papers (Shapiro [1981], [1993]) and Mendelson [1990] present some historical situations which are like CT in the relevant respects, in that they identify an intuitive, pre-theoretic notion with a more precisely defined one. These other “theses” are not subject to doubt anymore, nor is their status as mathematics in question. Of course, this is not to say that they are proved either, but they do serve as premises in informal reasoning, and as central parts of mathematics. Let us call “Weierstrass’s thesis” the proposition that the pretheoretic or intuitive notion of a continuous function is extensionally equivalent to the notions yielded by the now standard definition. It is clear that there is an intuitive such notion of continuous function, and that mathematicians worked with it well before the rigorous definition was proposed and accepted. Moreover, the definitions are accepted, almost without opposition (but see Bell [1998]), despite that fact that they have some consequences that conflict with intuition. Prominent among those is the existence of a continuous curve that is nowhere differentiable. It might be noted as well that the pretheoretic notion has two, non-equivalent formal counterparts: pointwise continuity and uniform continuity. A second example is the identification of the pre-theoretic notion of the area under a curve with the various integrals. Note that here, too, there are a few different formal notions that go with the intuitive, pre-theoretic one. To follow the example indicated in the epigraph of this paper, the topological definition of “knot” provides another example. Ancient Greek mathematicians did not identify magnitudes like lengths, areas, and volumes with real numbers as we do today, via analytic geometry. Yet the ancient mathematicians worked effectively with a notion of ratios of magnitudes. Euclid proved theorems to the effect that two areas are in the same proportion as two line segments. What of this notion of “proportion”. The following might be called “Eudoxus’s thesis”:
430
Stewart Shapiro Let x, X be two like magnitudes (i.e., two line segments, two areas, or two volumes), and let y, Y be two like magnitudes. Then x : X = y : Y if and only if for any pair of natural numbers m, n, mx is larger than (resp., smaller than, resp. identical to) nX if and only if my is larger than (resp., smaller than, resp. identical to) nY .
Mendelson cites the identification of functions with sets of ordered pairs, Tarski’s definition of truth (in formalized languages not containing a truth predicate, anyway), the model-theoretic definition of validity, and the Cauchy–Weierstrass definition of limit. As far as I know, the epistemological status of such theses, and the extent to which they are and are not subject to proof, has not been settled. The definitions are used nowadays to prove things about the original, pre-theoretic notions, or at least they are advertised that way. Crowell and Fox claim to have proved that a figureeight knot cannot be transformed into an overhand knot without tying or untying. We have many instances of what Georg Kreisel [1967] calls “informal rigor” (see also Kreisel [1987]). 4. Open-texture and mathematics. I turn now to what I take to be an important insight, traced to Friedrich Waismann, concerning ordinary language. Its application to the language of mathematics sheds some interesting light on propositions like CT and the “theses” just indicated. Waismann introduces the notion of open-texture in an attack on crude phenomenalism, the view that one can understand any cognitively significant statement in terms of sense-data. The failure of this program: is not, as has been suggested, due to the poverty of our language which lacks the vocabulary for describing all the minute details of sense experience, nor is it due to the difficulties inherent in producing an infinite combination of sense-datum statements, though all these things may contribute to it. In the main it is due to a factor which, though it is very important and really quite obvious, has to my knowledge never been noticed—to the ‘open texture’ of most of our empirical concepts. [Waismann 1968, pp. 118–119]
Here is one of the thought experiments that Waismann uses to characterize the notion of open-texture:
Computability, Proof, and Open-Texture
431
Suppose I have to verify a statement such as ‘There is a cat next door’; suppose I go over to the next room, open the door, look into it and actually see a cat. Is this enough to prove my statement? [...] What [...] should I say when that creature later on grew to a gigantic size? Or if it showed some queer behavior usually not to be found with cats, say, if, under certain conditions it could be revived from death whereas normal cats could not? Shall I, in such a case, say that a new species has come into being? Or that it was a cat with extraordinary properties? [...] The fact that in many cases there is no such thing as a conclusive verification is connected to the fact that most of our empirical concepts are not delimited in all possible directions.
The last observation is the key. As Waismann sees things, language users introduce some terms to apply to certain objects or kinds of objects, and of course the terms are supposed to fail to apply to certain objects or kinds of objects. As we introduce the terms, and use them in practice, we cannot be sure that every possible situation is covered, one way or the other. This applies even in science, to what are now called “natural kind” terms: The notion of gold seems to be defined with absolute precision, say by the spectrum of gold with its characteristic lines. Now what would you say if a substance was discovered that looked like gold, satisfied all the chemical tests for gold, whilst it emitted a new sort of radiation? ‘But such things do not happen.’ Quite so; but they might happen, and that is enough to show that we can never exclude altogether the possibility of some unforseen situation arising in which we shall have to modify our definition. Try as we may, no concept is limited in such a way that there is no room for any doubt. We introduce a concept and limit it in some directions; for instance we define gold in contrast to some other metals such as alloys. This suffices for our present needs, and we do not probe any farther. We tend to overlook the fact that there are always other directions in which the concept has not been defined [...] we could easily imagine conditions which would necessitate new limitations. In short, it is not possible to define a concept like gold with absolute precision; i.e., in such a way that every nook and cranny
432
Stewart Shapiro is blocked against entry of doubt. That is what is meant by the open texture of a concept. [Waismann 1968, p. 120]
Themes from Ludwig Wittgenstein are quite apparent here. The talk of family resemblance fits nicely into this picture. Waismann [1968, p. 122] waxes poetic: “Every description stretches, as it were, into a horizon of open possibilities: However far I go, I shall always carry this horizon with me.”6 The phrase “open texture” does not appear in Waismann’s treatment of the analytic-synthetic distinction in a lengthy article published serially in Analysis ([1949], [1950], [1951], [1951a], [1952], [1953]), but the notion clearly plays a central role there. He observes that language is an evolving phenomenon. As new situations are encountered, and as new scientific theories develop, the extensions of various predicates change. Sometimes the predicates become sharper and, importantly, sometimes the boundaries move. When this happens, there is often no need to decide, on hard metaphysical or semantic grounds, whether the application of a given predicate to a new case represents a change in its meaning or a discovery concerning the term’s old meaning. And even if we focus on a given period of time, language use is not univocal: Simply [...] to refer to “the” ordinary use [of a term] is naive. There are uses, differing from one another in many ways, e.g. according to geography, taste, social standing, special purpose to be served and so forth. This has long been recognized by linguists [...] [These] particular ways of using language loosely [revolve] around a—not too clearly defined—central body, the standard speech [...] [O]ne may [...] speak of a prevailing use of language, a use, however, which by degrees shades into less established ones. And what is right, appropriate, in the one may be slightly wrong, wrong, or out of place in others. And this whole picture is in a state of flux. One must indeed be 6
Many contemporary accounts of natural kind terms have it that they somehow pick out properties which are sharp, in the sense that they have fixed extensions in all metaphysically possible worlds. On such views, the progress of science (or metaphysics) will tell us whether various hitherto unconsidered cases fall under the kind in question, and will correct any mistakes we now make with the terms. Needless to say, Waismann would reject such accounts. It would perhaps be too much of a distraction to probe the modality that Waismann himself invokes.
Computability, Proof, and Open-Texture
433
blind not to see that there is something unsettled about language; that it is a living and growing thing, adapting itself to new sorts of situations, groping for new sorts of expression, forever changing. [Waismann 1951a, pp. 122–123]
In the final installment of the series, he writes: “What lies at the root of this is something of great significance, the fact, namely, that language is never complete for the expression of all ideas, on the contrary, that it has an essential openness” [Waismann 1953, pp. 81– 82]. The publication of this series coincides with Quine’s [1951] celebrated attack on the very notion of analyticity. Waismann would agree with Quine that (so-called) analytic truths are not epistemologically sacrosanct, and thus immune to revision. Waismann points out that major advances in science sometimes—indeed usually—demand revisions in the accepted use of common terms: “breaking away from the norm is sometimes the only way of making oneself understood” ([1953, p. 84]). If I may interject my own example, we can wonder if scientists contradicted themselves then they said that atoms have parts. Well, what is it to be an atom? Surely at some point in history, the word “atom” could have been defined to be a particle of matter that has no proper parts. Metaphysicians still use the term that way. Waismann [1952] illustrates the point in some detail with the evolution of the word “simultaneous”. The main innovative theses of the theory of relativity violated the previous meaning of that word. One might think that in cases like these, a new word, with a new meaning, is coined with the same spelling as an old word, or one might think that an old notion has found new applications. Did Einstein discover a hidden and previously unnoticed relativity in the established meaning of the word “simultaneous”? He was not a linguist by training, but surely he knew his own language. Or did Einstein coin a new theoretical term, to replace the old, scientifically misleading one? If so, then strictly speaking, we should introduce a new term, “simultaneous”, to avoid the obvious and perhaps misleading ambiguity. According to Waismann, there is often no need, and no reason, to decide what counts as a change in meaning and what counts as the extension of an old meaning to new cases—going on as before, as Wittgenstein might put it. Waismann said, in an earlier installment in the series: “there are no precise rules governing the
434
Stewart Shapiro
use of words like ‘time’, ‘pain’, etc., and that consequently to speak of the ‘meaning’ of a word, and to ask whether it has, or has not changed in meaning, is to operate with too blurred an expression” [Waismann 1951, p. 53]. Waismann only applies his notion of open-texture to empirical terms, either from everyday language or from science. One might think that mathematics is exempt from his account of language as in flux. With mathematics, at least, we have precision in the use of our terms, don’t we? Consider, for example, the notion of a prime (natural) number. Strict finitism aside, can we really envision a possible situation in which we encounter a natural number n for which it is somehow indeterminate whether or not n is prime? Don’t we prove that every natural number (other than zero or one) is either prime or composite?7 I submit that this case is more the exception than the rule. The sort of precision we are used to—and celebrate—in mathematics applies, perhaps, to contemporary formalized mathematics and to branches of mathematics that have been rendered in ZF, provided that no questions about the adequacy of the formalization, or the adequacy of ZF, are raised. But open-texture is indeed found in the informal practice and development of mathematics. The boundaries of the notion of “natural number” are, perhaps, as rigorous and sharp as can be—hard as rails, as Wittgenstein might put it. Harder than rails. What of the more general notion of “number”? Are complex numbers numbers? Surely. But this was once controversial. If it is a matter of proof or of simple definition, why should there ever have been controversy? And what of quaternions? Are those numbers? I presume that the jury is out on that one. Perhaps there is no real need to decide whether quaternions are numbers. As Waismann might put it, in asking this question we operate with too blurred an expression. In this case, the expression is “number”, which is as mathematical as it gets. Turning to the matter at hand, it is part of the standard argument for the received view that Church’s thesis cannot be determinately true, much less subject to proof, since computability is 7
To be sure, our inability to conceive of a situation in which we have to recognize open-texture in the notion of “prime natural number” need not be conclusive evidence that such situations are not possible. I have no views on how conceivability bears on the notion of possibility invoked in Waismann’s account of open-texture. Thanks to Graham Priest and Crispin Wright here.
Computability, Proof, and Open-Texture
435
a vague notion, while recursiveness is sharp. One can perhaps retort that computability is itself sharp. Mendelson [1990, p. 232] takes the opposite retort: The concepts and assumptions that support the notion of partial-recursive function are, in an essential way, no less vague and imprecise than the notion of effectively computable function; the former are just more familiar and are part of a respectable theory with connections to other parts of logic and mathematics [...] Functions are defined in terms of sets, but the concept of set is no clearer than that of function [...] Tarski’s definition of truth is formulated in set-theoretic terms, but the notion of set is no clearer than the that of truth.
Mendelson does not elaborate this remark, but he is correct that the notion of set has a long and sometimes troubled history. There is a logical notion, based on unrestricted comprehension—Gottlob Frege’s Basic Law V—which turned out to be inconsistent. And there is a more mundane notion of a collection of previously given entities, which, ultimately gave rise to the iterative conception that we have today. There is still some debate over the intuitive underpinning of the iterative conception. George Boolos [1989] argues that there is no single, intuitive notion of set that underlies ZF. The theory is a more or less ad hoc mixture of two notions. The currently unresolved status of propositions like the continuum hypothesis sheds some doubt on the proposition that even now we have hold of a single, sharply delineated notion. For many purposes, the notion of set underlying ZF is precise enough. This precision is due to decades of work with the axiomatic notion. But is it hard as rails? In present terms, then, the conclusion is that the notion of set, and with that, the notion of recursiveness, was at least once subject to open-texture. In that respect, the notion of set is on a par with that of computability. 5. Proofs and refutations. Let us briefly re-examine Lakatos’s favorite example, developed in delightful detail in Proofs and refutations [1976]. He presents a lively dialogue involving a class of rather exceptional mathematics students. The dialogue is a rational reconstruction of the history of what may be called Euler’s theorem:8 8
Much of the actual history is recounted in Lakatos’s footnotes.
436
Stewart Shapiro Consider any polyhedron. Let V be the number of vertices, E the number of edges, and F the number of faces. Then V − E + F = 2.
The reader is first invited to check that the equation holds for standard polyhedra, such as rectangular solids, pyramids, tetrahedra, icosahedra, and dodecahedra. One would naturally like this inductive evidence confirmed with a proof, or else refuted by a counterexample. Lakatos provides plenty of examples of both. The dialogue opens with the teacher presenting a proof of Euler’s theorem. We are told to think of a given polyhedron as hollow, with its surface made of thin rubber. Remove one face and stretch the remaining figure onto a flat wall. Then add lines to triangulate all of the polygonal faces, noting that in doing so we do not change V − E + F . For example, drawing a line between two vertices of the same polygon adds 1 face and 1 edge, no vertices. When the figure is fully triangulated, start removing the lines one or two at a time, doing so in a way that does not alter V − E + F . For example, if we remove two lines on the outer boundary along with the included vertex, we decrease V by 1, E by 2, and F by 1. At the end, we are left with a single triangle, which, of course, has 3 vertices, 3 edges, and 1 face. So for that figure, V − E + F = 1. If we add back the face we removed at the start, we find that V − E + F = 2 for the original polyhedron. QED. The class then considers a barrage of counterexamples to Euler’s conjecture. These include a picture frame, a cube with a cube-shaped hole in one if its faces, a hollow cube with cube-shaped hole in its interior, and a “star polyhedron” whose faces protrude from each other in space. One of the students even proposed that a sphere and a torus each qualifies as a polyhedron, and thus counterexamples to Euler’s theorem. Since a sphere and a torus have a single face with no vertices or edges, V − E + F = 1. A careful examination shows that each counterexample violates (or falsifies) at least one of three main “lemmas” of the teacher’s proof. In some cases, the three-dimensional figure in question cannot be stretched flat onto a surface after the removal of a face. In other cases, the stretched plane figure cannot be triangulated without changing the value of V − E + F (or cannot be triangulated at all), and in still other cases, the triangulated figure cannot be decomposed without altering the value of V − E + F .
Computability, Proof, and Open-Texture
437
The dialogue then gets most interesting, all the more so given that it more or less follows some threads in the history of mathematics. Some students declare that the counterexamples are “monsters” and do not refute Euler’s theorem. One route is to insist that the figures in question are not really polyhedra. A philosopher in the crowd might argue that a meaning-analysis of the word “polyhedron” would reveal this, and then the class could get entangled in a debate over the meaning of this word in ordinary language (be it English, Greek, Latin, etc.) or, even worse, a metaphysical debate over the proper analysis of the underlying concept that the word picks out. The class briefly considers—and dismisses—a desperate attempt along those lines: one defines a polyhedron to be a figure that can be stretched onto a surface once a face is removed, and then triangulated and decomposed in a certain way. That would make the teacher’s “proof” into a stipulative definition of the word “polyhedron”. A second maneuver is to overly restrict the theorem so that the proof holds: the proper theorem is that for any convex, “simple” polyhedron, V − E + F = 2. An advocate of this maneuver is content to ignore the interesting fact that V − E + F = 2 does hold for some concave, and some non-simple polyhedra. A third line is to take the counterexamples to refute Euler’s theorem, and to declare that the notion of “polyhedron” is too complex and unorderly for decent mathematical treatment. Those inclined this way just lose interest in the notion. A fourth line accepts the counterexamples as refuting Euler’s theorem, and looks for a generalization that covers the Eulerian and non-Eulerian polyhedra in a single theorem. The third and fourth lines, of course, reject the teacher’s (purported) proof, either at all or in the generality in which it was intended. The fourth line takes the proof as pointing toward the proper generalization. This leads to further rounds of the procedure. Someone proposes a proof of a generalization of Euler’s theorem, and then counterexamples to that proof are found—some even more bizarre polyhedra are considered. It is straightforward to interpret the situation in Lakatos’s dialogue—or, better, the history it reconstructs—in terms of Waismann’s account of language. The start of the dialogue refers to a period in which the notion of polyhedron had an established use in the mathematical community (or communities). Theorems about polyhedra go back to ancient Greece. Consider, for example, the
438
Stewart Shapiro
standard theorem that there are exactly five platonic solids. Nevertheless, the word, or notion, or concept, had no established, formal definition. Surely, a necessary condition for a figure to be a polyhedron is that it be bounded by plane polygons (i.e. closed networks of edges all of which lie in the same plane and enclose a single area). The above “examples” of a sphere and a torus are thus easily dismissed (and they are not really taken seriously in Lakatos’s dialogue). But what are the sufficient conditions? The mathematicians of the time, and previous generations of mathematicians, were working with a notion governed more by a Wittgensteinian family resemblance than by a rigorous definition that determines every case one way or the other. In other words, the notion of polyhedron exhibited what Waismann calls open-texture. This open-texture did not prevent mathematicians from working with the notion, and proving things about polyhedra. Still, at the time, it simply was not determinate whether a picture frame counts as a polyhedron. Ditto for a cube with a cube-shaped hole in one of the faces, etc. When the case did come up, and threatened to undermine a lovely generalization discovered by the great Euler, a decision had to be made. As Lakatos shows, different decisions were made, or at least proposed. Those who found the teacher’s proof compelling (at least initially) could look to its details—to what Lakatos calls its “hidden lemmas”—to shed light on just what a polyhedron is. Surely it is pure dogma, in the most pejorative sense, to simply declare that, by definition, a polyhedron just is a three-dimensional figure for which the proof works. As noted, an attempt along those lines is quickly dismissed in the dialogue. But one can get some guidance as to what one thinks a polyhedron is by examining the details of what looks like a compelling proof. On the other hand, those mathematicians who found the counterexamples compelling can look to the details fo the proof, and to the counterexamples, to formulate a more general definition of “polyhedron”, in order to find the characteristics that make some polyhedra, and not others, Eulerian. In this case, at least, both approaches proved fruitful. We can look back on the history and see how much was learned about the geometry of Euclidean space. At the end of the dialogue, a most advanced student proposes a purely set-theoretic definition of “polyhedron”. Accordingly,
Computability, Proof, and Open-Texture
439
a polyhedron just is a set of things called “vertices”, “edges”, and “faces” that satisfy some given formal conditions. The student insists that it really does not matter what the “vertices”, “edges”, and “faces” are, so long as the stated conditions are satisfied. That is, the theorem has been removed from the topic of space altogether. The student then gives a fully formal (or at least easily formalizable) proof of a generalization of Euler’s theorem from these definitions. The only residual question left, it seems to me, is the extent to which the set-theoretic definition captures the essence of the original, pre-theoretic (or at least pre-formal) concept of polyhedron. The orientation here fits nicely into a model-theoretic or algebraic account of mathematics that has been popular since the start of the twentieth century. The perspective was championed by Hilbert [1899]. The following occurs in a letter that Hilbert wrote to Frege, who complained about Hilbert’s orientation to mathematics:9 [...] it is surely obvious that every theory is only a scaffolding or schema of concepts together with their necessary relations to one another, and that the basic elements can be thought of in any way one likes. If in speaking of my points, I think of some system of things, e.g., the system love, law, chimneysweep [...] and then assume all my axioms as relations between these things, then my propositions, e.g., Pythagoras’ theorem, are also valid for these things [...] [A]ny theory can always be applied to infinitely many systems of basic elements.
Frege pointed out that Hilbert’s orientation loses touch with geometry, just as some might complain that the bright student at the end of the Lakatos dialogue has lost touch with the intuitive notion of polyhedron. It has nothing to do with figures in space. One can perhaps claim, now, that the final, austere and rigorous set-theoretic definition of “polyhedron”—as a set of “vertices”, “edges”, and “faces” under certain conditions—is not subject to open-texture. Its boundaries are as determinate as one could wish— assuming that there is no flexibility concerning the logic or the underlying set-theoretic model theory. But this is not to say that the original, pre-theoretic notion of “polyhedron” was similarly determinate, nor is it to say that the pre-theoretic notion (or notions) exactly 9
The correspondence between Frege and Hilbert is published in Frege [1976] and translated in Frege [1980].
440
Stewart Shapiro
matched the formal definition. This last is yet another example of the same sort of thing as Church’s thesis. Lakatos does not rest content with a rational reconstruction of the history of this one case. An appendix to Lakatos [1976] briefly explores the development of the notions of continuity and convergence, including the split between uniform and pointwise versions of these notions. The [1978] volume has essays and notes dealing with this example and a treatment of infinitesimals. 6. Church’s thesis, open-texture, and Turing’s theorem. Let us now examine Church’s thesis, and the impact of Turing’s [1936] analysis (as well as those of Gandy [1980], [1988] and Sieg [2002], [2002a]), in terms of the open-texture of mathematical concepts, and the Lakatos framework of proofs and refutations. Our main question, of course, concerns the so-called “intuitive”, or “pretheoretic” notion of computability. I do not want to get too hung up in an ordinary-language-style meaning analysis, but the suffix “able” means something like “capable of” or “it is possible to”. To say that an item is edible is to say that it is capable of being eaten— presumably without toxic effects. To say that a concept is definable is to say that it is capable of definition. And to say that a numbertheoretic function is computable is to say that it is capable of being computed: it is possible to compute the function. Typically, the extension of the modal construction underlying the suffix is sensitive to the interests of those speaking or writing, and to their background assumptions. Say that a distance and time is “runable” by me if I can cover the distance in the given time on relatively flat ground. Is an eight-minute mile runable by me? It depends on what, exactly, is being asked. If I warm up right now and go out and run a mile as fast as I can, it will take me much longer than eight minutes, probably ten or eleven. So, in that sense, an eight minute mile is not runable by me. On the other hand, if I were to spend six months on a training regimen, which involves working out diligently with a trainer four or five days each week and losing about 25 pounds (and avoiding injury), then I probably could manage an eight minute mile again. And I am probably capable of executing this regimen. So in a sense, an eight minute mile is runable after all. If I underwent surgery and spent several years on training, and perhaps replaced some of my body parts, I might get it down to seven minutes. So, in some sense, a seven minute mile is runable.
Computability, Proof, and Open-Texture
441
To take a mathematical example, it is possible to trisect an arbitrary angle? This depends on what tools one is allowed to use, and how accurate the result has to be. If complete accuracy is required, and one can use only a compass and unmarked straightedge, then, of course, the answer is “no”. If one is allowed to use a compass and a marked straightedge, then the answer is “yes”. And an angle is easily trisectable, using compass and unmarked straightedge, if one does not care about differences less than, say, one tenth of the width of a line drawn with a sharp pencil. So what are the corresponding parameters of the pre-theoretic notion of computability? What tools and limitations are involved? I would suggest that in the thirties, and probably for some time afterward, this notion was subject to open-texture. The concept was not delineated with enough precision to decide every possible consideration concerning tools and limitations. And just as with Lakatos’s example of a polyhedron, the mathematical work, notably Turing’s [1936] argument and the efforts of the other founders—people like Church, Turing, Emil Post, Kleene, and P´eter—sharpened the notion to what we have today. In other words, the mathematical work served to set the parameters and thus sharpen the original pre-theoretic notion. Several scholars, such as Copeland, Sieg, and Gandy (op. cit.), have noted that Turing was not concerned with what (analog or digital) physical machines are capable of. As noted, Turing’s own text makes it clear that he means to discuss computation by a human being, following an algorithm. Let us use the term “computist” to designate a person engaged in such computation. To pursue the question involving trisectability, what tools and abilities is the computist allowed to use? The philosophical issues are illustrated in the so-called “easy” half of Church’s thesis: every recursive function is computable. This raises the more or less standard matter of idealization. The following double recursion defines an Ackermann function f (in terms of the successor function s): ∀y(f y0 = s0) ∀x(f 0sx = ssf 0x) ∀y∀x(f sysx = f yf syx) The defined function is recursive, but not primitive recursive. Boolos [1987] points out that the value of f 5,5 is larger than the number of
442
Stewart Shapiro
particles in the known universe. In a real sense, the Ackermann function cannot be computed, in much the same sense that a two minute mile is not runable (by me or anyone else). No human computist will live long enough to complete this instance of the algorithm. Even if we waive that matter, and allow people to pass on long calculations to subsequent generations, this instance of the function cannot be computed without using more material than is available in the known universe. So here is a recursive function that is not computable. Mendelson [1963, §1] discusses a similar example, due to Porte [1960], noting that “it will be impossible to carry out the computation of” some instances of the defined function “within the lifespan of a human being or probably within the life-span of the human race”. Mendelson goes on to note that from “this fact Porte concludes that the general recursive function is not humanly computable, and therefore not effectively computable”. And we did just note that we are particularly interested in human (as opposed to machine) computability. Of course, this instance is standard, and so is the reply. Mendelson continues: Human computability is not the same as effective computability. A function is considered effectively computable if its value can be computed in an effective way in a finite number of steps, but there is no bound on the number of steps required for any given computation. Thus, the fact that there are effectively computable functions which may not be humanly computable has nothing to do with Church’s thesis.
So we are not particularly interested in human computists, but idealized counterparts thereof. On the first day of a class on computability, the teacher announces that we are to idealize on such matters as the attention span, lifetime, and available materials. I do not doubt the coherence of this. Since antiquity, this idealization has been standard fare throughout mathematics. Euclid’s first postulates reads “to draw a straight line from any point to any point” and the third is “To describe a circle with any center and distance”. There are no bounds on how close the two points have to be or how small or large the radius can be. My only point here is that the idealization represents a sharpening of the pre-theoretic notion of computability. The intuitive notion, like any of the above modal notions, leaves
Computability, Proof, and Open-Texture
443
open certain interest-relative parameters. What, exactly, should I do to try to run a mile in eight minutes? What, exactly, counts as a computation? In opting for (this half of) Church’s thesis, we fix the parameters in a certain direction. Note that even though we idealize, we do insist that the instructions given to the computist be finite. We could idealize more and allow infinitely long algorithms. But the result is not very interesting: every number-theoretic function would be “computable”. Similarly, every function can be “computed” with a Turing machine with infinitely many states. So there are limits to the idealization. We could idealize more than we in fact do. The fact that we are idealizing somewhat on time, attention span, and materials does not, by itself, sharpen the notion all the way to the hard rails of recursiveness. Some open-texture remains, depending on how far the idealization is to go. It seems reasonable, for example, to set some fixed bound on how fast a function can grow before we will call it “computable”. We might require a function to be computable in polynomial time, or exponential time, or hyperexponential time, or in polynomial space, or whatever. This would disqualify the Ackermann functions, and perhaps with good reason. But, of course, we set no such bounds. We noted that we are interested in (idealized) human computability. But what do we count as part of the computist? What tools do we allow? We must assume, first, that there is some limit to the size of individual symbols that the computist can recognize and distinguish. If an ideal computist could distinguish infinitely many different symbols, then she could be given instructions infinitely long, and thus could compute any number-theoretic function, as above. Turing [1936] provides a compelling argument for the “restriction” to finite alphabets. If each of infinitely many different symbols could be written in a given space (say a square centimeter), then some of them would be arbitrarily close to each other, and thus indistinguishable. Turing’s argument here is another instance of what Kreisel calls “informal rigor”, and it serves to sharpen a concept of “algorithm”. The thesis that there are limits to discriminability is perhaps what Lakatos might call a hidden lemma of Turing’s argument. Even an idealized person, considered in isolation, is more like a finite state machine than a Turing machine. As above, we waive the fact that the brain has only so many neurons, and thus only so much
444
Stewart Shapiro
memory. But, as above, we do insist on a finite memory. Otherwise, any given number-theoretic function may be “computable”. Even when we idealize, we may have nothing corresponding to the Turing machine tape. It is customary to allow the computist to use pencil and paper for doing scratch work. Suppose that we limit the computist to a certain, fixed, amount of paper, say one standard sheet for each particle in the universe. Then, given the aforementioned limits on discrimination, the result is at most a (very large) finite state machine, and not a Turing machine. The idealization relevant to Church’s thesis is that the computist has a truly unbounded amount of paper available. We assume that there is no possible circumstance in which she runs out of scratch paper or pencils.10 Again, I do not claim that this conception of computability is incoherent, nor that once it is formulated, it corresponds—exactly—to recursiveness. Church’s thesis is certainly true. But this is not to say that the original, pre-theoretic or intuitive notion of computability was as sharp as this. In his later article Mendelson [1990, p. 233] observes that there is little doubt that the so-called “east half” of CT is established beyond all doubt: The so-called initial functions are clearly [...] computable; we can describe simple procedures to compute them. Moreover, the operations of substitution and recursion and the leastnumber operator lead from [...] computable functions to [...] computable functions. In each case, we can describe procedures that will compute the new functions.
Mendelson concludes that this “simple argument is as clear a proof as I have seen in mathematics, and it is a proof in spite of the fact that it involves the intuitive notion of [...] computability”. I quite agree. 10
A Mac, PC, or mainframe is also more like a finite state device than a Turing machine, since it does not have unlimited memory. To get something equivalent to a Turing machine, we could consider a PC together with a supply of external disks (tapes, floppies, memory sticks, etc.) and a clerk who keeps track of which disk is inserted into the machine. The clerk responds to instructions like “remove the current disk and insert the next one”. We need not assume that the clerk has access to an actual infinity of disks. It is enough that he live near a disk factory and can obtain more, when needed. Thanks to John Case for this metaphor. We do have to assume that the factory will never run out of materials to make more disks.
Computability, Proof, and Open-Texture
445
The argument is compelling, and it, or something like it, certainly does or did convince just about all of the folks working in computability. This probably explains why there is not much challenge to this half of CT, once issues of feasibility and bounds on memory and the like are taken off the table. Mendelson’s argument here is thus a rather straightforward example of Kreisel’s informal rigor, and it does refute a thesis that it is not possible to prove things about an intuitive, pre-formal notion (§1 above). Perhaps no one meant to assert anything this strong. A more charitable reading of the once received view on CT is that it is not possible to establish, in mathematics, sharp limits to the intuitive, pre-theoretic notion of computability, giving necessary and sufficient conditions for it. Be that as it may, I submit that both conceptually and in the historical context, the argument for the easy half of CT serves to sharpen the intuitive notion. The reasoning allows us to see where the idealization from actual human or machine abilities comes in, and, indeed, to see what the idealization is. The argument takes no account on the length of the sequence of functions used to define a recursive function. By examining the argument, we see that we are to ignore, or reject, the possibility of a computation failing because the computist runs out of memory or materials. And we do not care about how much time a computation takes, or how much space we need for it. We thus declare the Ackermann functions to be computable, for example. The idealization away from considerations of feasibility, bounded states, and the like, is what Lakatos might call a “hidden lemma” of the proof. Logically, however, it would not be amiss for someone to invoke the Ackermann function as a Lakatosstyle refutation, calling for monster-barring, or monster-adjustment, or retreat to safety, or the like. And someone else could retort to one of these moves that issues of feasibility and limits on space and memory were never part of the pre-theoretic notion. A conceptual analysis would reveal this. A third person could take the situation as impetus to generalize the notion of computability, yielding notions like finite state computability, push-down computability, polynomial space computability, etc. With Waismann, I do not see a strong need to adjudicate disputes concerning which formalized notion gives the true essence of the pre-theoretic notion.
446
Stewart Shapiro
In light of the proof, and with hindsight, it remains eminently reasonable to focus on the idealized notion of computability, just as it was reasonable to focus on the sharply defined notions of polyhedron, continuity, area, and the like. Arguments like the one Mendelson cites serve to help clear away the vagueness and ambiguity in the appropriate direction. We look to see what the proof proves, and thereby gain insight on how the pre-theoretic notion should be sharpened. After the fact, one might think that the argument proves something determinate about a previously sharp notion. I submit that this would be a mistake. I’d say that the proof serves to fix the intuitive notion. Let us now turn to the so-called harder and, at one time, controversial direction of Church’s thesis, the statement that every computable function is recursive. Here, too, we ponder exactly what tools and abilities we allot to our idealized human computist. In a sense, we approach that question from the opposite side. First, we are asking about deterministic computability. The computist should not act randomly, willy nilly, at any point during the computation. She is supposed to execute a fixed algorithm. OK, then what is an algorithm? Presumably, an algorithm is a set of instructions that tells the computist what to do at every stage. Is that notion sufficiently sharp, pre-theoretically? To give Church’s thesis a chance, we have to allow the computist some “intuitive” abilities. At the very least, we assume that at any given moment during the computation, she can reliably detect which symbol she is looking at. She can distinguish a token of the type ‘1’ from a blank, for example. Otherwise, the computist is limited to what can be stored in memory (and even that assumes that memory recall is infallible). Let us consider an idealized person who has an intuitive ability to detect whether a given sentence in the language of arithmetic is true. Suppose that we give this idealized person the following instruction: if the string before you is a truth of arithmetic, then output 1; otherwise output 0. Does this count as an algorithm, for our super idiot savant? After all, the instructions tell him what to do at every stage, and by hypothesis, she is capable of executing this instruction. If this does count as an algorithm, then Church’s thesis is false, since the computed function is not recursive.
Computability, Proof, and Open-Texture
447
Clearly, this does not count as an algorithm. But what in the pretheoretic notion precludes it? We already noted, several times, that we are to idealize on human limitations. Presumably, the idealized computist is not supposed to use any intuition—other than what is needed to recognize symbols. Do we have a clear, unambiguous concept of what counts as (allowable) intuition? To be sure, actual humans do not have the ability to recognize arithmetic truth. But what of our idealized computists?11 Let us turn from this perhaps silly thought experiment to a published argument. Kalm´ ar [1959, p. 72] accepted the aforementioned near-consensus view that “Church’s thesis is not a mathematical theorem which can be proved or disproved in the exact mathematical sense”. Still, he gives a “plausibility argument” against CT. Let f be a two-place function from natural numbers to natural numbers. Define the improper minimalization of f to be the function f 0 such that: ½ the least natural number y such that f (x, y) = 0, if there is such a y 0 f (x) = 0, if there is no natural number y such that f (x, y) = 0. There are recursive functions φ whose improper minimalization ψ is not recursive.12 Indeed, the halting problem is a improper minimalization of the function whose value is 0 if y is the code of a complete computation of the Turing machine whose code is x, started with x as input; and whose value is 1 otherwise. Kalm´ ar proposes the following “method” to “calculate the value ψ(p) in a finite number of steps”: Calculate in succession the values ϕ(p, 0), ϕ(p, 1), ϕ(p, 2), ... and simultaneously try to prove by all correct means that none of them equals 0, until we find either a (least) natural number q for which ϕ(p, q) = 0 or a proof of the proposition stating that no natural number y with ϕ(p, y) = 0 exists; and consider in the first case this q, in the second case 0 as result of the calculation. [Kalm´ ar 1959, pp. 76–77] 11
See Tennant [1997, Chapter 5] for a defense of the standard idealizations. The proper minimalization of a two-place function f is the partial function whose value at x is the least y such that f (x, y) = 0, if there is such a y, and is undefined otherwise. The proper minimalization of a (total) recursive function is itself partial recursive. The operation plays in important role in the theory of partial recursive functions. 12
448
Stewart Shapiro
Kalm´ar concedes that this “method” will “compute” the function only if for each p, if there is no y such that ϕ(p, y) = 0, then this fact can be proved by correct means. But he finds this last assumption plausible. It also assumes that we (or our idealized counterparts) can reliably detect proofs, when such go by correct means. These are perhaps “hidden lemmas” of Kalm´ar’s “proof”. Do not get me wrong. I am not suggesting that there is a legitimate sense in which Kalm´ ar’s method counts as an algorithm. As noted above, several times, I accept Church’s thesis, without reservation. Even at the time, Mendelson [1963, §3] had no trouble dismissing the example. But is there something unambiguous in the pre-theoretic notion, or notions, of computability, in use in the 30’s and a few decades after, that rules it out, definitively? The question here is why Kalm´ ar thought that this “method” constitutes an algorithm that is relevant to Church’s thesis. To say the least, he was an intelligent mathematician, and was not prone to deny what is obvious, a mere matter of understanding the meaning of a word in use. Notice that Kalm´ ar’s instructions do tell the computist what to do at each stage, provided that she is a competent mathematician. She is told to try to prove a certain theorem. Mathematicians know how to do that. Entrance and qualifying examinations test prospective mathematicians for their ability to prove theorems. Those that display this ability are admitted to the profession. If Kalm´ar’s “plausible” assumption is correct, then his “method” will indeed terminate with the correct answer, if the computist is diligent enough and has unlimited time and materials at her disposal. The problem, of course, is that the “method” does not tell the computist exactly what to do at each stage. It does not specify what to write down, which proof tricks to try. Kalm´ar himself notes that G¨ odel’s theorem shows that there is no sound, recursive formal system that yields all and only the arithmetic truths. But we cannot identity recursiveness with computability without begging the question. Again, what is an algorithm? P´eter has been called the founding mother of recursive function theory. Chapter 19 of her landmark textbook at least tentatively endorses Kalmar’s “plausibility” argument against Church’s thesis, ([1957, §19.2]). The next chapter turns to Church’s thesis. It opens:
Computability, Proof, and Open-Texture
449
Now I should like to quote some of the arguments used in attempts to make plausible the identification of the “calculable functions” with the general recursive functions. The assertion that the values of a function are everywhere calculable in a finite number of steps has meaning only under the condition that this calculation does not depend on some individual arbitrariness but constitutes a procedure capable of being repeated and communicated to other people at any time. Hence it must be a mechanical procedure, and thus one can imagine, in principle, a machine able to carry through the single steps of the calculation.
One would think that Kalm´ ar’s “method” for calculating improper minimalizations is thereby disqualified. Different mathematicians will proceed differently to the instruction: “try to prove such and such a theorem”, and no one has argued that this “method” can be mechanized. Indeed, it can’t be. Still, P´eter does not object, at least not yet. She goes on to give a detailed, painstaking analysis of computation, relating the notion to Turing machines. It is clearly in the spirit of Turing [1936]. The chapter closes thus: If we assume that in the concept of a function calculable by the aid of a Turing machine we have succeeded in capturing the concept of the most general number-theoretic function whose values are everywhere calculable, then the results obtained above in fact characterize the general recursive functions as the functions calculable in the most general sense; and under this interpretation, the function [defined by Kalm´ar] [...] is an example of a function not calculable in the most general sense. [P´eter 1957, §20.13]
So far, so good. But P´eter still hedges:
But, however plausible it may seem that this interpretation correctly reflects real mathematical activity, we are nevertheless dealing here with a certain demarcation of the concept of calculability, and the future evolution of mathematics may bring about methods of calculation completely unexpected nowadays.
Waismann could not put it better. P´eter declares that, at the time, the notion of computability is subject to open-texture.13 13 P´eter was not quite finished with Kalmar’s function. In the next Chapter (§21.8), she calls for a generalization of the notion of Herbrand–G¨ odel recursive-
450
Stewart Shapiro
As the saying goes: that was then, this is now. It seems to me that in the ensuing decades, the community of logicians has come to see the notion of computability as sufficiently sharpened. It is now reasonable to hold that Church’s thesis is established with as much rigor as anything in (informal) mathematics. It takes its place among the other “theses” mentioned in §3 above. More recently, Turing’s argument has been supplemented and extended by the deep analyses by Gandy and Sieg. In effect, they have fulfilled G¨odel’s suggestion above (§1): “to state a set of axioms which would embody the generally accepted properties of this notion, and to do something on that basis”. The axioms have something of the flavor of the final definition of “polyhedron” in Lakatos [1976]. A “computation” is a certain kind of function on hereditarily finite sets. Yet the axioms on computations are perfectly reasonable. Indeed, they are obvious truths about the notion of computability in question, or at least they are now. Gandy and Sieg have also analyzed the concept of machine computability, in painstaking detail, and Sieg has shown that notion to be coextensive to recursiveness, and thus with human computability. Here, too, we can see the sharpening of a concept subject to opentexture. It is sometimes pointed out that there are models of current physical theories in which a certain quantity, such as the temperature at a given point or the number of electrons in a given area, is not recursive. So, one might think, a machine that detects the value of the function might count as “computing” a non-recursive function.14 After all, its instructions are clear and the device is physically determined, up to quantum effects. For a second example, involving relativity, it may be possible, in some sense, for a device to detect, in a finite time, what would be the result of an infinitely ness, and then cites a theorem, due to Kalm´ ar, that the above improper minimalization function is the unique solution to a certain system of equations. She suggests, tentatively, that this result “will perhaps in the course of time rouse more doubts as to the complete generality of the concept of the general recursive function”. In a review of P´eter [1957], Raphael M. Robinson [1958] reads this as expressing doubts about Church’s Thesis. He “does not regard it as surprising that a function determined in a non-effective way from a system of functional equations should fail to be general recursive”. It seems to me, however, that P´eter’s remarks here are directed at the intuitive, informal notion of a function defined by recursion—the other notion invoked in CT. Recursiveness, too, may be subject to open-texture. 14 Thanks to Peter Clark.
Computability, Proof, and Open-Texture
451
long “computation” of another machine, provided that the structure of space-time cooperates (and an actual infinity of materials were available). In a sense, the first device can determine the result of a “super-task”. Such a device would thus be able to solve the halting problem, in a finite time. The conditions on the Sieg-machines rule out all such “devices”, since they compute non-recursive functions. Moreover, the conditions in question are eminently reasonable. But can one say that the rogue devices in question are definitively ruled out by the intuitive, pre-theoretic notion(s) of machine computation? Perhaps, but it might be better to say, as Waismann notes, that the question itself operates with too blurred an expression. I submit that here, too, a better interpretation is that the Gandy–Sieg analyses to sharpen the notion. In sum, the notions of (idealized) human computability (and idealized machine computability) are now about as sharp as anything gets in mathematics. Hard as rails. There is not much room for open-texture anymore—or so it seems, anyway. The present conclusion is that the notion of computability in use now is the result of the last seventy years of examining texts like Turing [1936] and of the overwhelming ensuing success of the theory of computability. It is not accurate to think of the historical proofs of CT and related theses as establishing something about absolutely sharp pre-theoretic notions. Rather, the analytical and mathematical work served to sharpen the very notions themselves. Sieg says as much, noting that the analyses result in a “sharpening [of] the informal notion”. Once again, this is the norm in mathematics.
Acknowledgments Thanks to the workshop on the foundations of mathematics held at Notre Dame in October of 2005 and to the mathematics workshop at the Arch´e Research Centre at the University of St. Andrews, which devoted sessions to this project.
452
Stewart Shapiro
References Bell, J. [1998], A Primer of Infinitesimal Analysis, Cambridge, Cambridge University Press. Black, R. [2000], “Proving Church’s Thesis”, Philosophia Mathematica (III) 8, 244–258. Boolos, G. [1987], “A Curious Inference”, Journal of Philosophical Logic 16, 1–12. Boolos, G. [1989], “Iteration Again”, Philosophical Topics 17, 5–21. Church, A. [1936], “An Unsolvable Problem of Elementary Number Theory”, American Journal of Mathematics 58, 345–363; reprinted in [Davis [1965], pp. 89–107]. Copeland, J. [1997], “The Church–Turing Thesis”, Stanford Encyclopedia of Philosophy, . Crowell, R. and Fox, R. [1963], Introduction of Knot Theory, Boston, Ginn and Company. Davis, M. [1965], The Undecidable, Hewlett, New York, The Raven Press. Davis, M. [1982], “Why G¨ odel Didn’t Have Church’s Thesis”, Information and control 54, 3–24. Folina, J. [1998], “Church’s Thesis: Prelude to a Proof”, Philosophia Mathematica (3) 6, 302–323. Frege, G. [1976], in Wissenschaftlicher Briefwechsel, (G. Gabriel, H. Hermes, F. Kambartel, and C. Thiel eds.), Hamburg, Felix Meiner. Frege, G. [1980], Philosophical and Mathematical Correspondence, Oxford, Basil Blackwell. Gandy, R. [1980], “Church’s Thesis and Principles of Mechanisms”, in The Kleene Symposium, (J. Barwise, H.J. Keisler, and K. Kunen eds.), Amsterdam, North Holland, pp. 123–148. Gandy, R. [1988], “The Confluence of Ideas in 1936”, in The Universal Turing Machine, (R. Herken ed.), New York, Oxford University Press, pp. 55–111. G¨odel, K. [1934], “On Undecidable Propositions of Formal Mathematical Systems”, [Davis 1965, pp. 39–74].
Computability, Proof, and Open-Texture
453
G¨odel, K. [1946], “Remarks before the Princeton Bicentennial Conference on Problems in Mathematics”, [Davis 1965, pp. 84–88]. Hilbert, D. [1899], Grundlagen der Geometrie, Leipzig, Teubner; Foundations of geometry, translated by E. Townsend, La Salle, Illinois, Open Court, 1959. Kalm´ar, L. [1959], “An Argument against the Plausibility of Church’s Thesis”, Constructivity in mathematics, Amsterdam, North Holland, pp. 72–80. Kleene, S. [1952], Introduction to Metamathematics, Amsterdam, North Holland. Kleene, S. [1981], “Origins of Recursive Function Theory”, Annals of the History of Computing 3(1), 52–67. Kleene, S. [1987], “Reflections on Church’s Thesis”, Notre Dame Journal of Formal Logic 28, 490–498. Kreisel, G. [1967], “Informal Rigour and Completeness Proofs”, Problems in the Philosophy of Mathematics, (I. Lakatos ed.), Amsterdam, North Holland, pp. 138–186. Kreisel, G. [1987], “Church’s Thesis and the Ideal of Informal Rigour”, Notre Dame Journal of Formal Logic 28, 499–519. Lakatos, I. [1976], in Proofs and Refutations, (J. Worrall and E. Zahar eds.), Cambridge, Cambridge University Press. Lakatos, I. [1978], in Mathematics, Science and Epistemology, (J. Worrall and G. Currie eds.), Cambridge, Cambridge University Press. Maddy, P. [1997], Naturalism in Mathematics, Oxford, Oxford University Press. Mendelson, E. [1963], “On some Recent Criticisms of Church’s Thesis”, Notre Dame Journal of Formal Logic 4, 201–205. Mendelson, E. [1990], “Second Thoughts about Church’s Thesis and Mathematical Proofs”, The Journal of Philosophy 87, 225–233. Nelson, R.J. [1987], “Church’s Thesis and Cognitive Science”, Notre Dame Journal of Formal Logic 28, 581–614. Peter, R. [1957], Rekursive Funktionen, second enlarged edition, Budapest, Verlag der Ungarischen Akadamie der Wissenschaften; translated as Recursive functions, New York, Academic Press, 1967.
454
Stewart Shapiro
Porte, J. [1960], “Quelques pseudo-paradoxes de la ‘calculabilit´e effective’”, Actes du 2me Congr´es International de Cybernetique, Namur, Belgium, Association Internationale de Cybern´etique, pp. 332–334. Quine, W.V.O. [1951], “Two Dogmas of Empiricism”, Philosophical Review 60, 20–43. Robinson, R.M. [1958], review of [Peter 1957], Journal of Symbolic Logic 23, 362–363. Shapiro, S. [1981], “Understanding Church’s Thesis”, Journal of Philosophical Logic 10, 353–365. Shapiro, S. [1990], review of [Kleene 1981], [Davis 1982], and [Kleene 1987], Journal of Symbolic Logic 55, 348–350. Shapiro, S. [1993], “Understanding Church’s Thesis, again”, Acta Analytica 11, 59–77. Sieg, W. [1994], “Mechanical Procedures and Mathematical Experience”, in Mathematics and Mind, (A. George ed.), Oxford, Oxford University Press, pp. 71–140. Sieg, W. [2002], “Calculations by Man and Machine: Conceptual Analysis”, in Reflections on the Foundations of Mathematics: Essays in Honor of Solomon Feferman, (W. Sieg, R. Sommer, C. Talcott eds.), Natick, Massachusetts, Association for Symbolic Logic, A.K. Peters, Ltd., pp. 390–409. Sieg, W. [2002a], “Calculations by Man and Machine: Mathematical Presentation”, in In the scope of logic, methodology and philosophy of science 1, (P. G¨ardenfors, J. Woleński, and K. Kijania–Placek eds.), Dordrecht, Kluwer Academic Publishers, pp. 247–262. Tennant, N. [1997], The taming of the true, Oxford, Oxford University Press. Turing, A. [1936], “On computable numbers, with an application to the Entscheidungsproblem”, Proceedings of the London Mathematical Society 42, 230–265; reprinted in [Davis 1965, pp. 116–153]. Waismann, F. [1949], “Analytic-Synthetic I”, Analysis 10, 25–40. Waismann, F. [1950], “Analytic-Synthetic II”, Analysis 11, 25–38. Waismann, F. [1951], “Analytic-Synthetic III”, Analysis 11, 49–61.
Computability, Proof, and Open-Texture
455
Waismann, F. [1951a], “Analytic-Synthetic IV”, Analysis 11, 115–124. Waismann, F. [1952], “Analytic-Synthetic V”, Analysis 13, 1–14. Waismann, F. [1953], “Analytic-Synthetic VI”, Analysis 13, 73–89. Waismann, F. [1968], “Verifiability”, in Logic and Language, (A. Flew ed.), Oxford, Basil Blackwell.
Wilfried Sieg∗
Step by Recursive Step: Church’s Analysis of Effective Calculability In fact, the only evidence for the freedom from contradiction of Principia Mathematica is the empirical evidence arising from the fact that the system has been in use for some time, many of its consequences have been drawn, and no one has found a contradiction. [Church in a letter to G¨odel, July 27, 1932]
Alonzo Church’s mathematical work on computability and undecidability is well-known indeed, and we seem to have an excellent understanding of the context in which it arose. The approach Church took to the underlying conceptual issues, by contrast, is less well understood. Why, for example, was “Church’s Thesis” put forward publicly only in April 1935, when it had been formulated already in February/March 1934? Why did Church choose to formulate it then in terms of G¨ odel’s general recursiveness, not his own λ-definability as he had done in 1934? A number of letters were exchanged between Church and Paul Bernays during the period from December 1934 to August 1937; they throw light on critical developments in Princeton during that period and reveal novel aspects of Church’s distinctive contribution to the analysis of the informal notion of effective calculability. In particular, they allow me to give informed, though still tentative answers to the questions I raised; the character of my an∗
W. Sieg, Department of Philosophy, Carnegie Mellon University, Pittsburgh. This paper is dedicated to the memory of Alonzo Church. A number of colleagues and friends helped me to improve it significantly: J. Avigad, A. Blass, J. Byrnes, M. Davis, S. Feferman, W.W. Tait, and G. Tamburrini. I am very grateful to J. Dawson for providing me with a most informative and reflective letter from Church that is reproduced in Appendix D.
Step by Recursive Step...
457
swers is reflected by an alternative title for this paper, Why Church needed G¨ odel’s recursiveness for his Thesis. In section 5, I contrast Church’s analysis with that of Alan Turing and explore, in the very last section, an analogy with Dedekind’s investigation of continuity.
0. Proem on Church & G¨ odel Church’s mathematical work on computability and undecidability is well-known, and its development is described, for example, in informative essays by his students Kleene and Rosser. The study of the Church Nachlaß may provide facts for a fuller grasp of this evolution, but it seems that we have an excellent understanding of the context in which the work arose.1 By contrast, Church’s approach to the underlying conceptual issues is less well understood, even though a careful examination of the published work is already quite revealing. Important material, relevant to both historical and conceptual isssues, is contained in the Bernays Nachlaß at the Eidgen¨ossische Technische Hochschule in Z¨ urich. A number of letters were exchanged between Church and Bernays during the period from December 1934 to August 1937; they throw light on critical developments in Princeton2 and reveal novel aspects of Church’s contribution to the analysis of the informal notion of effective calculability. That contribution has been recognized by calling the identification of effective calculability with G¨ odel’s general recursiveness, or equivalent notions, Church’s Thesis. Church proposed the definitional identification publicly for the first time in a talk to the American Mathematical Society on April 19, 1935; the abstract of the talk had been received by the Society 1
For additional background, cf. Appendix 2 in [Sieg 1994] and Church’s letter in Appendix D. It would be of great interest to know more about the earlier interaction with leading logicians and mathematicians: as reported in [Enderton 1995], Church spent part of his time as a National Research Fellow from 1927 to 1929 at Harvard, G¨ ottingen, and Amsterdam. 2 This correspondence shows also how closely Bernays followed and interacted with the work of the Princeton group; this is in striking contrast to the view presented in [Gandy 1988]. It should also be noted that Bernays was in Princeton during the academic year 1935–6, resulting in [Bernays 1935–6]. I assume that Bernays spent only the Fall term, roughly from late September 1935 to around February 1936, in Princeton. In any event, he made the transatlantic voyage, starting from Le Havre on September 20, in the company of Kurt G¨ odel and Wolfgang Pauli. Due to health reasons, G¨ odel left Princeton again at the very end of November; cf. [Dawson 1997, pp. 109–110].
458
Wilfried Sieg
already on March 22. Some of the events leading to this proposal (and to the first undecidability results) are depicted in Martin Davis’s fascinating paper Why G¨ odel did not have Church’s Thesis: Church formulated a version of his thesis via λ-definability in conversations during the period from late 1933 to early 1934;3 at that time, the reason for proposing the identification was the quasi-empirical fact he expressed also strongly in a letter to Bernays dated January 23, 1935: The most important results of Kleene’s thesis concern the problem of finding a formula to represent a given intuitively defined function of positive integers (it is required that the formula shall contain no other symbol than λ, variables, and parentheses). The results of Kleene are so general and the possibilities of extending them apparently so unlimited that one is led to the conjecture that a formula can be found to represent any particular constructively defined function of positive integers whatever.
How strongly such quasi-empirical evidence impressed Church is illustrated by the quotation from his letter to G¨odel in the motto of my paper; that was written in response to G¨odel’s question concerning Church’s 1932 paper: “If the system is consistent, won’t it then be possible to interpret the basic notions in a system of type theory or in the axiom system of set theory, and is there, apart from such an interpretation, any other way at all to make plausible the consistency?”4 In the 1935 abstract the thesis was formulated, however, in terms of general recursiveness, and the sole stated reason for the identification is that “other plausible definitions of effective calculability turn out to yield notions that are either equivalent to or weaker than recursiveness”. For Davis, this wording “leaves the impression that in the early spring of 1935 Church was not yet certain that λ-definability and Herbrand–G¨odel recursiveness were equivalent.”5 Davis’s account continues as follows, specifying a particular order in which central results were obtained: 3
Cf. section 1 and, in particular, Rosser’s remarks quoted there. The German original: “Falls das System widerspruchsfrei ist, wird es dann nicht m¨ oglich sein, die Grundbegriffe in einem System mit Typentheorie bzw. im Axiomensystem der Mengenlehre zu interpretieren, und kann man u ¨ berhaupt auf einem anderen Wege als durch eine solche Interpretation die Widerspruchsfreiheit plausibel machen?” 5 L.c., p. 10. 4
Step by Recursive Step...
459
Meanwhile, Church and Kleene each proved that all λdefinable functions are recursive. Church submitted an abstract of his work on [sic] March 1935, basing himself on recursiveness rather than λ-definability. By the end of June 1935, Kleene had shown that every recursive function is λ-definable, after which Church [1936] was able to put his famous work into its final form. Thus while G¨ odel hung back because of his reluctance to accept the evidence for Church’s thesis available in 1935 as decisive, Church (who after all was right) was willing to go ahead, and thereby launch the field of recursive function theory. [p. 12]
The accounts in [Kleene 1982] and [Rosser 1984], together with the information provided by Church in his letters to Bernays, make it perfectly clear that the λ-definability of the general recursive functions was known at the very beginning of 1935; it had been established by Rosser and Kleene. The converse was not known when Church wrote his letter of January 23, 1935, but had definitely been established by July. Church wrote on July 15, 1935 his next letter to Bernays and pointed to “a number of developments” that had taken place “in the meantime”; these developments had led to a(n impressive) list of papers, including his own [1935a] and [1936], Kleene’s [1936 and 1936a], Rosser’s [1935], and the joint papers with Kleene, respectively Rosser. Contrary to Davis’s “impression”, the equivalence was known already in March of 1935 when the abstract was submitted: if the inclusion of λ-definability in recursiveness had not also been known by then, the thesis could not have been formulated coherently in terms of recursiveness. The actual sequence of events thus differs in striking ways from Davis’s account (based on more limited historical information); most importantly, the order in which the inclusions between λ-definability and general recursiveness were established is reversed. This is not just of historical interest, but important for an evaluation of the broader conceptual issues. I claim, and will support through the subsequent considerations, that Church was reluctant to put forward the thesis in writing—until the equivalence of λ-definability and general recursiveness had been established. The fact that the thesis was formulated in terms of recursiveness indicates also that λ-definability was at first, even by Church, not viewed as one among “equally natural definitions of effective calculability”: the notion just
460
Wilfried Sieg
did not arise from an analysis of the intuitive understanding of effective calculability. I conclude that Church was cautious in a similar way as G¨ odel. Davis sees stark contrasts between the two: in the above quotation, for example, he sees G¨odel as “hanging back” and Church as “willing to go ahead”; G¨odel is described as reluctant to accept the “evidence for Church’s Thesis available in 1935 as decisive”. The conversations on which the comparison between Church and G¨ odel are based took place, however, already in early 1934.6 Referring to these same conversations, Davis writes (and these remarks immediately precede the above quotation): The question of the equivalence of the class of these general recursive functions with the effectively calculable functions was [...] explicitly raised by G¨ odel in conversation with Church. Nevertheless, G¨ odel was not convinced by the available evidence, and remained unwilling to endorse the equivalence of effective calculability, either with recursiveness or with λdefinability. He insisted [...] that it was ‘thoroughly unsatisfactory’ to define the effectively calculable functions to be some particular class without first showing that ‘the generally accepted properties’ of the notion of effective calculability necessarily lead to this class.
Again, the evidence for the thesis provided by the equivalence, if it is to be taken as such, was not yet available in 1934. Church’s and G¨ odel’s developed views actually turn out to be much closer than this early opposition might lead one to suspect. That will be clear, I hope, from the detailed further discussion. The next section reviews the steps towards Church’s “conjecture”; then we will look at the equivalence proof and its impact. Church used implicitly and G¨ odel formulated explicitly an “absoluteness” property for the rigorous concept that is based on an explication of the informal notion as “computation in some logical calculus”. Sections 3 and 4 discuss that explication and absoluteness notion. In the next to last section I contrast their explication with the analysis of Alan Tur6
In footnote 18 of his [1936] Church remarks: “The question of the relationship between effective calculability and recursiveness (which it is here proposed to answer by identifying the two notions) was raised by G¨ odel in conversation with the author. The corresponding question of the relationship between effective calculability and λ-definability had previously been proposed by the author independently.”
Step by Recursive Step...
461
ing and explore, in the final section, an analogy between Turing’s analysis and Dedekind’s investigation of continuity. Let me mention already here that both Church and G¨odel recognized and emphasized the special character of Turing’s analysis: Church pointed out that Turing’s notion has the advantage (over recursiveness and λdefinability) “of making the identification with effectiveness in the ordinary (not explicitly defined) sense evident immediately”; G¨odel asserted that Turing’s work gives an analysis of the concept of mechanical procedure and that this concept is shown to be equivalent with that of a “Turing machine”.7
1. Effective Calculability: A Conjecture The first letter of the extant correspondence between Bernays and Church was mentioned above; it was written by Church on January 23, 1935 and responds to a letter by Bernays from December 24, 1934. Bernays’s letter is not preserved in the Z¨ urich Nachlaß, but it is clear from Church’s response, which issues had been raised in it: one issue concerned the applicability of G¨odel’s Incompleteness Theorems to Church’s systems in the papers [Church 1932 and 1933], another the broader research program pursued by Church with Kleene and Rosser. Church describes in his letter two “important developments” with respect to the research program. The first development contains Kleene and Rosser’s proof, published in their [1935], that the set of postulates in [Church 1932 and 1933] is contradictory. For the second development Church refers to the proof by Rosser and himself that a certain subsystem is free from contradiction. (Cf. Church’s letter in Appendix D for a description of the broader context.) The second development is for our purposes particularly significant and includes Kleene’s thesis work. Church asserts that the latter provides support for the conjecture that “a formula can be found to represent any particular constructively defined function of positive integers whatever”. He continues: It is difficult to prove this conjecture, however, or even to state it accurately, because of the difficulty in saying precisely what is meant by “constructively defined”. A vague description can be given by saying that a function is constructively defined if 7
G¨ odel’s brief and enigmatic remark (as to a proof of the equivalence) is elucidated in [Sieg and Byrnes 1996 and 1997].
462
Wilfried Sieg a method is given by which its value could be actually calculated for any particular positive integer whatever. Every recursive definition, of no matter how high an order, is constructive, and as far as I know, every constructive definition is recursive.8
The last remark is actually reminiscent of part of the discussion in [Church 1934], where Church claims that “[...] it appears to be possible that there should be a system of symbolic logic containing a formula to stand for every definable function of positive integers, and I fully believe that such systems exist”. [p. 358] From the context it is clear that “constructive definability” is intended, and the latter means minimally that the values of the function can be calculated for any argument. It is equally clear that the whole point of the paper is to propose plausible formal systems that, somehow, don’t fall prey to G¨ odel’s Incompleteness Theorems. A system of this sort [with levels of different notions of implications, W.S.] not only escapes our unpleasant theorem that it must be either insufficient or oversufficient, but I believe that it escapes the equally unpleasant theorem of Kurt G¨odel to the effect that, in the case of any system of symbolic logic which has a claim to adequacy, it is impossible to prove its freedom from contradiction in the way projected in the Hilbert program. This theorem of G¨ odel is, in fact, more closely related to the foregoing considerations than appears from what has been said. [p. 360]
Then Church refers to a system of postulates whose consistency can be proved and which probably is adequate for elementary number theory; it seems to be inconceivable to Church that all formal theories should fail to allow the “representation” of the constructively definable functions. Indeed, for the λ-calculus, the positive conjecture had been made by Church in conversation with Rosser tentatively late in 1933, with greater conviction in early 1934. Rosser describes matters in his [1984] as follows: 8 The quotation continues directly the above quotation from this letter. — Church’s paper [1934] was given on December 30, 1933 to a meeting of the Mathematical Association; incidentally, G¨ odel presented his [1933o] in the very same session of that meeting. Cf. also [G¨ odel 1936b], reviewing [Church 1935].
Step by Recursive Step...
463
One time, in late 1933, I was telling him [Church, W.S.] about my latest function in the LC [Lambda Calculus, W.S.]. He remarked that perhaps every effectively calculable function from positive integers to positive integers is definable in LC. He did not say it with any firm conviction. Indeed, I had the impression that it had just come into his mind from hearing about my latest function. With the results of Kleene’s thesis and the investigations I had been making that fall, I did not see how Church’s suggestion could possibly fail to be true. [...] After Kleene returned to Princeton on February 7, 1934, Church looked more closely at the relation between λ-definability and effective calculability. Soon he decided they were equivalent, [...] [p. 345]
Kleene put all of these events, except for Church’s very first speculations, after his “return to Princeton on February 7, 1934, and before something like the end of March 1934”; see [Davis 1982, p. 8]. Church discussed these issues also with G¨odel who was at that time, early 1934, not convinced by the proposal to identify effective calculability with λ-definability: he called the proposal “thoroughly unsatisfactory”.9 This must have been discouraging to Church, in particular, as G¨ odel suggested a different direction for supporting such a claim and made later in his lectures a different proposal for a broader notion; Church reports in a letter to Kleene of November 29, 1935: His [G¨ odel’s, W.S.] only idea at the time was that it might be possible, in terms of effective calculability as an undefined notion, to state a set of axioms which would embody the generally accepted properties of this notion, and to do something on that basis. Evidently it occurred to him later that Herbrand’s definition of recursiveness, which has no regard to effective calculability, could be modified in the direction of effective calculability, and he made this proposal in his lectures. At that time he did specifically raise the question of the connection between recursiveness in this new sense and effective calcula9
Church in a letter to Kleene, dated November 29, 1935, and quoted in [Davis 1982, p. 9]. The conversation took place, according to Davis, “presumably early in 1934”; that is confirmed by Rosser’s account on p. 345 of [Rosser 1984].
464
Wilfried Sieg bility, but said he did not think that the two ideas could be satisfactorily identified “except heuristically”.10
This was indeed G¨ odel’s view and was expressed in Note 3 of his 1934 Princeton lectures. The note is attached to the remark that primitive recursive functions have the important property that their unique value can be computed by a finite procedure—for each set of arguments. The converse seems to be true if, besides recursions according to the schema (2) [of primitive recursion; W.S.], recursions of other forms (e.g., with respect to two variables simultaneously) are admitted. This cannot be proved, since the notion of finite computation is not defined, but it serves as a heuristic principle.
To some it seemed that the note expressed a form of Church’s Thesis. However, in a letter of February 15, 1965 to Martin Davis, G¨odel emphasized that no formulation of Church’s Thesis is implicit in the conjectured equivalence; he explained: [...] it is not true that footnote 3 is a statement of Church’s Thesis. The conjecture stated there only refers to the equivalence of “finite (computation) procedure” and “recursive procedure”. However, I was, at the time of these lectures, not at all convinced that my concept of recursion comprises all possible recursions; and in fact the equivalence between my definition and Kleene’s [...] is not quite trivial.11
In the Postscriptum to his [1934] G¨odel asserts that the question raised in footnote 3 can now, in 1965, be “answered affirmatively” for his recursiveness “which is equivalent with general recursiveness as defined today”, i.e., with Kleene’s µ-recursiveness. I do not understand, how that definition could have convinced G¨odel that it captures “all possible recursions”, unless its use in proofs of Kleene’s normal form theorem is also considered. The ease with which “the” normal form theorem allows to establish equivalences between different formulations makes it plausible that some stable notion has been 10
This is quoted in [Davis 1982, p. 9], and is clearly is in harmony with G¨ odel’s remark quoted below. As to the relation to Herbrand’s concept, see the critical discussion in my [1994, pp. 83–85]. 11 [Davis 1982, p. 8].
Step by Recursive Step...
465
isolated; however, the question, whether that notion corresponds to effective calculability has to be answered independently. — The very next section is focused on the equivalence between general recursiveness and λ-definability, but also the dialectical role this mathematical result played for the first published formulation of Church’s Thesis.
2. Two Notions: An Equivalence Proof In his first letter to Bernays, Church mentions in the discussion of his conjecture two precise mathematical results: all primitive recursive, respectively general recursive functions in G¨odel’s sense can be represented, and that means that they are λ-definable. The first result is attributed to Kleene12 and the second to Rosser. The letter’s remaining three and a half pages (out of a total of six pages) are concerned with an extension of the pure λ-calculus for the development of elementary number theory, consonant with the considerations of [Church 1934] described above. The crucial point to note is that the converse of the mathematical result concerning general recursive functions and, thus, the equivalence between λ-definability and general recursiveness is not formulated. Bernays had evidently remarked in his letter of December 24, 1934, that some statements in [Church 1933] about the relation of G¨odel’s theorems to Church’s formal systems were not accurate, namely, that the theorems might not be applicable because some very special features of the system of Principia Mathematica seemed to be needed in G¨ odel’s proof.13 Church responds that Bernays’s remarks are “just” and then describes G¨odel’s response to the very same issue: G¨ odel has since shown me, that his argument can be modified in such a way as to make the use of this special property of the system of Principia unnecessary. In a series of lectures here at Princeton last spring he presented this generalized form of 12
This fact is formulated also in [Kleene 1935, Part II on p. 223]. G¨ odel and Church had a brief exchange on this issue already in June and July of 1932. In his letter of July 27, 1932, Church remarks that von Neumann had drawn his attention “last fall” to G¨ odel’s paper [1931] and continues: “I have been unable to see, however, that your conclusions in §4 of this paper apply to my system. Possibly your argument can be modified so as to make it apply to my system, but I have not been able to find a modification of your argument.” Cf. Appendix D. 13
466
Wilfried Sieg his argument, and was able to set down a very general set of conditions such that his theorem would hold of any system of logic which satisfied them.
The conditions Church alludes to are found in section 6 of G¨odel’s lectures; they include one precise condition that, according to G¨odel, in practice suffices as a substitute for the unprecise requirement that the class of axioms and the relation of immediate consequence be constructive. The unprecise requirement is formulated at the beginning of G¨ odel’s lectures to characterize crucial normative features for a formal mathematical system: We require that the rules of inference, and the definitions of meaningful formulas and axioms, be constructive; that is, for each rule of inference there shall be a finite procedure for determining whether a given formula B is an immediate consequence (by that rule) of given formulas A1 , . . . , An , and there shall be a finite procedure for determining whether a given formula A is a meaningful formula or an axiom. [p. 346]
The precise condition replaces “constructive” by “primitive recursive”.14 Not every constructive function is primitive recursive, however: G¨ odel gives in section 9 a function of the Ackermann type, asks what one might mean “by every recursive function”, and defines in response the class of general recursive functions via his equational calculus. Clearly, it is of interest to understand, why Church publicly announced the thesis only in his talk of April 19, 1935, and why he formulated it then in terms of general recursiveness, not λ-definability. Here is the full abstract of Church’s talk: Following a suggestion of Herbrand, but modifying it in an important respect, G¨ odel has proposed (in a set of lectures at Princeton, N.J., 1934) a definition of the term recursive function, in a very general sense. In this paper a definition of recursive function of positive integers which is essentially G¨ odel’s is adopted. And it is maintained that the notion of an effectively calculable function of positive integers should be 14
Here and below I use “primitive recursive” where G¨ odel just says “recursive” to make explicit the terminological shift that has taken place since [G¨ odel 1934].
Step by Recursive Step...
467
identified with that of a recursive function, since other plausible definitions of effective calculability turn out to yield notions that are either equivalent to or weaker than recursiveness. There are many problems of elementary number theory in which it is required to find an effectively calculable function of positive integers satisfying certain conditions, as well as a large number of problems in other fields which are known to be reducible to problems in number theory of this type. A problem of this class is the problem to find a complete set of invariants of formulas under the operation of conversion (see abstract 41.5.204). It is proved that this problem is unsolvable, in the sense that there is no complete set of effectively calculable invariants.15
Church’s letter of July 15, 1935, to Bernays explicitly refers to this abstract and mentions the paper [Church 1936] as “in the process of being typewritten”; indeed, Church continues “[...] I will mail you a copy within a week or two. All these papers will eventually be published, but it may be a year or more before they appear.” His mailing included a copy of a joint paper with Rosser, presumably their [1936], and an abstract of a joint paper with Kleene, presumably their [1935]. Of historical interest is furthermore that Kleene’s papers General recursive functions of natural numbers and λ-definability and recursiveness are characterized as “forthcoming”, i.e., they had been completed already at this time. The precise connection between recursiveness and λ-definability or, as Church puts it in his abstract, “other plausible definitions of effective calculability” had been discovered in 1935, between the writing of the letters of January 23 and July 15. From the accounts in [Kleene 1981] and [Rosser 1984] it is quite clear that Church, Kleene, and Rosser contributed to the proof of the equivalence of these notions. Notes 3, 16, and 17 in [Church 1936] add detail: consistently with the report in the letter to Bernays, the result that all general recursive functions are λ-definable was first found by Rosser and then by Kleene (for a slightly modified definition of λ-definability); the converse claim was established “independently” by Church and Kleene “at about the same time”. However, neither from Kleene’s or Rosser’s historical accounts nor from Church’s remarks is it clear, 15 [Church 1935]. In the next to last sentence “abstract 41.5.204” refers to [Church and Rosser 1935].
468
Wilfried Sieg
when the equivalence was actually established. In view of the letter to Bernays and the submission date for the abstract, March 22, 1935, the proof of the converse must have been found after January 23, 1935, but before March 22, 1935. So one can assume with good reason that this result provided to Church the additional bit of evidence for actually publishing the thesis.16 That the thesis was formulated for general recursiveness is not surprising when Rosser’s remark in his [1984] about this period is seriously taken into account: “Church, Kleene, and I each thought that general recursivity seemed to embody the idea of effective calculability, and so each wished to show it equivalent to λ-definability”. [p. 345] There was no independent motivation for λ-definability to serve as a concept to capture effective calculability, as the historical record seems to show: consider the surprise that the predecessor function is actually λ-definable and the continued work in 1933/4 by Kleene and Rosser to establish the λ-definability of more and more constructive functions. In addition, Church argued for the correctness of the thesis when completing the 1936 paper (before July 15, 1935); his argument took the form of an explication of effective calculability with a central appeal to “recursivity”. Kleene referred to Church’s analysis, when presenting his [1936b] to the American Mathematical Society on January 1, 1936, and made these introductory remarks [on p. 544]: “The notion of a recursive function, which is familiar in the special cases associated with primitive recursions, Ackermann–Peter multiple recursions, and others, has received a general formulation from Herbrand and G¨odel. The resulting notion is of especial interest, since the intuitive notion of a ‘constructive’ or ‘effectively calculable’ function of natural numbers can be identified with it very satisfactorily.” λ-definability was not even mentioned.
3. Reckonable Functions: An Explication The paper An unsolvable problem of elementary number theory was published, as Church had expected, in (early) 1936. Church restates in it his proposal for identifying the class of effectively calculable functions with a precisely defined class, so that he can give 16
This account should be compared with the more speculative one given in [Davis 1982], for example in the summary on p. 13.
Step by Recursive Step...
469
a rigorous mathematical definition of the class of number theoretic problems of the form: “Find an effectively calculable function that is the characteristic function of a number theoretic property or relation.” This and an additional crucial point are described by Church as follows: [...] The purpose of the present paper is to propose a definition of effective calculability which is thought to correspond satisfactorily to the somewhat vague intuitive notion in terms of which problems of this class are often stated, and to show, by means of an example, that not every problem of this class is solvable.17
In section 7 of his paper, Church presents arguments in support of the proposal to use general recursiveness18 as the precise notion; indeed, the arguments are to justify the identification “so far as positive justification can ever be obtained for the selection of a formal definition to correspond to an intuitive notion”.19 Two methods to characterize effective calculability of number-theoretic functions suggest themselves. The first of these methods makes use of the notion of algorithm, and the second employs the notion of calculability in a logic. Church argues that neither method leads to a definition more general than recursiveness. The two arguments have a very similar structure, and I will discuss only the one pertaining to the second method.20 Church considers a logic L, whose language contains the equality symbol =, a symbol { }( ) for the application of a unary function symbol to an argument, and numerals for the positive integers. For unary functions F he defines: F is effectively calculable if and only if there is an expression f in the logic L such that: {f }(µ) = ν is a theorem of L iff 17
[Church 1936] in [Davis 1965, p. 89 and 90]. The fact that λ-definability is an equivalent concept adds for Church “[...] to the strength of the reasons adduced below for believing that they [these precise concepts] constitute as general a characterization of this notion (i.e. effective calculability) as is consistent with the usual intuitive understanding of it”. [Church 1936, footnote 3, p. 90] in [Davis 1965]. 19 [Church 1936], in [Davis 1965, p. 100]. 20 An argument following quite closely Church’s considerations pertaining to the first method is given in [Shoenfield 1967, p. 120]. — For the second argument, Church uses the fact that G¨ odel’s class of general recursive functions is closed under the µ-operator, then still called Kleene’s p-function. That result is not needed for the first argument on account of the determinacy of algorithms. 18
470
Wilfried Sieg F (m) = n; here, µ and ν are expressions that stand for the positive integers m and n.21
Such functions F are recursive, if it is assumed that L satisfies conditions that make L’s theorem predicate recursively enumerable. To argue then for the recursive enumerability of the theorem predicate, Church formulates conditions any system of logic has to satisfy if it is “to serve at all the purposes for which a system of symbolic logic is usually intended”.22 These conditions, Church remarks in footnote 21, are “substantially” those from G¨odel’s Princeton Lectures for a formal mathematical system: (i) each rule must be an effectively calculable operation, and (ii) the set of rules and axioms (if infinite) must be effectively enumerable. Church supposes that these conditions can be interpreted to mean that, via a suitable G¨odel numbering for the expressions of the logic, (iC ) each rule must be a recursive operation, (iiC ) the set of rules and axioms (if infinite) must be recursively enumerable, and (iiiC ) the relation between a positive integer and the expression which stands for it must be recursive. The theorem predicate is thus indeed recursively enumerable, but the crucial interpretative step is not argued for at all! Church’s argument in support of the recursiveness of effectively calculable functions may appear to be viciously circular. However, our understanding of the general concept of calculability is explicated in terms of derivability in a logic, and the conditions (iC )-(iiiC ) sharpen the idea that within such a logical formalism one operates with an effective notion of immediate consequence.23 The “thesis” is appealed to in a special and narrower context, and it is precisely here that we encounter the real stumbling block for Church’s anal21 This concept is an extremely natural and fruitful one and is directly related to “Entscheidungsdefinitheit” for relations and classes introduced by G¨ odel in his 1931 paper and to representability of functions used in his 1934 Princeton Lectures. As to the former, compare Collected Works I, p. 170 and 176; as to the latter, see p. 58 in [Davis 1965]. 22 [Church 1936] in [Davis 1965, p. 101]. As to what is intended, namely for L to satisfy epistemologically motivated restrictions, see [Church 1956, section 7, in particular pp. 52–53]. 23 Compare footnote 20 on p. 101 in [Davis 1965] where Church remarks: “In any case where the relation of immediate consequence is recursive it is possible to find a set of rules of procedure, equivalent to the original ones, such that each rule is a (one-valued) recursive operation, and the complete set of rules is recursively enumerable.”
Step by Recursive Step...
471
ysis. Given the crucial role this observation plays, it is appropriate to formulate it as a normative requirement: Church’s Central Thesis. The steps of any effective procedure (governing proofs in a system of symbolic logic) must be recursive. If the central thesis is accepted, the earlier considerations indeed prove that all effectively calculable functions are recursive. Robin Gandy called this Church’s “step-by-step argument”.24 The idea that computations are carried out in a logic or simply in a deductive formalism is also the starting point of the considerations in a supplement to Hilbert and Bernays’s book Grundlagen der Mathematik II. Indeed, Bernays’s letter of December 24, 1938 begins with an apology for not having written to Church in a long time: I was again so much occupied by the working at the “Grundlagenbuch”. In particular the “Supplemente” that I found desirable to add have become much more extended than expected. By the way: one of them is on the precising [sic!] of the concept of computability. There I had the opportunity of exposing some of the reasonings of yours and Kleene on general recursive functions and the unsolvability of the Entscheidungsproblem.
Bernays refers to the book’s Supplement II, entitled “Eine Pr¨azisierung des Begriffs der berechenbaren Funktion und der Satz von Church u ¨ber das Entscheidungsproblem”. A translation of the title, not quite adequate to capture “Pr¨azisierung”, is “A precise explication of the concept of calculable function and Church’s Theorem on the decision problem”. 24
It is most natural and general to take the underlying generating procedures directly as finitary inductive definitions. That is Post’s approach via his production systems; using Church’s central thesis to fix the restricted character of the generating steps guarantees the recursive enumerability of the generated set. Cf. Kleene’s discussion of Church’s argument in [Kleene 1952, pp. 322–323]. Here it might also be good to recall remarks of C.I. Lewis on “inference” as reported in [Davis 1995, p. 273]: “The main thing to be noted about this operation is that it is not so much a piece of reasoning as a mechanical, or strictly mathematical, operation for which a rule has been given. No ‘mental’ operation is involved except that required to recognize a previous proposition followed by the main implication sign, and to set off what follows that sign as a new assertion.”
472
Wilfried Sieg
In this supplement Hilbert and Bernays make the core notion of calculability in a logic directly explicit and define a number theoretic function to be reckonable (in German, regelrecht auswertbar ) when it is computable in some deductive formalism satisfying three recursiveness conditions. The crucial condition is an analogue of Church’s Central Thesis and requires that the theorems of the formalism can be enumerated by a primitive recursive function or, equivalently, that the proof predicate is primitive recursive. Then it is shown (1) that a special, very restricted number theoretic formalism suffices to compute the reckonable functions, and (2) that the functions computable in this formalism are exactly the general recursive ones. The analysis provides, in my view, a natural and most satisfactory capping of the development from Entscheidungsdefinitheit of relations in [G¨odel 1931] to an “absolute” notion of computability for functions, because it captures directly the informal notion of rule-governed evaluation of effectively calculable functions and isolates appropriate restrictive conditions.
4. Absoluteness and Formalizability A technical result of the sort we just discussed was for G¨odel in 1935 the first hint that there might be a precise notion capturing the informal concept of effective calculability.25 G¨odel defined an absoluteness notion for the specific formal systems of his paper [1936a]. A number theoretic function φ(x) is said to be computable in S just in case for each numeral m there exists a numeral n such that φ(m) = n is provable in S. Clearly, all primitive recursively defined functions, for example, are already computable in the system S1 of classical arithmetic, where Si is number theory of order i, for i finite or transfinite. In the Postscriptum to the paper G¨odel observed: It can, moreover, be shown that a function computable in one of the systems Si , or even in a system of transfinite order, is computable already in S1 . Thus the notion ‘computable’ is 25
Cf. my [1994, p. 88 and note 52]. The latter asserts that the content of [G¨ odel 1936a] was presented in a talk in Vienna on June 19, 1935. An interesting question is, certainly, how much G¨ odel knew then about the ongoing work in Princeton reported in Church’s 1935 letters to Bernays. I could not find any evidence that G¨ odel communicated with Bernays, Church, or Kleene on these issues at that time.
Step by Recursive Step...
473
in a certain sense ‘absolute’, while almost all metamathematical notions otherwise known (for example, provable, definable, and so on) quite essentially depend upon the system adopted. [p. 399]
A broader notion of absoluteness was used in G¨odel’s contribution to the Princeton bicentennial conference, i.e., in [G¨odel 1946]. G¨odel starts out with the following remark: Tarski has stressed in his lecture (and I think justly) the great importance of the concept of general recursiveness (or Turing’s computability). It seems to me that this importance is largely due to the fact that with this concept one has for the first time succeeded in giving an absolute definition of an interesting epistemological notion, i.e., one not depending on the formalism chosen. [p. 150]
For the publication of the paper in [Davis 1965] G¨odel added a footnote to the last sentence: To be more precise: a function of integers is computable in any formal system containing arithmetic if and only if it is computable in arithmetic, where a function f is called computable in S if there is in S a computable term representing f . [p. 150]
Both in 1936 and in 1946, G¨odel took for granted the formal character of the systems and, thus, the elementary character of their inference or calculation steps. G¨ odel’s claim that “an absolute definition of an interesting epistemological notion” has been given, i.e., a definition that does not depend on the formalism chosen, is only partially correct: the definition does not depend on the details of the formalism, but depends crucially on the fact that we are dealing with a “formalism” in the first place. In that sense absoluteness has been achieved only relative to an un-explicated notion of an elementary formalism. It is in this conceptual context that Church’s letter from June 8, 1937 to the Polish logician Józef Pepis should be seen.26 Church brings out this “relativism” very clearly in an indirect way of defending his thesis; as far as I know, this broadened perspective, though clearly related to his earlier explication, has not been presented in any of Church’s writing on the subject. 26
The letter to Pepis was partially reproduced and analyzed in my [1994]; it is reprinted in Appendix A.
474
Wilfried Sieg
Pepis had described to Church his project of constructing a number theoretic function that is effectively calculable, but not general recursive. In his response Church explains, why he is “extremely skeptical”. There is a minimal condition for a function f to be effectively calculable, and “if we are not agreed on this then our ideas of effective calculability are so different as to leave no common ground for discussion”: for every positive integer a there must exist a positive integer b such that the proposition f (a) = b has a “valid proof” in mathematics. But as all of extant mathematics is formalizable in Principia Mathematica or in one of its known extensions, there actually must be a formal proof of a suitably chosen formal proposition. However, if f is not general recursive then, by the considerations of [Church 1936], for every definition of f within the language of Principia Mathematica there exists a positive integer a such that for no b the proposition f (a) = b is provable in Principia Mathematica; that holds again for all known extensions. Indeed, Church claims this holds for “any system of symbolic logic whatsoever which to my knowledge has ever been proposed”. Thus, to satisfy the above minimal condition and to respect the quasi-empirical fact that all of mathematics is formalizable, one would have to find “an utterly new principle of logic, not only never before formulated, but also never before actually used in a mathematical proof”. Moreover, and here is the indirect appeal to the recursivity of steps, the new principle “must be of so strange, and presumably complicated, a kind that its metamathematical expression as a rule of inference was not general recursive”, and one would have to scrutinize the “alleged effective applicability of the principle with considerable care”. The dispute concerning a proposed effectively calculable, non-recursive function would thus for Church center around the required new principle and its effective applicability as a rule of inference, i.e., what I called Church’s Central Thesis. If the latter is taken for granted (implicitly, for example, in G¨odel’s absoluteness considerations), then the above minimal understanding of effective calculability and the quasi-empirical fact of formalizability block the construction of such a function. This is not a completely convincing argument; Church is extremely skeptical of Pepis’s project, but mentions that “this [skeptical] attitude is of course subject to the reservation that I may be induced to change my opinion after seeing your work”.
Step by Recursive Step...
475
On April 22, 1937, Bernays wrote a letter to Church and remarked that Turing had just sent him the paper [Turing 1936]; there is a detailed discussion of some points concerned with Turing’s proof of the undecidability of the Entscheidungsproblem. As to the general impact of Turing’s paper Bernays writes: He [Turing] seems to be very talented. His concept of computability is very suggestive and his proof of the equivalence of this notion with your λ-definability gives a stronger conviction of the adequacy of these concepts for expressing the popular meaning of “effective calculability”.
Bernays does not give in this letter (or in subsequent letters to Church and to Turing) a reason, why he finds Turing’s concept “suggestive”; strangely enough, in Supplement II of Grundlagen der Mathematik II, Turing’s work is not even mentioned. It is to that work that I’ll turn now to indicate in what way it overcomes the limitations of the earlier analyses (all centering around the concept of “computability in a formal logic”).
5. Computors, Boundedness, and Locality The earlier detailed reconstruction of Church’s justification for the “selection of a formal definition to correspond to an intuitive notion” and the pinpointing of the crucial difficulty show, first of all, the sophistication of Church’s methodological attitude and, secondly, that at this point in 1935 there is no major opposition to G¨odel’s cautious attitude. These points are supported by the directness with which Church recognized in 1937, when writing a review of [Turing 1936] for the Journal of Symbolic Logic, the importance of Turing’s work as making the identification of effectiveness and Turing computability “immediately evident”. That review is quoted now in full: The author proposes as criterion that an infinite sequence of digits 0 and 1 be “computable” that it is possible to devise a computing machine, occupying a finite space and with working parts of finite size, which will write down a sequence to any desired number of terms if allowed to run for a sufficiently long time. As a matter of convenience, certain further restrictions are imposed on the character of the machine, but these are of such a nature as obviously to cause no loss of generality—in particular, a human calculator, provided with
476
Wilfried Sieg pencil and paper and explicit instructions, can be regarded as a kind of Turing machine. It is thus immediately clear that computability, so defined, can be identified with (especially, is no less general than) the notion of effectiveness as it appears in certain mathematical problems (various forms of the Entscheidungsproblem, various problems to find complete sets of invariants in topology, group theory, etc., and in general any problem which concerns the discovery of an algorithm). The principal result is that there exist sequences (welldefined on classical grounds) which are not computable. In particular the deducibility problem of the functional calculus of first order (Hilbert and Ackermann’s engere Funktionenkalk¨ ul) is unsolvable in the sense that, if the formulas of this calculus are enumerated in a straightforward manner, the sequence whose nth term is 0 or 1, according as the nth formula in the enumeration is or is not deducible, is not computable. (The proof here requires some correction in matters of detail.) In an appendix the author sketches a proof of equivalence of “computability” in his sense and “effective calculability” in the sense of the present reviewer [American Journal of Mathematics, vol. 58(1936), pp. 345–363, see review in this JOURNAL, vol. 1, pp. 73–74]. The author’s result concerning the existence of uncomputable sequences was also anticipated, in terms of effective calculability, in the cited paper. His work was, however, done independently, being nearly complete and known in substance to a number of people at the time that the paper appeared. As a matter of fact, there is involved here the equivalence of three different notions: computability by a Turing machine, general recursiveness in the sense of Herbrand–G¨odel–Kleene, and the λ-definability in the sense of Kleene and the present reviewer. Of these, the first has the advantage of making the identification with effectiveness in the ordinary (not explicitly defined) sense evident immediately—i.e. without the necessity of proving preliminary theorems. The second and third have the advantage of suitability for embodiment in a system of symbolic logic.
So, Turing’s notion is presumed to make the identification with effectiveness in the ordinary sense “evident immediately”. How this is to be understood is a little clearer from the first paragraph of the review, where it is claimed to be immediately clear “that computability, so defined, can be identified with [...] the notion of effectiveness as it appears in certain mathematical problems [...]”. This claim
Step by Recursive Step...
477
is connected to previous sentences by “thus”: the premises of this “inference” are (1) computability is defined via computing machines (that occupy a finite space and have working parts of finite size), and (2) human calculators, “provided with pencil and paper and explicit instructions”, can be regarded as Turing machines. The review of Turing’s paper is immediately followed by Church’s review of [Post 1936]; the latter is reprinted in Appendix C. Church is sharply critical of Post; this is surprising, perhaps, as Church notices the equivalence of Post’s and Turing’s notions. The reason for the criticism is methodological: Post does not “identify” his formulation of a finite 1-process with effectiveness in the ordinary sense, but rather considers as a “working hypothesis” that wider and wider formulations can be reduced to this formulation; he believes that the working hypothesis is in need of continual verification. Church objects “that effectiveness in the ordinary sense has not been given an exact definition, and hence the working hypothesis in question has not an exact meaning”. The need for a working hypothesis disappears, so Church argues, if effectiveness is defined as “computability by an arbitrary machine, subject to restrictions of finiteness”. The question here is, why does that seem “to be an adequate representation of the ordinary notion”? Referring back to the “inference” isolated in the review of Turing’s paper, we may ask, why do the two premises support the identification of Turing computability with the informal notion of effectiveness as used for example in the formulation of the decision problem? Thus we are driven to ask the more general question, what is the real character of Turing’s analysis?27 Let me emphasize that Turing’s analysis is neither concerned with machine computations nor with general human mental processes. Rather, it is human mechanical computability that is being analyzed, and the special character of this intended notion motivates the restrictive conditions that are brought to bear by Turing.28 Turing exploits in a radical way that a human computor is performing mechanical procedures on symbolic configurations: the immediate recognizability of symbolic configurations is demanded so that basic (computation) steps cannot be further subdivided. This demand and the evident limitation of the computor’s sensory apparatus lead 27
The following analysis was given in my [1994]; it is also presented in the synoptic [Sieg and Byrnes 1997]. 28 This is detailed in my [1994].
478
Wilfried Sieg
to the formulation of boundedness and locality conditions. Turing requires also a determinacy condition (D), i.e., the computor carries out deterministic computations, as his internal state together with the observed configuration fixes uniquely the next computation step. The boundedness conditions can be formulated as follows: B.1 there is a fixed bound for the number of symbolic configurations a computor can immediately recognize; B.2 there is a fixed bound for the number of a computor’s internal states that need to be taken into account.29 For a given computor there are consequently only boundedly many different combinations of symbolic configurations and internal states. Since his behavior is, according to (D), uniquely determined by such combinations and associated operations, the computor can carry out at most finitely many different operations. These operations are restricted by the following locality conditions: L.1 only elements of observed configurations can be changed; L.2 the distribution of observed squares can be changed, but each of the new observed squares must be within a bounded distance of an immediately previously observed square.30 Turing’s computor proceeds deterministically, must satisfy the boundedness conditions, and the elementary operations he can carry out must be restricted as the locality conditions require. Every number-theoretic function such a computor can calculate, Turing argues, is actually computable by a Turing machine over a twoletter alphabet. Thus, on closer inspection, Turing’s Thesis that the concept “mechanical procedure” can be identified with machine computability is seen as the result of a two part analysis. The first part yields axioms expressing boundedness conditions for symbolic configurations and locality conditions for mechanical operations on them, together with the central thesis that any mechanical procedure 29
This condition (and the reference to internal states) can actually be removed and was removed by Turing; nevertheless, it has been a focus of critical attention. 30 This is almost literally Turing’s formulation. Obviously, it takes for granted particular features of the precise model of computation, namely, to express that the computor’s attention can be shifted only to symbolic configurations that are not “too far away” from the currently observed configuration.
Step by Recursive Step...
479
can be carried out by a computor satisfying the axioms. The second part argues for the claim that every number-theoretic function calculable by such a computor is computable by a Turing machine. In Turing’s presentation these quite distinct aspects are intertwined and important steps in arguments are only hinted at.31 Indeed, the claim that is actually established in Turing’s paper is the more modest one that Turing machines operating on strings can be simulated by Turing machines operating on single letters. In the historical context in which Turing found himself, he asked exactly the right question: What are the elementary processes a computor carries out (when calculating a number)? Turing was concerned with symbolic processes, not—as the other proposed explications—with processes directly related to the evaluation of (number theoretic) functions. Indeed, the general “problematic” required an analysis of the idealized capabilities of a computor, and it is precisely this feature that makes the analysis epistemologically significant. The separation of conceptual analysis and rigorous proof is essential for clarifying on what the correctness of Turing’s central thesis rests, namely, on recognizing that the boundedness and locality conditions are true for a computor and also for the particular precise, analyzed notion.
6. Conceptual Analyses: A Brief Comparison Church’s way of approaching the problem was at first deeply affected by quasi-empirical considerations. That is true also for his attitude to the consistency problem for the systems in [Church 1932 and 1933]; his letter of July 27, 1932 to G¨odel is revealing. His review of Turing’s 1936 paper shows, however, that he moved away from that position; how far is perhaps even better indicated by the very critical review of [Post 1936]. In any event, Turing’s approach provides immediately a detailed conceptual analysis realizing, it seems, what G¨odel had suggested in conversation with Church, namely “to state a set of axioms which would embody the generally accepted properties of this notion [effective calculability, W.S.], and to do something on that basis”. The analysis leads convincingly to the conclusion that “effectively calculable” functions can be computed by Turing 31
Turing’s considerations are sharpened and generalized in [Sieg and Byrnes 1996].
480
Wilfried Sieg
machines (over a two letter alphabet). The latter mathematical notion, appropriately, serves as the starting point for Computability Theory; cf. [Soare 1996]. Turing’s analysis divides, as I argued in the last section, into conceptual analysis and rigorous proof. The conceptual analysis leads first to a careful and sharper formulation of the intended informal concept, here, “mechanical procedures carried out by a human computor”, and second to the axiomatic formulation of determinacy, boundedness, and locality conditions. Turing’s central thesis connects the informal notion and the axiomatically restricted one. Rigorous proof allows us then, third, to recognize that all the actions of an axiomatically restricted computor can be simulated by a Turing machine. Thus, the analysis together with the proof allows us to “replace” the boldly claimed thesis, all effectively calculable functions are Turing computable, by a carefully articulated argument that includes a sharpened informal notion and an axiomatically characterized one. Once such a “meta-analysis” of Turing’s ways is given, one can try and see whether there are other mathematical concepts that have been analyzed in a similar way.32 It seems to me that Dedekind’s recasting of “geometric” continuity in “arithmetic” terms provides a convincing second example; the steps I will describe now are explicitly in [Dedekind 1872]. The intended informal concept, “continuity of the geometric line”, is first sharpened by the requirement that the line must not contain “gaps”. The latter requirement is characterized, second, by the axiomatic condition that any “cut” of the line determines a geometric point. This “completeness” of the line is taken by Dedekind to be the “essence of continuity” and corresponds, as a normative demand, to Turing’s central thesis. What corresponds to the third element in Turing’s analysis, namely the rigorous proof? — Dedekind’s argument, that the continuous geometric line and the system of rational cuts are isomorphic, does: the rationals can be associated with geometric points by fixing an origin on the line and a unit; the geometric cuts can then be transferred to the arithmetic realm. (To be sure, that requires the consideration of 32
Mendelson and Soare, for example, draw in their papers parallels between Turing’s or Church’s Thesis and other mathematical “theses”. G. Kreisel has reflected on “informal rigor” generally and on its application to Church’s Thesis in particular; a good presentation of Kreisel’s views and a detailed list of his relevant papers can be found in [Odifreddi 1996].
Step by Recursive Step...
481
arbitrary partitions of the rationals satisfying the cut conditions and the proof that the system of rational cuts is indeed complete.) It is in this way that Dedekind’s Thesis, or rather Dirichlet’s demand that Dedekind tried to satisfy, is now supported: every statement of algebra and higher analysis can be viewed as a statement concerning natural numbers (and sets of such).33 Hilbert presented considerations concerning the continuum in his lectures from the Winter term 1919, entitled “Natur und mathematisches Erkennen”; he wanted to support the claim that the formation of concepts in mathematics is constantly guided by intuition and experience, so that on the whole mathematics is a non-arbitrary, unified structure.34
Having presented Dedekind’s construction and his own investigation on non-Archimedean extensions of the rationals, he formulated the general point as follows: The different existing mathematical disciplines are consequently necessary parts in the construction of a systematic development of thought; this development begins with simple, natural questions and proceeds on a path that is essentially traced out by compelling internal reasons. There is no question of arbitrariness. Mathematics is not like a game that determines the tasks by arbitrarily invented rules, but rather a conceptual system of internal necessity that can only be thus and not otherwise.35 33
Let me add to the above analogy two further remarks: (i) both concepts are highly idealized—in Dedekind’s case, he is clear about the fact that not all cuts are needed to have a model of Euclidean geometry, i.e., the constructibility of points is not a concern; for Turing, feasibility of computations is not a concern; (ii) both concepts are viewed by me as “abstract” mathematical concepts in the sense of my [1996]. 34 L.c., p. 8: “vielmehr zeigt sich, daß die Begriffsbildungen in der Mathematik best¨ andig durch Anschauung und Erfahrung geleitet werden, so daß im großen und ganzen die Mathematik ein willk¨ urfreies, geschlossenes Gebilde darstellt”. 35 L.c., p. 19: “Es bilden also die verschiedenen vorliegenden mathematischen Disziplinen notwendige Glieder im Aufbau einer systematischen Gedankenentwicklung, welche von einfachen, naturgem¨ aß sich bietenden Fragen anhebend, auf einem durch den Zwang innerer Gr¨ unde im wesentlichen vorgezeichneten Wege fortschreitet. Von Willk¨ ur ist hier keine Rede. Die Mathematik ist nicht wie ein Spiel, bei dem die Aufgaben durch willk¨ urlich erdachte Regeln bestimmt werden, sondern ein begriffliches System von innerer Notwendigkeit, das nur so und nicht anders sein kann”.
482
Wilfried Sieg
Hilbert’s remarks are fitting not only for the theory of the continuum, but also for the theory of computability.
Appendix A. Church’s letter of June 8, 1937, to Pepis was enclosed with a letter to Bernays sent on June 14, 1937. Other material, also enclosed, were the “card” from Pepis to which Church’s letter is a reply and the manuscript of Pepis’s paper, Ein Verfahren der mathematischen Logik ; Church asked Bernays to referee the paper. Church added, “Not because they are relevant to the question of acceptance of this particular paper, but because you may be interested in seeing the discussion of another point, I am sending you also a card received from Pepis and a copy of my reply. Please return Pepis’s card when you write.” In his letter of July 3, 1937, Bernays supported the publication of the paper which appeared in the 1938 edition of the Journal of Symbolic Logic; he also returned the card (which may very well be in the Church Nachlaß). Dear Mgr. [Monsignore] Pepis: This is to acknowledge receipt of your manuscript, Ein Verfahren der mathematischen Logik, offered for publication in the Journal of Symbolic Logic. In accordance with our usual procedure we are submitting this to a referee to determine the question of acceptance for publication, and I will write you further about the matter as soon as I have the referee’s report. In reply to your postal [card] I will say that I am very much interested in your results on general recursiveness, and hope that I may soon be able to see them in detail. In regard to your project to construct an example of a numerical function which is effectively calculable but not general recursive I must confess myself extremely skeptical—although this attitude is of course subject to the reservation that I may be induced to change my opinion after seeing your work. I would say at the present time, however, that I have the impression that you do not fully appreciate the consequences which would follow from the construction of an effectively calculable non-recursive function. For instance, I think I may assume that we are agreed that if a numerical function f is effectively calculable then for every positive integer a there must exist a positive integer b such that a valid proof can be given of the proposition f (a) = b (at least if we are not agreed on this then our ideas of effective
Step by Recursive Step...
483
calculability are so different as to leave no common ground for discussion). But it is proved in my paper in the American Journal of Mathematics that if the system of Principia Mathematica is omega-consistent, and if the numerical function f is not general recursive, then, whatever permissible choice is made of a formal definition of f within the system of Principia, there must exist a positive integer a such that for no positive integer b is the proposition f (a) = b provable within the system of Principia. Moreover this remains true if instead of the system of Principia we substitute any one of the extensions of Principia which have been proposed (e.g. allowing transfinite types), or any one of the forms of the Zermelo set theory, or indeed any system of symbolic logic whatsoever which to my knowledge has ever been proposed. Therefore to discover a function which was effectively calculable but not general recursive would imply discovery of an utterly new principle of logic, not only never before formulated, but never before actually used in a mathematical proof— since all extant mathematics is formalizable within the system of Principia, or at least within one of its known extensions. Moreover this new principle of logic must be of so strange, and presumably complicated, a kind that its metamathematical expression as a rule of inference was not general recursive (for this reason, if such a proposal of a new principle of logic were ever actually made, I should be inclined to scrutinize the alleged effective applicability of the principle with considerable care). Sincerely yours, Alonzo Church
B. This is the part of Bernays’s letter to Church of July 3, 1937, that deals with the latter’s reply to Pepis. Your correspondence with Mr. Pepis on his claimed discovery has much interested me. As to the consequence you draw from your result p. 357 Amer. Journ. Math., it seems to me that you have to use for it the principle of excluded middle. Without this there would remain the possibility that for the expression f it can neither be proved that to every µ standing for a positive integer m, there is a ν standing for a positive integer n such that the formula {f }(µ) = ν is deducible within the logic, nor there can be denoted a positive integer m for which it can be proved that for no positive integer n the formula {f }(µ) = ν, where µ stands for m and ν for n, is deducible within the logic.
484
Wilfried Sieg
C. Church’s review of Turing’s paper in the Journal of Symbolic Logic is followed directly by his review of [Post 1936]: The author proposes a definition of “finite 1-process” which is similar in formulation, and in fact equivalent, to computation by a Turing machine (see the preceding review). He does not, however, regard his formulation as certainly to be identified with effectiveness in the ordinary sense, but takes this identification as a “working hypothesis” in need of continual verification. To this the reviewer would object that effectiveness in the ordinary sense has not been given an exact definition, and hence the working hypothesis in question has not an exact meaning. To define effectiveness as computability by an arbitrary machine, subject to restrictions of finiteness, would seem to be an adequate representation of the ordinary notion, and if this is done the need for a working hypothesis disappears. The present paper was written independently of Turing’s, which was at the time in press but had not yet appeared.
D. On July 25, 1983 Church wrote a letter to John W. Dawson responding to the latter’s inquiry, whether he (Church) had been “among those who thought that the G¨odel incompleteness theorem might be found to depend on peculiarities of type theory”. Church’s letter is a rather touching (and informative) reflection on his work in the early thirties. Dear Dr. Dawson: In reply to your letter of June eighth, yes I was among those who thought that the G¨ odel incompleteness theorem might be found to depend on peculiarities of type theory (or, as I might later have added, of set theory) in a way that would show his results to have less universal significance than he was claiming for them. There was a historical reason for this, and that is that even before the G¨ odel results were published I was working on a project for a radically different formulation of logic which would (as I saw it at the time) escape some of the unfortunate restrictiveness of type theory. In a way I was seeking to do the very thing that G¨odel proved impossible, and of course it’s unfortunate that I was slow to recognize that the failure of G¨ odel’s first proof to apply quite exactly to the sort of formulation of logic I had in mind was of no great significance. The one thing of lasting importance that came out of my work in the thirties is the calculus of λ-conversion. And indeed this might be claimed as a system of logic to which the
Step by Recursive Step... G¨ odel incompleteness theorem does not apply. To ask in what sense this claim is sound and in what sense not is not altogether pointless, as it may give some insight into the question where the boundary lies for applicability of the incompleteness theorem. In my monograph on the calculus of λ-conversion (Annals of Mathematics Studies), in the section of the calculus of λ-δconversion (a minor variation of the λ-calculus) it is pointed out how, after identifying the positive integer 1 with the truthvalue falsehood and the positive integer 2 with the truth-value truth, it is possible to introduce by definition, first the connectives of propositional calculus, and then an existential quantifier, but the latter only in the sense that: (∃x)M reduces to truth whenever there is some positive integer such that M reduces to truth after replacing x by the standard name of that positive integer, and in the contrary case (∃x)M has no normal form. The system is complete within its power of expression. But an attempt to introduce a universal quantifier, whether by definition or by added axioms, will give rise to some form of the G¨ odel incompletenes. — I’ll not try to say more, as I am writing from recollection and haven’t the monograph itself before me. Sincerely, Alonzo Church
485
486
Wilfried Sieg
References Bernays, P. [1935–6], Logical Calculus; Notes by Prof. Bernays with assistance of Mr. F.A. Ficken, The Institute for Advanced Studies, Princeton, p. 125. Church, A. [1927], “Alternatives to Zermelo’s Assumption”, Trans. AMS 29, 178–208, Ph.D. thesis; Princeton. Church, A. [1928], “On the Law of the Excluded Middle”, Bulletin AMS 34, 75–78. Church, A. [1932], “A Set of Postulates for the Foundation of Logic”, part I, Ann. Math. 33(2), 346–66. Church, A. [1933], “A Set of Postulates for the Foundation of Logic”, part II, Ann. Math. 34(2), 839–64. Church, A. [1934], “The Richard Paradox”, Amer. Math. Monthly 41, 356–61. Church, A. [1935], “A Proof of Fredom from Contradiction”, Proc. Nat. Acad. Sci., 21, pp. 275–81, reviewed in [G¨odel 1936b]. Church, A. [1935a], “An Unsolvable Problem of Elementary Number Theory”, Bulletin AMS 41, 332–3. Church, A. [1936], “An Unsolvable Problem of Elementary Number Theory”, Amer. J. Math. 58, 345–63. Church, A. [1936a], “A Note on the Entscheidungsproblem”, Journal of Symbolic Logic 1, 40–1, and “Corrections”, ibid., 101–2. Church, A. [1937], Review of [Turing 1936], Journal of Symbolic Logic 2(1), 42–3. Church, A. [1937a], Review of [Post 1936], Journal of Symbolic Logic, 2(1), p. 43. Church, A. [1956], Introduction to Mathematical Logic, vol. I, Princeton University Press, Princeton. Church, A. and Rosser, J.B. [1935], “Some Properties of Conversion”, Bull. AMS, 41, 332. Church, A. and Rosser, J.B. [1936], “Some Properties of Conversion”, Trans. AMS 39, 472–82. Church, A. and Kleene, S.C. [1935], “Formal Definitions in the Theory of Ordinal Numbers”, Bull. AMS, 41, 627.
Step by Recursive Step...
487
Church, A. and Kleene, S.C. [1936], “Formal Definitions in the Theory of Ordinal Numbers”, Fund. Math. 28, 11–21. Davis, M. [1965], The Undecidable, Raven Press, Hewlett, New York. Davis, M. [1982], “Why G¨ odel Didn’t Have Church’s Thesis”, Information and Control 54, 3–24. Davis, M. [1995], “American Logic in the 1920s”, Bulletin of Symbolic Logic 1(3), 273–8. Dawson, J.W. [1997] Logical Dilemmas—The Life and Work of Kurt G¨ odel, A.K. Peters, Wellesley. Dedekind, R. [1872], Stetigkeit und irrationale Zahlen, Vieweg, Braunschweig. Dedekind, R. [1888], Was sind und was sollen die Zahlen?, Vieweg, Braunschweig. Enderton, H. [1995], “In Memoriam: Alonzo Church”, Bulletin of Symbolic Logic 1(4), 486–88. Feferman, S. [1988], Turing in the Land of O(z), in The Universal Turing Machine, (R. Herken ed.), Oxford University Press, Oxford, pp. 113–47. Gandy, R. [1988], The Confluence of Ideas, in The Universal Turing Machine, (R. Herken ed.), Oxford University Press, Oxford, pp. 55–111. G¨odel, K. [1933o], The Present Situation in the Foundations of Mathematics, in Collected Works III, 45–53. G¨odel, K. [1934], On Undecidable Propositions of Formal Mathematical Systems, in Collected Works I, 346–371. G¨odel, K. [1934e], Review of [Church 1933], in Collected Works I, 381–3. G¨odel, K. [1936a], On the Length of Proofs, in Collected Works I, 397–9. G¨odel, K. [1936b], Review of [Church 1935], in Collected Works I, 399–401. G¨odel, K. [1946] Remarks before the Princeton Bicentennial Conference on Problems in Mathematics, in Collected Works II, 150–3. G¨odel, K. [1986], Collected Works I, Oxford University Press, Oxford.
488
Wilfried Sieg
G¨odel, K. [1990], Collected Works II, Oxford University Press, Oxford. G¨odel, K. [1995], Collected Works III, Oxford University Press, Oxford. Hilbert, D. [1919 (1989)], Natur und mathematisches Erkennen, Lecture given in the Winter term 1919, [the notes were written by P. Bernays], reprint by the Mathematical Institute of the University G¨ ottingen, 1989. Hilbert, D. and Bernays, P. [1934], Grundlagen der Mathematik I, Springer Verlag, Berlin. Hilbert, D. and Bernays, P. [1939], Grundlagen der Mathematik II, Springer Verlag, Berlin. Kleene, S.C. [1934], “Proof by Cases in Formal Logic”, Ann. Math. 35, 529–44. Kleene, S.C. [1935], “A Theory of Positive Integers in Formal Logic”, Amer. J. Math. 57, 153–73, 219–44, the story behind this paper is described in [Kleene 1981, pp. 57–58]; it was first submitted on October 9, 1933; its revised version was re-submitted on June 13, 1934. Kleene, S.C. [1935a], “General Recursive Functions of Natural Numbers”, Bull. AMS, 41, p. 489. Kleene, S.C. [1935b], “λ-Definability and Recursiveness” Bull. AMS, 41, p. 490. Kleene, S.C. [1936], “General Recursive Functions of Natural Numbers”, Math. Ann. 112, 727–42. Kleene, S.C. [1936a], “λ-Definability and Recursiveness”, Duke Math. J. 2, 340–53 ([1935a] and [b] are abstracts of these two papers and were received by the AMS on July 1, 1935). Kleene, S.C. [1936b] “A Note on Recursive Functions”, Bull. AMS 42, 544–6. Kleene, S.C. [1952], Introduction to Metamathematics, Wolters-Noordhoff Publishing, Groningen. Kleene, S.C. [1981], “Origins of Recursive Function Theory”, Annals Hist. Computing 3(1), 52–66. Kleene, S.C. and Rosser, J.B. [1935], “The Inconsistency of Certain Formal Logics”, Bull. AMS, 41, 24 (this abstract was received by the AMS on November 16, 1934).
Step by Recursive Step...
489
Kleene, S.C. and Rosser, J.B. [1935], “The Inconsistency of Certain Formal Logics”, Ann. Math. 36(2), 630–6. Mendelson, E. [1990], “Second Thoughts about Church’s Thesis and Mathematical Proofs”, The Journal of Philosophy 87(5), 225–33. Mundici, D. and Sieg, W. [1995], “Paper Machines”, Philosophia Mathematica 3, 5–30. Odifreddi, P.G. [1996], Kreisel’s Church, in Kreiseliana, (P.G.Odifreddi ed.), A.K. Peters, Wellesly, pp. 389–415. Pepis, J. [1938], “Ein Verfahren der mathematischen Logik”, Journal of Symbolic Logic 3, 61–76 (reviews of some of his papers appeared in the JSL 2, p. 84; 3, pp. 160–1; 4, p. 93). Post, E. [1936], “Finite Combinatory Processes. Formulation I”, Journal of Symbolic Logic 1, 103–5. Rosser, J.B. [1935], “A Mathematical Logic without Variables”, Ann. Math. 36(2), 127–50 and Duke Math. J. 1, 328–55. Rosser, J.B. [1984], “Highlights of the History of the lambda-Calculus”, Annals Hist. Computing 6(4), 337–49. Shoenfield, J.R. [1967] Mathematical Logic, Addison-Wesley, Reading [Massachusetts]. Sieg, W. [1994] Mechanical Procedures and Mathematical Experience, in Mathematics and Mind, (A. George ed.), Oxford University Press, Oxford, pp. 71–117. Sieg, W. [1996], Aspects of Mathematical Experience, in Philosophy of Mathematics Today, (E. Agazzi and G. Darvas eds.), Kluwer, pp. 195–217. Sieg, W. and Byrnes, J. [1996], K-Graph Machines: Generalizing Turing’s Machines and Arguments, in G¨ odel ’96, (P. H´ ajek ed.), Lecture Notes in Logic 6, Springer Verlag, pp. 98–119. Sieg, W. and Byrnes, J. [1997], “G¨odel, Turing, and K-Graph Machines”, to appear in Logic in Florence. Soare, R. [1996], “Computability and Recursion”, Bulletin of Symbolic Logic 2(3), 284–321. Turing, A. [1936], “On Comptable Numbers, with an Application to the Entscheidungsproblem”, Proc. London Math. Soc., Series 2, 42, pp. 230–265.
490
Wilfried Sieg
Postscriptum This essay was published in the Bulletin of Symbolic Logic, vol. 3(2), June 1997 (pp. 154–180). It is reprinted with the permission of the Association for Symbolic Logic. The historical discussion and analysis presented here are informative as to the real difficulties Church and other pioneers faced, when fixing the broad conceptual framework for the conclusive formulation of undecidability and incompleteness results. The complexities of interactions and influences are clear, but could now be deepened, I assume, by examining in particular the Nachlaß of both Church and Post; they have become accessible. As to the evolution of G¨odel’s thinking, I refer to my essay “G¨ odel on Computability”, which will appear soon in Philosophia Mathematica. However, I have modified my view on the structure of Turing’s argument that is analyzed towards the end of section 5 (and discussed further in section 6). The modification concerns (i) the “axiomatic” formulation of determinacy, boundedness, and locality conditions, and (ii) the place of Turing’s central thesis. Turing’s argument does lead to the boundedness and locality constraints, but in the form they are given here they are not mathematically rigorous. The central thesis connects this informal notion of computability for computors to the sharp concept of computability of a string machine. This restructuring led ultimately to a mathematically rigorous formulation of axioms for Turing computors and Gandy machines in my “Calculations by Man and Machine: Conceptual Analysis” (Lecture Notes in Logic 15, 2002, pp. 390–409). The need for a central thesis disappears, as its claim is replaced by the recognition that the axioms are correct for the intended notions of human mechanical and discrete machine computability.
Karl Svozil∗
Physics and Metaphysics Look at Computation As far as algorithmic thinking is bound by symbolic paper-andpencil operations, the Church–Turing thesis appears to hold. But is physics, and even more so, is the human mind, bound by symbolic paper-and-pencil operations? What about the powers of the continuum, the quantum, and what about human intuition, human thought? These questions still remain unanswered. With the strong Artificial Intelligence assumption, human consciousness is just a function of the organs (maybe in a very wide sense and not only restricted to neuronal brain activity), and thus the question is relegated to physics. In dualistic models of the mind, human thought transcends symbolic paper-and-pencil operations.
Computation is Physical It is not unreasonable to require from a “useful” theory of computation that any capacity and feature of physical systems (interpretable as “computing machines”) should be reflected therein and vice versa. In this way, the physical realization confers power to the formal method. Conversely, the formalism might “reveal” some “laws” or structure in the physical processes. With the Church–Turing thesis, physics also acquires a definite, formalized concept of “physical determinism” as well as “undecidability,” which is lacking in pre-Church– Turing times. Indeed, the Church–Turing thesis can be perceived as ∗
K. Svozil, Institut f¨ ur Theoretische Physik, University of Technology Vienna, Wiedner Hauptstraß e 8–10/136, A–1040 Vienna, Austria, , .
492
Karl Svozil
part of physics proper, and its assumption be interpreted as indication that the Universe amounts to a huge computational process; a suspicion already pursued by the Pythagoreans. Such perception does not fix the lapse of evolution in a Laplacian-type monotony entirely, but still allows for dualism and “miracles” through the influx of information from interfaces, as will be discussed below. The recognition of the physical aspect of the Church–Turing thesis—the postulated equivalence between the informal notion of “algorithm,” and recursive function theory as its formalized counterpart—is not new.1 In particular Landauer has pointed out on many occasions that computers are physical systems, that computations are physical processes and therefore are subject to the laws of physics.2 As Deutsch puts it, The reason why we find it possible to construct, say, electronic calculators, and indeed why we can perform mental arithmetic, cannot be found in mathematics or logic. The reason is that the laws of physics ‘happen to’ permit the existence of physical models for the operations of arithmetic such as addition, subtraction and multiplication. If they did not, these familiar operations would be noncomputable functions. We might still know of them and invoke them in mathematical proofs (which would presumably be called ‘nonconstructive’) but we could not perform them. [Deutsch 1985, p. 101]
One may indeed perceive a strong interrelationship between the way we do mathematics, formal logic, the computer sciences and physics. All these sciences have been developed and constructed by us in the context of our (everyday) experiences. The Computer Sciences are well aware of this connection. See, for instance, Odifreddi’s review [Odifreddi], the articles by [Rosen] and [Kreisel 1974], or Davis’ book [1958, p. 11], where the following question is asked: [...] how can we ever exclude the possibility of our presented, some day (perhaps by some extraterrestrial visitors), with a (perhaps extremely complex) device or “oracle” that “computes” a noncomputable function? 1
[Feld and Szilard], [Brillouin 1962; 1964], [Leff and Rex], [Rogers], [Odifreddi], [Pitowsky], [Galindo and Martin–Delgado]. 2 [Landauer 1961; 1967; 1982; 1987; 1988; 1989; 1991; 1994].
Physics and Metaphysics Look at Computation
493
In what follows, we shall briefly review some aspects of the interrelationship between physics and computation. We believe that it is the nature of the subject itself which prevents a definite answer to many questions, in particular to a “canonical” model of computation which might remain intact as physics and the sciences evolve. So, we perceive this review as a snapshot about the present status of our thinking on feasible computation.
Paper-and-Pencil Operations in Physics After Alonzo Church [1930; 1936] conceptualized an equivalent notion of “effective computability” with an “Entscheidungsproblem” (decision problem) in mind which was quite similar to the questions G¨odel pursued in [1931], Alan Turing [1936] enshrined that part of mathematics, which can be “constructed” by paper and pencil operations, into a Turing machine which possesses a potentially unbounded one-dimensional tape divided into cells, some finite memory and some read-write head which transfers back and forth information from the tape to this memory. A table of transition rules figuring as the “program” steers the machine deterministically. The behaviour of a Turing machine may also be determined by its initial state. Furthermore, a universal Turing machine is capable of simulting all other Turing machines (including itself). According to Turing’s definition stated in [1936], a number is computable if its decimal can be written down by a machine. In view of the “algorithm” created by Chaitin [1987], [Calude 2002] to “compute” the halting probability and encodable by almost every conceivable programming language such as C or Algol, one should add the proviso that any such Turing computable number should have a computable radius of convergence. It turned out that Turing’s notion of computability, in particular universal computability, is quite robust in the sense that it is equivalent to the recursive functions ([Rogers], [Odifreddi]), abacus machines, or the usual modern digital computer (given “enough” memory) based on the von Neumann architecture, on which for instance this manuscript has been written and processed. It is hardly questionable that Turing’s model can be embedded in physical space-time; at least in principle. A discretization of physical space, accompanied by deterministic evolution rules, presents no conceptual challenge for a physical realization. After all, Turing’s conceptualization started from the intuitive symbolic handling of
494
Karl Svozil
the mathematical entities that every pupil is drilled to obey. Even grown-up individuals arguably lack an understanding of those rules imposed upon them and thus lack the semantics; but this ignorance does not stop them from applying the syntax correctly, just as a Turing machine does. There are two problems and two features of any concrete technical realization of Turing machines. (P1) On all levels of physical realization, errors occur due to malfunctioning of the apparatus. This is unavoidable. As a result, all our realistic models of computation must be essentially probabilistic. (P2) From an operational perspective [Bridgman 1934; 1952], all physical resources are strictly finite and cannot be unbounded; even not potentially unbounded [Gandy 1980; 1982]. (F1) It comes as no surprise that any embedding of a universal Turing machine, and even more so less powerful finitistic computational concepts, into a physical system results in physical undecidability. In case of computational universality, this is due to a reduction to the recursive unsolvability of the halting problem. Ever after G¨ odel’s and Tarski’s destruction of the finitistic program of Hilbert, Kronecker and others to find a finite set of axioms from which to derive all mathematical truth, there have been attempts to translate these results into some relevant physical form.3 (F2) The recursive undecidability of the rule inference problem [Gold 1967] states that for any mechanistic agent there exists a total recursive function such that the agent cannot infer this function. In more physical terms, there is no systematic way of finding a deterministic law from the input-output analysis of a (universal) mechanistic physical system. The undecidabilities resulting from (F1)&(F2) should not be confused with another mode of undecidability. Complementarity is a quantum mechanical feature also occurring in finite automata the3
E.g., see [Popper 1950; 1950a], [Moore], [Svozil 1993], [Casti and Traub], [Casti and Karlquist].
Physics and Metaphysics Look at Computation
495
ory4 and generalized urn models [Wright 1978; 1990], two models having a common logical; i.e., propositional structure [Svozil 2005].
Cantor’s Paradises and Classical Physics It is reasonable to require from a “useful” theory of computation that any capacity and feature of physical systems (interpretable as “computing machines”) should be reflected therein and vice versa. If one assumes a correspondence between (physical) theory and physical systems [Svozil 1995; 1998], how does the continuum and its associated pandemonium of effects (such as the Banach–Tarski paradox5 ) fit into this picture?
Computational Correspondence between Formal and Physical Entities According to the standard physics textbooks, physical theory requires “much” richer structures than are provided by universal Turing computability. Physical theories such as (pre-quantum) mechanics [Goldstein] and electrodynamics [Jackson] in various ways assume the continuum, for example configuration space-time, phase space, field observables and the like. Even quantum mechanics is a theory based upon continuous space and time as well as on a continuous wave function, a fact which stimulated Einstein to remark (at the end of [1956]) that maybe we should develop quantum theory radically further into a purely discrete formalism. Note that, with probability one, any element of the continuum is neither Turing computable, nor algorithmically compressible; and thus random ([Chaitin 1987], [Calude 2002]). Thus, assuming that initial values of physical systems are arbitrary elements “drawn” from some “continuum urn” amounts to assuming that in almost all cases they cannot be represented by any constructive, computable method. Worse yet, one has to assume the physical system has a capacity associated with the axiom of choice in order to even make sure that such a draw is possible. Because how could one draw; i.e., 4 [Moore], [Conway], [Svozil 1993], [Schaller and Svozil], [Dvureˇcenskij et al.], [Calude et al.], [Svozil 1998]. 5 [Wagon 1986], [Pitowsky 1982], [Svozil and Neufeld]; see also [Siegelmann 1995].
496
Karl Svozil
select, an initial value, whose representation cannot be represented in any conceivable algorithmic way? These issues have become important for the conceptual foundation of chaos theory. In the “deterministic chaos” scenario the deterministic equation of motion appears to “reveal” the randomness; i.e., the algorithmically incompressible information, of the initial value.6 Another issue is the question of the preservation of computability in classical analysis, the physical relevance of Specker’s theorems,7 as well as the more recent constructions by [Pour–El and Richards].8
Infinity Machines For the sake of exposing the problems associated with continuum physics explicitly, an oracle will be introduced whose capacity exceeds and outperforms any universal Turing machine. Already Hermann Weyl raised the question whether it is kinematically feasible for a machine to carry out an infinite sequence of operations in finite time.9 Weyl writes, Yet, if the segment of length 1 really consists of infinitely many sub-segments of length 1/2, 1/4, 1/8, . . ., as of ‘chopped-off’ wholes, then it is incompatible with the character of the infinite as the ‘incompletable’ that Achilles should have been able to traverse them all. If one admits this possibility, then there is no reason why a machine should not be capable of completing an infinite sequence of distinct acts of decision within a finite amount of time; say, by supplying the first result after 1/2 minute, the second after another 1/4 minute, the third 1/8 minute later than the second, etc. In this way it would be possible, provided the receptive power of the brain would function similarly, to achieve a traversal of all natural numbers and thereby a sure yes-or-no decision regarding any existential question about natural numbers! [Weyl, p. 42] 6
[Shaw], [Schuster], [Pitowsky 1996]. [Specker 1990], [Wang], [Kreisel 1974]. 8 Cf. objections raised by [Bridges] and [Penrose 1990]; see also [Calude et al.]. 9 See also [Gr¨ unbaum, p. 630], [Thomson], [Benacerraf], [Rucker], [Pitowsky], [Earman and Norton] and [Hogarth 1993; 1994], as well as [Beth, p. 492] and [López-Escobar 1991], and the author [Svozil 1993, pp. 24–27] for related discussions. 7
Physics and Metaphysics Look at Computation
497
The oracle’s design is based upon a universal computer with “squeezed” cycle times of computation according to a geometric progression. The only difference between universal computation and this type of oracle computation is the speed of execution. In order to achieve the limit, two time scales are introduced: the intrinsic time t of the process of computation, which approaches infinity in finite extrinsic or proper time τ of some outside observer. The time scales τ and t are related as follows. • The proper time τ measures the physical system time by clocks in a way similar to the usual operationalizations; whereas • a discrete cycle time t = 0, 1, 2, 3, . . . characterizes a sort of “intrinsic” time scale for a process running on an otherwise universal machine. • For some unspecified reason we assume that this machine would allow us to “squeeze” its intrinsic time t with respect to the proper time τ by a geometric progression. Hence, for k < 1, let any time cycle of t, if measured in terms of τ , be squeezed by a factor of k with respect to the foregoing time cycle i.e., τ0 = 0, τ1 = k, τt+1 − τt = k(τt − τt−1 ), t X k(k t − 1) τt = kn − 1 = . k−1 n=0
(3) (4)
Thus, in the limit of infinite cycle time t → ∞, the proper time τ∞ = k/(1 − k) remains finite. Note that for the oracle model introduced here merely dense spacetime would be required. As a consequence, certain tasks which lie beyond the domain of recursive function theory become computable and even tractable. For example, the halting problem and any problem codable into a halting problem would become solvable. It would also be possible to produce an otherwise uncomputable and random output—equivalent to the tossing of a fair coin—such as Chaitin’s halting probability ([Chaitin 1987], [Calude 2002]) in finite proper time. There is no commonly accepted physical principle which would forbid such an oracle a priori. One might argue that any such oracle would require a geometric energy increase resulting in an infinite
498
Karl Svozil
consumption of energy. Yet, no currently accepted physical principle excludes us from assuming that every geometric decrease in cycle time could be associated with a geometricaly decreasing progression in energy consumption, at least up to some limiting (e.g., Planck) scale. Quantum Oracles
In the light of the quanta, the Church–Turing thesis, and in particular quantum recursion theory, might have to be extended. We first present an algorithmic form of a modified diagonalization procedure in quantum mechanics due to the existence of fixed points of quantum information.10 Then we shortly discuss quantum computation and mention recent proposals extending the capacity of quantum computation beyond the Church–Turing barrier.
Diagonalization Method in Quantum Recursion Theory Quantum bits can be physically represented by a coherent superposition of the two classical bit states denoted by t and f . The quantum bit states xα,β = αt + βf (5) form a continuum, with |α|2 + |β|2 = 1, α, β ∈ C. For the sake of contradiction, consider a universal computer C and an arbitrary algorithm B(X) whose input is a string of symbols X. Assume that there exists a “halting algorithm” HALT which is able to decide whether B terminates on X or not. The domain of HALT is the set of legal programs. The range of HALT are classical bits (classical case) and quantum bits (quantum mechanical case). Using HALT(B(X)) we shall construct another deterministic computing agent A, which has as input any effective program B and which proceeds as follows: Upon reading the program B as input, A makes a copy of it. This can be readily achieved, since the program B is presented to A in some encoded form pBq, i.e., as a string of symbols. In the next step, the agent uses the code pBq as input string for B itself; i.e., A forms B(pBq), henceforth denoted by B(B). The agent now hands B(B) over to its subroutine HALT. Then, A proceeds as follows: if HALT(B(B)) decides that B(B) halts, then the agent A 10
[Deutsch 1991], [Svozil 1995], [Ord and Kieu].
Physics and Metaphysics Look at Computation
499
does not halt; this can for instance be realized by an infinite DO-loop; if HALT(B(B)) decides that B(B) does not halt, then A halts. The agent A will now be confronted with the following paradoxical task: take the own code as input and proceed. Classical Case
Assume that A is restricted to classical bits of information. To be more specific, assume that HALT outputs the code of a classical bit as follows (↑ and ↓ stands for divergence and convergence, respectively): ½ 0 if B(X) ↑ . (6) HALT(B(X)) = 1 if B(X) ↓ Then, whenever A(A) halts, HALT(A(A)) outputs 1 and forces A(A) not to halt. Conversely, whenever A(A) does not halt, then HALT(A(A)) outputs 0 and steers A(A) into the halting mode. In both cases one arrives at a complete contradiction. Classically, this contradiction can only be consistently avoided by assuming the nonexistence of A and, since the only nontrivial feature of A is the use of the peculiar halting algorithm HALT, the impossibility of any such halting algorithm. Quantum Mechanical Case
As has been argued above, in quantum information theory a quantum bit may be in a coherent superposition of the two classical states t and f . Due to this possibility of a coherent superposition of classical bit states, the usual reductio ad absurdum argument breaks down. Instead, diagonalization procedures in quantum information theory yield quantum bit solutions which are fixed points of the associated unitary operators. In what follows it will be demonstrated how the task of the agent A can be performed consistently if A is allowed to process quantum information. To be more specific, assume that the output of the hypothetical “halting algorithm” is a quantum bit HALT(B(X)) = xα,β .
(7)
We may think of HALT(B(X)) as a universal computer C 0 simulating C and containing a dedicated halting bit, which it the output of C 0
500
Karl Svozil
at every (discrete) time cycle. Initially (at time zero), this halting bit is prepared to be a 50 : 50 mixture of the classical halting and non-halting states t and f ; i.e., x1/√2,1/√2 . If later C 0 finds that C converges (diverges) on B(X), then the halting bit of C 0 is set to the classical value t (f ). The emergence of fixed points can be demonstrated by a simple example. Agent A’s diagonalization task can be formalized as follows. Consider for the moment the action of diagonalization on the classical bit states. (Since the quantum bit states are merely a coherent superposition thereof, the action of diagonalization on quantum bits is straightforward.) Diagonalization effectively transforms the classical bit value t into f and vice versa. Recall that in equation (6), the state t has been identified with the halting state and the state f with the non-halting state. Since the halting state and the non-halting state exclude each other, f, t can be identified with orthonormal basis vectors in a twodimensional vector space. Thus, the standard basis of Cartesian coordinates can be chosen for a representation of t and f ; i.e., µ ¶ µ ¶ 0 1 . (8) and f ≡ t≡ 1 0 The evolution representing diagonalization (effectively, agent A’s task) can be expressed by the unitary operator D by Dt = f and Df = t.
(9)
Thus, D acts essentially as a not-gate. In the above state basis, D can be represented as follows: ¶ µ 0 1 . (10) D 1 0 D will be called diagonalization operator, despite the fact that the only nonvanishing components are off-diagonal. As has been pointed out earlier, quantum information theory allows a coherent superposition xα,β = αt + βf of the classical bit states t and f . D acts on classical bits. It has a fixed point at the classical bit state µ ¶ 1 t+f 1 ∗ x := x √1 , √1 = √ ≡ √ . (11) 2 2 2 2 1
Physics and Metaphysics Look at Computation
501
x∗ does not give rise to inconsistencies [Deutsch 1991; Svozil 1995]. If agent A hands over the fixed point state x∗ to the diagonalization operator D, the same state x∗ is recovered. Stated differently, as long as the output of the “halting algorithm” to input A(A) is x∗ , diagonalization does not change it. Hence, even if the (classically) “paradoxical” construction of diagonalization is maintained, quantum theory does not give rise to a paradox, because the quantum range of solutions is larger than the classical one. Therefore, standard proofs of the recursive unsolvability of the halting problem do not apply if agent A is allowed a quantum bit. The consequences for quantum recursion theory are discussed below. It should be noted, however, that the fixed point quantum bit “solution” to the above halting problem is of not much practical help. In particular, if one is interested in the “classical” answer whether or not A(A) halts, then one ultimately has to perform an irreversible measurement on the fixed point state. This causes a state reduction into the classical states corresponding to t and f . Any single measurement will yield an indeterministic result. There is a 50 : 50 chance that the fixed point state will be either in t or f , since Pt (x∗ )Pf (x∗ ) = 21 . Thereby, classical undecidability is recovered. Another, less abstract, application for quantum information theory is the handling of inconsistent information in databases. Thereby, two contradicting classical bits√of information t and f are resolved by the quantum bit x∗ (t + f )/ 2. Throughout the rest of the computation the coherence is maintained. After the processing, the result is obtained by an irreversible measurement. The processing of quantum bits, however, would require an exponential space overhead on classical computers in classical bit base [Feynman]. Thus, in order to remain tractable, the corresponding quantum bits should be implemented on truly quantum universal computers. As far as problem solving is concerned, classical bits are not much of an advance. If a classical information is required, then quantum bits are not better than probabilistic knowledge. With regards to the question of whether or not a computer halts, for instance, the “solution” is equivalent to the throwing of a fair coin. Therefore, the advance of quantum recursion theory over classical recursion theory is not so much classical problem solving but
502
Karl Svozil
the consistent representation of statements which would give rise to classical paradoxes. The above argument used the continuity of classical bit states as compared to the two classical bit states for a construction of fixed points of the diagonalization operator. One could proceed a step further and allow nonclassical diagonalization procedures. Thereby, one could allow the entire range of twodimensional unitary transformations [Murnaghan] ¶ µ iα e cos ω −e−i ϕ sin ω −i β , (12) U2 (ω, α, β, ϕ) = e ei ϕ sin ω e−i α cos ω
where −π ≤ β, ω ≤ π, − π2 ≤ α, ϕ ≤ π2 , to act on the quantum bit. A typical example of a nonclassical operation on a quantum bit is √ √ the “square root of not” gate ( not not = D) ¶ µ √ 1 1+i 1−i . (13) not 2 1−i 1+i
Not all these unitary transformations have eigenvectors associated with eigenvalues 1 and thus fixed points. Indeed, it is not difficult to see that only unitary transformations of the form [U2Ã(ω, α, β, ϕ)]−1 diag(1, eiλ )U2 (ω, α, β, ϕ) = ! −1+ei λ −i (α+ϕ) cos ω 2 + ei λ sin ω 2 e sin(2 ω) 2 −1+ei λ i (α+ϕ) e sin(2 ω) ei λ cos ω 2 + sin ω 2 2
(14)
have fixed points. Applying nonclassical operations on quantum bits with no fixed points −1 iµ iλ [U α, β, ϕ) = Ã 2 (ω, α, β, ϕ)] diag(e , e )U2 (ω, ! ¡ iλ ¢ 2 2 e−i (α+p) i µ sin(2 ω) i µ i λ e − e e cos(ω) + e sin(ω) 2 ¡ iλ ¢ ei (α+p) i µ sin(2 ω) i λ cos(ω)2 + ei µ sin(ω)2 e − e e 2 (15) with µ, λ 6= nπ, n ∈ N0 gives rise to eigenvectors which are not fixed points, but which acquire nonvanishing phases µ, λ in the generalized diagonalization process.
Quantum Computation
First attempts to quantize Turing machines [Deutsch 1985] failed to identify any possibilities to go beyond Turing computability. Recently, two independent proposals by [Calude and Pavlov], [Adamyan
Physics and Metaphysics Look at Computation
503
et al.], as well as by [Kieu 2003; 2003a]. Both proposals are not just mere quantized extensions of Turing machines, but attempt to utilize very specific features and capacities of quantum systems. The question as to what might be considered the “essence” of quantum computation, and its possible advantages over classical computation, has been the topic of numerous considerations, both from a physical11 as well as from a computer science12 perspective. One advantage of quantum algorithms over classical computation is the possibility to spread out, process, analyse and extract information in multipartite configurations in coherent superpositions of classical states. This can be discussed in terms of quantum state identification problems based on a proper partitioning of mutually orthogonal sets of states [Svozil 2005]. The question arises whether or not it is possible to encode equibalanced decision problems into quantum systems, so that a single invocation of a filter used for state discrimination suffices to obtain the result. Certain kinds of propositions about quantum computers exist which do not correspond to any classical statement. In quantum mechanics information can be coded in entangled multipartite systems in such a way that information about the single quanta is not useful for (and even makes impossible) a decryption of the quantum computation. Alas, not all decision problems have a proper encoding into some quantum mechanical system such that their resources (computation time, memory usage) is bound by some criterion such as polynomiality or even finiteness. One “hard” problem is the parity of a binary function of k > 1 binary arguments:13 It is only possible to go from 2k classical queries down to 2k /2 quantum queries, thereby gaining a factor of 2. Another example is a type of halting problem: Alice presents Bob a black box with input and output interfaces. Bob’s task is to find out whether an arbitrary function of k bits encoded in the black box will ever output “0”. As this configuration could essentially get as worse as a busy beaver problem ([Rado], [Chaitin]), the time it 11
E.g., [Ekert and Jozsa], [Preskill 1998; LN], [Nielsen and Chuang], [Galindo and Martin–Delgado], [Mermin 2002–4],[Eisert]. 12 E.g., [Gruska], [Bennett et al. 1997], [Ozhigov], [Beals et al.], [Cleve 1999], [Fortnow 2003]. 13 [Farhi et al.], [Beals et al.], [Miao 2001], [Orus et al.], [Stadelhofer et al.].
504
Karl Svozil
takes for Alice’s box to ever output a “0” may grow faster than any recursive function of k. Functional recursion and iterations may represent an additional burden on efficiency. Recursions may require a space overhead to keep track of the computational path, in particular if the recursion depth cannot be coded efficiently. From this point of view, quantum implementations of the Ackermann or the Busy Beaver functions, to give just two examples, may even be less efficient than classical implementations, where an effective waste management can get rid of many bits; in particular in the presence of a computable radius of convergence.
Dualistic Transcendence It is an entirely different and open question whether or not the human or animal mind can “outperform” any Turing machine. Almost everybody, including eminent researchers, has an opinion on this matter, but not very much empirical evidence has been accumulated. For example, Kurt G¨ odel believed in the capacity of the human mind to comprehend mathematical truth beyond provability ([Kreisel 1980], [Casti and Depauli–Schimanovich]). Why should the mind outpace Church–Turing computability? The question is strongly related to the eternal issue of dualism and the relation of body and soul (if any), of the mind and its brain, and of Artificial Intelligence. Instead of giving a detailed review of the related spiritual, religious and philosophical [Descartes] discussions, we refer to a recent theory based on neurophysiologic processes by Sir John Eccles ([Popper and Eccles], [Eccles 1990]). Even more speculitatively, Jack Sarfatti allegedly (in vain) built an “Eccles Telegraph” in the form of an electric typewriter directed by a stochastic physical process which might be believed to allow communication with spiritual entities. It may not be considered totally unreasonable to base a theory of miracles ([Frank], [Jung]) on the spontaneous occurrence of stochastic processes [Greenberger and Svozil] which individually may be interpreted to be “meaningful”, although their occurrence is statistically insignificant. Dualism has acquired a new model metaphor in virtual realities [Svozil 1995] and the associated artistic expressions which have come
Physics and Metaphysics Look at Computation
505
with it.14 We might even go as far as stating that we are the “dead ´ bout de souffle”], or incarcerated in a Cartesian on vacation” [“A prison (cf. Descartes’ Meditation I, 9 of [Descartes]). Some time ago, I had a dream. I was in an old, possibly medieval, castle. I walked through it. At times I had the feeling that there was something “out there”, something so inconceivable hidden that it was impossible to recognize. Then suddenly I realized that there was something “inside the walls:” another, dual, castle, quite as spacious as the one I was walking in, formed by the inner side of what one would otherwise consider masonry. There was a small opening, and I glanced through it; the inside looked like a three-dimensional maze inhabited by dwarfs. The opening closed again. Computers are exactly such openings; doors of perception to hidden universes. In a computer-generated virtual environment the “physical” laws are deterministic and computable in the Church– Turing sense; and yet this universe may not entirely be determined by the initial values and the deterministic laws alone. Dualism manifests itself in the two “reality layers” of the virtual reality and the Beyond, as well as in the interface between them. Through the interface, there can occur a steady flow of information back and forth from and to the Beyond which is transcendental with respect to the operational means available within the virtual reality. Proofs of the recursive unsolvability of the halting problem or of the rule inference problem, for example, break down due to the nonapplicability of selfreferential diagonal arguments in the transcendental Beyond. This makes necessary a distinction between an extrinsic and an intrinsic representation of the system [Svozil 1994].
Verifiability Let us, in this final section, take up the thought expressed by Martin Davis in the first section; and let us assume for a moment that some extraterrestrial visitors present us a device or “oracle” which is purportedly capable to “compute” a non Church–Turing computable function. In what follows we shall argue that we can do very little to verify such hilarious claims. Indeed, this verification problem can be reduced to the induction problem, which remains unsolved. 14
See, e.g., [Galouye], [“Total Recall”], [Egan], [“The Matrix”].
506
Karl Svozil
Oracles in a Black Box
However polished and suspicious the device looks, for verification purposes one may put it into a black box, whose only interfaces are symbolic input and output devices, such as a keyboard and a digital display or printer. The only important aspect of the black box is its input-output behaviour. One (unrealistic) realization is a black box with an infinity machine stuffed into it. The input and output ports of the infinity machine are directly connected to the input and output interfaces of the black box. The question we would like to clarify is this: how could observers by finite means know that the black box represents an oracle doing something useful for us; in particular computing a non Church– Turing computable function? Induction Problem Unsolved
The question of verifiability of oracle computation can be related to the question of how to differentiate a particular algorithm or more general input-output behaviour from others. In a very broad sense, this is the induction problem plaguing inductive science from its very start. Induction is “bottom-up”. It attempts to reconstruct certain postulated features from events or the input-output performance of black boxes. The induction problem, in particular algorithmic ways and methods to derive certain outcomes or events from other (causally “previous”) events or outcomes via some kind of “narratives” such as physical theories, still remains unsolved. Indeed, in view of powerful formal incompleteness theorems, such as the halting problem, the busy beaver function, or the recursive unsolvability of the rule inference problem, the induction problem is provable recursively unsolvable for physical systems which can be reduced to, or at least contain, universal Turing machines. The physical universe as we know it, appears to be of that kind (cf. [Svozil 1993; 1996]). Deduction is of not much help with the oracle identification problem either. It is “top-down” and postulates certain entities such as physical theories. Those theories may just have been provided by another oracle, they may be guesswork or just random pieces of data crap in a computer memory. Deduction then derives empirical
Physics and Metaphysics Look at Computation
507
consequences from those theories. But how could one possibly derive a non computable result if the only verifiable oracles are merely Church–Turing computable? The Conjecture on Unverifiability beyond NP-Completeness
It is not totally unreasonable to speculate that NP-completeness serves as a kind of boundary, a demarcation line between operationally verifiable oracles and nonverifiable ones. For it makes no sense to consider propositions which cannot even be tractably verified.
Outlook Presently the question of a proper formalization of the informal notion of “algorithm” seems to remain wide open. With regards to discrete finite paper-and-pencil operations, Church–Turing computability seems to be appropriate. But if one takes into account physics, in particular continuum mechanics and quantum physics, the issues become less certain. And if one is willing to include the full capacities of the human mind with all its intuition and thoughtfulness, any formalization appears highly speculative and inappropriate; at least for the time being, but maybe forever.
References ´ bout de souffle” (aka Breathless) [1960], movie, director: “A Jean–Luc Godard, French, 87 min. Adamyan, V.A., Calude, C.S., and Pavlov, B. [1998], “Transcending the Limits of Turing Computability”; quant-ph/0304128, . Beals, R., Buhrman, H., Cleve, R., Mosca, M., and de Wolf, R. [2001], “Quantum Lower Bounds by Polynomials”, Journal of the ACM 48, 778–797; quant-ph/9802049, . Benacerraf, P. [1962], “Tasks and Supertasks, and the Modern Eleatics”, Journal of Philosophy LIX(24), 765–784. Bennett, C.H., Bernstein, E., Brassard, G., and Vazirani, U. [1997], “Strengths and Weaknesses of Quantum Computing”, SIAM
508
Karl Svozil
Journal on Computing 26, 1510–1523; quant-ph/9701001, . Beth, E.W. [1959], The Foundations of Metamathematics, North-Holland, Amsterdam. Bridges, D.S. [1999], “Constructive Mathematics and Unbounded Operators—A Reply to Hellman”, Journal of Philosophical Logic 28(5); . Bridgman, P.W. [1934], “A Physicist’s Second Reaction to Mengenlehre”, Scripta Mathematica 2, 101–117, 224–234; cf. [Landauer 1994]. Bridgman, P.W. [1952], The Nature of Some of Our Physical Concepts, Philosophical Library, New York. Brillouin, L. [1962], Science and Information Theory. Academic Press, New York, 2nd ed. Brillouin, L. [1964], Scientific Uncertainty and Information, Academic Press, New York. Calude, C.S. [2002], Information and Randomness—An Algorithmic Perspective, Springer, Berlin, 2nd ed. Calude, C.S., Calude, E., Svozil, K., and Yu, S. [1997], “Physical versus Computational Complementarity” I, International Journal of Theoretical Physics 36(7), 1495–1523; quant-ph/9412004. Calude, C.S. and Pavlov, B. [2002], “Coins, Quantum Measurements, and Turing’s Barrier”, Quantum Information Processing 1, 107–127; quant-ph/ 0112087, . Casti, J.L. and Karlquist, A. [1996], Boundaries and Barriers. On the Limits to Scientific Knowledge, Addison-Wesley, Reading, MA. Casti, J.L. and Traub, J.F. [1994], Santa Fe Institute, Santa Fe, NM; report 94–10–056, . Casti, J.L. and Depauli–Schimanovich, W. [2000], G¨ odel: A Life of Logic, Perseus, Cambridge, MA. Chaitin, G.J. [1987], Algorithmic Information Theory, Cambridge University Press, Cambridge.
Physics and Metaphysics Look at Computation
509
Chaitin, G.J. [1987], Computing the Busy Beaver Function, in Open Problems in Communication and Computation, (T.M. Cover and B. Gopinath eds.), Springer, New York, p. 108; reprinted in [Chaitin 1990]. Chaitin, G.J. [1990], Information, Randomness and Incompleteness, World Scientific, Singapore, 2nd ed. Church, A. [1930], “A Note on the Entscheidungsproblem”, Journal of Symbolic Logic 1, 40–41. Church, A. [1936], “An Unsolvable Problem of Elementary Number Theory”, American Journal of Mathematics 58, 345–363. Cleve, R. [2000], An Introduction to Quantum Complexity Theory, in Collected Papers on Quantum Computation and Quantum Information Theory, (C. Macchiavello, G. Palma, and A. Zeilinger eds.), World Scientific, Singapore, pp. 103–127; quant-ph/9906111. Conway, J.H. [1971], Regular Algebra and Finite Machines, Chapman and Hall Ltd., London. Davis, M. [1958], Computability and Unsolvability, McGraw-Hill, New York. Davis, M. [1965], The Undecidable, Raven Press, New York. Descartes, R. [1641], Meditation on First Philosophy; . Deutsch, D. [1985], “Quantum Theory, the Church–Turing Principle and the Universal Quantum Computer”, Proceedings of the Royal Society (London), A 400, 97–119. Deutsch, D. [1991], “Quantum Mechanics Near Closed Timelike Lines”, Physical Review D 44, 3197–3217; . Dvurecenskij, A., Pulmannov´ a, S., and Svozil, K. [1995], “Partition Logics, Orthoalgebras and Automata”, Helvetica Physica Acta 68, 407–428. Earman, J. and Norton, J.D. [1993], “Forever is a Day: Supertasks in Pitowsky and Malament–Hogarth Spacetimes”, Philosophy of Science 60, 22–42. Eccles, J.C. [1990], The Mind-Brain Problem Revisited: The Microsite Hypothesis, in The Principles of Design and
510
Karl Svozil
Operation of the Brain, (J.C. Eccles and O. Creutzfeldt eds.), Springer, Berlin, pp. 549. Egan, G. [1994], Permutation City. Einstein, A. [1956], Grundz¨ uge der Relativit¨ atstheorie, Vieweg, Braunschweig, 1st ed. Eisert, M.W.J. [2004], Quantum Computing, in Handbook Innovative Computing, (A. Zomaya, G. Milburn, J. Dongarra, D. Bader, R. Brent, M. Eshaghian–Wilner, and F. Seredynski eds.), Springer, Berlin, Heidelberg, New York, pp. 281–283; quant-ph/0401019. Ekert, A. and Jozsa, R. [1996], “Quantum Computation and Shor’s Factoring Algorithm”, Reviews of Modern Physics 68(3), 733–753. Farhi, E., Goldstone, J., Gutmann, S., and Sipser, M. [1998], “Limit on the Speed of Quantum Computation in Determining Parity”, Physical Review Letters 81, 5442–5444; quant-ph/9802045, . Feld, B.T. and Szilard, G.W. [1972], The Collected Works of Leo Scilard: Scientific Papers, MIT Press, Cambridge. Feynman, R.P. [1982], “Simulating Physics with Computers”, International Journal of Theoretical Physics 21, 467–488. Fortnow, L. [2003], “One Complexity Theorist’s View of Quantum Computing”, Theoretical Computer Science 292, 597–610; . Frank, P. [1932], Das Kausalgesetz und seine Grenzen, Springer, Vienna. Galindo, A. and Martin–Delgado, M.A. [2002], “Information and Computation: Classical and Quantum Aspects”, Reviews of Modern Physics 74, 347–432; (quant-ph/ 0112105, ). Galouye, D.F. [1964], Simulacron 3. Gandy, R.O. [1980], Church’s Thesis and Principles for Mechanics, in The Kleene Symposium, (J. Barwise, H.J. Keisler, and K. Kunen eds.), North Holland, Amsterdam.
Physics and Metaphysics Look at Computation
511
Gandy, R.O. [1982], Limitations to Mathematical Knowledge, in Logic Colloquium ’82, (D. van Dalen, D. Lascar, and J. Smiley eds.), North Holland, Amsterdam. G¨odel, K. [1929–1936 (1986)], in Collected Works. Publications 1929–1936, vol. I, (S. Feferman, J.W. Dawson, S.C. Kleene, G.H. Moore, R.M. Solovay, and J. van Heijenoort eds.), Oxford University Press, Oxford, 1986. ¨ G¨odel, K. [1931], “Uber formal unentscheidbare S¨atze der Principia Mathematica und verwandter Systeme”, Monatshefte f¨ ur Mathematik und Physik 38, 173–198; English translation in [G¨ odel 1929–1936] and in [Davis 1965]. Gold, E.M. [1967], “Language Identification in the Limit”, Information and Control 10, 447–474; . Goldstein, H. [1950 (1980)], Classical Mechanics, Addison-Wesley, Reading, MA, 2nd ed. Greenberger, D.M. and Svozil, K. [2002], A Quantum Mechanical Look at Time Travel and Free Will, in Between Chance and Choice, (H. Atmanspacker and R. Bishop eds.), Imprint Academic, Thorverton, pp. 293–308. Gr¨ unbaum, A. [1974], “Philosophical Problems of Space and Time”, Boston Studies in the Philosophy of Science, vol. 12, D. Reidel, Dordrecht/Boston, 2nd enlarged ed. Gruska, J. [1999], Quantum Computing, McGraw-Hill, London. Hogarth, M. [1993], “Predicting the Future in Relativistic Spacetimes”, Studies in History and Philosophy of Science. Studies in History and Philosophy of Modern Physics 24(5), 721–739. Hogarth, M. [1994], “Non-Turing Computers and non-Turing Computability”, PSA 1, 126–138. Jackson, J.D. [1999], Classical Electrodynamics, John Wiley & Sons, New York, 3rd ed. Jung, C.G. [1952], Synchronizit¨ at als ein Prinzip akausaler Zusammenh¨ ange, in Naturerkl¨ arung und Psyche, (C.G. Jung and W. Pauli eds.), Rascher, Z¨ urich.
512
Karl Svozil
Kieu, T.D. [2003], “Quantum Algorithm for Hilbert’s Tenth Problem”, International Journal of Theoretical Physics 42, 1461–1478; quant-ph/ 0110136, . Kieu, T.D. [2003a], “Computing the Noncomputable”, Contemporary Physics 44, 51–71; quant-ph/ 0203034, . Kreisel, G. [1974], “A Notion of Mechanistic Theory”, Synthese 29, 11–16. Kreisel, G. [1980], Biographical Memoirs of Fellows of the Royal Society 26, 148; corrections: ibid. 27, 697; ibid. 28, 718. Landauer, R. [1961], “Irreversibility and Heat Generation in the Computing Process”, IBM Journal of Research and Development 3, 183–191; reprinted in [Leff and Rex, pp. 188–196]. Landauer, R. [1967], “Wanted: a Physically Possible Theory of Physics”, IEEE Spectrum 4, 105–109. Landauer, R. [1982], “Uncertainty Principle and Minimal Energy Dissipation in the Computer”, International Journal of Theoretical Physics 21, 283–297. Landauer, R. [1987], “Fundamental Physical Limitations of the Computational Process; an Informal Commentary”, Cybernetics Machine Group Newsheet (1/1/1987). Landauer, R. [1988], “Dissipation and Noise Immunity in Computation and Communication”, Nature 335, 779–784. Landauer, R. [1989], Computation, Measurement, Communication and Energy Dissipation, in Selected Topics in Signal Processing, (S. Haykin ed.), Prentice Hall, Englewood Cliffs, NJ, p. 18. Landauer, R. [1991], “Information is Physical”, Physics Today 44, 23–29. Landauer, R. [1994], Advertisement for a Paper I Like, in On Limits, (J.L. Casti and J.F. Traub eds.), Santa Fe Institute Report 94–10–056, Santa Fe, NM, p. 39; .
Physics and Metaphysics Look at Computation
513
Landauer, R. [1994], “Zig-Zag Path to Understanding”, Proceedings of the Workshop on Physics and Computation PHYSCOMP ’94, IEEE Computer Society Press, Los Alamitos, CA, pp. 54–59. Leff, H.S. and Rex, A.F. [1990], Maxwell’s Demon, Princeton University Press, Princeton. López-Escobar, E.G.K. [1991], “Zeno’s paradoxes pre G¨odelian incompleteness”, Jahrbuch 1991 der Kurt–G¨ odel–Gesellschaft, p. 49. Mermin, N.D. [2002–4], “Lecture Notes on Quantum Computation”; . Miao, X. [2001], “A Polynomial-Time Solution to the Parity Problem on an NMR Quantum Computer”; quant-ph/0108116. Moore, C.D. [1990], “Unpredictability and Undecidability in Dynamical Systems”, Physical Review Letters 64, 2354–2357; cf. C. Bennett, Nature 346, 606; . Moore, E.F. [1956], Gedanken-Experiments on Sequential Machines, in Automata Studies, (C.E. Shannon and J. McCarthy eds.), Princeton University Press, Princeton. Murnaghan, F.D. [1962], The Unitary and Rotation Groups, Spartan Books, Washington, D.C. Nielsen, M.A. and Chuang, I.L [2000], Quantum Computation and Quantum Information, Cambridge University Press, Cambridge. Odifreddi, P.G. [1989], Classical Recursion Theory, North-Holland, Amsterdam. Ord, T. and Kieu, T.D. [2003], “The Diagonal Method and Hypercomputation”, math.LO/ 0307020. Orus, R., Latorre, J.I., and Martin–Delgado, M.A. [2004], “Systematic Analysis of Majorization in Quantum Algorithms”, European Physical Journal D 29, 119–132; quant-ph/ 0212094, . Ozhigov, Y. [1997], “Quantum Computer Can Not Speed Up Iterated Applications of a Black Box“; quant-ph/ 9712051.
514
Karl Svozil
Penrose, R. [1990], The Emperor’s New Mind: Concerning Computers, Minds, and the Laws of Physics, Oxford University Press, Oxford. Pitowsky, I. [1982], “Resolution of the Einstein–Podolsky–Rosen and Bell paradoxes”, Physical Review Letters 48, 1299–1302; cf. N.D. Mermin, Physical Review Letters 49, 1214; A.L. Macdonald, Physical Review Letters 49, 1215; I. Pitowsky, Physical Review Letters 49, 1216; . Pitowsky, I. [1990], “The physical Church–Turing Thesis and Physical Computational Complexity”, Iyyun 39, 81–99. Pitowsky, I. [1996], “Laplace’s Demon Consults an Oracle: The Computational Complexity of Prediction”, Studies in History and Philosophy of Science, Part B: Studies in History and Philosophy of Modern Physics 27, 161–180; . Popper, K.R. [1950a], “Indeterminism in Quantum Physics and in Classical Physics II”, The British Journal for the Philosophy of Science 1, 173–195. Popper, K.R. [1950], “Indeterminism in Quantum Physics and in Classical Physics” I, The British Journal for the Philosophy of Science 1, 117–133. Popper, K.R. and Eccles, J.C. [1977], The Self and its Brain, Springer, Berlin, Heidelberg, London, New York. Pour–El, M.B. and Richards, I. [1989], Computability in Analysis and Physics, Springer, Berlin. Preskill, J. [1998], “Quantum Computing: Pro and Con”, Proceedings of the Royal Society (London), A 454, 469–486; quant-ph/ 9705032, . Preskill, J. [LN], “Quantum Computation”, Lecture notes, quant-ph/ 9705032, . Rado, T. [1962], “On Non-Computable Functions”, The Bell System Technical Journal XLI(41)(3), 877–884. Rogers, H. Jr. [1967], Theory of Recursive Functions and Effective Computability, MacGraw-Hill, New York.
Physics and Metaphysics Look at Computation
515
Rosen, R. [1988], Effective Processes and Natural Law, in The Universal Turing Machine. A Half-Century Survey, (R. Herken ed.), Kammerer & Unverzagt, Hamburg, p. 523. Rucker, R. [1982 (1986)], Infinity and the Mind, Birkh¨auser, Boston; reprinted by Bantam Books, 1986. Schaller, M. and Svozil, K. [1996], “Automaton Logic”, International Journal of Theoretical Physics 35(5), 911–940. Schuster, H.G. [1984], Deterministic Chaos, Physik Verlag, Weinheim. Shaw, R. [1981], Zeitschrift f¨ ur Naturforschung 36a, 80. Siegelmann, H.T. [1995], “Computation Beyond the Turing Limit”, Science 268, 545–548. Specker, E. [1990], Selecta, Birkh¨auser Verlag, Basel. Stadelhofer, R., Suterand, D., and Banzhaf, W. [2005], “Quantum and Classical Parallelism in Parity Algorithms for Ensemble Quantum Computers”, Physical Review A 71, 032345; quant-ph/ 0112105, . Svozil, K. [1993], Randomness & Undecidability in Physics, World Scientific, Singapore. Svozil, K. [1994], Extrinsic-Intrinsec Concept and Complementarity, in Inside versus Outside, (H. Atmanspacker and G.J. Dalenoort eds.), Springer-Verlag, Heidelberg, pp. 273–288. Svozil, K. [1995], “Consistent Use of Paradoxes in Deriving Contraints on the Dynamics of Physical Systems and of No-Go-Theorems”, Foundations of Physics Letters 8(6), 523–535. Svozil, K. [1995], “Set Theory and Physics”, Foundations of Physics 25, 1541–1560. Svozil, K. [1995], A Constructivist Manifesto for the Physical Sciences, in The Foundational Debate, Complexity and Constructivity in Mathematics and Physics, (W.D. Schimanovich, E. K¨ ohler, and F. Stadler eds.), Kluwer, Dordrecht, Boston, London, pp. 65–88. Svozil, K. [1996], “How Real are Virtual Realities, How Virtual is Reality? The Constructive Re-Interpretation of Physical Undecidability”, Complexity 1(4), 43–54.
516
Karl Svozil
Svozil, K. [1996], Undecidability everywhere?, in Boundaries and Barriers. On the Limits to Scientific Knowledge, (J.L. Casti and A. Karlquist eds.), Addison-Wesley, Reading, MA, pp. 215–237. Svozil, K. [1998], Quantum Logic, Springer, Singapore. Svozil, K. [1998], The Church–Turing Thesis as a Guiding Principle for Physics, in Unconventional Models of Computation, (C.S. Calude, J. Casti, and M.J. Dinneen eds.), Springer, Singapore, pp. 371–385. Svozil, K. [2005], “Logical Equivalence between Generalized Urn Models and Finite Automata”, International Journal of Theoretical Physics, [in print]; quant-ph/ 0209136. Svozil, K. [2005], “Characterization of Quantum Computable Decision Problems by State Discrimination”, AIP Conference Proceedings, (A. Khrennikov ed.), American Institute of Physics, Melville, NY, in print; quant-ph/ 0505129. Svozil, K. and Neufeld, N. [1996], “‘Linear’ Chaos via Paradoxical Set Decompositions”, Chaos, Solitons & Fractals 7(5), 785–793; . “The Matrix” [1999], movie, directors: Andy Wachowski and Larry Wachowski, USA, 136 min.; English, budget of US$80 million. Thomson, J.F. [1954], “Tasks and supertasks”, Analysis 15, 1–13. “Total Recall” [1990], movie, director: Paul Verhoeven, USA, 113 min.; English, budget of US$65 million. Turing, A.M. [1936–7 and 1937], “On Computable Numbers, with an Application to the Entscheidungsproblem”, Proceedings of the London Mathematical Society, Series 2 42 and 43, 230–265 and 544–546; reprinted in [Davis 1965]. Wagon, S. [1986], The Banach–Tarski Paradox, Cambridge University Press, Cambridge. Wang, P.S. [1974], “The Undecidability of the Existence of Zeros of Real Elementary Functions”, Journal of the ACM (JACM) 21, 586–589. Weyl, H. [1949], Philosophy of Mathematics and Natural Science, Princeton University Press, Princeton.
Physics and Metaphysics Look at Computation
517
Wright, R. [1978], The State of the Pentagon. A Nonclassical Example, in Mathematical Foundations of Quantum Theory, (A.R. Marlow ed.), Academic Press, New York, pp. 255–274. Wright, R. [1990], “Generalized Urn Models”, Foundations of Physics 20, 881–903.
David Turner∗
Church’s Thesis and Functional Programming The earliest statement of Church’s Thesis, from Church [1936, p. 356] is We now define the notion, already discussed, of an effectively calculable function of positive integers by identifying it with the notion of a recursive function of positive integers (or of a lambda-definable function of positive integers).
The phrase in parentheses refers to the apparatus which Church had developed to investigate this and other problems in the foundations of mathematics: the calculus of lambda conversion. Both the Thesis and the lambda calculus have been of seminal influence on the development of Computing Science. The main subject of this article is the lambda calculus but I will begin with a brief sketch of the emergence of the Thesis. The epistemological status of Church’s Thesis is not immediately clear from the above quotation and remains a matter of debate, as is explored in other papers of this volume. My own view, which I will state but not elaborate here, is that the thesis is empirical because it relies for its significance on a claim about what can be calculated by mechanisms. This becomes clearer in Church’s restatement of the thesis the following year, after he had seen Turing’s paper, see below. For a fuller discussion see Hodges [this volume]. Three definitions of the effectively computable functions of the natural numbers (non-negative integers, hereafter N ), developed nearly contemporaneously in the early to mid 1930’s, turned out to be equivalent. Church [1936, quoted above] showed that his own theory of lambda definable functions yielded the same functions on ∗
D.A. Turner, School of Computing Science, Middlesex University, UK.
Church’s Thesis and Functional Programming
519
N k → N as the recursive functions of Herbrand and G¨odel [Herbrand 1932, G¨ odel 1934]. This was proved independently by Kleene [1936]. A few months later Turing [1936] introduced his concept of logical computing machine (LCM)—a finite automaton with an unbounded tape divided into squares, on which it could move left or right and read or write symbols from a finite alphabet, in accordance with a specified state transition table. A central result of the paper is the existence of a universal LCM, which can emulate the behaviour of any LCM whose description is written on its tape. In an appendix Turing shows that the numeric functions computable by his machines coincide with the lambda-definable ones. In his review of Turing’s paper, Church [1937] writes there is involved here the equivalence of three different notions: computability by a Turing machine, general recursiveness [...] and lambda-definability [...] The first has the advantage of making the identification with effectiveness in the ordinary sense evident immediately [...] The second and third have the advantage of suitability for embodiment in a system of symbolic logic.
The Turing machine led, about a decade later, to the Turing/von Neumann computer—a realization in electronics of Turing’s universal machine, with the important optimization that (an initial portion of) the tape is replaced by a random access store. The concept of a programming language didn’t yet exist in 1936, but the second and third notions were eventually to provide the basis of what we now know as functional programming.
The Halting Theorem All three notions of computability involve partiality in an essential way. General recursion schemas yield the partial recursive functions, which may for some values of their arguments fail to produce a result. We will write their type as N k → N . We have N = N ∪ {⊥} where the undefined value ⊥ represents non-termination.1 The recursive functions are the subset that are everywhere defined. That this subset is not recursively enumerable is shown by a use of Cantor’s 1
The idea of treating non-termination as a peculiar kind of value, ⊥, is more recent and was not current at the time of Church and Turing’s foundational work.
520
David Turner
diagonalization argument.2 Since the partial recursive functions are recursively enumerable it follows that the property of being total (for a partial recursive function) is not recursively decidable. By a separate argument it can be shown that the property for a partial recursive function of being defined at a specified value of its input vector is also not in general recursively decidable. Similarly, Turing machines may not halt and lambda-terms may have no normal form; and these properties are not, respectively, Turingcomputable or lambda-definable, as is shown in each case by a simple argument involving self-application. Thus of perhaps equal importance with Church’s Thesis and which emerges from it is the Halting Theorem: given an arbitrary computation whose result is of type N we cannot in general decide if it is ⊥. What is actually proven, e.g. of the halts predicate on Turing machines, is that it is not Turing-computable (equiv not lambdadefinable etc). It is by an appeal to Church’s Thesis that we pass from this to the assertion that halting is not effectively decidable. The three convergent notions (to which others have since been added) identify an apparently unique, effectively enumerable, class of functions of type N k → N corresponding to what is computable by finite but unbounded means. Church’s identification of this class with effective calculability amounts to the conjecture that this is the best we can do. In the case of the Turing machine the unbounded element is the tape (it is initially blank, save for a finite segment but provides an unlimited working store). In the case of the lambda calculus it is the fact that there is no limit to the intermediate size to which a term may grow during the course of its attempted reduction to normal form. In the case of recursive functions it is the minimalization operation, which searches for the smallest n²N on which a specified recursive function takes the value 0. The Halting Theorem tells us that unboundedness of the kind needed for computational completeness is effectively inseparable from the possibility of non-termination. 2
The proof is purely constructive and doesn’t depend on Church’s Thesis: any effective enumeration, h, of computable functions in N → N is incomplete—it lacks f (n) = h[n)(n] + 1.
Church’s Thesis and Functional Programming
521
The Lambda Calculus Of the various convergent notions of computability Church’s lambda calculus is distinguished by its combination of simplicity with remarkable expressive power. The lambda calculus was conceived as part of a larger theory, including logical operators such as implication, intended as an alternative foundation for mathematics based on functions rather than sets. This gave rise to paradoxes, including a version of the Russell paradox. What remained with the propositional part stripped out is a consistent theory of pure functions, of which the first systematic exposition is Church [1941].3 In the sketch given here we use for variables lower case letters: a, b, c · · · x, y, z and as metavariables denoting terms upper case letters: A, B, C · · · . The abstract syntax of the lambda calculus has three productions. A term is one of variable e.g. x application AB abstraction λx.A In the last case λx. is a binder and free occurrences of x in A become bound. A term in which all variables are bound is said to be closed otherwise it is open. The motivating idea is that closed term represent functions. The intended meaning of AB is the application of function A to argument B while λx.A is the function which for input x returns A. Terms which are the same except for renaming of bound variables are not distinguished, thus λx.x and λy.y are the same, identity function. In writing terms we freely use parentheses to remove ambiguity. We further adopt the conventions that application is left-associative and that the scope of a binder extends as far to the right as possible. For example f g h means (f g)h and λx.λy.Ba means λx.(λy.[Ba)]. The calculus has only one essential rule, which shows how to substitute an argument into the body of a function: (β) 3
(λx.A)B →β [B/x]A
In his monograph Church defines two slightly differing calculi called λI and λK, of these λK is now regarded as canonical and is what we sketch here.
522
David Turner
Here [B/x]A means substitute B for free occurrences of x in A. The smallest reflexive, symmetric, transitive, substitutive relation on terms including →β , written ⇔, is Church’s notion of λ-conversion. If we omit symmetry from the definition we get an oriented relation, written ⇒, called reduction. An instance of the left hand side of rule β is called a redex. A term containing no redex is said to be in normal form. A term which is convertible to one in normal form is said to be normalizing. There are non-normalizing terms, of which perhaps the simplest is (λx.xx)(λx.xx). We have the cyclic (λx.xx)(λx.xx) →β (λx.xx)(λx.xx) as the only available step. The two most important technical results are Church–Rosser Theorem If A ⇔ B there is a term C such that A ⇒ C and B ⇒ C. An immediate consequence of this is that the normal form of a normalizing term is unique.4 Normal Order Theorem Stated informally: the normal form of a normalizing term can be found by repeatedly reducing its leftmost redex.5 To see the significance of the normal order theorem consider the term (λy.z)((λx.xx)(λx.xx)) We have (λy.z)((λx.xx)(λx.xx)) →β z which is the normal form. But if we try to reduce the argument ((λx.xx)(λx.xx)) to normal form first, we get stuck in an endless loop. In general there are many ways of reducing a term, since it or one of its reducts may contain multiple redexes. The normal order theorem gives a sequential procedure, normal order reduction, which is guaranteed to reach the normal form if there is one. 4
This means unique up to changes of bound variable, of course. In case of nested redexes, leftmost is usually defined as leftmost-outermost, although the theorem will still hold if we take leftmost-innermost. 5
Church’s Thesis and Functional Programming
523
Note that normal order reduction substitutes arguments into function bodies without first reducing any redexes inside the argument, which amounts to lazy evaluation. A closed term of pure6 λ-calculus is called a combinator. Note that any normalizing closed term of pure λ-calculus must reduce to an abstraction. Some combinators with their conventional names are: S = λx.λy.λz.xz(yz) K = λx.λy.x I = λx.x B = λx.λy.λz.x(yz) C = λx.λy.λz.xzy It is evident that λ-calculus has a rich collection of functions, including functions of higher type, that is whose arguments and/or results are functions, but since (at least closed) terms can denote only functions and never ground objects it remains to show how to represent data such as the natural numbers. Here are the Church numerals 0 = λa.λb.b 1 = λa.λb.ab 2 = λa.λb.a(ab) 3 = λa.λb.a(a[ab)] etc. · · · To understand this representation for numbers note the effect of applying a Church numeral to function f and object a: 0f a ⇔ a
1f a ⇔ f a
2f a ⇔ f (f a)
3f a ⇔ f (f [f a)] 6
Pure means using only variables and no proper constants, as λ-calculus is presented here.
524
David Turner
The numbers are thus represented as iterators. It is now straightforward to define the arithmetic operations, for example + = λm.λn.λa.λb.ma(nab) × = λm.λn.λa.λb.m(na)b predecessor and subtraction are a little trickier, see Church [1941]. We also need a way to branch on 0: zero = λa.λb.λn.n(Kb)a We have zero A B N
⇔ A,
⇔ B,
N ⇔0
N ⇔n+1
The master-stroke, which shows every recursive function to be λ-definable is to find a universal fixpoint operator, that is a term Y with the property that for any term F , Y F ⇔ F (Y F ) There are many such terms, of which the simplest is due to H.B. Curry. Y = λf.(λx.f [xx)](λx.f [xx)] The reader may satisfy himself that we have Y F ⇔ F (Y F ) as required. The beauty of λ-definability as a theory of computation is that it gives not only—assuming Church’s Thesis—all computable functions of type N → N but also those of higher type of any finite degree, such as (N → N ) → N , (N → N ) → (N → N ) and so on. Moreover we are not limited to arithmetic. The idea behind the Church numerals is very general and allows any data type—pairs, lists, trees and so on—to be represented in a purely functional way. Each datum is encoded as a function that captures its elimination operation, that is the way in which information is extracted from it during computation. It is also possible to represent codata, such as infinite lists, infinitary trees and so on. Part of the simplicity of the calculus lies in its considering only functions of a single argument. This is no real restriction since it is a basic result of set theory that for any sets A, B, the function spaces
Church’s Thesis and Functional Programming
525
(A × B) → C and A → (B → C) are isomorphic. Replacing the first by the second is called Currying.7 We have made implicit use of this idea all along, e.g. + is curried addition. Solvability and Non-Strictness
A non-normalizing term is by no means necessarily useless. An example is Y , which has no normal form but can produce one when applied to another term. On the other hand (λx.xx)(λx.xx) is irredeemable—there is no term and no sequence of terms to which it can be applied and yield a normal form. Definition: a term T is SOLVABLE if there are terms A1 , · · · , Ak for some k ≥ 0 such that T A1 · · · Ak is normalizing. Thus Y is solvable because we have for example Y (λx.λy.y) ⇔ (λy.y) whereas (λx.xx)(λx.xx) is unsolvable. An important result, due to Corrado B¨ohm, is that a term is solvable if and only if it can be reduced to head normal form: λx1 · · · λxn .xk A1 · · · Am the variable xk is called the head and if the term is closed must be one of the x1 · · · xn . If a term is solvable normal order reduction will reduce it to HNF in a finite number of steps. See Barendregt [1984]. All unsolvable terms are equally useless, so we can think of them as being equivalent and introduce a special term ⊥ to represent them. This gives us an extension of ⇔ for which we will use ≡. The two fundamental properties of ⊥, which follow from the definitions of unsolvability and head normal form, are: ⊥A≡⊥
λx.⊥ ≡ ⊥ Introducing ⊥ allows an ordering relation to be defined on terms with ⊥ as least element and a stronger equivalence relation using limits which is studied in domain theory (see later). We make one 7
After H.B. Curry, although the idea was first used in Sch¨ onfinkel [1924].
526
David Turner
further remark here. Definition: a term A is STRICT if A⊥≡⊥ and non-strict otherwise. A strict function thus has ⊥ for a fixpoint and applying Y to it will produce ⊥. So non-strict functions play an essential role in the theory of λ-definability—without them we could not use Y to encode recursion.
Combinatory Logic
Closely related to λ-calculus is combinatory logic, originally due to Sch¨ onfinkel [1924] and subsequently explored by H.B. Curry. This has meagre apparatus indeed—just application and a small collection of named combinators. These are defined by stating their reduction rule. In the minimal version we have two combinators, defined as follows S x y z ⇒ x z(y z) K x y ⇒x here x, y, z are metavariables standing for arbitrary terms and are used to state the reduction rules. Combinatory logic terms have no variables and are built using only constants and application:, e.g. K(SKK). A central result, perhaps one of the strangest in all of logic, is that every λ-definable function can be written using only S and K. Here is a start I = SKK The proof is by considering application to an arbitrary term. We have SKKx ⇒ Kx(Kx) ⇒ x as required. The definitive study of combinatory logic and its relationship to lambda calculus is Curry & Feys [1958]. There are several algorithms for transcribing λ-terms to combinators and for convenience most of these use besides S, K, additional combinators such as B, C, I etc.
Church’s Thesis and Functional Programming
527
It would seem that only a dedicated cryptologist would choose to write other than very small programs directly in combinatory logic. However, Turner [1979a] describes compilation to combinators as an implementation method for a high-level functional programming language. This required finding a translation algorithm, described in Turner [1979b], that produces compact combinator code when translating expressions containing many nested λ-abstractions. The attraction of the method is that combinator reduction rules are much simpler than β-reduction, each requiring only a few machine instructions, allowing a fast interpreter to be constructed which carries out normal order reduction. The Paradox
It is easy to see why the original versions of λ-calculus and combinatory logic, which included properly logical notions, led to paradoxes. (Curry calls these theories illative.) The untyped theory is too powerful, because of the fixpoint combinator, Y . Suppose N is a term denoting logical negation. We have Y N ⇔ N (Y N ) which is the Russell paradox. Even minimal logic, which lacks negation, becomes inconsistent in the presence of Y —implication is sufficient to generate the paradox, see Barendregt [1984, p. 575]. Because of this Y is sometimes called Curry’s paradoxical combinator. Typed λ-Calculi
The λ-calculus of Church [1941] is untyped : it allows the promiscuous application of any term to any other, so types arise only in the interpretation of terms. In a typed λ-calculus the rules of term formation embody some theory of types. Only terms which are well-typed according to the theory are permitted. The rules for β reduction remain unchanged, as does the Church–Rosser Theorem. Most type systems disallow self-application, as in (λx.xx), preventing the formation of a fixpoint combinator like Curry’s Y . Typed λ-calculi fall into two main groups depending on what is done about this (i) Add an explicit fixpoint construction to the calculus—for example a polymorphic constant Y of type schema (α → α) → α,
528
David Turner
with reduction rule Y H ⇒ H(Y H). This allows general recursion at every type and thus retains the computational completeness of untyped λ. (ii) In the other kind of typed λ-calculus there is no fixpoint construct and every term is normalizing. This brings into play a fundamental isomorphism between programming and logic: the Propositions-as-Types principle. This gives two apparently very different models of functional programming, which we discuss in the next two sections.
Lazy Functional Programming Imperative programming languages, from the earliest such as FORTRAN and COBOL which emerged in the 1950’s to current “object-oriented” ones such as C++ and Java have certain features in common. Their basic action is the assignment command, which changes the content of a location in memory and they have an explicit flow of control by which these state changes are ordered. This reflects more or less directly the structure of the Turing/von Neumann computer, as a central processing unit operating on a passive store. Backus [1978] calls them “von Neumann languages”. Functional8 programming languages offer a radical alternative— they are descriptive rather than imperative, have no assignment command and no explicit flow of control—sub–computations are ordered only partially, by data dependency. The claimed merits of functional programming—in conciseness, mathematical tractability, potential for parallel execution—have been argued in many places so we will not dwell on them here. Nor will we go into the history of the concept, other than to say that the basic ideas go back over four decades, see in particular the important early papers of McCarthy [1960], Landin [1966]—and that for a long period functional programming was mainly practised in imperative languages with functional subsets (LISP, Scheme, Standard ML). The disadvantages of functional programming within a language that includes imperative features are two. First, you are not forced 8
We here use functional to mean what some call purely functional, an older term for this is applicative, yet another term which includes other mathematically based models, such as logic programming, is declarative.
Church’s Thesis and Functional Programming
529
to explore the limits of the functional style, since you can escape at will into an imperative idiom. Second, the presence of side effects, exceptions etc., even if they are rarely used, invalidate important theorems on which the benefits of the style rest. The λ-calculus is the most natural candidate for functional programming: it is computationally complete in the sense of Church’s Thesis, it includes functions of higher type and it comes with a theory of λ-conversion that provides a basis for reasoning about program transformation, correctness of evaluation mechanisms and so on. The notation is a little spartan for most tastes but it was shown long ago by Peter Landin that the dish can be sweetened by adding a sprinkling of syntactic sugar.9 Efficient Normal Order Reduction
The Normal Order Theorem tells us that an implementation of λ-calculus on a sequential machine should use normal order reduction 10 , otherwise it may fail to find the normal form of a normalizing term. This requires that arguments be substituted unevaluated into function bodies as we noted earlier. In general this will produce multiple copies of the argument, requiring any redexes it contains to be reduced multiple times. For λ-calculus-based functional programming to be a viable technology it is necessary to have an efficient way of handling this. A key step was the invention of normal graph reduction, by Wadsworth [1971]. In this scheme the term is held as a directed acyclic graph, and the result of β-reduction is that a single copy of the argument is retained, with the function body containing multiple pointers to it. As a consequence any redexes in the argument are reduced at most once. Turner adapted this idea to graph reduction on S, K, I, etc. combinators, allowing a much simpler abstract machine. In Turner’s scheme the graph may be cyclic, permitting a more compact representation of recursion. The reduction rule for the Y combinator, Y H ⇒ H (Y H), creates a loop in the graph, increasing the amount of sharing. The combinators are a target code for a compiler for 9
The phrase syntactic sugar is due to Strachey, as are other evocative terms and concepts in programming language theory. 10 Except where prior analysis of the program shows it can be avoided, a process known as strictness analysis.
530
David Turner
compilation from a high level functional language. Initially this was SASL [Turner 1976] and in later incarnations of the system, Miranda. While using a set of combinators fixed in advance is a good solution if graph reduction is to be carried out by an interpreter, if the final target of compilation is to be native code on conventional hardware it is advantageous to use the λ-abstractions present (explicitly or implicitly) in the program source as the combinators whose reduction rules are to be implemented. This requires a source-to-source transformation called λ-lifting, Hughes [1983], Johnsson [1985]. This method was first used in the compiler of LML, a lazy version of the functional subset of ML, written by Lennart Augustsson & Thomas Johnsson at Chalmers University in Sweden, around 1984. Their model for mapping graph reduction onto conventional hardware, the G machine, has since been further refined, leading to the optimized model of Simon Peyton Jones [1992]. Thus over a period of two decades normal order functional languages have been implemented with increasing efficiency. Miranda
Miranda is a functional language designed by David Turner in 1983–6 and is a sugaring of a typed λ-calculus with a universal fixpoint operator. There are no explicit λ’s—instead we have function definition by equations and local definitions with where. The insight that one can have λ-calculus without λ goes back to Peter Landin [1966] and his ISWIM notation. Neither is the user required to mark recursive definitions as such—the compiler analyses the call graph and inserts Y where it is required. The use of normal order reduction (aka lazy evaluation) and nonstrict functions has a very pervasive effect. It supports a more mathematical style of programming, in which infinite data structures can be described and used and, which is most important, permits communicating processes and input/output to be programmed in a purely functional manner. Miranda is based on the earlier lazy functional language SASL [Turner 1976] with the addition of the system of polymorphic strong typing of Milner [1978]. For an overview of Miranda see Turner [1986]. Miranda doesn’t use Church numerals for its arithmetic—modern computers have fast fixed and floating point arithmetic units and it
Church’s Thesis and Functional Programming
531
would be perverse not to take advantage of them. Arithmetic operations on unbounded size integers and 64bit floating point numbers are provided as primitives. In place of the second order representation of data used within the pure untyped lambda calculus we have algebraic type definitions. For example bool ::= False | True nat ::= Zero | Suc nat tree ::= Leaf nat | Fork tree tree Introducing new data types in this way is in fact better than using second order impredicative definitions for two reasons: you get clearer and more specific type error messages if you misuse them— and each algebraic type comes with a principle of induction which can be read off from the definition. The analysis of data is by pattern matching, for example flatten :: tree -> [nat] flatten (Leaf n) = [n] flatten (Fork x y) = flatten x ++ flatten y The type specification of flatten is optional as the compiler is able to deduce this; ++ is list concatenation. There is a rich vocabulary of standard functions for list processing, map, filter, foldl, foldr, etc. and a notation, called list comprehension that gives concise expression to a useful class of iterations. Miranda was widely used for teaching and for about a decade following its initial release by Research Software Ltd in 1985–6 provided a de facto standard for pure functional programming, being taken up by over 200 universities. The fact that it was interpreted rather than compiled limited its use outside education, but several significant industrial projects were successfully undertaken using Miranda, see for example Major et. al. [1991] and Page & Moe [1993]. Haskell, a successor language designed by a committee, includes many extensions, of which the most important are type classes and monadic input-output. The language remains purely functional, however. For a detailed description see S.L. Peyton Jones [2003]. Available implementations of Haskell include, besides an interpreter suitable for educational use, native code compilers. This makes Haskell a viable choice for production use in a range of areas.
532
David Turner
The fact that people are able to write large programs for serious applications in a language, like Miranda or Haskell, that is essentially a sugaring of λ-calculus is in itself a vindication of Church’s Thesis. Domain Theory
The mathematical theory which explains programming languages with general recursion is Scott’s domain theory. The typed λ-calculus looks as though it ought to have a settheoretic model, in which types denote sets and λ-abstractions denote functions. But the fixpoint operator Y is problematic. It is not the case in set theory that every function f ²A → A has a fixpoint in A. There is second kind of fixpoint to be explained, at the level of types. We can define recursive algebraic data types, like (we are here using Miranda notation): big ::= Leaf nat | Node (big -> big) This appears to require a set with the property Big ∼ = N + (Big → Big) which is impossible on cardinality grounds. Dana Scott’s domain theory solves both these problems. A domain is a complete partial order: a set with a least element, ⊥, representing non-termination, and limits of ascending chains (or more generally of directed sets). The function space A → B for domains A, B, is defined to contain just the continuous functions from A to B and this is itself a domain. Continuous means preserving limits. The continuous functions are also monotonic (= order preserving). For a complete partial order, D, each monotonic function f ²D → D F n ⊥. has a least fixed point, ∞ f n=0 A plain set, like N can be turned into a domain by adding ⊥, to get N . Further, domain equations, like D ∼ = N + (D × D), D ∼ = N +(D → D) and so on, all have solutions. The details can be found in Scott [1976] or Abramsky & Jung [1994]. This includes that there is a non-trivial11 domain D∞ with D∞ ∼ = D∞ → D∞
11
The one-point domain, with ⊥ for its only element, if allowed, would be a trivial solution.
Church’s Thesis and Functional Programming
533
providing a semantic model for Church’s untyped λ-calculus. Domain theory was originally developed to underpin denotational semantics, Christopher Strachey’s project to formalize semantic descriptions of real programming languages using a typed λ-calculus as the metalanguage [see Strachey, 1967; Strachey & Scott 1971]. Strachey’s semantic equations made frequent use of Y to explain control structures such as loops and also required recursive type equations to account for the domains of the various semantic functions. It was during Scott’s collaboration with Strachey in the period around 1970 that domain theory emerged. Functional programming in non-strict languages like Miranda and Haskell is essentially programming directly in the metalanguage of denotational semantics. Computability at Higher Types, Revisited
Dana Scott once remarked that λ-calculus is only an algebra, not a calculus. With domain theory and proofs using limits we get a genuine calculus, allowing many new results. Studying a typed functional language with arithmetic, Plotkin [1977] showed that if we consider functions of higher type where we allow inputs as well as outputs to be ⊥, there are computable functions which are not λ-definable. Using domain B where B = {T rue, F alse}, two examples are: Or ² B → B → B where Or x y is T rue if either x or y is T rue Exists ² (N → B) → B where Exists f is T rue when ∃i²N.f i = T rue This complete or parallel Or must interleave two computations, since either of its inputs may be ⊥. Exists is a multi-way generalization. What we get from untyped λ-calculus, or a typed calculus with N and general recursion, are the sequential functions. To get all computable partial functions at every type we must add primitives expressing interleaving or concurrency. In fact just the two above are sufficient. This becomes important for programming with exact real numbers, an active area of research. Martin Escardo [1996] shows that a λ-calculus with a small number of primitives including Exists can
534
David Turner
express every computable function of analysis, including those of higher type, e.g. differentiation and integration.
Strong Functional Programming There is an extended family of typed λ-calculi, all without Y or any other method of expressing general recursion, in which every term is normalizing. The family includes simply typed λ-calculus—this is a family in itself Girard’s system F [1971], also known as the second order λcalculus (we consider here the Church-style or explicitly typed version) Coquand & Huet’s calculus of constructions [1988] Martin–L¨ of’s intuitionist theory of types [1973] In a change of convention we will use upper case letters A, B, C · · · for types and lower case letters a, b, c · · · for terms, reserving x, y, z, for λ-calculus variables (this somewhat makeshift convention will be adequate for a short discussion). In addition to the usual conversion and reduction relations, ⇔ , ⇒, these theories have a judgement of well-typing, written a : A which says that term a has type A (which may or may not be unique). All the theories share the following properties: Church–Rosser If a ⇔ b there is a term c such that a ⇒ c and b ⇒ c. Decidability of well-typing This what is meant by saying that a programming language or formalism is strongly typed (aka staticly typed). Strongly normalizing Every well-typed term is normalizing and every reduction sequence terminates in a normal form. Uniqueness of normal forms Immediate Rosser.
from
Church–
Decidability of ⇔ on well-typed terms From the two previous properties—reduce both sides to normal form and see if they are equal.
Church’s Thesis and Functional Programming
535
Note that decidability of the well typing judgment, a : A, is not the same as type inference. The latter means that given an a we can find an A with a : A, or determine that there isn’t one. The simply typed λ-calculus has type inference (in fact with most general types) but none of the stronger theories do. The first two properties in the list are shared with other wellbehaved typed functional calculi, including those with general recursion. So the distinguishing property here is strong normalization. Programming in a language of this kind has important differences from the more familiar kind of functional programming. For want of any hitherto agreed name, we can call it strong functional programming.12 An obvious difference is that all evaluations terminate13 , so we do not have to worry about ⊥. It is clear that such a language cannot be computationally complete—there will be always-terminating computable functions it cannot express (and one of these will be the interpreter for the language itself). It should not be inferred that a strongly normalizing language must therefore be computationally weak. Even simple typed lambda calculus, equipped with N as a base type and primitive recursion, can express every recursive function of arithmetic whose totality is provable in first order number theory (a result due to G¨ odel [1958]). A proposed elementary functional programming system along these lines, but including codata as well as data, is discussed in Turner [2004]. A less obvious but most striking consequence of strongly normalization is a new and unexpected interface between λ-calculus and logic. We show how this works by considering the simplest calculus of this class. Propositions-as-Types
The simply typed λ-calculus (STLC) has for its types the closure under → of a set of base types, which we will leave unspecified. As 12
Another possible term is “total functional programming”, although this has the disadvantage of encouraging the unfortunate term “total function” (redundant because it is part of the definition function that it is everywhere defined on its domain). 13 This seems to rule out indefinitely proceeding processes, such as an operating system, but we can include these by allowing codata and corecursion, see eg Turner [2004].
536
David Turner
before we use A, B, C · · · as variables ranging over types. We can associate with each closed term a type schema, for example λx.x : A → A The function λx.x has many types but they are all instances of A → A, which is its most general type. A congruence first noticed by Curry in the 1950’s is that the types of closed terms in STLC correspond to tautologies of intuitionist propositional logic, if we read → as implication, e.g. A → A is a tautology. The correspondence is exact, for example A → B is not a tautology and neither can we make any closed term of this type. Further, the most general types of the combinators s = λx.λy.λz.xz(yz) and k = λx.λy.x are s : ((A → (B → C)) → ((A → B) → (A → C)) k : A → (B → A) and these formulae are the two standard axioms for the intuitionist theory of implication: every other tautology in → can be derived from them by modus ponens. What is going on here? Let us look at the rules for forming well-typed terms of simply typed λ. (x : A) b:B λx.b : A → B
c:A→B
a:A
ca:B
On the left14 we have the rule for abstraction, on the right that for application. If we look only at the types and ignore the terms, these are the introduction and elimination rules for implication in a natural deduction system. So naturally, the formulae we can derive using these rules are all and only the tautologies of the intuitionist theory of implication.15 14 The left hand rule says that if from assumption x : A we can derive b : B then we can derive what is under the line. 15 The classical theory of implication includes additional tautologies dependant on the law of the excluded middle—the leading example is ((A → B) → A) → A, Pierce’s law.
Church’s Thesis and Functional Programming
537
In the logical reading, the terms on the left of the colons provide witnessing information—they record how the formula on the right was proved. The judgement a : A thus has two readings—that term a has type A, but also that proof-object or witness a proves proposition A. The correspondence readily extends to the other connectives of propositional logic by adding some more type constructors to SLTC besides →. The type of pairs, cartesian product, A × B, corresponds to the conjunction A∧B. The disjoint union type, A⊕B, corresponds to the disjunction A ∨ B. The empty type corresponds to the absurd (or False) proposition, which has no proof. This Curry–Howard isomorphism between types and propositions is jointly attributed to Curry [1958] and to W. Howard [1969], who showed how it extended to all the connectives of intuitionist logic including the quantifiers. It is at the same time an isomorphism between terminating programs and constructive (or intuitionistic) proofs. The Constructive Theory of Types
Per Martin–L¨ of [1973] formalizes a proposed foundational language for constructive mathematics based on the isomorphism. The Intuitionist (or Constructive) Theory of Types is at one and the same time a higher order logic and a theory of types, providing for constructive mathematics what for classical mathematics is done by set theory. It provides a unified notation in which to write functions, types, propositions and proofs. Unlike the constructive set theory of Myhill [1975], Martin–L¨of type theory includes a principle of choice (not as an axiom, it is provable within the theory). It seems that the source of the nonconstructivities of set theory is not the choice principle, which for Martin–L¨ of is constructively valid, but the axiom of separation, a principle which is noticeably absent from type theory.16 17 Constructive type theory is both a theory of constructive mathematics and a strongly typed functional programming language. Ver16 Note that Goodman & Myhill’s [1978] proof that Choice implies Excluded Middle makes use of an instance of the Axiom of Separation. The title should be Choice + Separation implies Excluded Middle. 17 The frequent proposals to “improve” CTT by adding a subtyping constructor should therefore be viewed with suspicion.
538
David Turner
ifying the validity of proofs is the same process as type checking. Martin–L¨ of [1982] writes I do not think that the search for high level programming languages that are more and more satisfactory from a logical point of view can stop short of anything but a language in which all of constructive mathematics can be expressed.
There exist by now a number of different versions of the theory, including several computer-based implementations, of which perhaps the longest established is NuPRL [Constable et al. 1986]. An alternative impredicative theory, also based on the Curry– Howard isomorphism, is Coquand and Huet’s Calculus of Constructions [1988] which provides the basis for the COQ proof system developed at INRIA.
Type Theory with Partial Types Being strongly normalizing, constructive type theory cannot be computationally complete. Moreover we might like to reason about partial functions and general recursion using this powerful logic. Is it possible to somehow unify type theory with a constructive version of Dana Scott’s domain theory? In his PhD thesis Scott F. Smith [1989] investigated adding partial types to the type theory of NuPRL. The idea can be sketched briefly as follows. For each ordinary type T there is a partial type T of T -computations, whose elements include those of T and a divergent element, ⊥. For partial types [only] there is a fixpoint operator, f ix : (T → T ) → T . This allows the definition of general recursive functions. The constructive account of partial types is significantly different from the classical account given by domain theory. For example we cannot assert ∀x : T . x ² T ∨ x = ⊥ because constructively this implies an effective solution to the halting problem for T . A number of intriguing theorems emerge. Certain non-computability results can be established absolutely, that is independently of Church’s Thesis, see Constable & Smith [1988].18 18
The paper misleadingly claims that among these is the Halting Theorem, which would be remarkable. What is in fact proved is the extensional halting
Church’s Thesis and Functional Programming
539
Further, the logic of the host type theory is altered so that it is no longer compatible with classical logic—some instances of the law of the excluded middle, of the form ∀x.P (x) ∨ ¬P (x) can be disproved. To recapture domain theory requires something more than T and f ix, namely a second order fixpoint operator, F IX, that solves recursive equations in partial types. As far as the present author is aware, noone has yet shown how to do this within the logic of type theory. This would unify the two theories of functional programming. Among other benefits it would allow us to give within type theory a constructive account of the denotational semantics of recursive programming languages. Almost certainly relevant here is Paul Taylor’s Abstract Stone Duality [2002], a computational approach to topology. The simplest partial type is Sierpiński space, Σ, which has only one point other than ⊥. This plays a special role in Taylor’s theory: the open sets of a space X are the functions in X → Σ and can be written as λ-terms. ASD embraces both traditional spaces like the reals and Scott domains (topologically these are non-Hausdorff spaces).
Conclusion Church’s Thesis played a founding role in computing theory by providing a single notion of effective computability. Without this foundation we might have been stuck with a plethora of notions of computability depending on computer architecture, programming language etc.: we might have Motorola-computable versus Intelcomputable, Java-computable versus C-computable and so on. The λ-calculus, which Church developed during the period of convergence from which the Thesis emerged, has influenced almost every aspect of the development of programming and programming languages. It is the basis of functional programming, which after a long infancy is entering adulthood as a practical alternative to traditional ad-hoc imperative programming languages. Many important ideas in mainstream programming languages—recursion, procedures as parameters, linked lists and trees, garbage collectors—came by cross fertilization from functional programming. Moreover the main theorem, which is already provable in domain theory, trivially from monotonicity. The real Halting Theorem is intensional, in that the halting function whose existence is to be disproved is allowed access to the internal structure of the term, by being given its G¨ odel number.
540
David Turner
schools of both operational and denotational semantics are λ-calculus based and amount to using functional programming to explain other programming systems. The original project from whose wreckage by paradox λ-calculus survived, to unify logic with an account of computable functions, appears to have been reborn in unexpected form, via the propositionsas-types paradigm. Further exciting developments undoubtedly lie ahead and ideas from Church’s λ-calculus will continue to be central to them.
References Abramsky, S. and Jung, A. [1994], “Domain Theory”, in Handbook of Logic in Computer Science, vol. III, OUP. (S. Abramsky, D.M. Gabbay, and T. Maibaum eds.), Barendregt, H.P. [1984], The Lambda Calculus: Its Syntax and Semantics, North-Holland. Church, A. [1936], “An Unsolvable Problem of Elementary Number Theory”, American Journal of Mathematics 58, 345–363. Church, A. [1937], Review of A.M. Turing [1936], “On computable numbers...”, Journal of Symbolic Logic 2(1), 42–43, (March). Church, A. [1941], The Calculi of Lambda Conversion, Princeton University Press. Constable, R.L., et al. [1986], Implementing Mathematics with the Nuprl Proof Development System, Prentice Hall. Constable, R.L. and Smith, S.F. [1988], “Computational Foundations of Basic Recursive Function Theory”, Proceedings 3rd IEEE Symposium on Logic in Computer Science, pp. 360–371, (also Cornell Dept CS, TR 88–904), March; this and other papers of the NuPRL group can be found at . Coquand, T. and Huet, G. [1988], “The Calculus of Constructions”, Information and Computation 76, 95–120. Curry, H.B. and Feys, R. [1958], Combinatory Logic, vol. I, North-Holland, Amsterdam. Escardo, M.H. [1996], “Real PCF Extended with Existential is Universal”, Proceedings 3rd Workshop on Theory and Formal Methods, (A. Edalat, S. Jourdan, G. McCusker eds.), IC
Church’s Thesis and Functional Programming
541
Press, 13–24, April; this and other papers of Escardo can be found at . Girard, J.-Y. [1971], “Une extension de l’interpretation fonctionnelle de G¨ odel a l’analyse et son application a l’elimination des coupures dans l’analyse et la theorie des types”, Proceedings 2nd Scandinavian Logic Symposium, (J.F. Fenstad ed.), North-Holland 1971. pp. 63–92; a modern treatment of System F can be found in Proofs and Types, (J.-Y. Girard, Y. Lafont, and P. Taylor eds.), Cambridge University Press, 1989. G¨odel, K. [1965], “On Undecidable Propositions of Formal Mathematical Systems”, 1934 Lecture notes taken by Kleene and Rosser at the Institute for Advanced Study; reprinted in The Undecidable, (M. Davis ed.), Raven, New York 1965. G¨odel, K. [1958], “On a hitherto unutilized extension of the finitary standpoint”, Dialectica 12, 280–287. Goodman, N.D. and Myhill, J. [1978], “Choice Implies Excluded Middle”, Zeit. Logik und Grundlagen der Math 24, 461. Herbrand, J. [1932], “Sur la non-contradiction de l’arithmetique”, Journal fur die reine und angewandte Mathematik 166, 1–8. Hodges, A. [this collection], “Did Church and Turing have a Thesis about Machines?”. Hughes, J. [1984], “The Design and Implementation of Programming Languages”, D. Phil. Thesis, University of Oxford, 1983; Published by Oxford University Computing Laboratory Programming Research Group, as Technical Monograph PRG-40, September. Howard, W. [1969], “The Formulae as Types Notion of Construction”, privately circulated letter, published in To H.B. Curry, Essays on Combinatory Logic, Lambda Calculus and Formalism, (Seldin and Hindley eds.), Academic Press 1980. Johnsson, T. [1985], “Lambda Lifting: Transforming Programs to Recursive Equations”, Proceedings IFIP Conference on Functional Programming Languages and Computer Architecture, Nancy, France, Sept. 1985, Springer LNCS 201. Kleene, S.C. [1936], “Lambda-Definability and Recursiveness”, Duke Mathematical Journal 2, 340–353.
542
David Turner
Landin, P.J. [1966], “The Next 700 Programming Languages”, CACM 9(3), 157–165, March. McCarthy, J. [1960], “Recursive Functions of Symbolic Expressions and their Computation by Machine”, CACM 3(4), 184–195. Major, F., Turcotte, M., et al. [1991], “The Combination of Symbolic and Numerical Computation for Three-Dimensional Modelling of RNA”, SCIENCE 253, 1255–1260, September. Martin–L¨ of, P. [1975], “An Intuitionist Theory of Types—Predicative Part”, in Logic Colloquium 1973, (Rose and Shepherdson eds.), North Holland 1975. Martin–L¨ of, P. [1982], “Constructive Mathematics and Computer Programming”, Proceedings of the Sixth International Congress for Logic, Methodology and Philosophy of Science, (Cohen, Los, Pfeiffer, and Podewski eds.), North Holland, pp. 153–175; also in Mathematical Logic and Programming Languages, (Hoare and Shepherdson eds.), Prentice Hall 1985, pp. 167–184. Milner, R. [1978], “A Theory of Type Polymorphism in Programming”, Journal of Computer and System Sciences 17(3), 348–375. Myhill, J. [1975], “Constructive Set Theory”, Journal of Symbolic Logic 40(3), 347–382, September. Page, R.L. and Moe, B.D. [1993], “Experience with a Large Scientific Application in a Functional Language”, in Proceedings ACM Conference on Functional Programming Languages and Computer Architecture, Copenhangen, June 1993. Peyton Jones, S.L. [1992], “Implementing Lazy Functional Languages on Stock Hardware: the Spineless Tagless G-Machine”, Journal of Functional Programming 2(2), 127–202, April. Peyton Jones, S.L. [2003], Haskell 98 Language and Libraries: the Revised Report, Cambridge University Press; also published in Journal of Functional Programming, 13(1), January. This and other information about Haskell can be found at . Plotkin, G. [1977], “LCF Considered as a Programming Language”, Theoretical Computer Science 5(1), 233–255.
Church’s Thesis and Functional Programming
543
¨ Sch¨onfinkel, M. [1924], “Uber die Bausteine der mathematischen Logik”, translated as “On the Building Blocks of Mathematical Logic”, in Heijenoort, From Frege to G¨ odel—a Source Book in Mathematical Logic 1879–1931, Harvard 1967. Scott, D. [1976], “Data Types as Lattices”, SIAM Journal on Computing 5(3), 522–587. Smith, S.F. [1989], “Partial Objects in Type Theory”, Cornell University, Ph.D. Thesis. Strachey, C. [2000], “Fundamental Concepts in Programming Languages”, originally notes for an International Summer School on computer programming, Copenhagen, August 1967, published in Higher-Order and Symbolic Computation, vol. 13, Issue 1/2, April 2000—this entire issue is dedicated in memory of Strachey. Scott, D. and Strachey, C. [1971], “Toward a Mathematical Semantics for Computer Languages”, Oxford University Programming Research Group Technical Monograph PRG-6, April. Taylor, P. [2002], “Abstract Stone Duality”, privately circulated, 2002—this and published papers about ASD can be found at . Turing, A.M. [1937], “On Computable Numbers with an Application to the Entscheidungsproblem”, Proceedings London Mathematical Society, Series 2 42, 230–265; correction 43(1937), 544–546. Turner, D.A. [1976], “SASL Language Manual”, St. Andrews University, Department of Computational Science Technical Report, 43 pages, December. Turner, D.A. [1979a], “A New Implementation Technique for Applicative Languages”, Software-Practice and Experience 9(1), 31–49, January. Turner, D.A. [1979b], “Another Algorithm for Bracket Abstraction”, Journal of Symbolic Logic 44(2), 267–270, June. Turner, D.A. [1986], “An Overview of Miranda”, SIGPLAN Notices 21(12), 158–166, December; this and other information about Miranda† can be found at . Turner, D.A. [2004], “Total Functional Programming”, Journal of Universal Computer Science 10(7), 751–768, July.
544
David Turner
Wadsworth, C.P. [1971], “The Semantics and Pragmatics of the Lambda Calculus”, D. Phil. Thesis, Oxford University Programming Research Group. †
Miranda is a trademark of Research Software Limited.
Index Abelson, R.P., 78 Abramsky, S., 532 Abramson, D., 9, 290, 301 Abramson, F.G., 195 Achilles, 496 Ackermann, W., 132, 149, 395, 412 Adamyan, V.A., 502 Anderson, A.R., 391 Aristotle, 425 Arkoudas, K., 19–21, 66, 73, 75, 80, 84–86, 88, 90, 91, 93, 98, 110 Asch, C.J., 273 Ashcraft, M.H., 78 Atten, van M., 253 Augustsson, L., 530 Avigad, J., 456 Backus, J., 528 Baer, R.M., 180 Barendregt, H.P., 358, 525, 527 Barr, A., 109 Barwise, J., 73 Bauer, A., 59 Bealer, G., 17 Beals, R., 503 Bechtel, W., 155 Beethoven, L. van, 90 Bell, J., 429 Bello, P., 77 Benacerraf, P., 496 Bennett, C.H., 37, 178, 503 Bernays, P., 405, 412, 414, 456–459, 461, 465, 467, 468, 471, 472, 475, 482, 483 Bernstein, E., 38 Beth, E.W., 496
Bishop, E.A., 59 Black, R., 220, 225, 228–230, 232–234, 236, 253, 263, 305, 307, 424 Blass, A., 24, 25, 46–49, 51, 456 Block, N., 14, 16 Blum, L., 195 Blutner, R., 175 B¨ohm, C., 525 Bolzano, B., 126, 140 Boolos, G.S., 96, 97, 153, 402, 435, 441 B¨orger, E., 26 Bowie, G.L., 186, 220, 223 Bowles, M.D., 332 Boyer, C.B., 122 Boyle, F., 204, 205 Branicky, M.S., 184 Bridges, D.S., 58, 60, 61, 63, 496 Bridgman, P.W., 494 Brillouin, L., 492 Bringsjord, S., 14, 15, 19–21, 66, 75–81, 83, 84, 91–93, 97, 98, 100, 103, 108–111, 113, 192 Brouwer, L.E.J., 7, 58, 363 Burgin, M.S., 143, 210 Bush, V., 194, 195 Buss, S., 69 Byrnes, J., 89, 399, 456, 461, 477, 479 Calude, C.S., 158, 493, 495–497, 502 Cantor, G., 119, 120, 126–131, 519 Carlson, T., 260 Carnap, R., 321, 322, 324
546
Index
Carnielli, W.A., 275 Carter, N., 301 Case, J., 444 Casti, J.L., 494, 504 Cauchy, A.–L., 124–127 Chaitin, G.J., 249, 493, 495, 497, 503 Chalmers, D.J., 18, 20, 181, 182 Charniak, E., 81 Chuang, I.L., 503 Church, A., 7, 24, 26–28, 102, 110, 121, 132, 133, 144, 149–151, 154, 159–161, 175, 176, 178, 212, 223–225, 227, 242–247, 249, 269, 278, 304, 306, 310, 312, 317, 319, 324, 355, 383–388, 390, 391, 394, 395, 399, 400, 402, 403, 411–414, 420–423, 426, 441, 456–463, 465–475, 477, 479, 482–485, 490, 493, 518–522, 524, 527, 539 Churchland, P.S., 205 Clark, M., 128 Clark, P., 450 Cleland, C.E., 99, 102–108, 119, 134, 143, 144, 220, 226, 227 Cleve, R., 503 Constable, R.L., 538 Conway, J.H., 495 Cook, S.A., 38, 52 Cooper, S.B., 192 Copeland, B.J., 9–12, 14, 15, 66, 110–113, 119, 143, 147, 151, 154, 155, 158, 164, 166, 176, 179, 180, 182, 194, 195, 202, 203, 211, 242–245, 247–250, 252, 342, 347, 401, 413, 421, 441 Coquand, T., 534, 538 Costa, J.F., 335–342, 344, 345 Cotogno, P., 179 Counihan, M., 175 Crowell, R., 420, 427, 430 Curry, H.B., 524–527, 536, 537
Davis, M., 22, 27, 52, 175, 179, 187, 278, 399–401, 403, 412, 422, 423, 456, 458–460, 463, 464, 468–471, 473, 492, 505 Dawson, J.W., 52, 456, 457, 484 de La Mettrie, J.O., 155 de Wolf, R., 507 Dedekind, R., 126, 457, 461, 480, 481 Dekker, J.C.E., 298 Delchamps, D.F., 195 DeLong, H., 313 Dennett, D., 79 Depauli–Schimanovich, W., 504 Descartes, R., 122, 155, 504, 505 Deutsch, D., 37, 151, 178, 189, 492, 498, 501, 502 Devitt, M., 198 Dietrich, E., 109, 204 Dirichlet, P.G., 481 Doria, F.A., 196 Doyle, J., 158 Dreyfus, H.L., 201 Du Bois–Reymond, P., 295 Dummett, M.A.E., 60 Dvureˇcenskij, A., 495
da Costa, N.C.A., 196, 214
Farhi, E., 503
Earman, J., 496 Eccles, J.C., 504 Eco, U., 94 Eddington, A., 163, 164 Egan, G., 505 Einstein, A., 495 Eisert, M.W.J., 503 Ekert, A., 37, 503 Enderton, H., 457 Epstein, R.L., 269, 270, 275 Ernest, P., 68 Escardo, M.H., 533 Etesi, G., 178, 350 Euclid, 122, 353, 429 Euler, L., 438 Evans, C., 163
Index Feferman, S., 52, 380, 456 Feld, B.T., 492 Ferrucci, D., 77, 81, 83, 92 Fetzer, J., 109 Feynman, R.P., 249, 501 Feys, R., 526 Fields, C., 198 Fitz, H., 175, 179 Flagg, R.C., 258, 259, 292 Fleming, I., 94 Folina, J., 220, 253, 301, 305–307, 312, 315, 424 Fortnow, L., 503 Fouch´e, W.L., 196 Fox, R., 420, 427, 430 Frank, P., 504 Frege, G., 69, 235, 370, 371, 386, 387, 391, 439 Freud, S., 372 Friedman, H., 258 Galileo, 128, 129 Galindo, A., 492, 503 Galouye, D.F., 505 Gandy, R.O., 37, 137, 151, 152, 175–177, 220, 243, 244, 246, 247, 249, 269, 270, 274, 305, 310, 316, 398, 399, 401, 403, 413, 423, 424, 428, 440, 441, 450, 457, 471, 494 Gauss, C.F., 364 Genzten, G., 7 Giaquinto, M., 224, 237, 238 Girard, J.–Y., 364, 534 Glymour, C., 190 G¨odel, K., 7, 20, 24, 26–31, 67, 121, 132, 141, 169, 178, 225, 245, 246, 248, 270, 271, 274, 278, 290, 304, 319, 359, 367, 371–373, 384, 393, 394, 399–415, 422, 423, 426, 450, 456–466, 468, 470, 472–475, 479, 484, 490, 493, 494, 504, 519, 535 Goel, V., 185 Gold, M.E., 143, 191, 494
547
Goldfarb, W., 403 Goldstein, H., 495 Goodman, N.D., 222–224, 228, 229, 258, 262, 537 Gordon, M.J.C., 70 Gorham, G., 239 Gr¨adel, E., 49, 52 Greenberger, D.M., 504 Grigoriev, D., 36 Gr¨ unbaum, A., 496 Gruska, J., 503 Grzegorczyk, R., 102 Gurevich, Y., 24–26, 31, 34, 38–41, 43, 45–49, 51, 387 Halbach, V., 259 Hale, J., 385 Hallett, M., 235 Hamkins, J.D., 180 Hammer, E.M., 73 Hansson, S.O., 177 Harnad, S., 109, 204 Haugeland, J., 11, 109 Helprin, M., 92 Henry, G.C., 151 Henson, C.W., 72 Herbrand, J., 7, 355, 394, 400, 403, 464, 466, 468, 519 Heylen, J., 253 Heyting, A., 7, 58, 298, 363 Higman, G., 361 Hilbert, D., 7, 69, 120, 131, 132, 149, 243, 275, 394, 395, 406, 412, 439, 471, 472, 481, 482 Hilton, P., 166 Hinck, A.P., 182 Hippasus, 122 Hobbes, T., 155 Hodges, A., 150, 155, 157, 159–162, 164–166, 242, 249, 278, 394, 411, 518 Hofstadter, D., 77, 85, 109 Hogarth, M.L., 158, 178, 496 Holst, P.A., 332 Horgan, T., 192
548
Index
Horsten, L., 253, 259, 260, 265, 266, Kreisel, G., 22, 99, 188, 196, 249, 260, 261, 278, 290, 293, 294, 292 312, 314, 353–365, 367–380, 385, Howard, W., 537 387, 430, 443, 480, 492, 496, 504 Huet, G., 534, 538 Kripke, S.A., 16–19, 21, 289, 367 Hughes, C., 17 Kronecker, L., 389 Hughes, J., 530 Kugel, P., 77, 78, 81, 89–91 Kuratowski, K., 307 Jackson, J.D., 495 Kushner, B.A., 59 Jeffrey, R.C., 96, 97, 153, 402 Johnson–Laird, P., 109, 111 Lakatos, I., 68, 235, 426, 435–441, Johnsson, T., 530 443, 445, 450 Jozsa, R., 503 Landauer, R., 37, 178, 492 Jung, A., 532 Landin, P.J., 528–530 Jung, C.G., 504 Leff, H.S., 492 Leibniz, G., 123, 412 Kalm´ar, L., 98–101, 311, 324, 364, Levin, L.A., 34, 37, 52 425, 447–450 Levy, A„ 75 Karatsuba, A., 34 Lewis, A., 180 Karlquist, A., 494 Lewis, C.I., 471 Karp, R.M., 48 Lewis, D., 276 Kelly, K., 190 Lewis, H.R., 67, 112, 402 Kelvin, Lord (Sir William Lindemann, F., 127 Thomson), 332 Lloyd, S., 178 Kieu, T.D., 158, 498, 503 López-Escobar, E.G.K., 496 Kijne, D., 32 Lucas, J.R., 192, 276, 277 Kitcher, P., 68, 123 Lupaccini, R., 37 Kleene, S.C., 27–31, 99, 133, 150, Lycan, W.G., 16, 17 178, 269–275, 304, 311, 312, 316–318, 356, 358, 365, 368, Maass, W., 197 383, 385, 388–391, 394, 395, Maddy, P., 74, 235, 427 398, 399, 402, 407, 410, Major, F., 531 421–423, 441, 457–459, 461, Markov, A.A., 58 463–465, 467, 468, 471, 472, 519 Marr, D., 211, 212 Kleiner, I., 68 Martin–Delgado, M.A., 492, 503 Knight, J.F., 273 Martin–L¨of, P., 534, 537, 538 Knuth, D.E., 34, 36 Materna, P., 387, 391 Ko¸cak, H., 385 Matyasevic, Z., 372 Koiran, P., 342 McCarthy, J., 402, 528 Kolmogorov, A.N., 25, 32–34, 42, McCarthy, T., 180 270, 304, 305 McCarty, C.D., 225, 236, 239, 281, Komar, A., 158 290, 294, 312 K¨orner, S., 131, 140 McCulloch, W.S., 402 McDermott, D., 81 Krajewski, S., 242, 269, 276, 277, 317 McMenamin, M., 93
Index Meehan, J., 79 Melham, T.F., 70 Mendelson, E., 66–75, 95, 98, 100, 102, 152, 220–224, 228–232, 234, 236, 253, 254, 263, 304, 305, 314, 315, 317, 321, 324, 423, 424, 428–430, 435, 442, 444–446, 448, 480 Mermin, N.D., 503 Metakides, G., 59 Miao, X., 503 Michelangelo, 91 Milner, R., 530 Minsky, M.L., 402 Moe, B.D., 531 Montague, R., 258, 281 Moore, C., 184, 195, 335, 342, 346, 494 Moore, E.F., 495 Moschovakis, Y.N., 39, 75, 100 Mundici, D., 177 Murawski, R., 310, 312 Murnaghan, F.D., 502 Mycka, J., 331, 335–342, 344–346 Myhill, J., 197, 260, 537 Nelson, R.J., 97, 99, 100, 428 N´emeti, I., 178 Neufeld, N., 495 Neumann, J. von, 243, 355, 465 Newell, A., 109, 151 Newman, M.H.A., 163, 167, 168, 243, 246, 394 Newton, I., 123 Nielsen, M.A., 503 Norton, J.D., 496 Norvig, P., 111 Nowack, A., 49 Oberschelp, A., 308 Odifreddi, P.G., 151, 188, 189, 192, 275, 331, 343, 348, 353, 387, 480, 492, 493 Ofman, Y., 34 Olszewski, A., 383
549
Ord, T., 179, 498 Orponen, P., 197 Orus, R., 503 Ozhigov, Y., 503 Page, R.L., 531 Painlev´e, P., 349 Pambuccian, V., 52 Papadimitriou, C.H., 67, 112, 402 Pauli, W., 457 Pavlov, B., 158, 502 Penrose, R., 164, 192, 249, 276, 277, 496 Pepis, J., 473, 474, 482, 483 P´eter, R., 425, 441, 448–450 Peters, A.K., 109 Peyton Jones, S.L., 530, 531 Pitowsky, I., 158, 188, 196, 350, 394, 409, 410, 413, 492, 495, 496 Pitts, W., 402 Plotkin, G., 533 Poincar´e, H., 349 Pollock, J., 78, 97 Popper, K.R., 494, 504 Porte, J., 425, 442 Post, E.L., 7, 67, 88, 89, 160, 178, 244, 245, 275, 278, 313, 356, 394, 402, 441, 471, 477, 479, 484, 490 Pour–El, M.B., 37, 196, 202, 301, 333, 334, 340, 378, 379, 496 Preskill, J., 503 Priest, G., 434 Proudfoot, D., 155, 158, 242, 243, 250 Putnam, H., 143, 182 Pythagoras, 121 Quine, W.V.O., 264, 325, 428, 433 Rabin, M.O., 187 Rado, T., 503 Ramachandran, V., 48 Ramanujan, S.A., 237 Reisig, W., 46 Rex, A.F., 492
550
Index
Richards, I., 37, 196, 301, 378, 379, 496 Richardson, R.C., 155 Richman, F., 60, 61, 63 Riemann, K., 126, 140 Robinson, R.M., 450 Rodman, R., 239 Rogers, H.Jr., 20, 272, 311, 316–318, 492, 493 Rogozhin, Y., 341 Rosen, R., 177, 492 Rosenzweig, D., 52 Rosser, J.B., 226, 304, 457–459, 461–463, 465, 467, 468 Rubel, L.A., 334, 335, 345 Rucker, R., 496 Russell, B., 69, 98, 131, 132 Russell, S., 111 Ryle, G., 323 Sarfatti, J., 504 Scarpellini, B., 158 Schaller, M., 495 Schank, R., 78, 79 Schonbein, W., 197 Sch¨onfinkel, M., 525, 526 Sch¨onhage, A., 35, 36 Schulz, K.-D., 310, 321 Schuster, H.G., 496 Schwichtenberg, H., 69 Scott, D., 532, 533 Searle, J.R., 10–15, 109–113, 205 Sejnowski, T., 205 Shagrir, O., 137, 350, 393, 394, 401, 409, 410, 413 Shakespeare, W., 91 Shanker, S.G., 207 Shannon, C.E., 195, 332, 402 Shapiro, S., 179, 180, 220, 253, 256, 258, 261, 262, 264, 288, 295, 312, 315, 325, 327, 420, 421, 423, 429 Shaw, R., 496 Shelah, S., 34, 48 Shields, C., 122
Shin, S.-J., 73 Shoenfield, J.R., 42, 232, 469 Shor, P.W., 38 Sieg, W., 27, 37, 89, 131, 176, 177, 242, 243, 245–247, 270, 278, 305, 393, 398, 399, 401, 403, 409–414, 424, 440, 441, 450, 451, 456, 457, 461, 477, 479 Siegelmann, H.T., 104, 105, 143, 158, 195, 197, 495 Simon, H., 109 Sipper, M., 179 Skolem, T., 7 Smith, S.F., 538 Smoryński, C.A., 295 Smullyan, R., 318 Soare, R.I., 175, 261, 278, 348, 401, 480 Sontag, E.D., 104, 158, 195, 197 Specker, E., 496 Stadelhofer, R., 503 Stannett, M., 158, 179, 195 St¨ark, R., 26 Steinhart, E., 105, 143 Stewart, I., 196 Stillings, N., 78 Strachey, C., 529, 533 Suppes, P., 385 Svozil, K., 491, 494–496, 498, 501, 503–506 Sylvan, R., 9, 158, 211 Szilard, G.W., 492 Tait, W.W., 456 Tamburrini, G., 456 Tarjan, R.E., 37 Tarski, A., 7, 26, 304, 401, 423, 473, 494 Taylor, P., 539 Taylor, R., 89 Tennant, N., 447 Teuscher, C., 179 Thomas, W.J., 98, 188 Thomson, J.F., 496 Thomson, W. (Lord Kelvin), 332
Index Tich´ y, P., 387 Tienson, J., 192 Tiles, M., 126 Tillman, C., 301 Tolstoy, L., 90 Trabasso, T., 78 Traub, J.F., 494 Trautteur, G., 197, 199 Troelstra, A.S., 69, 253, 257, 365, 368 Turing, A.M., 7, 13, 20, 24, 26, 28–30, 32, 39, 41, 67, 88, 89, 102, 110, 119, 121, 132–134, 137–139, 141, 142, 147–154, 156–169, 176, 178, 201, 207, 212, 225, 242–250, 252, 261, 269, 274, 275, 278, 281, 290, 304, 305, 331, 348, 358, 359, 393–398, 400–415, 421, 423–426, 440, 441, 443, 449, 451, 457, 461, 475–481, 484, 493, 518, 519 Turner, D.A., 518, 527, 529, 530, 535 Uspensky, V.A., 25, 33, 34, 52, 270, 304, 305 Vazirani, U., 38 Vergis, A., 197
551
Wadsworth, C.P., 529 Wagon, S., 495 Waismann, F., 430–434, 437, 438, 445, 449, 451 Wang, H., 188, 189, 200, 202, 277, 402, 406, 407, 413, 496 Webb, J.C., 271, 277, 312, 317, 407, 408, 411 Weierstrass, K.Th.W., 126, 127, 140 Weihrauch, K., 59 Weyl, H., 249, 496 Whitehead, A.N., 132 Wiener, N., 307 Wigner, E.P., 249 Wiles, A., 89 Wittgenstein, L., 93, 94, 148, 207, 262, 286, 287, 289, 432–434 Woleński, J., 310, 325 Wright, R., 434, 495 Wyer, R.S., 78 Xia, Z., 349 Yablo, S., 18 Yao, A.C.-C., 247 Zach, R., 405, 412 Zenzen, M., 14, 15, 20, 80, 92, 103, 111, 192