271 59 8MB
English Pages 376 [186] Year 1994
Logic, Language, and the Structure of Scientific Theories Proceedings of the Camap-Reichenbach Centennial, University of Konstanz, 21-24 May 1991
ED IT E D
BY
Wesley Salmon and Gereon Wolters
Bogazici University Library
111111111111111111111111111111111111111 39001100362485
~
University of Pittsburgh Press/Universitatsveriag Konstanz
Published in the U.S.A. by the University of Pittsburgh Press, Pittsburgh, Pa. 15260 Published in Germany by Universitatsverlag Konstanz GMBH
1'9~
_,dIf, :,.:;:;;::~,~; ~':~" ",,' -,_, ~~~?,. . . . ~ oj,;)"" "~~ ~~;';~:~··'~-,;\4"'i~i.'t" t. h:, '" ~ .'~"r' '-:_ .... ~\.~._,
Manufacture,'
P
1':f~
.[64
Copyright © 1994, University of Pittsburgh Press All rights reserved
cP/1
'r'
To Rudolf Carnap, Carl G. Hempel, and Hans Reichenbach
t~ited States of Americ~~ :\~
Printed on a, -free pat2ro
-00- 1900
1/
Three great philosophers who together form the nucleus of twentieth-century scientific philosophy proceedings of the Carnap-Reichenbach centennial, University of Konstanz, 21-24 May 19911 edited by Wesley Salmon and Gereon Wolters. p. cm. - (Pittsburgh-Konstanz series in the philosophy and history of science) Includes bibliographical references. ISBN 0-8229-3740-9 1. Science-Philosophy-Congresses. 2. Science-HistoryCongresses. 3. Carnap, Rudolf, 1891-1970-Philosophy-Congresses. 4. Reichenbach, Hans, 1891-1953-Philosophy-Congresses. I. Salmon, Wesley C. II. Wolters, Gereon. III. Series. Q174.L64 501-dc20
1993 93-47985 CIP
A CIP record for this book is available from the British Library. Die Deutsche Bibliothek - CIP-Einheitsaufnahme Logic, language, and the structure of scientific theories: proceedings of the Carnap-Reichenbach Centennial, University of Konstanz, 21-24 may 1991 1 ed by Wesley Salmon and Gereon Wolters. - Konstanz : Univ.-Verl. Konstanz ; Pittsburgh, Pa. : Univ. of Pittsburgh Press, 1994 (Pittsburgh-Konstanz series in the philosophy and history of science; 2) ISBN ,.3-117940-4 77-\ NE: Salmon, Wl'sll'y C. IHrsg.l; Canup Rl'il-hl'nhach Cl'ntl'nnial 0, is found within time t, then at is acceptable over a2 iff u(T) > (1-p)u(T) + p(u(T) + b) - I(t) or
pb < I(t), that is, the loss l(t) due to extra time t is larger than the expected excess utility. This analysis is applicable to the cases where a theory is accepted for practical purposes, for example, for prediction or action. But the special case of IS, (ES 2 ) Of theories equivalent relative to the observed facts, accept for practical purposes the one that is the simplest is clearly problematic, as Shrader-Frechette (1990) has illustrated with examples of applied science in public policy. Preference of ontological simplicity, she points out, may lead to "dangerous conse-
156
I1kka Niiniluoto
quences" (ibid., 12). Thus, a rule such as ES 2 is unsatisfactory since it does not take into account the losses and risks of alternative choices (as the Bayesian decision theory does), and it does not allow for the possibility of withholding the decision and searching for better solutions or more data. On the other hand, simplicity as manageability or applicability has a role in applied science. If a theory is defined by equations which cannot be solved for the relevant case, it is a normal practice to introduce some "simplifications" or "approximation" in the theory, even when we know that this will lead us away from truth. If we have to choose between a calculated prediction or no prediction at all, simplicity may be favored even at the expense of accuracy and truth (Niiniluoto 1984, 262, and Sintonen 1984). Reichenbach's own interpretation of IS, unlike the economic formulations ESt and ES2 , applies to the context of cognitive justification: (IS') Of theories equivalent relative to the observed facts, accept the simplest one as the best candidate for truth. In Reichenbach's view, the simplest theory is the most probable candidate for truth, and it will also "furnish the best predictions" (1938, 276). He admits that in some cases the simplicity ordering indicates several theories with the same weight. But Rule IS' is pragmatically justified in the sense that its continued application, with corrections due to new evidence, "must lead to success, if success is possible at all" (Reichenbach 1938, 377; for criticism of this view, see Ackermann 1961). The rest of this essay is restricted to the evaluation of the Principle of Inductive Simplicity, IS', and its modifications. Theory Choice: Information, Systematic Power, and Relative Simplicity The general problem of theory choice is the task of identifying the solution to a cognitive problem which best satisfies the standards of a good scientific theory. Such quantitative criteria we call epistemic utilities. They may include truth, information, content, explanatory and predictive power, simplicity, beauty, and so on. (See Hempel 1965; Levi 1967a,b; Hintikka and Suppes 1970; Niiniluoto
Descriptive and Inductive Simplicity
157
1987, 1990. Note that the existence of several different epistemic utilities already suffices to disprove Schlesinger's 1974 claim that maximizing simplicity is "the only usable methodological principle"-he compares it only to other rules formulated in terms of admissibility and simplicity alone.) Let e be the total available evidence. Let u(h, t) be the epistemic utility of accepting h as true on evidence e, if h is true, and u(h, f) the corresponding utility, if h is false. Let P(hle) be the epistemic probability-degree of belief or Reichenbachian "weight"-of h being true given evidence e. Then the expected utility of h given e is
Ulhle) = Plhle)ulh,t)
+ 11- Plhle))ulh, f)
(5.1)
The Bayes Rule for optimization then tells us that the cognitively best hypothesis h is the one that maximizes the expected utility: (EU) Of rival hypotheses, accept on evidence e as cognitively best the one h which maximizes U(hle). Well-known special cases of EU include the following (see, e.g., Levi 1967a,b and Niiniluoto 1987, 1990; note that Levi 1967a does not accept the measure cont(h) = 1 - P(h) for information). If our aim is truth, and nothing but the truth, then theory choice is determined by posterior probability:
If ulh, t) = 1 and ulh, f) = 0, then Ulhle) = Plhle)
(5.2)
If our aim is information, measured by cont(h) = 1 - P(h), independently of truth-value, then cognitive preference covaries with boldness, improbability, or logical strength: If ulh, t) = ulh, f) = contlh), then Ulhle) = contlh)
(5.3)
Both (5.2) and (5.3) lead to unsatisfactory special cases of EU. But if information content is used as a truth-dependent utility, then theory choice depends upon a combination of posterior probability and content:
If ulh, t) = contlh) and ulh, f) = - conti-h), then Ulhle) = "Ihle) - Plh) = Plhle) + contlh) - 1
(5.4)
158
Descriptive and Inductive Simplicity
I1kka Niiniluoto
Note that here U(h/e) equals what Carnap in his later work called "the increase of firmness" (1962, xvi) of h bye. More generally, the same result is obtained by measuring utility with a weighted average of truth-value tv(h) (1 for true, 0 for false) and content, since then U(h/e) = aP(h/e)
+
(1- a)cont(h),
(5.5)
where the constant a (0 < a < 1) indicates the relative value of truth and information. Similar results are obtained if information content cont(h) is replaced by the systematic power of h relative to e (for measures of systematic power, see Hempel 1965, Hintikka and Suppes 1970, Niiniluoto and Tuomela 1973, and Niiniluoto 1990). The crudest measure of this sort is the number of empirical statements (conjuncts of e) entailed by h, that is, the cardinality of the known empirical content of h: syst(h, e) =
ICn(h V ell
(5.6)
This is also the number of empirical problems solved by h in Laudan's (1977) sense. A refinement of (5.6) was given by Hempel and Oppenheim in 1948 (reprinted in Hempel 1965). How could we integrate the concept of simplicity into this framework of theory choice? One possibility is to treat the simplicity S(h) of a hypothesis h, or its complexity K(h), as an additional factor which is independent of other desiderata of theory formation. E. Kaila (1935) defined the relative simplicity of a theory h as the ratio between the multitude of empirical data derivable from h and the number of logically independent basic assumptions of h. (See Kaila 1935, 1939, and Lindfors 1978. Kaila 1929 distinguished between "descriptive" and "explanatory" simplicity, corresponding to Reichenbach's distinction between descriptive and inductive simplicity.) Thus, the relative simplicity RS(h, e) of h given e is defined by RS(h, e)
= syst(h, e)/K(h)
(5.7)
159
This measure, Kaila observed, is proportional to the "explanatory power" of a theory. If it could be exactly defined and measured, relative simplicity RS(h, e) would also define the "inductive probability" of h on e. An interesting feature of Kaila's concept of relative simplicity is that it immediately justifies DS 1 and DS2 • If theories hand h' are equivalent or empirically equivalent, then syst(h, e) = syst(h', e), and formula (5.7) recommends us to prefer h to h' iff K(h) < K(h') or S(h)
>
S(h').
Kaila's proposal is also related to Laudan's definition of scientific progress in terms of the difference (instead of ratio) between the empirical problem-solving capacity of a theory-see syst(h, e)-minus the number of "conceptual problems" generated by h-this can be viewed as a variant of K(h). The concept of relative simplicity has further an interesting connection to measures of beauty in "algorithmic aesthetics." Stiny and Gips (1978) propose that the aesthetic value of a text x (sequence, figure, picture, composition, and so on) is defined by L(x)/K(x), where L(x) is the length of x and K(x) is the Kolmogorov complexity of x, which is directly analogous to (5.7). However, this is not a good definition of beauty, since the aesthetic value should start to decrease if the regularity of x-or lIK(x)-becomes too large. (The same observation holds of such information-theoretic measures of aesthetic value as redundacy R per entropy H, i.e., R/H = lIH - lIHmax [Gunzenhauser 1975].) On the other hand, Kaila does not have arguments to show that RS(h, e) would behave exactly like probability. It is nevertheless plausible to assume that posterior probability P(h/e), or increase of probability P(h/e) - P(h), is proportional to syst(h, e): A hypothesis is confirmed by the empirical data it explains. (Most measures of syst(h, e) are positive iff e is positively relevant to h [see Niiniluoto and Tuomela 1973]. For Solomonoff's not very attractive attempt to define probability directly in terms of Kolmogorov complexity, see fine 1973, 149-52.) If a connection of this kind is established, relative simplicity has similar intuitive motivation as E. Sober's (1975) account, where theory choice is determined by two "irreducibly distinct goals" (p. 168), namely, support and simplicity. As Sober's explication of simplicity is in terms of informativeness, his formula
160
Ilkka Niiniluoto
Descriptive and Inductive Simplicity
support and simplicity is related to the formula (5.5), that is, posterior probability and content. (Sober does not measure support by
posterior probability, or simplicity by content, however. In my view, Sober's definition is not an explicate of simplicity,· but rather of the degree of completeness of an answer to a question.) If simplicity is taken to be a new independent criterion in theory choice, we may follow Levi's (1967a) suggestion and introduce it as an additional factor in the formula U(h/e) of expected utility, for example, (5.5) might be extended to aP(h/e)
+
(1 - a)cont(h)
+
PS(h)
If the maximum value U(h/e) is reached by several hypotheses h, choose among the one which maximizes S(h), (5.9) where U(h/e) is independent of simplicity considerations. Rule (5.9) has the advantage of (5.8) that it does not presuppose the possibility of measuring simplicity with the same dimension as probability and content. Probability and Inductive Simplicity The second strategy in introducing simplicity within theory choice is to argue that his notion is already built into rational probabilitiesand, therefore, needs no separate treatment. According to Bayes's Theorem, that is,
=
P(h)P(e/h)/P(e),
posterior probability depends upon evidence e only via the likelihood P(e/h). In addition to this empirical factor, which essentially tells how h has passed the test of experience, P(h/e) depends on the prior probability P(h). Therefore, it is natural to think-with Salmon (1966)that methodologically relevant plausibility considerations should be built into the prior P(h). Among such considerations, we may include the simplicity of h. Assume that hand h' are powerful consistent theories which both entail all the evidence e. Then P(e/h) = 1 and P(e/h') = 1. Hence, by (5.10),
(5.8)
The simplicity factor might take into account here the economy of research (see the previous section). However, is S(h) is not truthindicative in any way, then the application of EU with (5.8) to find the "best" theory need not pick out the strongest candidate for truth. (See also Sintonen 1984.) Alternatively, we may return to Reichenbach's strategy of descriptive simplicity and treat S(h) as a secondary criterion for breaking "ties":
P(h/e)
161
(5.10)
P(h/e)
>
P(h'/e) iff P(h)
>
P(h/e) - P(h)
>
P(h'/e) - P(h') iff P(h)
(5.11)
P(h')
Similarly,
>
P(h')
(5.12)
Thus, if our ordering of theories is by posterior probability-see (5.2)-or by increase of probability-see (5.4) and (5.5)-then conditions h r e and h' r e guarantee that our ranking of hypotheses h and h' depends only upon their prior probabilities. It is now tempting to assume-as Jeffreys and Wrinch first suggested in 1921-that greater simplicity is always associated with higher prior probability: (SP) Among rival theories, h is simpler than h' iff P(h) > P(h'). (See Jeffreys 1973, Hesse 1974, and Quine 1966.) This assumption with (5.11) guarantees that, in the context of hypothetico-deductive theorizing, greater simplicity entails larger posterior probability: Assuming SP, h r e, and h' r e, it holds that P(h/e» P(h' Ie) iff h is simpler than h'
(5.13)
This is one possible formulation of Reichenbach's thesis of Inductive Simplicity. In spite of contrary suggestions that simplicity should always be associated with prior improbability, the assumption SP seems to hold in some special cases. As a general principle, however, it expresses the
162
I1kka Niiniluoto
old dictum: simplex sigillum veri. This is a highly dubious metaphysical doctrine: Why on earth should nature be simple? Why should there be any a priori reasons to regard simpler theories as more probable than their rivals? The problem of inductive simplicity appears in new light if we replace the strong HD-condition h I- e with 0 < p(e/h) < 1. When our theories make evidence e only probable to some degree, results (5.11)and (5.12) are no longer valid. Instead of the questionable Sp, we may then study whether a posteriori preference of simple theories could be a consequence of likelihoods. In other words, simple theories need not be assumed to be probable a priori since they turn out to be probable a posteriori. I will investigate this idea in five different types of cases. (I use standard notation and concepts of inductive logic developed by Carnap 1952 and Hintikka 1966. See also Niiniluoto 1977, 1987; and Niiniluoto and Tuomela 1973.) (i) State descriptions. Let Z be the set of state descriptions of a monadic first-order language Lk N with k primitive one-place predicates and N individual constants a ... •• , aN. If our cognitive problem is to find the (unknown) true element s. of Z, then each s in Z represents a complete answer. Hence, in Sober's (1975) sense, all state descriptions in Z are equally simple since no one of them requires any extra information to yield an answer. But already in 1958, Kiesow proposed that the simplicity of a state description s depends on the length of the shortest sentence logically equivalent to s-which is essentially the same idea as in the Kolmogorov definition of complexity. This definition implies that simple state descriptions distribute the N individuals nonuniformly (with frequencies k .. ... , k K ) over the K (=2k) Q-predicates Q., ... , QK of LkN. Walk (1966) proposes that the complexity of a state description s is measured by log N s , where Ns = n!lnk;! is the number of state descriptions isomorphic to s. Here Ns is also the number of state descriptions belonging to the same structure descriptions as s; the more uneven the distribution of k .. ... , k K , the smaller is N s • Walk further suggests that prior probabilities should be defined for state descriptions in Z in the order of their simplicity. This is satisfied, in a simple way, by Carnap's one-time favorite measure c·, which distributes probability first evenly to structure descriptions and then, within each structure description, evenly for state descriptions. As Hesse (1974) observes, here simplicity goes together with duster-
Descriptive and Inductive Simplicity
163
ing in the spirit of principle SP. This assumption about prior probabilities justifies reasonable predictive inferences: Given a sample e of n individuals, it is most probable that the next individual is found in that cell which is most frequently occupied in e. (ii) Structure descriptions. Let S(f.. ..• , fK) be a structure description in language L~ which says that the individuals are distributed into the K Q-predicates with relative frequencies flo • •• , fK. Walk (1966) suggests that the complexity of S(f.. .•• , fK) is defined by the entropy H of the distribution flo •.. , fK' that is, (5.14) Then choosing A = K in Carnap's A-continuum, so that P = c·, all structure descriptions have the same" prior probability. By choosing A < K, simpler structure descriptions (with smaller H and thus with more uniform populations) will have larger prior probabilities-in agreement with SP. The strongest prior belief in simplicity is obtained with A = 0, which corresponds to Reichenbach's Straight Rule. But when A > K, simple structure descriptions will have lower priors than more complex ones. In the limit A~ 00, that is, for ct, all state descriptions become equally probable, and prior probability is concentrated on the description with maximum entropy H max (and , hence , minimum redundancy R = 1 - H/Hmax). This measure ct , which ignores all simplicity considerations, is unsatisfactory since it does not allow for learning from evidence. When K < A < 00, the Carnapian prior probabilities are ordered by their degree of complexity, not by simplicity as SP would require. Nevertheless, as the A-continuum satisfies the so-called Reichenbach's axiom, that is, the probability P(Q;(an + .)/e) approaches the observed relative frequency nln of cell Q; when n ~ 00, we know that eventually evidence will confirm structure descriptions which are among the simplest compatible with e. However, we face one serious limitation with this result if we are dealing with an indefinitely large universe (N ~ 00). Suppose, for example, that all observed individuals belong to cells Q. and Q2 with frequencies f and 1 - f. Then the simplest structure description compatible with this evidence e is S(f, 1 - f, 0, ... , 0). But in Carnap's
. I
164
I1kka Niiniluoto
Descriptive and Inductive Simplicity
A-continuum, for an infinite domain, this structure description has prior probability zero-and, hence, posterior probability zero on any evidence. This means that Carnap's A-continuum (and Reichenbach's rule of enumerative induction as its special case) incorporates a Principle of Inductive Simplicity only in a weak sense: Some inferences to the simplest unrefuted hypothesis are ruled out a priori. (iii)Constituents. In Hintikka's (1966) system of inductive logic for language L~(N ....... 00), universal generalizations may receive nonzero probabilities. A constituent C; of L~ is a sentence which says that certain cells CT; are exemplified and others are empty:
Cj =
A
Qi ECT;
(3x)Q;(x) & (x) [ V Q;(x)]
The likelihood P(e/C;), when
[FlA)! FIn + A)]
Qi ECT;
I CT; I
n;=
1
(5.15)
= w, is defined by
[FIn; + Alw)Ir(Alw)],
(5.16)
where n I, ••• , nk are the observed frequencies of the Q-predicates Q ..... , Qk in sample e and r is the gamma function (note that F(n + 1) = n!). In Carnap's A-continuum, the whole prior probability one is given to the "atomistic" constituent C K which says that all K Q-predicates are exemplified. This constituent C K is obviously the most complex in the ontological sense. In Hintikka's a-A-continuum, all constituents are equally probable if a = o. When 0 < a < 00, we have
P(C j ) < PIC;) iff I CTj
I < I CT; I
(5.17)
In other words, the prior probabilities of constituents are ordered by their ontological complexity: A simpler constituent, which postulates fewer kinds of individuals in the universe, is less probable. This is intuitively acceptable since a simpler constituent in this sense entails stronger universal laws. When the size n of evidence e grows larger than parameter a, but number c of different kinds of individuals observed in e remains smaller than K, the ordering (5.16) will be changed. When n is large enough, the most probable constituent a posteriori will be C, which
165
claims that the universe is similar to the sample. In the limit n ....... the probability P( Cc/e) approaches one. 5 Hence,
00,
In Hintikka's a-A-continuum, when a < 00, and evidence e is large enough, the simplest constituent compatible with e is also the most probable given e. (5.18) As P(Cc/e) is high, while P(CC) is low, it follows from (5.18) that CC maximizes also the difference P(h/e) - P(h). (iv) Nomic constituents. The result (5.18) concerning inductive simplicity can be generalized to the language L~ (0), which contains the operator 0 of physical or nomic necessity. When Hintikka-type probabilities are applied to "nomic constituents," we will see that evidence will eventually favor the simplest nomic statements or laws compatible with observations (see Niiniluoto 1987). (v) Curve fitting. The paradigm case of most discussions of inductive simplicity has been the problem of curve fitting. Suppose we are interested in the lawful connection between two quantities x and y. Our hypotheses are thus of the form y = f(x), where f is a function. Let our evidence E = {(x;, y;) I i = 1, ... , n} consist of a finite set of pairs of measurements of the values of x and y. Then the problem of curve fitting is often presented in the following form: (CF) Given the points E, find the simp/est curve that passes through
them. This rule is "not to be regarded as a matter of convenience" (Reichenbach 1938, 375), but it "depends on our inductive assumption: we believe that the simplest curve gives the best predictions" (ibid.). Most philosophers agree that the simplicity of a curve depends on the degree of its equation: A linear function y = ax + b has degree 2, since it is determined by two points, a quadratic function y = ax 2 + bx + c has degree 3, and so on. Some explain their preference for simple curves by prior probability (Jeffreys), some by falsifiability (Popper) or testability (Friedman). Reichenbach said that the right curve is reached in the limit when we apply rule CF and make corrections with new observations if the function f exists. Kaila (1939) pointed out that a simple curve of degree m through tIll' points of /