212 11 14MB
English Pages 176 Year 1973
JANUA
LINGUARUM
STUDIA M E M O R I A E N I C O L A I VAN WIJK D E D I C A T A Series Maior,
69
LEXICOSTATISTICS IN GENETIC LINGUISTICS Proceedings of the Yale Conference Yale University, April 3-4, 1971
Edited by ISIDORE DYEN
1973
MOUTON THE HAGUE • PARIS
© Copyright 1973 in The Netherlands. Mouton & Co. N.V., Publishers, The Hague. No part of this book may be translated or reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publishers.
LIBRARY OF CONGRESS CATALOG CARD NUMBER: 72-94460
Printed in Belgium by NICI, Ghent.
FOREWORD
It was a great pleasure to me to have had the honor of opening the sessions called together as the First Conference on Genetic Lexicostatistics. The Conference took place in the Hall of Graduate Studies, Yale University, on April 3-4, 1971 as a result of consultations conducted mainly by mail among the three who finally issued the call as the Organizing Committee: Dr. Joseph Kruskal, Professor David Sankoff, and myself. The sessions were well-attended, the discussions were lively and productive, and above all the participants received the encouragement and intellectual stimulation that is necessary to growth and development of this wide-ranging new approach in linguistic research. It was fitting that the First Conference was at Yale, for it was here that Morris Swadesh, the founder of genetic lexicostatistics, took his degree under the great Edward Sapir. It was an attempt to subgroup the Salishan languages that led Swadesh to a study of cognates in the basic vocabulary and this to a statistical approach to subgrouping. From this followed almost logically the great step of drawing glottochronological implications from the lexicostatistical percentages. It was the discovery that Indoeuropean languages with their relatively ancient records offered a means of calibrating the rate of replacement, viewed as a constant, that added impetus to the tendency to view languages as changing in some degree according to laws analogous to those observable in natural non-social phenomena. It is not so important in the history of lexicostatistics or linguistic science that the nature of the law involved need not be as Swadesh saw it. What is important is that he attempted to formulate a general law about the rate of replacement. His formulation was accurate enough as a first approximation to be applicable for some purposes even now and in any case made it clear that a linguistic analog of radio-carbon dating was not only conceivable, but practicable. It was my misfortune to have arrived at Yale University after Swadesh had left. I nevertheless had the good fortune of meeting with Morris Swadesh a number of times in the forties and fifties and of exchanging ideas with him. I should say that like many classical comparatists today, I resisted very strongly the notion that basic vocabulary could be used as it is in lexicostatistics. The use of statistics is not part of the training
6
FOREWORD
of a linguist, let alone a traditional comparatist. It is all the more surprising that by a series of steps that now appear reasonable Swadesh was led into an interesting use of statistics in the solution of a linguistic problem. It is to be expected that other interesting uses of statistics will be developed. Swadesh was an anthropological linguist who developed an application of statistics in linguistics. He developed this in consultation with a statistician. It is thus only natural that we had among our participants not only linguists and anthropologists, but also statisticians. This mixture of very varied specialties lies at the basis of the high scientific interest in these sessions. It is unfortunate that the discussions of the papers at the conference could not be recorded. Contributions sometimes appear in comments that outrank those in papers. We can only hope that such comments will find their way into future papers. Isidore Dyen For the Organizing Committee
PARTICIPANTS IN THE FIRST CONFERENCE ON GENETIC LEXICOSTATISTICS
Patricia Afable, Yale University + Paul Black, Yale University Harold Conklin, Yale University Frederick Damerau, IBM at White Plains * Annette Dobson, James Cook University of North Queensland * Isidore Dyen, Yale University * William Welcome Elmendorf, University of Wisconsin * Harold C. Fleming, Boston University + Dell Hathaway Hymes, University of Pennsylvania * Martin Joos, University of Toronto * Joseph B. Kruskal, Bell Telephone Laboratories, Murray Hill, New Jersey Floyd Lounsbury, Yale University Curtis McFarland, Yale University Paul Newman, Yale University Roxana Newman, New Haven Dove Pierce, New Haven * David SankofF, University of Montreal Gillian Sankoff, University of Montreal Leonard James Savage, Yale University (now deceased) * Louisa R. Stark, University of Wisconsin Nathan Smith, Yale University Shigeru Tsuchida, Yale University Andrew Weiss, Yale University * Rulon Wells, Yale University + Henri Gontran Wittmann, The ITM Corporation Hanni Woodbury, Yale University
* Presented a paper at the Conference. All but one are published herewith. + Offered a paper, but could not attend.
CONTENTS
Foreword
5
ISIDORE DYEN
The Validity of the Mathematical Model of Glottochronology
11
JOSEPH B. KRUSKAL, ISIDORE DYEN, a n d PAUL BLACK
Some Results from the Vocabulary Method of Reconstructing Language Trees
30
0. 1. 2. 3. 4. 5.
30 30 31 32 35 46
Introduction The data Statistical Assumptions Maximum Likelihood Method Some Results Validity of the Key Approximation
ANNETTE J. DOBSON
Estimating Time Separation for Languages
56
DAVID SANKOFF
Parallels between Genetics and Lexicostatistics
64
ISIDORE DYEN
The Impact of Lexicostatistics on Comparative Linguistics
75
HAROLD C. FLEMING
Sub-classification in Hamito-Semitic
85
HENRI GONTRAN WITTMANN
The Lexicostatistical Classification of the French-Based Creole Languages .
89
LOUISA R. STARK
Glottochronology and the Prehistory of Western South America
100
10
CONTENTS
WILLIAM W. ELMENDORF
Lexical Determination of Subgrouping
108
RULON WELLS
Lexicostatistics in the Regency Period
118
DELL HYMES
Lexicostatistics and Glottochronology in the Nineteenth Century (with Notes toward a General History)
122
THE VALIDITY OF THE MATHEMATICAL MODEL OF GLOTTOCHRONOLOGY
ISIDORE DYEN
When first proposed by Swadesh, glottochronology was regarded as a fresh contribution to the expanding role of linguistics, but because it failed to meet certain tests that were applied, not all of them fair, glottochronology in the manner proposed by Swadesh has come to be suspect.1 Chretien's "Mathematical Models of Glottochronology" (1962) can reasonably be regarded as the article which has convinced many linguists that the mathematical model of Swadesh's glottochronology led to error. Their conviction is undoubtedly based on an acceptance of his claim that the probabilities involved did not behave according to the model. The conclusion that the model had to be abandoned seemed inescapable. As Grace (1964: 365) puts it: "In fact, as Chrétien (1962a) has shown, the mathematical models of glottochronology involve assumptions which are now known to be false." Diebold (1964: 989) refers to "mechanical misfunctions pointed out by Chrétien". Fodor (1965: 9) believes that "Chrétien ... disproved the correctness of the mathematical method of lexicostatistical glottochronology". Chretien's view has furthermore gained in circulation because it has been reprinted commercially.2 It is therefore worthwhile pointing out that Tables IV and V in his article, on which Chrétien bases his claim, evince an unexpressed and undoubtedly unintended bias that invalidates Chretien's conclusions. It is unfortunate that this fact has escaped those unaware of this. These two tables and Tables I, II, and III on which they are based are reproduced in the Appendix. In his presentation Chrétien exhibits his unfamiliarity with the fundamental notions of statistical variation and statistical error. Thus for example he makes the 1
This article represents partial results of work supported by NSF Grant GS-2398. It has profited greatly from discussions with Alan T. James, formerly at Yale University and now at the University of Adelaide, and with Leonard J. Savage of Yale University; any errors, however, are the author's responsibility. As an aftermath of the Yale Conference, four participants concerned with the mathematical theory of statistics — A.J. Dobson, J.B. Kruskal, D. Sankoff, and L.J. Savage (now deceased) — have since published their own critique of Chretien's article under the title "The Mathematics of Glottochronology Revisited", Anthropological Linguistics 14.205-212 (1972); their authoritative treatment, which makes reference to this article, presents essentially the same views as those here. 2 Bobbs-Merrill Reprint Series in Language and Linguistics L-16 (June 1964).
12
ISIDORE DYEN
rigid assumption that exactly 14 words are replaced in each language per millennium (15), whereas it is implicit in glottochronological theory that 14 is the center of a distribution of the number of words replaced at the end of one millennium in a 100-word Swadesh list. The difficulty lies in the notion of a 'constant'. Even if one believes that there is an exact number of replacements in the basic vocabulary that a language will show at the end of a given time period, provided it has had a normal history in that period, it would not follow that the current 'constant' was that exact number; it would merely represent the best approximation. This is essentially due to the fact that one cannot guarantee that all of the languages used in calibrating the constant have had a normal history for the whole of the period studied. The hope would rather be that the individual variations would play an increasingly minor role as the number of languages used to establish the 'constant' was increased. There are other examples illustrating this weakness in Chretien's background. However, such a weakness is perhaps to be forgiven in one whose field is not statistics. The major defect in Chretien's view of glottochronology is his claim that the usual procedure of converting a lexicostatistical percentage into an estimate of time elapsed usually selects a time interval which has a substantially lower probability of being associated with that percentage than another time interval. This is surprising since the usual procedure simply involves calculating the expected loss of cognate pairs in two related languages from the observed replacements of basic vocabulary in the histories of languages. This calculation is accomplished by the use of the 'product law' which says that the probability of combined independent events is equal to the product of the probabilities of the separate events. It is essentially Chrétien's claim that if one follows this calculation out, the resultant calculations do NOT permit one to select the combined independent event of the highest probability in the usual way that has been prescribed. One should be suspicious of something unreasonable here, for it implies that the event which is a combination of two events of highest probability is somehow to be less likely than a combination of events which are respectively nearly similar, but of lower probability, where all other things are equal. I concentrate on this claim of Chretien's, because it concerns a point that is fundamental to the view that glottochronology is a reasonable inference from simple cognate percentages in basic vocabulary lists, provided, of course, that one is prepared to take the risks involved in the error still present in the whole procedure.3 It is not unreason3
Both generally, and because it does not take into account the different retention rates involved in each meaning. In the procedure proposed by Swadesh the retention rates of the different meanings are treated as equal. It takes very little experience with basic vocabulary lists to realize that the words for 'two' are nearly always cognate no matter which pair of related languages one examines, whereas the words for 'play' are much less commonly cognate. There is reason to believe that these differences in the retention rates for the different meanings are generally not so great as to affect seriously the use of cognate percentages for subgrouping, nor even glottochronological percentages in the middle ranges (i.e. roughly those in the middle half of the range). Dyen, James, and Cole (1967) have proposed a way of taking into account the different retention rates of each meaning. This refinement corrects some of the error in the glottochronological inferences drawn by the earlier procedure.
MATHEMATICAL MODEL OF GLOTTOCHRONOLOGY
13
able, however, to hope that all error can eventually be reduced to reasonable proportions as the study advances. It is perhaps worthwhile stressing at this point that a great deal is at stake in glottochronology for the history of humanity simply because for inferred languages of the past glottochronology is probably the only dating mechanism that will ever become available. Furthermore, the reasonableness of an association of a prehistoric language with a prehistoric culture can be expected to depend in some cases on an agreement of their respective dates independently determined. The glottochronological procedure as commonly practiced is based on standard lists of basic vocabulary. Chrétien employs a hypothetical list modeled on the Swadesh list with 100 items. Swadesh (1955: 127) found that about 14 words on the average are replaced in 1000 years in the languages whose 100-word lists have been compared at different stages. The assumption is that this average replacement represents a constant rate of replacement for all languages. If so, it follows that the percentage of cognate pairs in a pair of lists of related languages is a function of the amount of time that they have been diverging since the lists were identical. There are thus two sets of percentages in glottochronology. A c percentage is the percentage of cognate pairs out of 100 in the basic vocabulary lists of two languages. A k percentage is the percentage of cognate pairs out of 100 in the basic vocabulary lists for different stages of the same language. The method is calibrated on a collection of k percentages for different languages which offer basic vocabulary lists at different stages. It is assumed that a given c percentage implies that each of the two languages would show approximately the same k percentage with their common starting point and that, on the assumption of independent development, c = k2 or \/c = k, approximately. Chrétien claims to prove that this assumption is untenable on mathematical grounds. Chrétien uses capital C for the number of cognate pairs in two lists and capital K for the number of words retained in the later of two stages of the same language and small c for the percentage of cognate pairs in two lists and small k for the percentage of words retained in the later of two stages of the same language. This difference between the number of words and the percentage of words is only important if the cognate pairs in a pair of lists are correlated to the retentions in a single language and thus to time elapsed. For unless these be considered as percentages, the relation c = k2 will not hold. However Chrétien arranges his tables in terms of the numbers C and K and the difference involved is only a percent sign (i.e. a relation of C or K as numerator to 100 as denominator). For this reason, wherever it is possible to do so, I shall follow Chrétien in talking about C and K. On the assumption of a constant rate of replacement we can calculate approximately the amount of time implied by the K implied by a C. If the constant rate of replacement is 14% per thousand years for the 100-word list, the time value of c = 81 % is approximately the same as that of k = 90%, the square root of 81 %: i.e., about 700 years. In the tables Chrétien gives t (elapsed time) as the decimal fraction of 1000. Thus .70 means 700 years.
14
ISIDORE DYEN
In speaking of the formula by which a given c percentage is converted into units of time Chrétien speaks of the c-function. Similarly there is a /c-function. His objection is expressed as follows : ... what the c-function really does is to indicate for a given t the value of C in the range for that t which has the highest probability. BUT THIS DOES NOT WORK IN REVERSE: it does not indicate for a given C the t ... which has the highest probability, as Tables IV and V clearly show. Yet in actual practice it is C which is observed and t which is required, not the reverse. If we convert an observed C..., we get a t which is useless to us. This t is merely one of the f's ... for the given C, and not the most probable one either; very likely it is one of the less probable f's. The c-function clearly does not give us what it claims to give. (31) This is the heart of Chretien's indictment. There is however no further need to speak of the c-function and ^-function since these are strictly dependent on the values of C and K respectively. If we turn to Chretien's Table V and find the row opposite C = 81, we find the italicized percentage 0072. The italicization indicates that if C = 81, the standard calculation selects the K of that column as the common K of the two languages. The K is 90 in this case and its t is 700 years. The K would of course not ordinarily be quite the same for the two languages, but to follow Chretien's discussion we must do as he does and take this approximation literally as though it were exact. The percentage 0072 represents Chrétien's calculation of the probability that if C = 81, then K = 90. But he calculates that the probability is 3609 or 50 times greater than if C = 81, the common K = 87 and t is then 920 years. Chrétien's assertion is simply that the scheme used in converting a c percentage into an estimated k percentage and thus a time interval of divergence yields highly erroneous results. We will ultimately see that in effect he is suggesting that the old way of keeping the books is wrong regardless of the fact that they balance. Under the improved scheme which he recommends, he can show that the books do not balance. Before entering into a discussion of Chrétien's tables, let us consider the following situation. Suppose someone has equal access to two unmarked opaque cups, one containing one die and the other six dice. Suppose he picks one of the two cups and he casts a sum of six spots. The dice and everybody involved are honest. Suppose we have not seen whether our man had cast one die or six dice. The fact that a point of six had been cast would be good evidence that one die and not six dice had been cast. With one die there is one chance in 6 of throwing a 6, whereas with six dice the chances of throwing a six are one in 6 6 . The likelihood ratio in this case is :
P(D\H2)
Cle6)
where L — likelihood, P = probability, D = the casting of a point of six, and Hi — the selection of the cup with one die, and Hi — the choice of the other cup. The formula P(D\Hi), for example, is read as follows: 'the probability of casting a point of six given the selection of the cup with one die'.
MATHEMATICAL MODEL OF GLOTTOCHRONOLOGY
15
The final odds in favor of the single die are: PjH, | D) P(H2\D)
=
P(D\Hj)
. P(Ht)
P(D\H2)
P(H2)
=
V6 V66 1 I
, P(Hj) 6
P(H2)
if the initial odds are P(Hi)IP(H2). If the chances that the man would pick up the cup with one die are the same as the chances that he would pick up the cup with the six dice - these chances constituting the initial odds - then we should have to conclude that the odds in favor of one die was 6 5 = 7,776. In any event, the odds in favor of one die rather than 6 have been increased by the large factor 7,776. Even statisticians who question the meaningfulness of odds in the scientific counterpart of such problems would agree that the data here speaks strongly for one die as opposed to six. Now let us see how this applies to Chretien's Tables. Table I is essentially the source of Tables II and III. Tables II and III are the source of his Tables IV and V. If we look in Table III at the row opposite C = 81 we find the following under the descending numbers from 90 to 81 respectively for K: 4078 0823 0048 0001 and a string of zeros. We are told (26) that a zero is to be interpreted as a number less than .00005. According to Chrétien this row of figures represents the probabilities that a pair of related languages will score 81 % cognation if they have been diverging for a time interval of between 700 and 1400 years, assuming the 86 % constant rate of replacement to be rigidly correct. He stipulates that the time lapse is in strict correlation with the calibration and therefore the heading of the table refers to values of K, the number of words remaining in a single list at given intervals as it passes through time. Suppose we decide to use these probabilities as the basis for determining which K and its time lapse has the best chances of being associated with a given cognation percentage C. We could compare each of the pairs of probabilities as we did in the likelihood ratio for the two sets of dice. Thus we would conclude that if C = 81, and if the chances that K would assume each of the possible values were not very different, the evidence in Chrétien 's tables was about five times stronger that K = 90 than that K = 89 and so on. Now this conclusion - somewhat exaggerated by Chretien's over-rigid assumptions - forms a reasonable basis for what glottochronologists have regularly done. Another way of putting it is to consider the Maximum Likelihood Estimate. The Maximum Likelihood Estimate of the parameter K is the value of K which maximizes the probability of C given K. This too would lead us to the highest probability of a C given K as the strongest evidence for a K given C. We need, however, remember one thing in all of this. The relatively high concentration of probability in one K such as 90 compared to a value close to it like 89 is due to the fact that we are accepting Chrétien's calculations as a starting point. His calculations are somewhat too rigid and tend to exaggerate differences which actually form part of a smoother somewhat less determinate distribution. This exaggeration is due
16
ISIDORE DYEN
to the highly artificial assumption made by Chrétien (not by glottochronologists) that exactly 14 words are replaced in each language per millennium. He makes this assumption in order to test the validity of inferring a k percentage from a c percentage and thus the time t implied by the k percentage (17). Chrétien does not reach the conclusion we have indicated above. Instead he constructs a table of probabilities in Tables IV and V which are intended to present the probability of K given C; that is, that given a value of C, the K of the lists will have a particular value. An examination of the probabilities in Chretien's Table V would show that if C = 81, the most probable values of K would be 88 or 87 (and thus not 90). Now this is a small difference since the difference of time involved is 150 to 200 years. However the difference here is brought about in exactly the same way as much greater differences elsewhere. Why did Chrétien construct Tables IV and V ? He says : "As I have said before, we cannot compare the probabilities of Tables II and III horizontally, but only vertically" (23 f.). He has earlier said concerning Table II: "We cannot compare probabilities horizontally, however; we cannot say that for C = 7 a t = 9 is more probable than a t = 8, because we would then be comparing probabilities which do not belong to the same fundamental probability set" (19). We are now at the crux of the matter. Instead of comparing the probabilities, Chrétien proposes to follow a different plan. In order to understand his plan one must first understand his term "a joint set". Chrétien desires to test the constant rate postulate against its implications. He assumes for this purpose that the constant rate of 86 % retention per thousand years will operate without fail. It follows that two related languages A and B will each have replaced exactly 14 words at the end of 1000 years. The number of cognate pairs after 1000 years will therefore vary from a maximum of 86, if all the same words are replaced, to a minimum of 72 if none of the same words are replaced. A joint set is defined as one possible way in which two lists with the same number of replacements can be matched up or combined (17). Therefore, for example, in the hypothetical case of two languages diverging for exactly 1000 years he finds the number of joint sets if there are 86 cognates, if there are 85 cognates, and so on, and then develops a table which associates 'probabilities' with each K for each C. To do this he considers the number of joint sets associated with 86 cognate pairs, etc., as numerators over the total number of joint sets possible when each language has retained just 86 words. This is then repeated for other K's. As Chrétien himself points out, this is analogous to calculating the probabilities in casting two distinguishable dice. He says, Suppose that we require to know the probability of throwing a seven ... The probability of throwing a seven is the ratio of the number of possible throws of seven to the total of possible throws, or 6/36 = 1/6. What we did was to count the number of possible outcomes of a desired sort and then divide it by the total number of possible outcomes. This method I shall apply to the problem at hand (17).
MATHEMATICAL MODEL OF GLOTTOCHRONOLOGY
17
The procedure Chrétien follows does essentially yield the probabilities of C given K. Nevertheless, we must observe that as K decreases, the number of joint sets increases. The RATE OF INCREASE is at first enormous, but decreases until K is 50, that is, until half of the list is gone. As K decreases from 50, the number of joint sets decreases at a rapidly increasing rate until 0 is reached. I have indicated in Table A some of the data on the relative numbers of joint sets for different K's in Chretien's tables (The way in which these relative numbers are extracted from the tables is also exemplified in Table A). This variation in the number of joint sets for different K's is a crucial Table A Relative numbers of joint sets in columns of K = 99, K = 98, K — 97, etc. x 9 8 — 2500
Xgg — 50 Xgg =
X97 = 1020*98
Xg2
x95 =
X73 =
371 x 9 g
12*83 7 X74
X47 —
2X55
*47 ~
^ X40
X40 = 102X 3 5
Here xj = sum of joint sets in a column for which K = i (i.e. X98 is the sum for the column of K = 98). The manner in which the relative number of joint sets is determined for two K columns is based on the following additional definitions : yj = sum of joint sets in a row for which C = j. xtj = the joint sets in X; that are the same as those in y y yji = the joint sets in ys that are the same as those in x,. x
u — yji-
Then for the columns of K = 98, K = 97, K = 96 x
98,96
x98
96
=
3^96,98
= .9603 x 9 8 [Table III]
y 9 6 , 9 8 = .3158 y96 [Table V] .9603 x 9 8 = .3158 y96 x 9 8 = .33J>96 3 x 9 8 = y96 And: X
97,96 = y 96,91 x 9 7 9 6 = ,0018x 97 [Table III] x 96>97 = .6317 j>96 [Table V] .0018 x 9 7 = .6317 j>96 x 9 7 = 340 y96 x 9 7 = 340 • 3X 98 = 1020 x 9 8
18
ISIDORE DYEN
matter. It does not affect the calculations of C given K, the probabilities in the columns, but does seriously affect the probabilities of K given C in Chretien's plan. To go on with Chrétien's plan. He says : What we want now are fundamental probability sets which give probabilities ACROSS the ranges of C [i.e. horizontally - I.DJ. These probabilities can be found; the method may most conveniently be explained by an example. If we look in Table II at the ranges for which C = 74 is possible, we find that the first on the left is that for which t = .92 and K = 87. We count the number of joint sets of this range for which C = 74. We then take the next range to the right and again count the number of joint sets for which C = 74, and so on through the rest of the 14 ranges. We then sum the quantities thus obtained and get the total number of joint sets for which C — 74. Turning back to the first column that we used (where t = .92, K = 87), we divide the quantity we determined there by the total just found and get the probability that C = 74 reflects t = .92, K = 87. Continuing this through the remaining 13 ranges, we get a new fundamental probability set which gives us the probabilities of the various possible values of K and t ... for which C may be 74 (26). Thus we see that Chrétien plans (1) to consider the number of COMBINATIONS in all K's which produce a particular C, (2) to sum them, and (3) to use the sum as the denominator of the probability fraction under the joint sets of each K for that C as numerator. This is tantamount to assuming that each joint set is equiprobable with any other. But if the number of joint sets is increasing as K decreases, as they do until half of the list is gone, his plan is not reasonable. Suppose we follow Chrétien's lead in the dice example in attempting to infer which set of dice exhibited the six. We would consider the number of COMBINATIONS which produce six in each case, that is, 1 in the case of one die and 1 in the case of 6 dice. There are thus a total of two combinations which could produce a score of 6. We must then reach the conclusion according to Chrétien's plan that the probability that a six had been cast with one die in the instance described before is exactly onehalf - being exactly equal to the probability that it had been cast with six dice. It is hard to believe that anyone will find this statement reasonable. In fact, one might reasonably consider this statement as implying that one should consider events as equiprobable regardless of whether they normally occur only once a day, only once a week, only once a month, or only once a year. It is sometimes easier to follow a bit of reasoning in an analog than in the actual case. It is particularly advantageous here to work with an analog that uses relatively small numbers because the vast numbers involved in Chrétien's joint sets tend to be obfuscating. The following shows the results of applying Chrétien's reasoning to the case of numbers of dice (D) and sums of spots (S) on the dice. It turns out that D here is analogous to Chrétien's K, and that S here is analogous to Chrétien's C. If one plays with dice, it may be advantageous to know the chances of casting a particular sum of spots with a particular number of dice. To calculate these chances one would wish to know the number of ways in which one could obtain a particular sum of spots considering each die to be distinguishable from the others. Thus there
19
MATHEMATICAL MODEL OF GLOTTOCHRONOLOGY
are six ways of obtaining a sum of five spots with three dice, the dice regarded as distinguishable (3: 1: 1, 1: 3: 1, 1: 1: 3, 2: 2: 1, 2: 1: 2, 1: 2: 2). Each of such ways of obtaining a sum of spots is analogous to one of Chretien's joint sets defined as a way of obtaining a particular value for C given a particular value of K. We shall therefore use the term 'joint set' for each combination of dice and spots. Table B presents a matrix of the number of joint sets for a particular sum of spots (listed in the left-hand column) given a particular number of dice (listed in the top row). Table B is analogous to Chretien's Table I. TABLE B 1
2
3
4
5
6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 2 3 4 5 6 5 4 3 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 3 6 10 15 21 25 27 27 25 21 15 10 6 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 4 10 20 35 56 80 104 125 140 146 140 125 104 80 56 35 20 10 4 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 5 15 35 70 126 205 305 420 540 651 735 780 780 735 651 540 420 305 205 126 70 35 15 5 1 0 0 0 0 0 0
0 0 0 0 0 1 6 21 56 126 252 456 756 1161 1666 2247 2856 3431 3906 4221 4332 4221 3906 3431 2856 2247 1666 1161 756 456 252 126 56 21 6 1
Totals
6
36
216
1296
7776
46656
Totals 1 2 4 8 16 32 62 117 211 362 590 912 1337 1862 2467 3113 3743 4292 4697 4907 4892 4651 4215 3637 2982 2317 1701 1176 761 457 252 126 56 21 6 1
20
ISIDORE DYEN Given the numbers o f joint sets that have a particular S (sum o f spots) given D
(number of dice), o n e can calculate the probability that a particular joint set will appear if a given number o f dice are used. These probabilities are given in Table C. Table C is a n a l o g o u s t o Chretien's Tables II and III. TABLE C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
1
2
3
4
5
6
0.1667 0.1667 0.1667 0.1667 0.1667 0.1667 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0278 0.0556 0.0833 0.1111 0.1389 0.1667 0.1389 0.1111 0.0833 0.0556 0.0278 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0046 0.0139 0.0278 0.0463 0.0694 0.0972 0.1157 0.1250 0.1250 0.1157 0.0972 0.0694 0.0463 0.0278 0.0139 0.0046 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0008 0.0031 0.0077 0.0154 0.0270 0.0432 0.0617 0.0802 0.0965 0.1080 0.1127 0.1080 0.0965 0.0802 0.0617 0.0432 0.0270 0.0154 0.0077 0.0031 0.0008 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000 0.0001 0.0006 0.0019 0.0045 0.0090 0.0162 0.0264 0.0392 0.0540 0.0694 0.0837 0.0945 0.1003 0.1003 0.0945 0.0837 0.0694 0.0540 0.0392 0.0264 0.0162 0.0090 0.0045 0.0019 0.0006 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0005 0.0012 0.0027 0.0054 0.0098 0.0162 0.0249 0.0357 0.0482 0.0612 0.0735 0.0837 0.0905 0.0928 0.0905 0.0837 0.0735 0.0612 0.0482 0.0357 0.0249 0.0162 0.0098 0.0054 0.0027 0.0012 0.0005 0.0001 0.0000
W e have here considered only the cases o f the n u m b e r o f dice u p t o six, but this number is actually open-ended. O n e finds the probability o f a particular S given D by making it the numerator o f a fraction with a d e n o m i n a t o r that is the s u m o f the S ' s that can occur with a given D. F o r example, the probability o f casting a s u m o f
21
MATHEMATICAL MODEL OF GLOTTOCHRONOLOGY
six spots with three dice is .0463 ( 10 /2i6), since the sum of all the joint sets associated with (i.e. all the different combinations which can appear if) D = 3 is 216 and there are 10 joint sets associated with (i.e. ways of obtaining) a six with three dice. If one is given a particular sum of spots, one may wish to consider the likelihood that it was produced by a particular number of dice. The way that naturally suggests itself is to compare the probabilities associated with the various numbers of dice. Thus if the given number of spots is six, the probabilities are .1667 for one die, .1389 for two dice, .0463 for three, .0077 for four, .0006 for five, and for six a number, 1 /46656> that is too small to have a digit in the fourth decimal place. N o w given that S = 6, it would be reasonable to believe that the chances were best that one die had TABLE D 1
2
3
4
5
6
1
1.0000
0.0000
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
0.5000 0.2500 0.1250 0.0625 0.0313
0.5000 0.5000 0.3750 0.2500 0.1563 0.0968 0.0427 0.0190 0.0083 0.0034
0.0000 0.0000
0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 OiOOOO
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.0011 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.2500 0.3750 0.3750 0.3125 0.2419 0.1795 0.1185 0.0746 0.0458 0.0274 0.0157 0.0081 0.0041 0.0019 0.0008 0.0002
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.1250 0.2500 0.3125 0.3226 0.2991 0.2654 0.2210 0.1763 0.1371 0.1047 0.0784 0.0567 0.0402 0.0278 0.0186 0.0119 0.0071 0.0041 0.0022 0.0009 0.0003
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.0625 0.1563 0.2419 0.2991 0.3318 0.3481 0.3475 0.3344 0.3141 0.2900 0.2639 0.2361 0.2084 0.1817 0.1565 0.1327 0.1104 0.0903 0.0724 0.0564 0.0423 0.0302 0.0206 0.0128 0.0066 0.0022
0.0313 0.0968 0.1795 0.2654 0.3481 0.4271 0.5000 0.5654 0.6235 0.6753 0.7218 0.7630 0.7994 0.8316 0.8602 0.8855 0.9075 0.9267 0.9434 0.9577 0.9698 0.9794 0.9872 0.9934 0.9978
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
22
ISIDORE DYEN
been used, but only slightly better than those that two dice had been used. The chances that either one die or two dice had been used would be overwhelming. In this way of calculating which deals with the likelihoods, if one considers only the case where only one die or six dice and no other number of dice had been used, the choice between the two possibilities overwhelmingly favors one die, and this corresponds to our intuition. If one follows Chretien's suggestion, one would compare the number of JOINT SETS associated for a given S relative to the sum of all JOINT SETS ASSOCIATED WITH THAT S (analogous to Chretien's C). One thus obtains Table D which is analogous to Chretien's Tables IV and V. Thus, for example, one would establish a fraction .0313 (V32 where 32 represents the sum of the S"s associated with the sum of six spots) as representing the chances that if the sum of the spots is six, one die had been thrown. Similarly .1563 (5/32) would represent the chances that it was two dice that had been used, .3125 (10/32) that it was three, the same that it was four, .1563 that it was five, and .0313 that it was six (see Table D). Thus if the sum of the spots was six and the choice lay between from one up to six dice, one would conclude that the chances were best that either three or four dice had been thrown. It would thus also follow that if the sum of the spots was six and the choice was restricted to one die and six dice (with no other number of dice possible), it would be an even bet since the chances of either one are equal at .0313. One could easily lose one's faith in Chrétien's proposal by a little experimenting with betting. Instead of regarding Chrétien's joint sets as equiprobable for a given C, we in effect regard only time lapses which are not very different from each other as close to equiprobable. Thus Chrétien's probabilities in Tables IV and V deserve the name of probabilities only in that they add up to one in K for each fixed C, but this does not mean that what is probable according to the table is really probable. Tables IV and V are of the type that might be constructed if one had a strong presumption that all values of ^Ttend toward 50. The highest probabilities there for a value of K other than 50, given C, are always closer to 50 than those given by the likelihood ratio or the maximum likelihood estimate. But there seems to be no reason - or even intention on Chrétien's part - to make this a priori assumption at all. On this basis it seems reasonable to characterize the probability Tables IV and V as at best an inappropriate way of calculating the probability of K, given C. On this basis too, Chrétien's argument should cease to be a road-block to glottochronologyBIBLIOGRAPHY Chang, K. C„ G. W. Grace, W. G. Solheim II 1964 "Movement of the Malayo-Polynesians : 1500 B. C. to A. D. 500", Current Anthropology 5:359-406. Chrétien, C. Douglas 1962 "The Mathematical Models of Glottochronology", Language 38:11-37.
MATHEMATICAL MODEL OF GLOTTOCHRONOLOGY
Diebold, A. R. 1964 "A Control Case for Glottochronology", American Anthropologist 66: 987-1006. Dyen, I., A. T. James, and J. W. L. Cole 1967 "Language Divergence and Estimated Word Retention Rate", Language 43: 150-71. Fodor, Istvan 1965 The Rate of Linguistic Change (The Hague: Mouton and Co.). Swadesh, M. 1955 "Towards Greater Accuracy in Lexicostatistical Dating", UAL 21: 121-37.
24
ISIDORE D Y E N
APPENDIX
CHRÉTIEN'S TABLES TABLE I RANGE OF C
t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 to 22 23 24 25 27 28 35 36 if.
K'
Range of C
C'
Range of t'e
86 74 64 55 47 40 35 30 26 22 19 16 14 12 10 9 8 7 6 5 4 3 2 1 0
86 to 72 74 48 28 64 55 10 47 0 40 0 0 35 0 30 26 0 0 22 0 19 16 0 14 0 12 0 10 0 0 9 8 0 7 0 6 0 5 0 0 4 0 3 2 0 1 0 0 0
74 55 40 30 22 16 12 9 7 5 4 3 2 1 1 1 1 0 0 0 0 0 0 0 0
.50 to 1.09 1.00 2.43 1.48 4.22 1.98 7.63 2.50 3.04 3.48 3.99 4.47 5.02 5.51 6.08 6.52 7.03 7.63 7.98 8.37 8.81 9.33 9.93 10.67 11.63 12.97 15.27
MATHEMATICAL MODEL OF GLOTTOCHRONOLOGY
25
8 88 SSo HVCHt-mNOrtlfilOM^ rtNinooNifl^no^nH OOOO^'-i—i'H—iOOO T)vo
O vo m -h ïOiot^ninriHfloHr^ — ^ r- w-i ~ ~ OOOOOt^^H*-!^ h Ö O O 8 S
>c3 "5 b V e w j CP < H
H. ^ t ^ ' l ' í ^ "i ® 9 ^ 1 I t "i ^ ^ "i 1 t ® N H N N r i N r i N N m m r i m m m fn pî r ! r î m t í ^ ^ t í Tt
at
c (S ai
x> a si o
S
O O ^ ^ ^ ^ H O O o> oo »1 N
i Pi 00 m m 1
r- t-
r- r-
00
»n
00
a\ O \£> r» 00 © l »-t t© c\ s S s VO t- 00 00 •H T-l * * ri ri ri ri ri
r- ys M CT\ r - o vo VO «5 vo vo VÛ O S io 00 Vi «n
r»> (S >o >o M «HO «1
K
27
MATHEMATICAL MODEL OF GLOTTOCHRONOLOGY TABLE IV Probabilities of t and K for Selected Values of C t't K ' c
.77 89 74
74 55 40
55 40 30 22
55 40 30 22 16 12 9 7 5
40 30 22 16 12 9 7 5
.85 88
1
1 2 3
2 3 4 5
2 3 4 5 6 7 8 9 10
3 4 5 6 7 8 9 10
.92 87
1.00 86
1.08 85
1.16 84
1.24 83
1.32 82
1.40 81
1.48 80
0000
0037
0447
1804
2967
2899
1406
0379
2.09 73
2.18 72
2.27 71
2.37 70
1.56 79
1.65 78
1.73 77
1.82 76
1.91 75
2.00 74
0057
0005
0000 0000
0000 0000
0000 0008
0000 0075
0375
1105
2047
2496 0000
2.46 69
2.56 68
2.66 67
2.76 66
2.86 65
2.96 64
3.06 63
3.17 62
3.28 61
3.39 60
2064 0000
1183 0000
0476 0000
0136 0007
0027 0037 0000
0004 0000
0000 0549 0000
0000 0959 0000
0000 1914 0000 0000
0000 2222 0003 0000
4.09 54
4.21 53
4.34 52
4.47 51
4.60 50
0016 1899 0021 0000 0000 0000
0003 1982 0088 0000 0000 0000 0000
0000 1612 0277 0001 0000 0000 0000 0000
0000 1032 0656 0006 0000 0000 0000 0000
0000 0523 1195 0029 0000 0000 0000 0000
0174
3.50 59
3.61 58
3.73 57
3.85 56
3.97 55
0000 1924 0022 0000
0000 1258 0101 0000 0000
0000 0627 0331 0000 0000
0000 0240 0789 0000 0000 0000
0000 0071
4.73 49
4.87 48
5.01 47
5.15 46
5.30 45
5.45 44
5.60 43
5.75 42
5.91 41
6.07 40
0000 0211 1697 0108 0002 0000 0000 0000
0000 0068 1902 0310 0013 0000 0000 0000
0000 0017
0000 0004 1222 1195 0174 0015 0002 0000
0000 0001 0710 1655 0434 0060 0010 0001
0000 0000 0335 1840 0856 0185 0041 0007
0000 0000 0129 1658 1349 0448 0135 0029
0000 0000 0032 1219 1716 0863 0345 0101
0000 0000 0010 0735 1775 1340 0887 0273
0000 0000 0009
1701
0686 0054 0003 0000 0000
1403
0003 0000 0000
0364
1505 1689 1161 0589
28
30 22 16 12 9 7 5
ISIDORE DYEN
4 5 6 7 8 9 10
6.24 39
6.42 38
6.59 37
6.78 36
6.96 35
7.15 34
7.35 33
7.55 32
7.77 31
7.98 30
0000 0000 0149 1051 1743 1557 1028
0000 0000 0051 0608 1495 1714 1464
0000 0000 0014 0292 1051 1560 1713
0000 0000 0003 0117 0619 1180 1661
0000 0000 0001 0039 0304 0745 1342
0000 0000 0000 0011 0126 0394 0884
0000 0000 0000 0003 0043 0175 0515
0000 0000 0000 0001 0013 0066 0246
0000 0000 0000 0000 0003 0021 0099
0000 0000 0000 0000 0001 0006 0034
.55 92
.63 91
TABLE V Probabilities of T and Kioi Values of C from 100 to 72 t'k k C
t'c 0 100
100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82
0 .03 .07 .11 .14 .17 .21 .24 .28 .31 .35 .39 .42 .46 .50 .54 .58 .62 .66
.07 99
.13 98
.20 97
.27 96
.34 95
.41 94
.48 93
1.00 1.00 6667
3333 8570 3158
1429 6317 5882 1419
0526 3921 6381 3562 0632
0196 2128 5343 5058 2007 0281
0074 1069 3794 5353 3519 1081 0125
0025 0506 2408 4690 4501 2254 0564 0056
0009 0229 1407 3601 4696 3385 1364 0288 0025
0003 0101 0772 2505 4231 4094 2352 0793 0145 0011
29
MATHEMATICAL MODEL OF GLOTTOCHRONOLOGY
.70 90 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72
80 79 78 77 76 75 74 73 72
.35 .39 .42 .46 .50 .54 .58 .62 .66 .70 .74 .78 .82 .87 .91 .95 1.00 1.04 1.09
.74 .78 .82 .87 .91 .95 1.00 1.04 1.09
0001 0043 0402 1611 3410 4233 3235 1541 0446 0072 0005
.77 89
0000 0018 0202 0975 2521 3884 3777 2378 0967 0245 0035 0002
.85 88
0000 0007 0097 0560 1734 3238 3886 3095 1653 0585 0132 0017 0001
.92 87
0000 0003 0046 0308 1124 2497 3609 3525 2369 1097 0344 0070 0008 0000
1.00 86
1.08 85
1.16 84
1.24 83
1.32 82
1.40 81
0000 0001 0021 0164 0694 1804 3084 3611 2963 1719 0703 0203 0037 0004 0000
0000 0000 0009 0084 0410 1233 2457 3387 3317 2342 1226 0447 0113 0019
0000 0000 0004 0042 0234 0804 1844 2948 3387 2917 1804 0815 0267
0000 0000 0002 0021 0129 0503 1313 2409 3023 2967 2162 1179
0000 0000 0001 0010 0069 0303 0894 1909 2899 3260 2766
0000 0000 0000 0005 0036 0177 0600 1406 2401 3073
1.48 80
1.56 79
1.65 78
1.73 77
1.82 76
1.91 75
2.00 74
2.09 73
2.18 72
0000 0000 0000 0002 0018 0103 0379 0985 1882
0000 0000 0000 0001 0009 0057 0230 0663
0000 0000 0000 0000 0005 0030 0135
0000 0000 0000 0000 0002 0016
0000 0000 0000 0000 0001
0000 0000 0000 0000
0000 0000 0000
0000 0000
0000
SOME RESULTS FROM THE VOCABULARY METHOD OF RECONSTRUCTING LANGUAGE TREES
JOSEPH B. KRUSKAL ISIDORE DYEN and PAUL BLACK
0.
INTRODUCTION
This paper is a sequel to Kruskal, Dyen, and Black (1971). It presents further results and further statistical analysis. We refer the reader to the earlier paper for a more elementary introduction to our work, and a critical discussion of the linguistic and statistical assumptions. However, we repeat in this paper enough to make it comprehensible in itself and to make the significance of the results clear. Also, we make an effort in this paper to explain the statistical methods to linguistically trained readers. For more information about the Austronesian ( = Malayopolynesian) data, we refer the reader to Dyen (1965).
1.
THE DATA
Very briefly, the original linguistic data are based on four language families, as shown in Fig. 1. For each family, word-lists have been assembled for many languages and dialects. (The word-lists have come from a variety of sources, such as native speakers, experts on the language, and dictionaries). Each word-list contains the form for each meaning in a test-list originally prepared by Swadesh. Different versions of this test-list have M = 200, 196, or 100 items. Occasionally a form is missing, or a single meaning is represented by two (or even more) forms. Meanings
Lists L
List-Pairs L2 = L ( L - l ) / 2
196(5)
371 95 107 63
68,635 4,465 5,671 1,953
M
Austronesian (Malayopolynesian)1 Indo-European Philippine4 Cushitic
200 196 100
Philippine and Austronesian share 40 lists. See text for explanation. Fig. 1. The Four Families.
VOCABULARY METHOD OF RECONSTRUCTING LANGUAGE TREES
31
The lists were all collected and the judgments described below were made by Dyen, except for the Custhitic, which was all done by Black. (Black describes his judgments as tentative opinions, made very rapidly). However the raw data for STATISTICAL analysis does not consist of these forms. Instead, it consists of expert judgments on the cognation among these forms. For a given meaning in the test-list, and for each PAIR of languages, there are four possibilities : (a) the forms are judged 'cognate' (symbol C); (b) the forms are judged 'not cognate' (symbol N); (c) occasionally, the decision is so difficult that the opinion is 'doubtfully cognate' (symbol D); (d) no form is present in one or both lists, so that no judgment is possible, and the pair is recorded as 'missing' (symbol G). For L languages, there are Z.2 = L(L-1)/2 pairs of languages. Hence the raw data for statistical analysis consists in principle of a rectangular COGNATION MATRIX, having M rows and L% columns. Each entry consists of one of the four possibilities just mentioned. Each column of the matrix corresponds to a pair of lists, say i and j. We refer to this pair of lists more briefly as 'list-pair if or 'column i f . Each row of the matrix corresponds to some meaning m. We use (m, i f ) to indicate the judgment for meaning m in list-pair ij, that is, the matrix entry in row m and column ij. 2. STATISTICAL ASSUMPTIONS
There are three statistical assumptions on which our analysis is based. Assumption 1: We assume that the word-lists came to their present form by evolution from a single original list, through a series of splits occurring at definite times which can be described in the usual way as a family tree (see Fig. 2). PRESENT DAY LANGUAGES
TIME'S ARROW
0 = T H E PRESENT SCALE SHOWING DIVERGENCE T I M E S
Fig. 2.
Assumption 2 : We regard replacement of one form by another (for a given meaning) as a random process, which is statistically independent for distinct branches of the tree, and for distinct meanings (even on a single branch). We assume that for a single
32
J. B. KRUSKAL, I. DYEN AND P. BLACK
meaning along a single branch of duration t, the probability of nonreplacement is p =
e~rt.
Assumption 3: We assume that, in a single family, the replacement rate r depends only on which meaning of the test-list is involved, and remains the same over the entire family tree. (Notice that this is considerably less restrictive than the usual assumption made in most earlier work, which is that r is equal for all meanings in the test-list).
3.
M A X I M U M LIKELIHOOD M E T H O D
The analyses we present here are all based on what statisticians call 'maximum likelihood estimation'. Maximum likelihood is one of the most widely used methods of estimation, particularly for complex or novel models. Using the mean of a sample to estimate the true underlying value is often cited as an elementary example of maximum likelihood estimation in statistics textbooks. We explain here how maximum likelihood estimation works in this context. The method we explain here is already moderately complex. However, it falls considerably short of what we would like to accomplish, due to many difficulties. One simplification is fairly innocuous: we estimate the divergence time ty for each pair of languages, ignoring the fact that the languages form a tree. The tree form would greatly limit the times, in that a tree requires many divergence times to equal one another. The hope here is that our estimates without using the tree constraint will nevertheless form themselves nearly enough into a tree that we can recover it. If so, we can use the maximum likelihood times as the input to some further procedure to form the tree and to estimate the times associated with the nodes. The other simplification is far from innocuous. We call it the key approximation, and describe it a little later. In another section, we present quantitative evidence for its validity. Consider the forms used to denote meaning m in lists i and j. What is the probability that they are cognate ? In practice, cognation occurs only if no replacement (for this meaning) has taken place in the two paths from the most recent common ancestor of i and j down to the present-day languages i and j. (In principle there are other possibilities, but their probability is negligible). Thus under our assumptions the probability of cognation, given rm and is Pmij
=
e~2r-"J.
Of course, the probability of noncognation is 1 - pmtiNow we define the 'likelihood' of observation (m,ij) to mean the probability, given rm and ty, that the actual observation would occur. Thus Lmn, the likelihood, is Pmi] if the observation is 'cognate' and 1 - p m u if the observation is 'not cognate'. At this point, the basic idea of maximum likelihood can already be stated: we must
VOCABULARY METHOD OF RECONSTRUCTING LANGUAGE TREES
33
choose the parameter values rm and IY to make the ACTUAL event be PLAUSIBLE under the model. In other words, we must choose the parameter values so that the likelihood is relatively large. Maximum likelihood means choosing the parameters so as to maximize the likelihood. Of course, it is not enough to consider observation (m, ij) all by itself: we must consider the likelihood of all the observations jointly. Before we form the joint likelihood of the whole cognation matrix, however, we note that a JUDGMENT of cognation or noncognation is not quite the same as the corresponding EVENT, due to the possibility of indecision (judgment of 'doubtfully cognate'), failure to record a form (judgment of 'missing'), or actual error. We explain in the earlier paper why occasional errors in judgment, and other deviations from perfection, can be tolerated, and we simply define ( Pmij, if the judgment is 'cognate'; Lmij = 1 1 - Pmih if the judgment is 'not cognate'; ( undefined, otherwise. Next we form the likelihood, Ltjj, of the entire ij column of the matrix of judgments. Since cognation of distinct meanings has, quite reasonably, been assumed to be statistically independent, and since the likelihood of independent events is the product of the separate likelihoods, we have L t j j = the product of all Z,my except those which are undefined. A t this point we ignore the relationship among the columns, which are certainly NOT independent, and treat them as though they were independent. This is the key approximation mentioned above. We present quantitative evidence for the validity of this approximation in a later section. Then, based on independence, we can write the likelihood of the entire cognation matrix as L = the product of all the L* y. This is the likelihood (or probability) that the observed table of cognation judgments would occur, given all the rm and all the iy. Ultimately L depends only on the rm and the iy (and the data). L varies as we vary the rm and the iy. Values for the rm and iy which make L very small are implausible, since they correspond to the assignment of low likelihood to the actually observed event. Conversely, values of rm and /y which make L large seem much more plausible, since they correspond to the assignment of relatively high probability to the actual event. Hence, it is natural to seek those values of rm and iy which make L as large as possible, that is to maximize L. The values of rm and iy which give this maximum value to L are called the 'maximum likelihood' estimates of the unknown true values. In view of the large size of M and Li, calculating the maximum likelihood values of the rm and the iy is a substantial computational job. For the Austronesian data, L 2 is so very much larger than in the other three cases that this calculation did not, at the time, seem feasible at all. Hence a further reduction was made in this case.
34
J. B. KRUSKAL, I. DYEN AND P. BLACK
(Though we describe the Austronesian calculation last, this calculation was done long before the others). To reduce the size of the Austronesian data, we arranged the 196 meanings in increasing order according to a crude estimate of replacement rate, and then broke them into 5 groups consisting respectively of 12, 34, 40, 60, and 50 meanings. (We omit the rationale for these sizes). Within each group of meanings, we assume that the replacement rates are equal. This reduces the number of replacement rates to be estimated from 196 to 5. From the matrix of cognation judgments, which has M = 196 rows and Z-2 = 68,635 columns, we form a reduced matrix having only 5 rows (but still having 68,635 columns). Each entry f m a is the fraction of meanings in group m for which lists i and j are cognate. More precisely, for m = 1,2, 3, 4, and 5, let Cmij = the number of meanings in group m for which lists i and j have COGNATE forms, Nmii = the number of meanings in group m for which lists i and j have NONCOGNATE forms, Dmij — the number of meanings in group m for which lists i and j have DOUBTFULLY COGNATE forms, Gmij = the number of meanings in group m for which either lists i or lists j or both have missing forms. Then
Under mild assumptions concerning the 'doubtful' and 'missing' situations, fma has the binomial distribution with number of observations = CmU mi j + Nmii, probability = pmij = e
2rm iJ
' .
In practice it is clear that the binomial distribution is an excellent approximation to the truth. Therefore the likelihood ( = probability) of f m n occurring is given by
where C = Cmi],
N = Kmij,
p = pmij,
and fc! (n - k)\
n(n - 1) (n - 2) 2-1 k(k - 1)...2 • 1 • (n - fc)...2 • 1
is called 'n-choose-k' or 'the binomial coefficient'. From the L m y we build the Z^y and £ as before, and the maximum likelihood estimation procedure consists of finding values for the 5 rm and the 68,635 Uj so as to maximize L.
VOCABULARY METHOD OF RECONSTRUCTING LANGUAGE TREES
35
Actually, the approach used for Austronesian can be thought of as a direct generalization of the approach used with the other families. Suppose we use the Austronesian approach in the case where each group contains only one meaning. Then C + N + D + G = 1, where the subscript mij has been dropped from all symbols. From this we see that the Austronesian formula reduces to Lmij
Ip = 1- p ( 1
if C = 1, if N = 1, otherwise.
From this it is easy to see that the Austronesian expression for L reduces to the previous expression for E if every group has only one meaning.
4.
S O M E RESULTS
The primary result which we would like to obtain from this work is the family tree for each family, together with the estimated divergence time for each node of the tree. This goal is the central motivating force behind this work. The tree for Indo-European is of special interest, because it will permit the first severe test of our methodology, and hopefully provide validation. However we are not yet ready to present such final results. For one thing, to make a family tree with TIMES AT THE NODES is by no means a routine matter. The statistical methodology is still in its infancy. For another thing, a lot of 'data analysis' is still needed. This includes finding and giving special treatment to peculiarities in the data. For example, there may well be columns of the cognation matrix which, on the basis of internal consistency, are abnormal, possibly due to a special relationship between the two languages, such as intimate borrowing. There may also be lists which are peculiar: for example, languages which do not belong to the family, or which have a very special history. There may be individual cognation judgments which can be questioned, based on statistical assessment of internal consistency. We wish to do such data analysis so that linguistic judgment can be brought to bear on the peculiarities, which may reflect either interesting linguistic phenomena, such as the effects of word taboo, or weakness in the data. Such data analysis may significantly affect many of the divergence times. When Dyen creates a classification (that is, a family tree) based on lexicostatistical percentages, he includes a great deal of data analysis and special handling of the percentages, integrated with his method for forming the classification. We wish to do such analysis on a more formal, systematic basis, and to expand it greatly, using computer methods. We have not yet done this work, and it is probably necessary before reaching our main goal. While several of the following results are repeated from Kruskal, Dyen, and Black (1971), the following items are new: Figure 4 (the Indo-European scatter diagram);
36
J. B. KRUSKAL, I. DYEN AND P. BLACK L E X I C O S T A T I S T I C A L PERCENTAGE VERSUS DIVERGENCE T I M E FOR T H E A U S T R O N E S I A N F A M I L Y ( S E E T E X T FOR C A U T I O N A R Y DIVERGENCE TIMES)
REMARKS
ABOUT VERY
LONG
B A S E D ON LONG S W A D E S H L I S T , 1 UNIT s 3 5 0 0 Y E A R S B A S E D ON SHORT S W A D E S H L I S T , 1 UNIT 3 4 9 0 0 Y E A R S
Fig. 3.
Figures 5 and 6 (the individual replacement rates for three families); and the distributional information about the replacement rates (Figures 7 and 8). One result of interest is shown in Figures 3 and 4. Each of these is a scatter diagram showing lexicostatistical percentages (on a logarithm scale) versus divergence times. The Austronesian ( = Malayopolynesian) figure has L2 = 68,635 points; the IndoEuropean has Lz = 4465. In a moment we shall discuss their use to convert percentages into divergence times, and the size of the arbitrary time units in years. First, however, we discuss the relationship between them. Suppose they were replotted on a single graph using a common scale for log P, and with the time scales adjusted so that the upper left-hand portions are in coincidence. Then the time units would have a ratio of about 1.44. (This has been checked by measurement and arithmetic calculation from these plots.) Also, the lower right hand part of the Indo-European scatter would be centered somewhat to the right of the Austronesian plot, with the discrepancy
VOCABULARY METHOD OF RECONSTRUCTING LANGUAGE TREES
37
LEX IC O S T A T I S T ICAL P E R C E N T A G E VERSUS DIVERGENCE ' T I M E S FOR T H E I N D O - E U R O P E A N FAMILY ( S E E ' T E X T FOR CAUTIONARY R E M A R K S ABOUT VERY LONG DIVERGENCE TIMES) 100 80 60 llj < o 4
i.
a O
>3 20
o
r
hi
J
d_ lil o < 10 1fl UJ o u: 6 a. . i 4 < o H CO t< 12 (O o o X UJ
mm
1
2
3 4 D I V E R G E N C E T I M E t-L]
'9
'
5
B A S E D ON LONG SWADESH LIST, 1 UNIT s 2 4 0 0 YEARS B A S E D ON SHORT SWADESH L I S T , 1 U N I T S 3 4 0 0 Y E A R S F i g . 4.
increasingly great for smaller percentages. At P = 15 % the shift is quite slight (about 8 %), but at P = 10 % it has risen to roughly 30 %. Both the ratio of 1.44 and the shift have theoretical explanations. The ratio is due to a minor technical distinction in how the rates and times are normalized. In the Austronesian data, we set the geometric mean of the rm equal to 1. This is the same as setting the product r ^ i n r ^ = 1. However n is being used as the rate for 12 meanings, r 2 for 34, n for 40, n for 60, and n for 50. Thus to make the Austronesian units correspond to the Indo-European, where each meaning has a separate rate, it is necessary to renormalize the Austronesian rates and times so that r
l
12 34 40 60 50 _ r 2 r3 rA rS —
1
,
To do this, we divide the 5 rates by 1.4409 each, and multiply the divergence times by 1.4409 each (all this merely represents a change of the arbitrary time unit). The relatively larger times for Indo-European compared to Austronesian at the smaller percentages are also due to the use of individual versus group rates. Intuitively speaking, this permits the few smallest rates to have more of an effect than they
J. B. KRUSKAL, I. DYEN AND P. BLACK
38
would have if grouped together with larger ones. Though the discrepancy is clearly visible, it only becomes substantial where the horizontal width of the bands is large, and hence is not bothersome (see below). The shape of the Indo-European scatter diagram is probably closer to the truth than that of the Austronesian. To understand this phenomenon more clearly, note that the expected value E(P(t)) within our model can easily be shown to be the following: - L 196
[12e_r'r L
+
34e~r2'
+
40e"r3' +
60e~r,t
+
50e""5']
for Austronesian, 200
00 200
^
m= 1
for Indo-European. REPLACEMENT RATES REPLACEMENT RATES FOR MEANINGS IN THREE LANGUAGE FAMILIES: INDO-EUROPEAN, PHILIPPINE, AND CUSHITIC
MEANING l 2 3 4 5 6 7 B 9 10 11 12 13 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
ALL AND ANIMAL ASHES AT BACK BAD BARK/N BECAUSE BELLY BIG BIRD BITE BLACK BLOOD BLOW BONE BREATHE BURN CHILD CLOUD COLD COME COUNT CUT DAY DIE DIG DIRTY DOG DRINK DRY DULL DUST EAR EARTH EAT EGG EYE FALL EAR FAT FATHER FEAR
IE
PH
cu
1-35 2.16 1.72 1.59 1.50 1.76 2.73 1.11 4.93 2.38 1.36 1.19 2.22 1.63 1.61 1.35 0.86 1.34 1.49 2.76 1.95 2.00 1.10 2.39 1.57 0.35 0.30 1.37 6.23 0.71 0.32 0.79 2.61 2.09 0.35 1.18 0.85 0.30 0.26 1.79 1.28 2.54 0.67 1.36
1.38 2.99 1.31 0.78 1.26 1.56 1 .87 2.27 1.91 1.69 1.18 1.10 0.99 1.10 0.78 2.06 0.75 1.26 2.29 1.10 2.18 2.69 2.22 0.67 2.89 0.20 O.U 1.74 3.38 0.77 0.02 0.95 2.20 1.77 0.27 1.60 0.02 0.45 0.01 1.92 0.25 0.52 0.14 1.80
2 .18 1 .15
2 .03 0 .92 1 .60 0 .74 1 .00 2 .10 0 .66 0 .66 1 .49 1 .74 2 .12 1 .85
1,.44 0,.92 0,,94 0.,66 1.95 1.,47 1.,32 0.,07 1.,37 G «42
MEANING 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 68
FEATHER FEW FIGHT FIRE FISH FIVE FLOAT FLOW FLOWER FLY/V FOG FOOT FOUR FREEZE FRUIT GIVE GOOD GRASS GREEN GUTS HAIR HAND HE . HEAD HEAR HEART HEAVY HERE HIT HOLD HOW HUNT HUSBAND I' ICE IF IN KILL KNOW LAKE LAUGH LEAF LEFT LEG
IE 0.87 1.45 2.96 1.05 0.94 0.01 1.12 1.73 0.88 1.68 1.05 0.52 0.01 1.57 1.86 0.22 1.94 1.76 1.01 2.63 1.79 1.17 1.02 1.26 0.84 0.32 1.27 1.31 2.96 1.65 0.03 1.66 1.27 0.01 1.65 2.32 0.26 1.73 0.61 1.53 1.60 1.47 2.44 1.60
PH 1.41 3.48 2.86 0.43 1.16 0.02 1.35 1.63 1.23 1.92 2.12 1.72 0.01 0.70 2.01 1.64 2.61 2.80 1.43 0.25 1.14 0.67 0.35 0.71 0.52 0.68 1.71 1.80 2.39 2.22 .1.31 0.68 0.05 1.30 1.51 0.37 1.76 1.45 1.80 0.69 1.42 2.12
CU
1.27 0.60 0.75
0.45 0.67 1.63 2.50 1.29 1.77 0.75 0.47 0.91 1.17 1.01
0.07
1.11 1.83 2.42
VOCABULARY METHOD OF RECONSTRUCTING LANGUAGE TREES MEANING 69 LIE DOWN 90 LIVE 91 L I V E R 92 LONG 93 LOUSE 94 MAN 95 MANY 96 MEAT 97 M O T H E R 98 M O U N T A I N 99 M O U T H 100 NAME 101 NARROW 102 NEAR 103 N E C K 104 NEW 1C5 N I G H T lCfc NOSE 1C7 NOT 108 O L D 109 ONE 110 OTHER 111 PERSON 112 PLAY 113 PULL 114 PUSH 115 R A I N / V 116 RED 117 RIGHT 118 P I G H T S I O E 119 R I V E R 120 ROAO 121 ROOT 122 ROPE 123 R O T T E N 124 RUB 125 SALT 126 SAND 127' SAY 128 S C R A T C H 129 SEA 130 SEE 131 SEED 132 SEW 133 SHARP 134 SHORT 135 SING 136 SIT 137 S K I N 138 SKY 139 SLEEP 140 SMALL 141 S M E L L 142 SMOKE 143 SMOOTH 144 SNAKE 145 SNOW 146 SOME 147 SPIT
IE 0.94 0.32 1.65 0.41 1.50 1.07 1.58 0.70 0.32 1.36 1.72 0.05 1.26 2.34 1.39 0.13 0.22 0.34 0.17 1.63 0.08 1.59 1.83 2.10 2.62 2.72 1.91 0.99 1.18 0.76 1.86 3.61 0.54 2.91 2.08 1.33 0.30 1.99 2.02 1.51 0.70 0.87 0.61 0.46 1.11 0.86 1.29 0.37 1.58 1.57 0.65 2.10 3.63 0.43 1.14 1.53 0.51 0.67 0.50
PH 2.64 1.60 0.38 1.62 0.15 0.60.. 1 .72 2.64 0.13 2.06 1.80 0.11 2.22 1.71 0.98 0.22 0.69 0.21 1.21 1.52 0.09 2 .49 0.68 2.25 1.39 1 .29 0.19 2.18 3.24 0.50 2.30 0.25 0.32 0.92 1.82 2.65 0.12 2.51 2.56 2.39 0.94 1.59 1.64 1.43 0.46 2.14 5.96 2.68 1.90 0.39 1.04 3.38 2.48 0.69 3.61 1.02 2.69 1.47
MEANING
cu
0 .73 0 .77 1 .04 1 .98 0 .81 1 .93 0 .34 0 .41 3 .34 1 .03 1 .82 0 .19 0 .99 2 .22 1 .33
1 .06 1 .19
2 .16 1 .10
4 .35 0 .78 2 .34 2 .10
2 .33 1 .41 2 .14 2 1 .83 2f C e
148 149 150 151 152 153 154 155 156 157 156 159 160 ltl 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 2C6
SPLIT SQUEEZE STAB STAND STAR STICK/N STONE STRAIGHT SUCK SUN SWELL SWIM TAIL THAT THERE THEY THICK THIN THINK THIS THOU THREE THROW TIE TONGUE TOOTH TREE TURN TWO VOMIT WALK WARM WASH WATER WE WET WHAT WHEN WHERE WHITE WHO WIDE WIFE WIND WING WIPE WITH WOMAN WOOOS WORM YE YEAR YELLOW BREAST CLAW/NAIL FLY/N HORN KNEE MOON
IE 2.03 5.28 5.05 0.38 0.35 3.28 1.67 1.00 0.58 0.19 2.46 1.40 2.55 0.46 0.62 0.86 1.80 0.70 2.28 0.46 0.08 0.01 2.49 0.97 0.09 0.34 1.34 2.32 0.01 1.65 2.64 0.71 2.13 0.54 0.01 2.18 0.06 0.16 0.07 1.30 0.01 1.47 1.14 0.34 1.81 3.71 1.04 0.78 2.08 0.68 0.34 2.12 0.65
PH 3.19 2.10 2.71 1.36 0.43 1.26 0.04 1.72 0.63 0.82 2.12 1.02 0.53 1.84 0.26 1.73 0.87 1.64 1.96 0.06 0.01 4.41 1.68 0.06 0.63 0.11 2.66 0.01 0.78 1.63 1.86 1.60 0.82 0.01 0.65 2.41 1.41 2.91 1.08 1.68 2.42 0.41 2.26 1.24 1.13 2.03 0.42 1.40 1.54 0.70 0.93 1.68
39
CU
1.33 1.41 l.ll 1.17 1.97 1.69 0.35
0.5! 0.82 0.36 0.16 1.50 0.28 1.75 2.34 0.90 0.03 2.45 0.32 1.06 0.50
2.65 0.42 1.42 1.51 2.20 0.56 0.09 1.19
Fig. 5.
When plots of these two functions (not displayed in this paper) are examined, we find that they go down through the bands in Figures 3 and 4, as they should, and that they display the same phenomenon very clearly. We note that our divergence times are still subject to correction and uncertainty of several types. For one thing, as we point out shortly, the divergence time estimates are subject to increasingly large uncertainty, even relatively, as the times increase. For another, a lot of data analysis remains to be done, and it could have quite significant effects on the times. Thus the simple fact that the Indo-European diagram extends to 6 time units, at an estimated duration of 2400 years per unit, does NOT indicate that we are making such absurd estimates as 6 X 2400 = 14,400 years for Indo-European, or
40
J. B. KRUSKAL, I. DYEN AND P. BLACK
REPLACEMENT RATES EOR THREE FAMILIES GRAPHICAL DISPLAY OF REPLACEMENT RATES FOR INDO-EUROPEAN (IE), PHILIPPINE (Ph). A N D CUSHITIC (Cu). EACH MARK INDICATES O N E RATE. AFEW RATES LARGER T H A N 3 ARE N O T S H O W N .
CU
Ph
IE
0
0.3
0.6
0.9
1.2
1.5
1.8
2.1
2.4
2.7
Fig. 6.
DISTRIBUTION FUNCTION OF REPLACEMENT RATES THE C U M U L A T I V E DISTRIBUTION F U N C T I O N OF REPLACEMENT RATES FOR THREE FAMILIES: I N D O - E U R O P E A N (IE), PHILIPPINE (Ph), A N D CUSHITIC (Cu). E A C H C U R V E S H O W S THE FRACTIONS OF RATES ä r FOR T H A T FAMILIE 1.0
IE
ff
0.8 IE/,
Cu// < 0.(