Theoretical Foundations 9783110814644, 9789027912435

254 28 42MB

English Pages 548 [552] Year 1970

Table of contents :
Editor's Introduction
Contents
Topics In The Theory Of Generative Grammar
Language Universals
Historical Linguistics And The Genetic Relationship Of Languages
Language, Mathematics, And Linguistics
Genetic Analysis Of Word Formation
A Guide To Publications Related To Tagmemic Theory
Explorations In Semantic Theory
Appendix I: F. De Saussure's Theory Of Language
Appendix II: Slavic Morphophonemics In Its Typological And Diachronic Aspects, By Edward Stankiewicz
Biographical Notes
Index Of Terms
Index Of Names

Recommend Papers

Theoretical Foundations of Corporate Finance 9780691188478

Corporate finance is the area of finance that studies the determinants of firms' values, including capital structur

100 81 2MB Read more

Theoretical foundations of VLSI design 0521366313

This book discusses recent research in the theoretical foundations of several subjects of importance for the design of h

478 87 4MB Read more

Communism, fascism, and democracy, the theoretical foundations

515 47 32MB Read more

Neural Network Learning: Theoretical Foundations [1 ed.] 052111862X, 9780521118620

This important work describes recent theoretical advances in the study of artificial neural networks. It explores probab

401 25 436KB Read more

Theoretical foundations of electrical engineering. Volume 2. Учебник

305 13 9MB Read more

Foundations and Theoretical Perspectives of Distributed Team Cognition 113862554X, 9781138625549

The background and interwoven streams of team cognition and distributed cognition fermenting together has wielded new nu

407 52 14MB Read more

Machine Learning: Theoretical Foundations and Practical Applications 981336517X, 9789813365179

This edited book is a collection of chapters invited and presented by experts at 10th industry symposium held during 9–1

447 54 7MB Read more

Theoretical Foundations of Radar Location and Radio Navigation 9789813365148, 9813365145

The book represents a study guide reciting theoretical basics of radar location and radio navigation systems of air and

109 12 14MB Read more

Machine Learning: Theoretical Foundations and Practical Applications 9789813365186

458 51 22MB Read more

Theoretical Foundations of Nanoscale Quantum Devices 1108475663, 9781108475662

Nanooptics which describes the interaction of light with matter at the nanoscale, is a topic of great fundamental intere

506 67 7MB Read more

Theoretical Foundations
9783110814644, 9789027912435

Author / Uploaded
Charles A. Ferguson

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

C U R R E N T T R E N D S IN L I N G U I S T I C S VOLUME III

CURRENT TRENDS IN LINGUISTICS Edited by T H O M A S

A.

VOLUME

S E B E O K

III

Theoretical Foundations Associate Editor:

C H A R L E S A.

Assistant Editor:

ALBERT

Assistant to the Editor:

VALDMAN

LUCIA

Second Printing

1970

MOUTON THE HAGUE

FERGUSON

• PARIS

SAUER

© Copyright 1966 by Mouton & Co., Publishers, The Hague, The Netherlands. No part of this book may be translated or reproduced in any form, by print, photoprint, or any other means, without written permission from the publishers. First Printing:

1966

LIBRARY OF CONGRESS CATALOG CARD NUMBER: 64.3663

Printed in The Netherlands by Mouton & Co., Printers, The Hague.

microfilm,

EDITOR'S INTRODUCTION

The Committee on Linguistic Information - the origins and some early activities of which were mentioned in the Editor's Introduction to the initial volume of this series now functions as an advisory body to the Center for Applied Linguistics. Current Trends in Linguistics is, of course, only one of several coordinated avenues through which our Committee attempts to carry out its basic mandate: to facilitate the spontaneous circulation of information amid linguists throughout the world, as well as between linguists on the one hand and consumers both from related segments of the scholarly community and, more widely, from among the interested lay public on the other. In her generous and gratifying review of Vol. 1, Soviet and East European Linguistics (1963), Olga Akhmanova underlined "the fact that international scholarly communication is a bilateral affair. It implies cross-communication and crossfertilization" (Word 21.180f. [1965]). This is a just characterization of the direction of our efforts and points to a primary impulse for launching the series. In an informal talk "On Linguistic Information", delivered at Georgetown University, Charles A. Ferguson, Director of the Center and, since 1963, also Chairman of the Committee, discussed the implications of this expression within the scope of which he explicitly included "general statements of theory, how language works, or formalized or semi-formalized theories that have to do with one language or part of a language or all of human behavior" (Monograph Series on Languages and Linguistics 17.203 [1964]). The contents of this third volume are in keeping with this emphasis on Theoretical Foundations. The following is an outline of the master plan for the Current Trends in Linguistics series, given here to indicate the position of the present volume in the cycle as this has been developed so far. Vol. 2, concurrently in press, is devoted to Linguistics in East Asia and South East Asia, and has been prepared with the collaboration of Associate Editors Y. R. Chao (University of California, Berkeley), Richard B. Noss (Foreign Service Institute), and Joseph K. Yamagiwa (University of Michigan); John R. Krueger (Indiana University) served as the Assistant Editor. Vol. 4, to appear next year, is devoted to Ibero-American and Caribbean Linguistics and being prepared in collaboration with Associate Editors Robert Lado (Georgetown University), Norman A. McQuown (University of Chicago), and Sol Saporta (Uni-

VI

EDITOR'S INTRODUCTION

versity of Washington); Yolanda Lastra (Georgetown University) serves as the Assistant Editor. Vol. 5, scheduled for publication in 1967, will deal with Linguistics in South Asia. The Associate Editors are Murray B. Emeneau (University of California, Berkeley) and Charles A. Ferguson; the Assistant Editors are Gerald B. Kelley (Cornell University) and N o r m a n H. Zide (University of Chicago). Vol. 6, scheduled to appear in 1968, will deal with Linguistics in South West Asia and North Africa. The Associate Editors are Charles A. Ferguson, Carleton T. Hodge (Indiana University), and Herbert H. Paper (University of Michigan); the Assistant Editors are John R. Krueger and Gene M. Schramm (University of Michigan). Vols. 7 and 8 are expected to cover, respectively, Linguistics in Sub-Saharan Africa and Linguistics in Oceania', the editorial boards for these two have not yet been selected. Future volumes under consideration include one on Linguistics of North America, another on Linguistics of Western Europe, and perhaps two more, afterwards, to complete the first cycle: one assessing the scholarly literature on extinct languages not discussed in previous volumes, and another examining the impact of linguistics on related fields and vice versa (i.e., psycholinguistics and sociolinguistics, stylistics and metrics, the information sciences and documentation, language learning and teaching, animal communication, and the like). Although the fundamental organizing principle of this series has been geopolitical chiefly for reasons of convenience in arranging and displaying the data - several volumes with a theoretical instead of an areal orientation have been envisaged from the beginning, as was announced in the first volume and is exemplified in this third. The seven principal chapters which follow are, in fact, consolidated and expanded versions of talks delivered in the Trends in Linguistics Lecture Series at the 1964 Linguistic Institute of the Linguistic Society of America, held at Indiana University. During that summer, seven visiting linguists, each, spent one week lecturing at the Institute and informally discussing with students and faculty the content of their formal presentations: Noam Chomsky spoke on "Topics in the Theory of Generative G r a m m a r " , June 22, 23, 25, 26; Kenneth L. Pike on "Recent Developments in Tagmemic and Matrix Theory", June 29, 30, July 2, 3; Yakov Malkiel on "Problems in the Diachronic Analysis of Word Formation", July 6, 7, 9, 10; Uriel Weinreich on "Explorations in Semantic Theory", July 13, 14, 16, 17; C. F. Hockett on "The Stratificational View", July 23, 24, 27, 28; Joseph H. Greenberg on "Problems in the Study of Universals", August 2, 4, 6, 7; and Mary R. Haas on "The Genetic Relationship of Languages", August 10, 11, 12. On June 23, Robert Godel spoke on "F. de Saussure's Theory of Language", now published as Appendix 1; and, on July 31, Edward Stankiewicz gave the Collitz Lecture, on "Slavic Morphophonemics in its Typological and Diachronic Aspects", included here as Appendix 11. (For a complete listing of the 1964 Linguistic Institute special lectures - virtually all of which are appearing in several books now in press - see the Director's Report in Bulletin 38 of the Linguistic Society.)

EDITOR'S INTRODUCTION

VII

The 1964 Linguistic Institute's Trends in Linguistics Lecture Series and, therefore, the production of this book were financed by a grant (GN313) from the National Science Foundation to the Indiana University Research Center in Anthropology, Folklore, and Linguistics, through the Indiana University Foundation. This assistance is acknowledged with pleasure. The undersigned - in consultation with Ferguson (who was also Associate Director of the Linguistic Institute) and Albert Valdman (who served as its Assistant Director) was responsible for the selection of the seven principal contributors to Theoretical Foundations, each of whom, however, had complete freedom of choice as to topic and the manner of its treatment. The coverage aimed at was, of course, in no sense meant to be exhaustive; obviously, there are other important currents in linguistic theory that are hardly alluded to in this volume and will need to be presented subsequently. The Editor is much obliged to Professors Godel and Stankiewicz for their permission to append the written version of their respective lectures to this book and the double opportunity thus afforded to expand its horizons beyond its original limits. The Index of Names and the Index of Terms were prepared under Ferguson's supervision, the latter by Rose Nash. Lucia Sauer facilitated the transformation of manuscripts into print with such extraordinary success that she herself underwent, in the process, a metamorphosis from Assistant to the Editor into employee of the publisher; Indiana's loss is The Netherlands' gain. The suggestion to make available selected chapters of this volume in separate, inexpensive booklets - particularly for the use of students - was first broached with the Editor by M. A. K. Halliday, and realized with the publishers through the cooperation of Cornelis H. van Schooneveld. Inquiries concerning this alternative format may be directed to Mouton & Co. Bloomington, November 1, 1965

THOMAS A . SEBEOK

CONTENTS

EDITOR'S INTRODUCTION

v

TOPICS IN THE THEORY OF GENERATIVE GRAMMAR, BY NOAM CHOMSKY

Topics in the Theory of Generative Grammar Bibliography

1 58

LANGUAGE UNIVERSALS, BY JOSEPH H. GREENBERG

Language Universals Appendix on Word Association

61 Ill

HISTORICAL LINGUISTICS A N D THE GENETIC RELATIONSHIP OF LANGUAGES, BY MARY R. HAAS

1. Introductory Remarks 2. Protolanguages and Problems of Reconstruction 3. The Ranking of Protolanguages and Problems of Comparison at Deeper Levels 4. Problems of Classification 5. Supplemental Methods

113 123 138 145 152

LANGUAGE, MATHEMATICS, A N D LINGUISTICS, BY CHARLES F. HOCKETT t

0. Introduction 1. Mathematical Background 2. Linear Generative Grammars

155 157 182

X

3. 4. 5. 6. 7.

CONTENTS

Stepmatrices From Phonons to the Speech Signal Binary Tree Grammars Conversion Grammars Ensembles of Grammars

203 233 241 256 285

GENETIC ANALYSIS OF WORD FORMATION, BY YAKOV MALKIEL

Prefatory Note I. The Place of Affixation in the Edifice of Language II. The Genesis of a Derivational Suffix III. Diffusion of Derivational Suffixes IV. The Esthetic Dimension of Derivation V. Prospects of Future Research

305 307 323 333 346 355

A G U I D E TO PUBLICATIONS RELATED TO TAGMEMIC THEORY, BY KENNETH L. PIKE

A 1. 2. 3. 4.

Guide to Publications related to Tagmemic Theory Relativity of Description and Theory to the Observer Complementarity of Perspective Field as Dimensional Matrix Wave as Nucleus plus Margin A. In Phonology B. In Grammar C. In Lexicon and Culture 5. Particle as Structured Unit A. In Grammar Units B. In Phonology C. In Lexicon and Culture Index Appendix

365 366 367 367 376 377 379 380 380 381 389 390 391 393

EXPLORATIONS IN SEMANTIC THEORY, BY URIEL WEINREICH

1. Introduction 2. The Semantic Theory K F : A Critical Analysis 3. A New Semantic Theory 4. Concluding Remarks Bibliography

395 397 417 467 474

CONTENTS

APPENDIX I: F. de Saussure's Theory of Language, by Robert Godel . . . .

XI

479

APPENDIX II: Slavic Morphophonemics in its Typological and Diachronic Aspects, by Edward Stankiewicz 495 BIOGRAPHICAL NOTES

521

INDEX OF TERMS

525

INDEX OF NAMES

533

TOPICS I N T H E T H E O R Y OF GENERATIVE GRAMMAR* NOAM CHOMSKY

I My original intention was to use these lectures to present some recent work on general linguistic theory and on the structure of English, within the general framework of transformational generative grammar. However, a sequence of recent publications has indicated that many points that I had hoped to take for granted are widely regarded as controversial, and has also indicated misunderstanding, on a rather substantial scale, of the general framework I had expected to presuppose — in particular, a misunderstanding as to which elements of this framework express substantive assumptions about the nature of language and are, therefore, matters of legitimate controversy and rational discussion, and which, on the other hand, relate only to questions of goals and interests and are therefore no more subject to debate than the question: is chemistry right or wrong? In the light of this, it seems advisable to change my original plan and to spend much more time on background assumptions and general questions of various sorts than I had at first intended. I still hope to be able to incorporate an exposition (much abbreviated) of some recent work, but I will lead up to it more slowly, in the following steps: 1. 2. 3.

4.

discussion of general background assumptions and goals that underlie and motivate much of the work in generative grammar of the past decade; discussion of various objections to this general point of view that seem to me to be based on error, misunderstanding, or equivocation of one sort or another; presentation of a theory of generative grammar of a sort exemplified, for example, in N. Chomsky, Syntactic structures (The Hague, 1957), R. B. Lees, The Grammar of English nominalizations (Bloomington, 1960), M. Halle, "Phonology in a generative grammar", Word 18.54-72 (1962), and J. Katz and J. Fodor, "The Structure of a semantic theory", Lg. 39.170-210 (1963); discussion of various real inadequacies that have been exposed in this position in work of the past half-dozen years; and

* This work was supported in part by a grant from the National Institutes of Health, No. MH-05129-04, to Harvard University, Center for Cognitive Studies, and in part by a grant from the American Council of Learned Societies.

2 5.

NOAM CHOMSKY

sketch of a refined and improved version of this theory, designed to overcome these difficulties.

I will try to cover these points in the first three sections, concentrating largely on syntax. Section I will deal with the first point, section II with the second, and section III with the third, fourth and fifth. In the final section I will discuss an approach to the study of sound structure that has been gradually evolving since Chomsky, Halle, and F. Lukoff, "On accent and juncture in English", For Roman Jakobson, eds. M. Halle, H. Lunt, and H. MacLean 65-80 (The Hague, 1956) and has been presented in various stages of development in publications of Halle's and mine (listed in the bibliography below) since then, and will, hopefully, soon emerge to full light of day in a book that is now in active preparation. In the course of this presentation, I will also discuss a few criticisms of this approach. The discussion of criticisms will be very brief, however, since Halle and I have discussed most of them, insofar as they are known to us, in considerable detail elsewhere. 1 In general, this article contains no new or original material. It is intended only as an informal guide to other books and papers, 2 in which questions touched on here are dealt with more thoroughly, and as an attempt to clarify issues that have been raised in critical discussion. In the course of this paper I will also make a few remarks about historical backgrounds for the position that will be outlined. 3 Quite a few commentators have assumed that recent work in generative grammar is somehow an outgrowth of an interest in the use of computers for one or another purpose, or that it has some other engineering motivation, or that it perhaps constitutes some obscure branch of mathematics. This view is incomprehensible to me, and it is, in any event, entirely false. Much more perceptive are those critics who have described this work as in large measure a return to the concerns and often even the specific doctrines of traditional linguistic theory. This is true — apparently to an extent that many critics do not realize.4 I 1 In particular, see Chomsky, Current issues in linguistic theory 31, 105-7 (The Hague, 1964), which deals with criticisms in C. A. Ferguson's review of Halle, The sound pattern of Russian (The Hague, 1959); and in Chomsky and Halle, "Some controversial questions in phonological theory", Journal of Linguistics 1.97-138 (1965), which deals with objections raised by F. W. Householder jr., "On some recent claims in phonological theory", Journal of Linguistics 1/1 (1965). 2 E.g. Katz and P. Postal, An integrated theory of linguistic description (Cambridge, Mass., 1964); Chomsky, Current issues in linguistic theory, and Aspects of the theory of syntax (Cambridge, Mass., 1965). 3 This matter is discussed in more detail in Chomsky, Current issues in linguistic theory, § 1, in Aspects of the theory of syntax, Ch. 1, § 8, and in Cartesian linguistics (to appear). 4 T o cite just one example, consider A. Reichling, "Principles and methods of syntax: cryptanalytical formalism", Lingua 10.1-17 (1961), who asserts that obviously I could not 'be said to sympathize with such a "mentalistic monster" as the "innere Sprachform"'. But in fact the work that he is discussing is quite explicitly and selfconsciously mentalistic (in the traditional, not the Bloomfieldian, sense of this word — that is, it is an attempt to construct a theory of mental processes), and it can, furthermore, be quite accurately described as an attempt to develop further the Humboldtian notion of 'form of language' and its implications for cognitive psychology, as will surely be evident to anyone

TOPICS IN THE THEORY OF GENERATIVE GRAMMAR

3

differ from them only in regarding this observation not as a criticism, but rather as a definite merit of this work. That is, it seems to me that it is the modern study of language prior to the explicit study of generative grammar that is seriously defective in its failure to deal with traditional questions and, furthermore, to recognize the essential correctness of many of the traditional answers and the extent to which they provide a fruitful basis for current research. A distinction must be made between what the speaker of a language knows implicitly (what we may call his competence) and what he does (his performance). A grammar, in the traditional view, is an account of competence. It describes and attempts to account for the ability of a speaker to understand an arbitrary sentence of his language and to produce an appropriate sentence on a given occasion. If it is a pedagogic grammar, it attempts to provide the student with this ability; if a linguistic grammar, it aims to discover and exhibit the mechanisms that make this achievement possible. The competence of the speaker-hearer can, ideally, be expressed as a system of rules that relate signals to semantic interpretations of these signals. The problem for the grammarian is to discover this system of rules; the problem for linguistic theory is to discover general properties of any system of rules that may serve as the basis for a human language, that is, to elaborate in detail what we may call, in traditional terms, the general form of language that underlies each particular realization, each particular natural language. Performance provides evidence for the investigation of competence. At the same time, a primary interest in competence entails no disregard for the facts of performance and the problem of explaining these facts. On the contrary, it is difficult to see how performance can be seriously studied except on the basis of an explicit theory of the competence that underlies it, and, in fact, contributions to the understanding of performance have largely been by-products of the study of grammars that represent competence. 5 Notice, incidentally, that a person is not generally aware of the rules that govern sentence-interpretation in the language that he knows; nor, in fact, is there any reason to suppose that the rules can be brought to consciousness. Furthermore, there is no reason to expect him to be fully aware even of the empirical consequences of these internalized rules — that is, of the way in which signals are assigned semantic interpretations by the rules of the language that he knows (and, by definition, knows perfectly). On the difficulties of becoming aware of one's own linguistic intuitions, familiar both with Humboldt and with recent work in generative grammar (for explicit discussion, see the references cited above). I will not consider Reichling's criticisms of generative grammar here. The cited remark is just one illustration of his complete lack of comprehension of the goals, concerns, and specific content of the work that he was discussing, and his discussion is based on such gross misrepresentation of this work that comment is hardly called for. 6 For discussion, see G. A. Miller and Chomsky, "Finitary models of language users", Handbook of mathematical psychology, Vol. II, eds. R. D . Luce, R. Bush, and E. Galanter ( N e w York, 1963); Chomsky, Aspects of the theory of syntax, Ch. 1, § 2.

4

NOAM CHOMSKY

see the discussion in C h o m s k y , Aspects

of the theory of syntax,

Ch. 1, § 4. It is im-

portant to realize that there is no paradox in this ; in fact, it is precisely what should be expected. Current work in generative grammar has adopted this traditional framework of interests and concerns. It attempts to go beyond traditional grammar in a fundamental way, however. As has repeatedly been emphasized, traditional grammars make an essential appeal to the intelligence of the reader. They do not actually formulate the rules of the grammar, but rather give examples and hints that enable the intelligent reader to determine the grammar, in some way that is not at all understood. They do not provide an analysis of the 'faculté de langage' that makes this achievement possible. To carry the study of language beyond its traditional bounds, it is necessary to recognize this limitation and to develop means to transcend it. This is the fundamental problem to which all work in generative grammar has been addressed. The most striking aspect of linguistic competence is what we may call the 'creativity of language', that is, the speaker's ability to produce new sentences, sentences that are immediately understood by other speakers although they bear no physical resemblance to sentences which are 'familiar'. The fundamental importance of this creative aspect of normal language use has been recognized since the seventeenth century at least, and it was at the core of Humboldtian general linguistics. Modern linguistics, however, is seriously at fault in its failure to come to grips with this central problem. In fact, even to speak of the hearer's 'familiarity with sentences' is an absurdity. Normal use of language involves the production and interpretation of sentences that are similar to sentences that have been heard before only in that they are generated by the rules of the same grammar, and thus the only sentences that can in any serious sense be called 'familiar' are clichés or fixed formulas of one sort or another. The extent to which this is true has been seriously underestimated even by those linguists (e.g. O .Jespersen) who have given some attention to the problem of creativity. This is evident from the common description of language use as a matter of 'grammatical habit' [e.g. O. Jespersen, Philosophy of grammar (London, 1924)]. It is important to recognize that there is no sense of 'habit' known to psychology in which this characterization of language use is true (just as there is no notion of 'generalization' known to psychology or philosophy that entitles us to characterize the new sentences of ordinary linguistic usage as generalizations of previous performance). The familiarity of the reference to normal language use as a matter of 'habit' or as based on 'generalization' in some fundamental way must not blind one to the realization that these characterizations are simply untrue if terms are used in any technical or well-defined sense, and that they can be accepted only as metaphors — highly misleading metaphors, since they tend to lull the linguist into the entirely erroneous belief that the problem of accounting for the creative aspect of normal language use is not after all a very serious one. Returning now to the central topic, a generative grammar (that is, an explicit grammar that makes no appeal to the reader's 'faculté de langage' but rather attempts to incorporate the mechanisms of this faculty) is a system of rules that relate signals to

TOPICS IN THE THEORY OF GENERATIVE GRAMMAR

5

semantic interpretations of these signals. It is descriptively adequate to the extent that this pairing corresponds to the competence of the idealized speaker-hearer. The idealization is (in particular) that in the study of grammar we abstract away from the many other factors (e.g., memory limitations, distractions, changes of intention in the course of speaking, etc.) that interact with underlying competence to produce actual performance. If a generative grammar is to pair signals with semantic interpretations, then the theory of generative grammar must provide a general, language-independent means for representing the signals and semantic interpretations that are interrelated by the grammars of particular languages. This fact has been recognized since the origins of linguistic theory, and traditional linguistics made various attempts to develop theories of universal phonetics and universal semantics that might meet this requirement. Without going into any detail, 1 think it would be widely agreed that the general problem of universal phonetics is fairly well-understood (and has been, in fact, for several centuries), whereas the problems of universal semantics still remain veiled in their traditional obscurity. We have fairly reasonable techniques of phonetic representation that seem to approach adequacy for all known languages, though, of course, there is much to learn in this domain. In contrast, the immediate prospects for universal semantics seem much more dim, though surely this is no reason for the study to be neglected (quite the opposite conclusion should, obviously, be drawn). In fact, recent work of Katz, Fodor, and Postal, to which I return in the third section, seems to me to suggest new and interesting ways to reopen these traditional questions. The fact that universal semantics is in a highly unsatisfactory state does not imply that we must abandon the program of constructing grammars that pair signals and semantic interpretations. For although there is little that one can say about the languages ndependent system of semantic representation, a great deal is known about conditions that semantic representations must meet, in particular cases. Let us then introduce the neutral technical notion of 'syntactic description', and take a syntactic description of a sentence to be an (abstract) object of some sort, associated with the sentence, that uniquely determines its semantic interpretation (the latter notion being left unspecified pending further insights into semantic theory) 6 as well as its phonetic form. A particular linguistic theory must specify the set of possible syntactic descriptions for sentences of a natural language. The extent to which these syntactic descriptions meet the conditions that we know must apply to semantic interpretations provides one measure of the success and sophistication of the grammatical theory in question. As the theory of generative grammar has progressed, the notion of syntactic description has been clarified and extended. I will discuss below some recent ideas on just what should constitute the syntactic description of a sentence, if the theory of generative grammar is to provide descriptively adequate grammars. 6 W o r k i n g in this f r a m e w o r k t h e n , we w o u l d r e g a r d a s e m a n t i c a l l y a m b i g u o u s m i n i m a l element as c o n s t i t u t i n g t w o distinct lexical e n t r i e s ; hence t w o s y n t a c t i c d e s c r i p t i o n s m i g h t differ only in t h a t they c o n t a i n different m e m b e r s of a p a i r of h o m o n y m o u s m o r p h e m e s .

6

NOAM CHOMSKY

Notice that a syntactic description (henceforth, SD) may convey information about a sentence beyond its phonetic form and semantic interpretation. Thus we should expect a descriptively adequate grammar of English to express the fact that the expressions (l)-(3) are ranked in the order given in terms of 'degree of deviation' from English, quite apart from the question of how interpretations can be imposed on them [in the case of (2) and (3)]: (1) (2) (3)

the dog looks terrifying the dog looks barking the dog looks lamb

A generative grammai, then, must at least determine a pairing of signals with SD's; and a theory of generative grammar must provide a general characterization of the class of possible signals (a theory of phonetic representation) and the class of possible SD's. A grammar is descriptively adequate to the extent that it is factually correct in a variety of respects, in particular, to the extent that it pairs signals with SD's that do in fact meet empirically given conditions on the semantic interpretations that they support. For example, if a signal has two intrinsic semantic interpretations in a particular language [e.g., (4) or (5), in English], a grammar of this language will approach descriptive adequacy if it assigns two SD's to the sentence, and, beyond this, it will approach descriptive adequacy to the extent that these SD's succeed in expressing the the basis for the ambiguity. (4) (5)

they don't know how good meat tastes what disturbed John was being disregarded by everyone

In the case of (4), for example, a descriptively adequate grammar must not only assign two SD's to the sentence but must also do so in such a way that in one of these the grammatical relations of good, meat, and taste are as in 'meat tastes good', while in the other they are as in 'meat which is good tastes Adjective' (where the notion 'grammatical relation' is to be defined in a general way within the linguistic theory in question), this being the basis for the alternative semantic interpretations that may be assigned to this sentence. Similarly, in the case of (5), it must assign to the pair disregard-John the same grammatical relation as in 'everyone disregards John', in one SD; whereas in the other it must assign this very same relation to the pair disregard-what (disturbed John), and must assign no semantically functional grammatical relation at all to disregardJohn. On the other hand, in the case of (6) and (7) only one SD should be assigned by a descriptively adequately grammar. This SD should, in the case of (6), indicate that John is related to incompetent as it is in 'John is incompetent' and, that John is related to regard (as incompetent) as it is in 'everyone regards John as incompetent'. In the case of (7), the SD must indicate that our is related to regard (as incompetent) as us is related to regard (as incompetent) in 'everyone regards us as incompetent'. (6) (7)

what disturbed John was being regarded as incompetent by everyone. what disturbed John was our being regarded as incompetent by everyone.

TOPICS IN THE THEORY OF GENERATIVE GRAMMAR

7

Similarly, in the case of (8), the grammar must assign four distinct SD's, each of which specifies the system of grammatical relations that underlies one of the distinct semantic interpretations of this sentence: (8)

the police were ordered to stop drinking after midnight.

Examples such as these should suffice to illustrate what is involved in the problem of constructing descriptively adequate generative grammars and developing a theory of grammar that analyzes and studies in full generality the concepts that appear in these particular grammars. It is quite evident from innumerable examples of this sort that the conditions on semantic interpretations are sufficiently clear and rich so that the problem of defining the notion 'syntactic description' and developing descriptively adequate grammars (relative to this notion of SD) can be made quite concrete, despite the fact that the notion 'semantic interpretation' itself still resists any deep analysis. We return to some recent ideas on semantic interpretation of SD's in section III. A grammar, once again, must pair signals and SD's. The SD assigned to a signal must determine the semantic interpretation of the signal, in some way which, in detail, remains unclear. Furthermore, each S D must uniquely determine the signal of which it is the SD, (uniquely, that is, up to free variation). Hence the S D must (i) determine a semantic interpretation and (ii) determine a phonetic representation. Let us define the 'deep structure of a sentence' as that aspect of the SD that determines its semantic interpretation, and the 'surface structure of a sentence' as that aspect of the S D that determines its phonetic form. A grammar, then must consist of three components: a syntactic component, which generates SD's each of which consists of a surface structure and a deep structure; a semantic component, which assigns a semantic interpretation to a deep structure; a phonological component, which assigns a phonetic interpretation to a surface structure. Thus the grammar as a whole will associate phonetic representations and semantic interpretations, as required, this association being mediated by the syntactic component that generates deep and surface structures as elements of SD's. The notions 'deep structure' and 'surface structure' are intended as explications of the Humboldtian notions 'inner form of a sentence' and 'outer form of a sentence' (the general notion 'form' is probably more properly to be related to the notion 'generative grammar' itself — cf. Chomsky, Current issues in linguistic theory, for discussion). The terminology is suggested by the usage familiar in contemporary analytic philosophy [cf., for example, Wittgenstein, Philosophical investigations 168 (Oxford, 1953)]. C. F. Hockett has also used these terms [A course in modem linguistics, Ch. 29 (New York, 1958)] in roughly the same sense. There is good reason (see below, section IV) to suppose that the surface structure of a sentence is a labeled bracketing that segments it into its continuous constituents, categorizes these, segments the constituents into further categorized constituents, etc. T h u s underlying (6), for example, is a surface structure that analyzes it into its constituents (perhaps, 'what disturbed John', 'was', 'being regarded as incompetent by

8

NOAM CHOMSKY

everyone'), assigning each of these to a certain category indicated by the labeling, then further segmenting each of these into its constituents (e.g., perhaps, 'what disturbed John' into 'what' and 'disturbed John'), each of these being assigned to a category indicated by the labeling, etc., until ultimate constituents are reached. Information of this sort is, in fact, necessary to determine the phonetic representation of this sentence. The labeled bracketing can be presented in a tree-diagram, or in other familiar notations. It is clear, however, that the deep structure must be quite different f r o m this surface structure. For one thing, the surface representation in no way expresses the grammatical relations that are, as we have just observed, crucial for semantic interpretation. Secondly, in the case of an ambiguous sentence such as, for example, (5), only a single surface structure may be assigned, but the deep structures must obviously differ. Such examples as these are sufficient to indicate that the deep structure underlying a sentence cannot be simply a labeled bracketing of it. Since there is good evidence that the surface structure should, in fact, simply be a labeled bracketing, we conclude that deep structures cannot be identified with surface structures. The inability of surface structure to indicate semantically significant grammatical relations (i.e., to serve as deep structure) is one fundamental fact that motivated the development of transformational generative grammar, in both its classical and modern varieties. In summary, a full generative grammar must consist of a syntactic, semantic, and phonological component. The syntactic component generates SD's each of which contains a deep structure and a surface structure. The semantic component assigns a semantic interpretation to the deep structure and the phonological component assigns a phonetic interpretation to the surface structure. An ambiguous sentence has several SD's, differing in the deep structures that they contain (though the converse need not be true). So far I have said little that is in any way controversial. This discussion has so far simply delimited a certain domain of interest and a certain class of problems, and has suggested a natural framework for dealing with these problems. The only substantive comments (i.e. factual assertions) that I have so far made within this framework are that the surface structure is a labeled bracketing and that deep structures must be distinct from surface structures. The first of these assertions is well-supported (see below), and would probably be widely accepted. The second is surely much too obvious to require elaborate defense. To go on from here to develop a substantive linguistic theory we must provide: (9)

(i) (ii) (iii) (iv)

theories of phonetic and semantic representation a general account of the notion 'syntactic description' a specification of the class of potential generative grammars a general account of how these grammars function, that is, how they generate SD's and assign to them phonetic and semantic interpretations, thus pairing phonetically represented signals with semantic interpretations.

TOPICS IN THE THEORY OF GENERATIVE GRAMMAR

9

Before going on to discuss these substantive questions, let us reassure ourselves about the uncontroversial character of what has preceded. Is there, in fact, anything in this account to which exception can be taken? Surely there is no conceivable question about the necessity for distinguishing competence from performance in the way suggested above. Having made this distinction, one may or may not choose to be interested in the general question of accounting for linguistic competence. If one chooses to concern himself with this question, he must immediately face the fact of 'creativity' and must therefore focus attention on the problem of constructing generative grammars. It is difficult to see how a full generative grammar can be regarded, ultimately, as anything other than a system of rules that relate signals to semantic interpretations; and, having set this goal, one is immediately faced with the problem of developing a rich enough notion of 'syntactic description' to support phonetic interpretation, on the one side, and semantic interpretation, on the other. The distinction between deep and surface structure emerges from even the most superficial examination of real linguistic material. Hence the conclusions outlined so far seem inescapable if the problem of studying linguistic competence is taken up. Notice that a substantive linguistic theory involves a specification of (9iv) as well as (9iii). For example, an essential part of the theory of phrase structure grammar is a particular specification of how categories and relations are determined for generated strings (see Chomsky, Logical structure of linguistic theory, Cambridge, 1955, chapter VI), and such a specification has been presupposed whenever this theory has been investigated. A change in this specification is as much a revision of the theory as a change in the specification of the class (9iii) of potential grammars. Failure to understand this leads to immediate absurdities. Thus if one thinks of the theory of 'phrase structure grammar' with the technique of interpretation (9iv) left free, one can easily prove that a phrase structure grammar of the language L assigns to sentences of L the structural descriptions assigned by some transformational grammar of L, etc. This point should be obvious without further discussion. Suppose that one chooses not to study linguistic competence (and, concomitantly, linguistic performance within the framework of a theory of competence). One might, alternatively, choose to limit attention to performance, or to surface structures, or to sound patterns in isolation from syntactic structure, or to voiced fricatives, or the first halves of sentences. The only question that arises, if any of these proposals is adopted, is whether any interesting result is likely to be attainable under such arbitrary limitations of subject matter. In each of the cited cases it seems quite unlikely. It is, in general, unclear why anyone should insist on studying an isolated aspect of the general problem of grammatical description unless there is some reason to believe that this is not affected by the character of other aspects of grammar. 7 7

Perhaps this matter can be clarified by considering examples of the latter sort. Thus, for example, it is quite reasonable to study semantics in isolation from phonology or phonology in isolation from semantics, since, at the moment, there seems t o be n o n o n - t r i v i a l r e l a t i o n between t h e s y s t e m s of

phonological and semantic interpretation and no significant way in which semantic considerations

10

NOAM CHOMSKY

I have been discussing so far only the question of descriptive adequacy of grammars and the problem of developing a linguistic theory that will provide the basis for the construction of descriptively adequate grammars. As has been repeatedly emphasized, however [see, e.g. Chomsky, Syntactic structures; "Explanatory models in linguistics", Logic, methodology, and philosophy of science, eds. E. Nagel, P. Suppes, and A. Tarski 528-50 (Stanford, 1962); Current issues in linguistic theory, and Aspects of the theory of syntax], the goals of linguistic theory can be set much higher than this; and, in fact, it is a prerequisite even for the study of descriptive adequacy that they be set higher than this. It is essential also to raise the question of 'explanatory adequacy' of linguistic theory. The nature of this question can be appreciated readily in terms of the problem of constructing a hypothetical language-acquisition device AD that can provide as 'output' a descriptively adequate grammar G for the language L on the basis of certain primary linguistic data from L as an input; that is, a device represented schematically as (10): (10)

primary linguistic data -> |AD|

G

We naturally want the device AD to be language-independent — that is, capable of learning any human language and only these. We want it, in other words, to provide an implicit definition of the notion 'human language'. Were we able to develop the specifications for a language-acquisition device of this sort, we could realistically claim to be able to provide an explanation for the linguistic intuition — the tacit competence — of the speaker of a language. This explanation would be based on the assumption that the specifications of the device AD provide the basis for languageacquisition, primary linguistic data from some language providing the empirical conditions under which the development of a generative grammar takes place. The difficulties of developing an empirically adequate language-independent specification

can play a role in p h o n o l o g y or phonological considerations in semantics. Similarly, it seems quite reasonable to develop a theory of syntactic structure with n o primitive notions of an essentially semantic nature, since, at the m o m e n t , there is no reason to assume that a priori semantic concepts play a role in determining the organization of the syntactic c o m p o n e n t of a grammar. On the other hand, it would be absurd to study semantics (and similarly, it seems to me, phonology) in isolation from syntax, since the syntactic interpretation of a sentence (similarly, its phonetic interpretation) depends in an essential way on its deep (respectively, surface) structure. And it would be absurd to develop general syntactic theory without assigning an absolutely crucial role to semantic considerations, since obviously the necessity to support semantic interpretation is one of the primary requirements that the structures generated by the syntactic c o m p o n e n t of a grammar must meet. For discussion of these points, see C h o m s k y (Syntactic structures; Current issues in linguistic theory), Lees, Review of C h o m sky, Syntactic structures, Lg. 33.375-408 (1957), Katz and Postal (An integrated theory of linguistic description), and many other references. Far t o o little care has been taken in the discussion of these questions in modern linguistics. As a result, there has been much confusion about them, and many dogmatic claims have been voiced and repeatedly echoed with no attempt to justify or support them by serious argument. The issues are important; while n o answers to any of these questions can be given with any certainty, the tentative position that the linguist accepts may have an important influence on the character of the work that he d o e s .

TOPICS IN THE THEORY OF GENERATIVE GRAMMAR

11

of A D are too obvious to require extended discussion; the vital importance of raising this problem and pursuing it intensively at every stage of linguistic investigation also seems to me entirely beyond the possibility of debate (cf. the references cited above for elaboration of this point). To pursue the study of explanatory adequacy, we may proceed in two parallel ways. First, we must attempt to provide as narrow a specification of the aspects of linguistic theory listed in (9) as is compatible with the known diversity of languages — we must, in other words, develop as rich a hypothesis concerning linguistic universals as can be supported by available evidence. This specification can then be attributed to the system AD as an intrinsic property. Second, we may attempt to develop a general evaluation procedure, as an intrinsic property of AD, which will enable it to select a particular member of the class of grammars that meet the specifications (9) (or, conceivably, to select a small set of alternatives, though this abstract possibility is hardly worth discussing for the present) on the basis of the presented primary linguistic data. This procedure will then enable the device to select one of the a priori possible hypotheses - one of the permitted grammars - that is compatible with the empirically given data from a given language. Having selected such a hypothesis, it has 'mastered' the language described by this grammar, (and it thus knows a great deal beyond what it has explicitly 'learned'). Given a linguistic theory that specifies (9) and an evaluation procedure, we can explain some aspect of the speaker's competence whenever we can show with some plausibility that this aspect of his competence is determined by the most highly valued grammar of the permitted sort that is compatible with data of the kind to which he has actually been exposed. Notice that an evaluation procedure (simplicity measure, as it is often called in technical discussion) is itself an empirical hypothesis concerning universal properties of language; it is, in other words, a hypothesis, true or false, about the prerequisites for language-acquisition. To support or refute this hypothesis, we must consider evidence as to the factual relation between primary linguistic data and descriptively adequate grammars. We must ask whether the proposed evaluation procedure in fact can mediate this empirically given relation. An evaluation procedure, therefore, has much the status of a physical constant; in particular, it is impossible to support or reject a specific proposal on the basis of a priori argument. Once again, it is important to recognize that there is nothing controversial in what has just been said. One may or may not choose to deal with the problem of explanatory adequacy. One who chooses to overlook this problem may (and, in my opinion, surely will) find that he has eliminated from consideration one of the most important sources of evidence bearing on the problems that remain (in particular, the problem of descriptive adequacy). 8 His situation, then, may be quite analogous to that of the 8 The reason for this is quite simple. Choice of a descriptively adequate grammar for the language L is always much underdetermincd (for the linguist, that is) by data from L. Other relevant data can be adduced from study of descriptively adequate grammars of other languages, but only if the linguist has an explanatory theory of the sort just sketched. Such a theory can receive empirical support from

12

NOAM CHOMSKY

person who has decided to limit his attention to surface structures (to the exclusion of deep structures) or to first halves of sentences. He must show that the delimitation of interest leaves him with a viable subject. But, in any event, he surely has no basis for objecting to the attempt on the part of other linguists to study the general question of which he has (artificially, in my opinion) delimited one facet. I hope that these remarks will be sufficient to show the complete pointlessness of much of the debate over the specific evaluation procedures (simplicity measures) that have been proposed as empirical hypotheses concerning the form of language in the course of work in generative grammar. To mention just one example, consider Householder's criticism (Householder, "On some recent claims in phonological theory") of several proposals of Halle's regarding an appropriate evaluation procedure for phonology. Halle presented a certain theory of phonological processes, including, as an essential part, a certain empirical hypothesis regarding a simplicity measure. A crucial aspect of this theory was its complete reliance on distinctive features in the formulation of phonological rules to the exclusion of any 'segmental' notation (e.g., phonemic notation) except as an informal expository device. His evaluation measure involved minimization of features in the lexicon and the phonological rules. In support of this theory he showed that a variety of facts can be explained on these assumptions. He also discussed alternative theories that use segmental notation along with or instead of feature notation and gave several arguments to show that under these assumptions it is difficult to see how any empirically valid evaluation measure can be formulated — in particular, he showed how various rather natural measures involving minimization or maximization fail on empirical grounds. Householder makes no attempt to refute these arguments but simply objects to them because they fail to meet certain a priori conditions that he arbitrarily imposes on any notion of 'evaluation procedure', in particular, the requirement that such a procedure must favor grammars that use fewer symbols and that are easy for the linguist to read. Since the grammars that Halle proposes, with their consistent reliance on feature representation, require more symbols than grammars that use auxiliary symbols as abbreviations for feature sets, and since Halle's grammars are (Householder claims) not easy to read, he concludes that the theory on which they are based must be mistaken. But clearly a priori arguments of this sort have no bearing on an

its success in providing descriptively adequate grammars for other languages. Furthermore, it prescribes, in advance, the form of the grammar of L and the evaluation procedure that leads to the selection of this grammar, given data. In this way, it permits data from other languages to play a role in justifying the grammar selected as an empirical hypothesis concerning the speakers of L. This approach is quite natural. Following it, the linguist comes to a conclusion about the speakers of L on the basis of an independently supported assumption about the nature of language in general — an assumption, that is, concerning the general 'faculté de langage' that makes language-acquisition possible. The general explanatory theory of language and the specific theory of a particular language that results from application of the general theory to data each have psychological content, the first as a hypothesis about innate mental structure, the second as a hypothesis about the tacit knowledge that emerges with exposure to appropriate experience.

TOPICS IN THE THEORY OF GENERATIVE GRAMMAR

13

empirical hypothesis about the nature of language (i.e. about the structure of a general language-acquisition device of the sort described above). Consequently, Householder's critique has no relevance to any issue that Halle discusses. Unfortunately, much of the criticism of recent attempts to develop valid evaluation measures is based on similar presuppositions. Notice, incidentally, that there is an interesting but poorly understood sense in which one can talk of the 'simplicity' or 'elegance' or 'naturalness' of a theory (of language, of the chemical bond, etc.), but this 'absolute' sense of simplicity has no clear relevance to the attempt to develop an evaluation measure (a simplicity measure) as a part of a theory of grammar. Such a theory is an empirical hypothesis, true or false, proposed to account for some domain of linguistic fact. The 'simplicity measure' that it contains is a constituent part of this empirical hypothesis. This distinction between 'simplicity' as an absolute notion of general epistemology and 'simplicity' as a part of a theory of grammar has been repeatedly emphasized; confusion regarding this point has, nevertheless, been quite widespread. Failure to make this distinction vitiates most of the criticism of evaluation procedures that has appeared in recent years.

II

I have concluded part 1 of the outline presented in the introductory remarks to this paper and would now like to turn to part 2, namely, to various objections that have been raised against the position sketched above. I have tried to indicate why I think any such objections must be mistaken, by attempting to show that the position is really quite uncontroversial. Perhaps further clarification can be achieved through a more detailed examination of some of these objections. Consider first the Reichling-Uhlenbeck criticisms. 9 Their view is, apparently, that the linguist must limit himself to what I called above 'surface structure', in fact, to certain restricted aspects of surface structure. They observe that a sentence is a linear sequence of elements which are grouped into units and then into larger units 'according to certain rules'. 10 The only clues to the relational structure of an utterance are in9 Reichling, op. cit.; E. M. Uhlenbeck, "An appraisal of transformational theory", Lingua 12.1-18 (1963). I will concentrate on the latter, since, as noted above, Reichling's remarks are based on an account of 'generative grammar' that has little identifiable relation to any of the actual work in generative grammar. However, Uhlenbeck asserts that their views as to the nature of syntactic description are essentially the same, so perhaps nothing is lost by restricting the discussion largely to his paper. 10 N o examples of such rules are given. In his discussion of linguistic rules, Uhlenbeck limits himself to rules governing morphological processes (e.g. formation of plurals). He gives no examples of recursive rules, and therefore does not touch upon what for syntax is the central problem, namely, the the problem of creativity mentioned above. We are therefore left to guess from his examples what kind of system of rules he has in mind. That he is in any position to deal with the problem of creativity is doubtful, given his view that the linguist must consider only 'regularities ... observed in speech', and must limit himself to the 'set of habits by which ... [native speakers] ... know how to proceed in speech'. But there is no reason to believe that syntactic rules represent 'observable regu-

14

NOAM CHOMSKY

tonation, arrangement, and phonetic signals (by 'arrangement', I presume they mean linear ordering and grouping of units into larger units). This grouping of units into larger units, these into larger units, etc., defines the 'fundamental aspects of the utterance'. 'It is impossible to conceive of other types of syntagmatic indications.' One must avoid the making of distinctions that are not present (presumably, this means 'formally marked') in the linguistic data. Thus, for example, transitive and intransitive verbs cannot be distinguished in English (cf. Uhlenbeck, 17) — it is, therefore, impossible to distinguish 'John compelled' or 'Bill elapsed John' from 'John compelled Bill to leave' or 'a week elapsed', on any syntactic grounds, in their view. Restricting ourselves to the consideration of surface structure, such expressions as (11) can correctly be described as syntactically ambiguous, but not the expressions of (12) (Uhlenbeck, 9): (11) (12)

(i) (ii) (i) (ii)

old men and women they are flying planes the shooting of the hunters John was frightened by the new methods.

The ambiguity of the latter two must be explained as in some way based on 'extralinguistic data'. 11 Uhlenbeck does not elaborate on the difference between the two cases (11) and (12). From what he does say, we might suppose that he is willing to agree to the constructional homonymity of (11) because the alternative interpretations can be represented by a difference of bracketing (hence in surface structure); but in the case of (12i) and (12ii) he insists that there is only one syntactic structure and one system of grammatical relations because the alternative interpretations cannot be represented in surface structure (this, of course, being the reason for the choice of these examples in the exposition of transformational grammar that he is discussing). Uhlenbeck concludes that it is his conviction (and Reichling's) that the connections that can be established in the terms he allows and the rules that express these conneclarities of speech', beyond the simplest cases; and few aspects of the normal formation and use of sentences appear to involve 'habits', in any reasonably well-defined sense of this notion. Consequently, the study of observable regularities and habit can hardly be expected to have much bearing on syntactic theory or syntactic description, or on the normal non-stereotyped use of language. 11 Uhlenbeck is, incidentally, in error in assuming that in a transformational grammar (12i) would be derived from either the phrase 'shoot the hunters' or the phrase 'the hunters shoot' and that this is the proposed explanation for the ambiguity. Sentences are never derived from kernel sentences, as he asserts, but rather from the abstract structures that underlie kernel sentences (and all others). The difference is fundamental. What is claimed is not that the hearer first converts (12i) into one of two other expressions, and that (12i) is understood in these terms, but rather that (12i) is analyzed in terms of one of two abstract underlying systems of grammatical relations, one of which happens also to underlie 'the hunters shoot' and the other of which happens to underlie 'shoot the hunters'. That is, in one case the phrase (12i) is interpreted with the Subject-Verb relation holding of hunters-shoot, and, in the other, with the Object-Verb relation holding of this pair. One may choose to define 'linguistics' in such a way as to exclude this observation from its domain, but it is hard for me to believe that Uhlenbeck means to deny the facts that are represented in these terms in a transformational grammar (but not, of course, represented in surface structure).

TOPICS IN THE THEORY OF GENERATIVE GRAMMAR

15

tions will turn out to be very simple. Though he gives no examples of such rules, I do not doubt that his conviction will turn out to be correct. That is, if one limits oneself to surface structure (pure labeled bracketing) and to formally marked relations, then so little of the structure of sentences is expressible that what remains is, no doubt, likely to be quite simple. In brief, Uhlenbeck proposes that syntactic analysis be restricted to surface structure. What he proposes is indistinguishable from the several varieties of taxonomic analysis (immediate constituent analysis) that have been developed, though with much greater clarity and detail, in the 'neo-Bloomfieldian' linguistics of the past several decades [for discussion of these, and their inadequacies, see Postal, Constituent structures: a study of contemporary models of syntactic description (Bloomington, Indiana, and The Hague, 1964), Chomsky, Current issues in linguistic theory, and many other references]. If he has something else in mind, it does not appear either from his examples or his exposition. 12 Similarly, Reichling wishes to restrict syntactic investigation to IC analysis and to formally marked relations between words, that is, to connections effected by concord, rection, categorization into units, or intonation (Reichling, 2). No other 'connections between meanings' are allowed. It therefore follows that the facts about the 'connections of meaning' in such examples as (4), (5), (6), (7), (8), (11), (12) are inexpressible within syntax, and presumably relegated to some (for the moment, nonexistent) theory of 'extra-linguistic context'. The reader of these papers will observe that the only criticism made of work in generative grammar (other than criticisms based on misstatement — cf., e.g. note 11) is that it does not remain within the limits set by the critics. The only argument for remaining within these limits is that they are the only ones conceivable. However, other more abstract representations of 'connections of meaning' are not only conceivable but have in fact been conceived and developed in considerable detail both in traditional grammar and in the modern work in transformational grammar that continues and extends traditional grammar. Thus the limitation to surface structure is quite arbitrary. There is no reason to accept it. There has been no indication that a viable domain of linguistic processes or linguistic structure is delimited by this arbi12

His examples, however, deserve further discussion. Uhlenbeck differs from the tradition in that he analyzes 'Subject-Verb-Object' constructions as (Subject-Verb) (Object) — thus, for example, 'John hit Bill' has the immediate constituent analysis 'John hit — Bill'. N o argument is offered for this analysis, which runs counter to all known syntactic, phonological and semantic considerations that are relevant to such examples. For discussion, see Lunt [Proceedings of the Ninth International Congress of Linguists 983 (The Hague, 1964)] ; Chomsky (Aspects of the theory of syntax p. 194) ; and Katz and Postal (An integrated theory of linguistic description). Uhlenbeck apparently feels that his and Reichling's approach falls neither within the bounds of "American descriptivism" nor of "the traditional, more or less antediluvian approach of language description" (Uhlenbeck, 5). That their approach has little to do with traditional linguistics is quite clear, since it systematically avoids the traditionally central problems of deep structure, semantic interpretation, and creativity. But why he feels that it escapes the limitations of American descriptivism is not at all obvious. There is no aspect of his views, as described here, that is not strictly formulable in terms of the varieties of taxonomic linguistics developed by American descriptivists ; and he does not suggest or indicate any respect in which his views diverge from these formulations.

16

NOAM CHOMSKY

trary restriction. There is not the slightest reason why one should not investigate the mass of problems about interpretation of sentences that transcend these entirely arbitrary limits (e.g. the problems posed by the examples cited above). Reichling-Uhlenbeck might be interpreted as making something more than a terminological proposal about the limitations of the term 'linguistics' in their remarks about the role of 'extra-linguistic information' in the interpretation of sentences — in particular, in the interpretation of the sentences for which the surface structure does not represent the semantically significant grammatical relations. But these remarks are based on a simple confusion. A sentence has an inherent grammatical structure; this structure provides it with a certain range of potential semantic interpretations. In particular, the rules of English grammar that constitute the competence of the native speaker, provide the sentences (4), (5), (11), (12) with alternative syntactic descriptions, each expressing its network of semantically significant grammatical relations. In a particular situation, the hearer may use information that goes well beyond grammar to determine which of the potential interpretations was intended (or whether, perhaps, something was intended that goes beyond the explicit semantic content of the utterance that was actually used). Absolutely nothing of any significance is known about this use of extra-grammatical information in interpretation of sentences, beyond the fact that it exists and is an important characteristic of performance. If a person enters a room and produces, for example, sentence (12ii), we know that the content of his assertion is (roughly) either that John was frightened by the existence of the new methods or that new methods of frightening people were used to frighten John. If he produces (4), we know that the content of his assertion is either that they don't know how good the taste of meat is, or that they know little about the taste of good meat. The 'situational context' is the same in both cases, and does not affect the range of possible contents that these sentences may have, in accordance with the linguistic rules of English. The determination of what the speaker actually intended, of course, involves extra-grammatical considerations and other knowledge well beyond knowledge of language. 13 Surely this is quite obvious, and there is hardly much point in discussing it in further detail. There is not a little irony in the fact that Reichling-Uhlenbeck seem to feel that they are somehow defending the study of meaning against the 'positivistic' attacks of 'cryptanalysts' and 'formalists.' The fact is quite the opposite. By arbitrarily limiting themselves to surface structure and to formally marked relations, they have simply excluded f r o m linguistics, by fiat, just those aspects of grammatical structure that can lead to an account and explanation of semantic interpretation; and what they are opposing is, precisely, the attempt to develop linguistic theory and grammatical de13 Perhaps some of the confusion about this matter results from a failure to distinguish 'meaning' in the sense of 'linguistic meaning' from 'meaning' in the sense roughly of'intention'. In the former sense we say that the meaning of (4) or (12ii) is as just roughly described; in the latter, we may ask what someone meant by saying (4) or (12ii), or what he meant by slamming the door, rejecting the invitation, or any other act.

TOPICS IN THE THEORY OF GENERATIVE GRAMMAR

17

scription to the point where it can deal with deep structures and the general problem of semantic interpretation. A second critique of work in generative grammar is that presented in R. M. W. Dixon, Linguistic science and logic (The Hague, 1963). At first glance, this criticism seems to reject the framework described in Section I in a much more radical way than the Reichling-Uhlenbeck objections. Generative grammar follows the tradition in attempting to acount for competence. Dixon, on the other hand, insists that a grammar must simply deal with performance. It must restrict itself to regularities observed in a corpus. 14 So far, this is simply an arbitrary restriction of interest (motivated, apparently, by some curious terminological proposals about 'science' which would exclude from science just about everything since, perhaps, Babylonian astronomy). But Dixon goes on to deny the existence of the most elementary and familiar aspects of competence. In particular, he asserts that people with no formal education "will certainly have no intuitive 'grammatical sense' and no "intuitive grammatical ideas" (p. 78), and he regards the failure to recognize this fact as one of the major flaws in generative grammar. Assuming now that words have their normal use, Dixon is apparently claiming that children will not distinguish in any way between 'look at the dog' and 'the at dog look'; that uneducated adults will note no distinction between (l)-(3), above; that the method by which they interpret the sentences (1), (4)-(8), (11)(12) is precisely the same as the mechanism by which they would (with equal facility) interpret any arbitrary permutation of the words of these sentences, etc. Since "linguistic research amongst aborigines ... has confirmed that ... those that have not been exposed to the European tradition of grammatical teaching have no recognizable intuitions of 'grammaticalness'" [Dixon, " ' A trend in semantics': rejoinder", Linguistics 4.17 (1964)], we may conclude, presumably, that aborigines are unable to distinguish between sentences and right-left inversions of sentences in their language and (by analogy) that English-speaking children or uneducated adults are in the same position. Before attributing to Dixon any such absurd views, however, we must read a bit further. The quote from Dixon (1964) given above reads more fully as follows: "linguistic research amongst aborigines ... has confirmed that although speakers have firm ideas concerning what is in their language, those that have not been exposed to the European tradition of grammatical teaching have no recognizable intuitions of 14 And in the 'situational contexts' in which speech occurs. I will make no attempt to deal with this topic. Dixon has no more to say about context of situation than anyone else; that is, he has nothing of substance to say about it. I notice only one concrete example of reference to context of situation, namely on p. 101, where 'British Culture' is referred to as "the wider situation . . . [within w h i c h ] . . . the lexical items milk and white will have situational correlation (one of the several factors demonstrating this will be a fairly high probability of their mutual collocation)". The high probability of the phrase 'white milk' in normal British English was a fact unknown to me before reading this. On the other hand, the high 'situational correlation' between whiteness and milk (though not white and milk) seems beyond dispute. Exactly what sense it makes to regard British Culture as 'a situation' is unclear, however. Perhaps citation of this example is sufficient to indicate why I will discuss this aspect of Dixon's proposals no further.

NOAM CHOMSKY

18 ' g r a m m a t i c a l n e s s " ' (italics m i n e ) .

T h e italicized r e m a r k i m p l i e s t h a t D i x o n ' s abori-

g i n e s h a v e firm i n t u i t i o n s a b o u t g r a m m a t i c a l n e s s , in t h e o n l y s e n s e in w h i c h this term has, t o m y k n o w l e d g e , ever b e e n u s e d . T h e r e f o r e D i x o n is n o t d e n y i n g t h e o b v i o u s , a s a s s u m e d i n m y c o m m e n t s a b o v e , b u t is rather p r o p o s i n g s o m e n e w u s a g e f o r t h e term 'grammatical' (a usage w h i c h he d o e s not g o o n to explain, except to point out t h a t i n this u s a g e , p e o p l e h a v e n o i n t u i t i o n s o f ' g r a m m a t i c a l n e s s ' ) .

Further reading

o n l y c o n f i r m s this s u p p o s i t i o n . T h u s h e s e e m s t o i m p l y ( 1 9 6 3 , 7 6 - 7 ) t h a t t h e d i s t i n c t i o n b e t w e e n (1), (2), (3) c a n b e a c c o u n t e d f o r i n t e r m s o f 'lexis a n d l e x i c a l p a t t e r n i n g s ' . 1 5 C l e a r l y there is little p o i n t i n a c c o u n t i n g f o r t h e d i s t i n c t i o n if it d o e s n o t

exist.

E v i d e n t l y , t h e n , D i x o n ' s r e j e c t i o n o f t h e n o t i o n o f ' g r a m m a t i c a l n e s s ' o n w h i c h all g r a m m a t i c a l d e s c r i p t i o n , t r a d i t i o n a l , structuralist, or g e n e r a t i v e , is b a s e d , 1 6 is m e r e l y 15

Whatever this may be. Dixon points out (1963, p. 64) that theories of generative grammar have " n o equivalent to our [i.e. his and Halliday's] lexis". But all that he says about lexis is that it somehow has to do with the probability of phrases or the probability with which an item will occur in certain contexts, no further specification of these notions being given. This 'theory of lexis' is too vague to discuss and, for reasons noted below, it is unlikely that it can be clarified in a way that may be of some linguistic significance. Compounding the confusion is the fact that Dixon accepts the frequently expressed but quite incredible view that the relation of synonymy (or degree of synonymy) holds between two expressions if (or to the extent that) they have the same probability of occurrence in particular contexts (Dixon, 1963, p. 43-4). Taken literally, this implies that in the context 'my God, the baby has just fallen down the — t h e two expressions 'stairs' and 'series of steps for passing from one level to another' must have the same probability of occurrence (courtesy of Webster's dictionary). Perhaps something else, is intended, but it is not easy to invent a coherent interpretation of this claim. " This truism does not seem to be widely recognized. But obviously, if such a distinction is not assumed, there is nothing for a grammar to describe except 'regularities in a corpus'. Since no one has ever shown that anything at all can be said about 'regularities in a corpus' (on the syntactic level, at least), we may discard the latter possibility, noting also that anyone who has looked at a record of actual speech will be disinclined to pursue this purely 'naturalistic' study. In any event, all known grammatical descriptions are based on an assumed delimitation of grammatical and ungrammatical sentences, whatever terminology they may employ. For example, although A. A. Hill regards the distinction as a special and controversial feature of generative grammar [cf. Hill, "Grammaticality", Word 17.1-10 (1961)], he relies on it at every turn in his own descriptive work, for example, when he asserts that the expression all the ten pretty young American children's twenty little old china dolls "reaches the theoretical limit of complexity" [Hill, Introduction to linguistic structures: From sound to sentence in English 186 (New York, 1958)]. Similarly, in all descriptive syntax such a distinction is presupposed. It is true that the distinction becomes more crucial as one approaches explicitness in the formulation of grammatical rules, that is, as one approaches the construction of a generative grammar. This is only to say that empirical evidence is more relevant to the truth or falsity of hypotheses that have consequences than to the 'correctness' of sets of examples, which have no consequences. In the same connection, it is important to be clear about the relevance of operational or behavioral tests to the grammatical-nongrammatical distinction (or scale, or set of scales). Certain experimental procedures have been proposed [cf., e.g. G. A. Miller and S. Isard, "Some perceptual consequences of linguistic rules", Journal of Verbal Learning and Verbal Behavior 2.217-28 (1963)] that appear to define a useful notion related quite closely to this distinction. On the other hand, there are obviously innumerable experimental procedures that will fail totally to characterize this distinction (for example, Hill has invented various tests of this sort, and has found — cf. Hill, "Grammaticality" — that they delimit no interesting sense of 'grammaticalness'). When an operational test is proposed, we may test it for significance by applying it. If it has no significance, it can be discarded. If it has significance, as indicated by the fact that it corresponds in some way to the notion it is intended to characterize, we may be able to rely on it to provide some evidence about unclear cases. But the intuitively given dis-

TOPICS IN THE THEORY OF GENERATIVE GRAMMAR

19

terminological and therefore of no importance. Let us now turn to what seems an equally radical objection to the position outlined above. Dixon appears to object to the assumption that a language is infinite (1963, 82f.). He asserts that this assumption is a fundamental error of a generative theory. On the contrary, from his point of view, "that of language as it is, every language is synchronically finite". Again, a superficial reading might lead one to think that Dixon is making an utterly fantastic proposal, namely, that a grammar should contain no recursions in its system of rules, and that the competence of the speaker must be represented as some vast list of sentences or category sequences, or something of this sort. 1 7 However, before considering this and other possible interpretations, we must, once again, read a bit further. Doing so, we find that Dixon is merely suggesting some new meaning for the word 'infinite'. Thus he asserts that "in the case of sentences which each consist of a conjunction of clauses we are clearly unable to say that there is any definite number, N, such that no sentence contains more than N clauses". This quoted remark simply states that the language is infinite, in the only known technical sense of the word 'infinite'. Since Dixon goes on to deny that the language is infinite, we conclude that he must be using the term 'infinite' in some new and private sense. What this sense may be is suggested by his discussion of the new notion 'synchronically finite', introduced in the quotation given above. He asserts that "to decide upon the size of a language at a particular time it is necessary not just to count the number of sentences allowable, but to sum the probabilities assigned to each allowable sentence". He goes on to point out that when we do this, we will find that this sum is a finite number, so that the language is 'synchronically finite'. Apparently, then, Dixon is suggesting the new term 'synchronically finite' which applies, trivially, to any set over which a probability measure is defined (since by definition, the sum of the probabilities is finite, in fact, unity) — in particular, which holds of any language if probability is somehow defined for the sentences of the language. Since he regards this as refuting the fallacious assumption that a language is infinite, he must be taking the term 'infinite' to mean 'not synchronically finite'. That is, a language is infinite just in case the sum of the probabilities assigned (somehow) to its sentences is infinite; by definition, then, a language is never infinite. What is unclear is simply why these vacuous concepts should be defined and why one should trouble to compute the 'size of a language' in the way he suggests, since we know in advance, by definition, that it will always be tinction is not called into question by the fact that some investigator is unable to develop a reasonable test, just as it would not be called into question by his inability to develop a theory — in this case, a generative grammar — that characterized the intended notion. Tests, as theories, are of interest in this connection only if they shed some light on tacit competence. There is, incidentally, no reason to take for granted that any simple, necessary and sufficient operational criterion can be invented for a theoretical notion like 'grammaticalness', although one would expect that some aspects of this notion can be clarified indirectly by operational tests (e.g., of the sort studied by Miller and Isard). 17 It is a useful exercise to calculate the vastness of the lists that would be required. See G.A. Miller, E. Galanter, and K. H. Pribram, Plans and the structure of behavior (New York, 1960) and Miller and Chomsky ("Finitary models of language users") for some highly conservative estimates which, however, suffice to show the utter absurdity of pursuing any such approach to syntax.

20

NOAM CHOMSKY

unity. Of course, this remark is based on the assumption that Dixon is using the term 'probability' in its normal sense. Some obscure remarks on p. 83 suggest that this assumption may be fallacious. I see little point in carrying this discussion any further, except to make one final remark. Dixon speaks freely throughout about the 'probability of a sentence' as though this were an empirically meaningful notion. The few hints that he gives about the linguistic theory that he has in mind are based critically on the assumption that some sense can be given to this notion. But this is not at all clear. We might take 'probability' to be an estimate of relative frequency, and some of Dixon's remarks suggest that he thinks of it in this way. This has the advantages of clarity and objectivity, and the compensating disadvantage that almost no 'normal' sentence can be shown empirically to have a probability distinct from zero. That is, as the size of a real corpus (e.g. the set of sentences in the New York Public Library, or the Congressional Record, or a person's total experience, etc.) grows, the relative frequency of any given sentence diminishes, presumably without limit. Furthermore, since most of the 'normal sentences' of daily life are uttered for the first time in the experience of the speaker-hearer (or in the history of the language, the distinction hardly being important at this level of frequency of occurrence), they will have had probability zero before this utterance was produced and the fact that they are produced and understood will be incomprehensible in terms of a 'probabilistic theory of grammar' — the reader who is suspicious of this remark may convince himself by searching for repetitions of sentences or for the occurrence of an arbitrarily selected 'normal sentence' in a real corpus. Dixon completely overlooks these obvious facts. Thus he asserts that the distinction between 'colorless green ideas sleep furiously' and some 'normal sentence' (say, 'revolutionary new ideas appear infrequently') is a matter of 'formal lexical meaning', i.e. frequency of occurrence: "That is to say: the collocation of 'colorless', 'green', 'ideas', 'sleep', and 'furiously' does not occur (or rather, has only a very small probability of occurring)" (Dixon, 1963, 75). That this sequence has a small probability of occurring is quite true (or was true, until a few years ago; actually, by now this sequence is one of the more frequent ones — i.e. it has occurred a handful of times — in the linguistic experience of some people, at least, without this sharp rise in frequency having affected its status as a semi-grammatical sentence in the least, obviously). Precisely the same is true of 'revolutionary new ideas appear infrequently' and a host of others with an entirely different linguistic status. None of these sentences has a probability of occurrence detectably different from zero. The frequency of the 'collocations', in each case, is so ridiculously low that the attempt to account for fundamental linguistic distinctions in these terms is a complete absurdity. Similarly, there is no hope of distinguishing (1), (2), and (3) in terms of probability of occurrence, as Dixon seems to believe (1963, 76-7). Putting aside Dixon's specific interpretation of 'probability of a sentence' as (apparently) literal frequency of occurrence of the sentence in an actual corpus, it is important to note that other probabilistic bases for grammatical or lexical properties

TOPICS IN THE THEORY OF GENERATIVE GRAMMAR

21

seem equally out of the question. The problems are not overcome if we take probability to be something other than an estimate of relative frequency. The vastness of the set of sentences from which normal discourse draws will yield precisely the same conclusions; the probability of 'normal sentences' will not be significantly different from zero. Nor does it help to consider, rather, probability of sentence forms (if a sentence form is a sequence of categories, the categories containing elements that are 'mutually substitutable everywhere' in some fairly narrow sense — notice that if this is not a narrow sense, then the notion defined will not meet the demands placed on it). The numbers are still too vast for the notion to be taken seriously. Nor can we take the probability of a sentence to be an estimate based on the probability of its parts, in some way; such a decision would, as has repeatedly been observed, give a probability scale that has no useful relation to the intuitive scale of grammaticalness on which all linguistic description is based. Nor does it make any sense at all to talk about 'probability relative to a situation', since no one has ever given the slightest hint as to how 'situations' can be individuated or identified in any terms that have any use for this discussion. In fact, no sense has ever been given to the notion of 'probabilities or continuum type scales' (the fundamental importance of which Dixon repeatedly stresses) in the domain of grammatical description; and every concrete proposal that has been made has been shown to lead to absurdity. Perhaps the time has come for linguists who insist on the importance of such notions to face this simple fact. In outlining the general framework for generative grammar at the outset of these lectures, I distinguished between, on the one hand, those aspects of this framework that serve merely to define goals and research problems (pointing out that these are largely derived from the traditional study of language and mind), and, on the other hand, substantive assertions that go beyond delimitation of problems. Two substantive assertions were made in this account, namely: (13)

(I) the surface structure of a sentence is a proper bracketing of the linear, temporally given sequence of elements, with the paired brackets labeled by category names (that is, a labeled tree diagram, with such categories as Sentence, Noun Phrase, Verb Phrase, Noun, and a small number of others serving as labels): (II) the deep structure of a sentence is in general not identical to its surface structure, but is a much more abstract representation of grammatical relations and syntactic organization.

The criticisms of generative grammar that I have so far discussed do not bear on these substantive proposals; rather, I have suggested that they amount to no more than a proposal to limit 'linguistics' so as to exclude the mass of 'antediluvian' traditional questions, 18 for example, questions of competence, semantic interpretation, 'creativity', 18 It is interesting to note that Dixon, like Uhlenbeck (see above, p. 15n.), regards traditional linguistics as essentially without value. But, like Uhlenbeck, he gives no reasons for this judgment, referring only to the fact that traditional g r a m m a r s have been "long condemned by professional

22

NOAM CHOMSKY

the nature of grammatical rules, etc. But no reasons have been offered for abandoning these topics, and no alternatives have been suggested that might lead to more fruitful study. Consequently, I think that these criticisms have no force. It still, however, remains to consider objections to the substantive assertions (131) and (1311). As mentioned above, (131) has never been questioned and, in any event, is well-supported (see below). Let us assume it then, for the purposes of present discussion, and turn briefly to (1311). This is an extremely important claim, and it is worthwhile to consider it with a bit more care. 19 Given the assumption (131) concerning surface structure, let us define 'taxonomic syntactic theory' as the view that such a representation exhausts the syntactic structure of an utterance, i.e. serves as the deep structure as well. There are, then, many varieties of taxonomic syntax: in particular, the few remarks that de Saussure offers concerning syntax indicate that he accepts this position; it was elaborated in various ways by American descriptivists in recent decades; it apparently subsumes the views of Reichling and Uhlenbeck (see above); it covers many models of language structure that have developed in the study of artificial languages and in computational linguistics ; it includes the models of language structure that have provided most of the substance of the mathematical study of language structure, etc. Taxonomic syntactic theory is what is rejected by assertion (1311). The theory of transformational generative grammar was developed as a specific alternative to taxonomic syntax, an alternative that incorporates assertion (1311). We return to it below, limiting our attention now to assertion (1311) itself, that is, to the question whether surface structure, as defined in (131), expresses all grammatical structure that is relevant to semantic interpretation. In many publications of recent years 20 the validation of assertion (1311) has been attempted in the following way. First, an attempt was made to formulate a theory of generative grammar broad enough to comprehend all varieties of taxonomic linguislinguists" (Dixon, 1963, p. 78). Though true, this remark is hardly sufficient to prove the point. The fact is that traditional linguistics is little known today, and where there is discussion of it, this is often quite distorted and inaccurate (cf. the reference in Chomsky, Current issues in linguistic theory, 67, to some of Bloomfield's comments on traditional linguistic theory). Furthermore, much of the critique of traditional grammatical descriptions is little more than a reflection of the methodological limitations within which the critic has chosen to work. For some examples, see Chomsky (Current issues in linguistic theory 29-30, 108). 19 The reason why the claim is important is that it seems totally incompatible with the empiricist assumptions that have dominated the study of language and the study of cognitive processes in modern times. For discussion, see Chomsky, Aspects of the theory of syntax, ch. 1 § 8, and references given there; also, Katz, Innate ideas (forthcoming). 20 E.g. Chomsky, "Three models for the description of language", I.R.E. Transactions on Information Theory 1T-2/3.113-24 (1956); Syntactic structures; Current issues in linguistic theory, Katz and Postal, An integrated theory of linguistic description', Lees, Review of Chomsky, Syntactic structures; Postal, "On the limitations of context-free phrase structure description", Quart. Progr. Rep. No. 64 231-7 (Cambridge, Mass., 1961), Constituent structure (The Hague, 1964); "Limitations of phrase structure grammars", Structure of language: readings in the philosophy of language, eds. Katz and Fodor 131-51 (Englewood Cliffs, N.J., 1964).

TOPICS IN THE THEORY OF GENERATIVE GRAMMAR

23

tics, insofar as they are clear and express substantive hypotheses concerning the nature of language. This was one motivation for the development of the theory of phrase structure grammar (context-free or context-sensitive — I presuppose familiarity with this). 21 Second, an attempt was made to show that no theory formalizable within the framework of phrase structure grammar can succeed in giving the required notion of deep structure, 22 if these grammars are required to generate the actual sentences of the language. Consequently, taxonomic syntactic theory cannot be correct, the fundamental reason for this being that the deep structures that are required for semantic interpretation cannot be identified with the structures that are assigned to sentences by a phrase structure grammar that generates actual sentences directly. Both steps of this argument seem to me to have been firmly established. In particular, every variety of syntactic theory that falls within the general range of taxonomic syntax seems to me formalizable (insofar as it is clear) within the framework of phrase structure grammar (in fact, with rare exceptions, its context-free variety), 23 whether these are formulated in terms of procedures of analysis, as in many descriptivist studies, or in terms of conditions on any possible syntactic description of a sentence, as in the Reichling-Uhlenbeck view mentioned above and many others. Furthermore, the inadequacy of a categorially labeled bracketing of an actual sentence (a phrase structure tree) as a representation of deep structure seems to me to have been established beyond any doubt, by consideration of examples of the sort mentioned above, pp. 6-7, 14. See the cited references for further discussion. Consequently, it seems to me that assertion (1311) and the consequence that taxonomic syntax is fundamentally inadequate are among the best supported conclusions of linguistic theory, in its present state. Both of the steps of the argument have been questioned, for example, in the Uhlen-

81

There is another and much more important empirical motivation for the theory of phrase structure grammar. Namely, it seems that although taxonomic syntax is inadequate (i.e. although assertion (1311) is true), nevertheless phrase structure grammar plays a fundamental role in a more adequate theory constructed in the light of (1311). 22 Or, for that matter, the required notion of surface structure, cf., e.g. § 3 in Chomsky, "On the notion 'rule of grammar'", Structure of language and its mathematical aspects. Proceedings of the 12th Symposium in Applied Mathematics, ed. Roman Jakobson 6-24 (Providence, 1961). 23 An exception is the matter of discontinuous constituents, as has been noted in most studies of this topic since its inception. However, as has also been pointed out in these studies, this has little bearing on the argument concerning (1311), since the basic objections to taxonomic theory are not overcome by introducing devices of one sort or another to permit certain types of discontinuity. It is important to be clear about just what is asserted here. What is asserted is that a phrase structure grammar that generates the actual linear, temporally given string of elements will not assign to this string the required deep structure ( = surface structure, in this case). On the other hand, work in transformational grammar has always assumed that a good deal of what constitutes deep structure is provided by a phrase structure grammar with a much more abstract function, that is, one that generates only underlying structures, which are mapped into sentences by grammatical transformations. Recent work (discussed in the third section) suggests that in fact the full set of grammatical relations involved in deep structure can be characterized by a system which is a phrase structure grammar. This proposal must be clearly distinguished from the viewpoint of taxonomic syntax.

24

NOAM CHOMSKY

beck paper discussed above. Thus Uhlenbeck seems to imply that his approach does not fall within the framework of phrase structure grammar and, at the same time, he claims that linguistics should not study deep structure. I have tried to show that neither objection can be justified. I know of no similar objections that have any more substance than this. There is one instance of what on the surface appears to be an objection to this twostep argument of a rather different sort, namely, a paper by G. E. Harman, "Generative grammars without transformation rules", Lg. 39.597-616 (1963). This is subtitled "A defense of phrase structure", and it appears to have been directed against step two of the argument, namely, against the claim that no theory formalizable within the theory of phrase structure grammars can be empirically adequate. But a more careful study of this paper shows that it is entirely irrelevant to the whole issue. Harman surveys several of the arguments against phrase structure grammar and accepts them as valid. However, he proposes that instead of regarding these as arguments against phrase structure grammar, they be restated as arguments against using the term 'phrase structure grammar' in the way in which it has always been used. Since terminological points of this sort are clearly of no consequence in themselves, let us accept Harman's proposal (for the purpose of discussion of his paper), using the term 'restricted phrase structure grammar' in the sense of 'phrase structure grammar', as previously defined, and 'extended phrase structure grammar' for the theory he proposes (keeping in mind, however, that extended phrase structure grammar has no more connection with phrase structure grammar than antelopes have with ants). Having made this terminological change, we observe that the argument against taxonomic linguistics and in favor of assertion (1311) stands exactly as before, naturally, since no mere terminological change can affect the substance of this argument. This observation is sufficient to allow us to dismiss this paper as a defense of taxonomic syntax, and as a critique of (1311). However, a more extended discussion of it will perhaps be useful, in that it offers a good opportunity to examine some issues that must be clearly understood if syntactic work is to proceed in any useful way. Harman proposes that 'extended phrase structure grammar' is in the spirit of phrase structure grammar, and points out that there is no a priori necessity to use the term 'phrase structure grammar' as a designation for 'restricted phrase structure grammar'. The first point is unarguable. I know of no way of determining whether 'extended phrase structure grammar' is more in the spirit of phrase structure grammar than, for example, transformational grammar of the usual sort, or whether it is less so (arguments might be offered both ways); and, furthermore, I see no point in trying to establish this, one way or the other. The second point is correct. There is no a priori necessity to use the term 'phrase structure grammar' in the way in which it has always been used; any other term would, in principle, have done as well. The significance of this observation and its role in the 'defense of phrase structure' are obscure. By similar logic, one can prove the baselessness of the charge that baboons cannot speak, for there is, after all, a creature rather like a baboon which can speak perfectly well, and

TOPICS IN THE THEORY OF GENERATIVE GRAMMAR

25

there is no a priori necessity to use the term 'baboon' so as to exclude this creature. 24 We observe once again, however, that the notion 'phrase structure grammar' in the familiar sense (i.e. 'restricted phrase structure grammar') is a well-motivated one, quite apart from terminology, having arisen not only in the formalization of taxonomic linguistics and in the theory of transformational grammar, but also, independently, in the study of the structure of artificial languages and in the study of mathematical models, where both context-free and context-sensitive phrase structure grammars occupy a central position. Let us now consider 'extended phrase structure grammar' in some further detail. Such a system contains rules of the form A -* X where A and X are not strings of category and terminal symbols, as in the theory of (restricted) phrase structure grammar, but strings of what we may call 'complex symbols' (A being a string of length one, and X possibly being null), where a complex symbol is a pair consisting of a category symbol and a set of indices. This is a modified form of a system proposed originally by G. H. Matthews seven or eight years ago, and elaborated in the Matthews-Yngve COMIT programming system. The rules are designed so that indices can be carried along in the course of a derivation. The indices can be used, in effect, to code many 'global operations' on strings (e.g. certain grammatical transformations) and to code context-restrictions of various sorts. The motivation for the development of the original system, it should be noted, was to simplify the coding of transformational processes. Notice that an extended phrase structure grammar can be regarded as a (restricted) phrase structure grammar, namely, if we take each complex symbol to represent a single category. Under this interpretation, what Harman has proposed is a new evaluation procedure for phrase structure grammar. Such a proposal is an empirical hypothesis about language. This hypothesis is immediately refuted, on grounds of descriptive inadequacy. Under this interpretation, if we consider the structures assigned to such sentences as 'John saw Bill', 'did John see Bill', 'look at Bill', etc., we find that there is no category to which both occurrences of 'John' belong; there is no category to which all occurrences of 'Bill' belong; there is no category to which all five Nouns belong; etc. For such reasons as these neither the requirements on deep or surface structure begin to be met, and this interpretation must be rejected. Therefore a prerequisite for the serious consideration of the theory of extended phrase structure grammar is that the condition (9iv) be met in some other way, that is, by some different proposal as to how a syntactic structure can be associated with a string generated by such a grammar, this structure in some way representing syntactically and semantically relevant information about the grammatical relations, sentence type, etc. But there is no suggestion in Harman's paper as to how this should be done, 25 so the matter remains quite open. What we have, then, is a theory of grammar 24

The analogy is not exact, however, An 'extended baboon' can indeed speak; there is little reason to suppose, however, that an 'extended phrase structure grammar' can describe syntactic structure. 25 Strictly speaking, there are suggestions, but they are far too vague to fill the need. For example,

26

NOAM CHOMSKY

which, though surely far richer than (restricted) phrase structure grammar, is, for the moment, not well-defined in the only linguistically relevant sense. Cf. p. 9. W h a t possible reason, then, can there be for considering the theory of 'extended phrase structure grammar'? One suggestion that comes to mind is that it may, in some sense, be a less powerful theory than the theory of transformational grammar, even though it is clearly much richer in expressive power than the theory of (restricted) phrase structure grammar. If true, this would be an interesting curiosity; it would become an interesting observation for linguistics if the theory could be shown to approach descriptive adequacy in nontrivial respects. But it is not true. The theory of extended phrase structure grammar is incomparable in expressive power to the theory of grammatical transformations; in each, certain hypothetical grammatical processes can be formulated that are not formulable in the other. 26 Consider, for example, a hypothetical language with a grammatical process P that forms one class of sentences f r o m another by, let us say, deleting a word belonging to the lexical category A if this occurrence of A is immediately dominated by the phrase category B. Within the framework of extended phrase structure grammar, this process is easily formulable (far more easily, for example, than the system of rules for generating passives of simple N V N sentences). It is only necessary to code P as an index of the initial symbol S, then formulating each rule B -* ...A... so that it adds a new index Q to the introduced occurrence of A ( Q appearing nowhere else). We then give the rule deleting the category A when it contains the two indices P, Q. Such a process is, however, not formulable as a grammatical transformation; it would require a radical extension of transformational theory to permit use of quantifiers in structural descriptions of transformations far beyond anything permitted in this theory (apparently, the theory can in fact require that these structural descriptions be quantifier free); and, furthermore, it would violate the general conditions on deletion. Within the theory of transformational on p. 608 a definition of 'transformational relation between sentences' is proposed which is so loose that it makes no distinction between the pair 'John saw Bill' — 'Bill was seen by John' and 'John saw Bill' — 'John was seen by Bill'. The crucial notions (e.g. grammatical relation) are not discussed at all. Thus there is no answer even suggested to the question of how to define the single grammatical relation that connects the italicized items in 'John saw Bill', 'John reads books', 'Bill was seen by John','Bill is easy for John to find', 'John ordered Bill to be examined','the proposal is expected to be brought to the floor', 'the book reads easily', etc.; there is no indication as to how one can express the fact that Bill and buy are interconnected in the same way in 'they like to buy presents for Bill', 'Bill is easy to buy presents for', 'it is Bill that I like to buy presents for', 'presents were bought for Bill' etc.; there is no way suggested to express the grammatical relations that determine the semantic interpretation of the examples cited in section I, etc. In short, none of the questions that have motivated research in transformational syntax are faced. 26 Obviously, we are interested in developing a linguistic theory the expressive power of which matches precisely the range of formal devices that actually function in natural language. If a theory is too weak in expressive power to accommodate such devices, or so strong in expressive power as to accommodate devices not to be found in natural language, this counts as a defect of the theory. As we shall see in the next section, earlier versions of the theory of transformational grammar exhibit both kinds of defect; and it would hardly be surprising to discover that the revised version that will be suggested there also suffers from defects of both kinds. This question is, of course, closely related to the question of explanatory adequacy touched on briefly above.

TOPICS IN THE THEORY OF GENERATIVE GRAMMAR

27

grammar, as this has been explicitly formulated, there is no way to identify an occurrence of a category that is immediately dominated by a given category. Similarly, consider a rule that deletes (or iterates, or adds a morpheme to) a word of the category A just in case this occurrence of A belongs simultaneously to two distinct phrases, one of which is the left-most category in some expansion and the other the right-most category. The indexing conventions of extended phrase structure grammar permit straightforward formulation of this process, 27 but it is unformulable as a grammatical transformation. On the other hand, although certain grammatical transformations can be coded into the format of extended phrase structure grammar, others cannot. Consider, for example, the transformations that form (14) from the pair of abstract structures underlying 'John read the book' and 'Bill read the book', or that form (15i, ii) from an abstract structure that includes the phrase-marker underlying 'Bill Auxiliary leave': (14) (15)

John read the book and so did Bill. (i) John persuaded Bill to leave. (ii) Bill was persuaded to leave by John.

Obviously, the information provided by the deep structures underlying these sentences, as this is formulated by the familiar transformational analysis of them [e.g. Chomsky, Syntactic Structures-, "A transformational approach to syntax", Third (1958) Texas conference on problems of linguistic analysis in English, ed. A. A. Hill 124-58 (Austin, 1962)] is essential to their semantic interpretation. Thus (14) implies that Bill read the book and it is evident that the semantic interpretation of (15) requires that Bill be specified as the Subject of leave. But the necessary information concerning (14) cannot be presented in terms of any structures assigned by an extended phrase structure grammar, and there is no indication of how the information required for (15) can be represented (it is not, as we shall note below, in Harman's derivation of these sentences, in contradition to his claim (p. 610) that his coding of certain transformational grammars preserves grammatical information). But in any event, the processes which are directly formulable as grammatical transformations either cannot be stated at all, or cannot be formulated without very involved and elaborate mechanisms, within the framework of extended phrase structure grammar. In short, the theory of extended phrase structure grammar is a rich system, incom27

T o illustrate with the simplest example, consider a grammar with a single category S and a single word X, and with the rules S ->• SSS, S X. In this language, then, the only grammatical information is that certain strings of X's belong to the category S. Suppose that we add to this grammar a grammatical process P that converts a certain occurrence of X to Y if X is dominated by an occurrence of S which is left-most in some expansion and also by an occurrence of S which is right-most in some expansion. Though unformulable as a grammatical transformation, this process is easily expressible in extended phrase structure grammar, for example, by allowing P to be an optional index of the initial symbol and indexing by L and R, respectively, the left-most and right-most occurrences of S in the recursive rule S SSS. T o express P, then, we simply add a rule converting X to Y when it has the indices L, R and P.

28

NOAM CHOMSKY

parable with transformational grammar in expressive power, and (insofar as it is welldefined) not empirically adequate for natural language. Harman's defense of phrase structure grammar is based on the claim that he has constructed a phrase structure grammar that generates exactly the set of sentences of a certain transformational grammar (namely, that presented in Chomsky, "A transformational approach to syntax"). The first part of this claim, as we have noted above, is based on nothing more than terminological equivocation. The second part is false. In fact, one of the central topics of the transformational grammar he was recoding is the system of Verb-Complement constructions illustrated in (15). In the transformational grammar, sentences such as (15) are based in part on an underlying structure 'Bill Auxiliary leave'. In particular, the constraints on Subject and Verb Phrase in sentences are carried over to the complement construction (15). Such strings as (16) are excluded from direct generation and thus marked as deviant, as a direct consequence of the exclusion of (17):

(16)

(17)

John

persuaded ordered expected

Bill

accumulated elapsed is numerous is abundant is parsed

Bill to

accumulate elapse be numerous be abundant be parsed

(exclusion of the last being a consequence of the deviance of 'X parsed Bill'). But in Harman's recoding, no relation is expressed between the positions occupied by 'Bill' and 'leave' in such sentences as (15) — that is, the fact that in both (15i) and (15ii) 'Bill' functions as the Subject of 'leave' fas well as being the object of 'persuade (to leave)', in this case] is not indicated. This failure of descriptive adequacy is reflected in the failure to express the deviant character of (16) by excluding these from direct generation, although the analogous property of (17) is correctly expressed by exclusion of these from direct generation by the grammar. Whether the defect illustrated by these examples can be overcome by more elaborate mechanisms I have no idea, but the point is hardly worth pursuing. Suppose that it were indeed possible to recode a transformational grammar within the framework of extended phrase structure grammar in such a way as to meet the requirement of weak generative equivalence (that is, in such a way that the coded version generated the same set of strings, though not the same set of structures, as the original). Suppose, in fact, that it were possible to construct even a restricted phrase structure grammar that generates exactly the set of sentences of English or, in fact, any natural language. Exactly what linguistic significance would this demonstration have? The

TOPICS IN THE THEORY OF GENERATIVE GRAMMAR

29

answer is : none at all. It would not in the least support the view that the theory of structure (just as the ability of a transformational grammar to weakly generate — that is, to generate the sentences of — a natural language would not, if a fact, demonstrate the adequacy of the transformational theory). Such a demonstration would simply support the conjecture in the earliest studies of transformational generative grammar (e.g. Chomsky, Syntactic structures, 34) that English and other languages may very well be within the weak generative capacity of phrase structure grammar. In fact, there is little reason to doubt that all natural languages can be weakly generated by context-sensitive phrase structure grammars, although it is now known (cf. Postal, "On the limitations of context-free phrase structure description", "Limitations of phrase structure grammar") that context-free grammars fail to meet even the condition of weak generative adequacy, and thus fail the test of descriptive adequacy in a particularly surprising way. 28 The point is that consideration of weak generative adequacy is only of interest if it provides negative evidence, with respect to a particular linguistic theory — if it shows, in other words, that the theory is so mismatched to language that not even the set of sentences can be correctly generated. But, as has been clear since the earliest studies of generative grammar, considerations of descriptive and explanatory adequacy are the only ones that can be used in support of a proposed grammar or theory of language. The fact that a grammar weakly generates a language is hardly of any interest. What is important is that it should do so in such a way as to assign to each sentence the correct deep and surface structure, and beyond that, that it succeed in this task of strong generation in an internally motivated way. 29 Returning to a concrete example, the fact that the extended phrase structure grammar that Harman presents fails in weak generative capacity by allowing (16) [while exluding (17)] is not in itself important. What is important is the underlying reason for this, namely, the failure of this grammar to express the grammatical relations that bind Bill in (15) to the Complement of the Verb, on the one hand (as its grammatical Subject), and to the full Verb phrase, on the other (as its Object). This is the important fact about such sentences as (15); considerations of weak generative capacity, in contrast, are of marginal linguistic interest unless they yield negative conclusions. Of course, questions of descriptive adequacy cannot even be raised in connection with a theory until at least the conditions (9) are met. Cf. p. 9. Returning to the main theme, it seems to me that both assertions of (13) have been 28 In particular, Postal's conclusions apply to extended phrase structure grammar, since despite its richness of expressive power, it has the weak generative capacity of context-free grammar. To achieve weak generative adequacy, then, the theory must be extended in one way or another, perhaps by permitting rules to be restricted with respect to context. The important point, however, is that such an extension of weak generative capacity would not overcome the defects of descriptive inadequacy, just as the extension of context-free phrase structure grammar to context-sensitive phrase structure grammar, though it may very well overcome the defect of weak generative inadequacy, nevertheless does not eliminate the linguistically most significant defects of phrase structure grammar. 29 Cf. Chomsky, "Three models for the description of language", Syntactic structures, Current issues in linguistic theory, Aspects of the theory of syntax, and many other references, for discussion.

30

NOAM CHOMSKY

established b e y o n d reasonable d o u b t , a n d that there are, furthermore, n o k n o w n alternatives to t r a n s f o r m a t i o n a l g r a m m a r that b e g i n t o m e e t c o n d i t i o n s o f descriptive or explanatory a d e q u a c y . 3 0

Ill H a v i n g n o w c o v e r e d the first t w o parts o f the outline given in the introductory section, I w o u l d like t o turn, m u c h m o r e briefly, to parts three, four, a n d five. T h e s e are discussed in m u c h m o r e detail in C h o m s k y , Aspects

of the theory

of syntax

and in the

references cited there. T h e earliest versions o f transformational generative g r a m m a r m a d e the f o l l o w i n g general a s s u m p t i o n s c o n c e r n i n g syntactic structure.

T h e syntactic c o m p o n e n t o f a

g r a m m a r consists o f t w o sorts o f rules: rewriting rules a n d t r a n s f o r m a t i o n a l rules. T h e rewriting rules constitute a phrase structure g r a m m a r (with, perhaps, a c o n d i t i o n o f linear ordering i m p o s e d ) . E a c h rule is, in other words, o f the f o r m A

X (with a

possible restriction t o the c o n t e x t Z — W ) , where A is a category s y m b o l and X , Z, W are strings o f category or terminal s y m b o l s . T h e strings generated b y this system w e m a y call base strings 30

(an alternative term is C-terminal

strings).

I n the course o f

I might mention at this point that the recent literature contains many references to purportedly equally effective or even more adequate theories. For example, Longacre [Review of Z. S. Harris, String analysis of sentence structure (The Hague, 1962), in Lg. 39.478 (1963)] asserts that the approaches of Harris, M. A. K. Halliday and K. L. Pike escape the limitations of phrase structure grammar and go well beyond immediate constituent analysis (presumably, he refers here to IC analysis of the type studied by Harris, B. Bloch, R. Wells and other in the '40's). This point, he claims, "is persistently missed by critics of 'phrase structure' grammars, who indiscriminately lump together immediateconstituent analysis and the more recent (and more satisfactory) approaches". Unfortunately, Longacre makes no attempt to provide any evidence in support of his claim; he does not, for example, attempt to show just how critics of phrase structure grammars have misinterpreted the theories he is advocating, in formulating them within the framework of phrase structure grammar. As to the advantages of these more recent theories over 'old-fashioned' IC analysis, Postal, in his comprehensive review of the subject (Postal, Constituent Structure) demonstrates quite the opposite. That is, he shows that the theories of Pike and Halliday, at least (along with many others), are if anything more defective than the IC theories worked out in the '40's. Of course, he was referring to material that is on the printed page. It is unfortunate that Longacre gives neither evidence nor documentation to show that something crucial was omitted in that account. — Much more far-reaching claims are made by H. A. Gleason ["The organization of language: a stratificational view", Report of the Fifteenth Annual (First International) Round Table Meeting on Linguistics and Language Studies, ed. C.I.J.M. Stuart, Monograph Series on Languages and Linguistics 17 (1964)] who mentions the existence of a linguistic theory (called 'stratificational grammar', but apparently critically different from the only published version of this theory, namely, that which Postal, Constituent Structure showed to be a variant of context-free grammar insofar as it was well-defined) that not only escapes the limitations of phrase structure grammar, but is demonstrably superior in descriptive adequacy to transformational grammar as well. It is disappointing, then, to discover that Gleason presents no linguistic evidence at all in support of his many claims, and, even more so, that he makes no attempt to sketch even the general outlines of the approach to language for which he claims these many virtues, though he does list some of the terminology that he expects will find a place in it. Some of these terms are explained, namely, those that are merely new terms for familiar notions (e.g. 'phonon' for distinctive feature); the rest, so far as this paper indicates, are devoid of any content.

TOPICS IN THE THEORY OF GENERATIVE GRAMMAR

31

generating a string, the system of rewriting rules (let us call this the base component of the syntax) assigns to it a Phrase-marker which we can call a base phrase-marker, this being representable as a labeled bracketing or a tree diagram with categories labeling the nodes. The transformational rules map phrase-markers into new, derived phrase-markers. Each transformational rule is defined by a structural analysis stating a condition on the class of phrase-markers to which it applies and specifying an analysis of the terminal string of this phrase-marker into successive parts. The specification of the transformation is completed by associating with this structural analysis a certain elementary transformation which is a formal operation on strings, of a certain narrow class. For details, see the references cited above. By defining the 'product' of two phrasemarkers as the new phrase-marker derived essentially by concatenation of the labeled bracketings, 31 we can apply what have been called generalized (or double base, triplebase etc.) transformations to a phrase-marker representing a sequence of phrasemarkers, mapping such a product into a new phrase-marker by the same apparatus as is required in the singulary case. The transformations meet certain ordering conditions (I return to these below), which must be stated in a separate part of the grammar. These conditions include a specification of certain transformations as obligatory or obligatory relative to certain sequences of transformations. To generate a sentence, we select a sequence of (one or more) base phrase-markers and apply singulary and generalized transformations to them, observing the ordering and obligatoriness requirements, until the result is a single phrase-marker dominated by S (the initial category, representing 'sentence'). If we select a single base phrase-marker and apply only obligatory transformations, we call the resulting sentence a kernel sentence (a kernel sentence is not to be confused with the base string that underlies it as well as possibly many other more complex sentences). We can represent the system of transformations that apply in the process of derivation as a transformation-marker (T-marker). To illustrate, consider the sentence (18)

I expected the man who quit work to be fired.

The transformational derivation of (18) might be represented by the T-Marker. (19)

B3 — TRel 31 Precise definitions of the notions mentioned here are provided in Chomsky [The logical structure of linguistic theory. Unpublished manuscript, Microfilm M.I.T. Library (Cambridge, Mass., 1955)], and descriptions of varying degrees of informality appear throughout the literature. In particular, a phrase-marker is representable as a set of strings, and the 'product' of two phrase markers is then the complex product of the two sets (i.e. the set of all strings XY such that X is in the first set and Y in the second).

32

NOAM CHOMSKY

In this representation, Bj, B 2 and B3 are the three base phrase-markers that underlie the (kernel) sentences (20i)-(20iii):32 (20)

(i) I expected it (ii) someone fired the man (iii) the man quit work

The interpretation of (19) is straightforward. It represents the fact that to form (18) we take the three base structuies underlying (20i-iii), and proceed as follows. First, apply to B3 the relative transformation TRei that converts it to 'wh-(the man) quit work' (rather, to the abstract string that underlies this — cf. n. 32), with its derived phrase-marker. Call this new structure K x . At this point, apply the generalized embedding transformation Tsmb to the pair of structures (B2, K x ), deleting the occurrence of the man in the embedded sentence in the process, giving the string 'someone fired the man who quit work' with its derived phrase-marker K 2 . To K 2 , apply the passive transformation Tpass to give 'the man who quit work was fired by someone', with the phrase-marker K3. To this apply the deletion transformation TDei to give 'the man who quit work was fired', with the derived phrase-marker K4. Now apply to the pair of structures (B^ K4) the generalized embedding transformation TEmb, giving 'I expected the man who quit work was fired' with the derived phrase-marker K5. To K5, apply the singulary transformation T t o giving the sentence (18) with its derived phrase-marker K.6. I emphasize once again that only after all the transformations have been completed do we have an actual 'sentence' — that is, a string of elements that constitutes an 'output' of the syntactic component of the grammar and an 'input' to the phonological component. Perhaps this example suffices to convey the content of the notion 'T-marker' (for further elaboration, see Chomsky, The logical structure of linguistic theory, Katz and Postal, An integrated theory of linguistic description). It should be clear, from this, how any transformational derivation can be represented as a T-marker which gives 32

Since I am presenting this merely as the basis for some revisions t o be proposed below, I skip many details. In particular, I am completely overlooking the question of how t o describe the Auxiliary system, and I have also supposed, for simplicity of exposition, that each of B r B 3 underlies a kernel sentence. Actually, this is not necessary, and in the transformational grammars presented in Chomsky (The logical structure of linguistic theory; Syntactic structures, "A transformational approach to syntax"), Lees, (The grammar of English nominalizations), and others, many of the base strings contain 'dummy symbols' [e.g., Comp, in the case of the analysis of such sentences as (15)] which are either deleted or filled in by sentence transforms in one way or another. Thus B ! might have a dummy symbol as Object, B a might have an unspecified Subject, etc. I am also assuming here a simpler analysis of the main (matrix) structure than was postulated in earlier work. The reasons for this go well beyond anything considered here. See P. Rosenbaum, {A grammar of English Predicate Complement Constructions. Unpublished Ph.D. dissertation, M.I.T 1965) and, for further related discussion, Chomsky ( A s p e c t s of the theory of syntax, ch. 1, § 4). Throughout the description of these structures, I cite sentences as examples, inaccurately, instead of the abstract strings that underlie them. It should be kept in mind that this is only an expository device.

TOPICS IN THE THEORY OF GENERATIVE GRAMMAR

33

the full 'transformational history' of the derived sentence, including, in particular, a specification of the base phrase-markers from which it is derived. In Chomsky (The logical structure of linguistic theory) a general theory of linguistic levels is developed in an abstract and uniform way, with phrase structure and transformations each constituting a linguistic level. On each level, markers are constructed that represent a sentence. In particular, derived phrase-markers and T-markers fill this function on the phrase-structure and transformational levels, respectively. Each level is a system of representation in terms of certain primes (elementary atomic symbols of this level). On the level of phrase-structure, the primes are category and terminal symbols. On the level of transformations, the primes are base phrase markers and transformations. A marker is a string of primes or a set of such strings. Both phrasemarkers and transformation markers can be represented in this way. Levels are organized in a hierarchy, and we may think of the markers of each level as being mapped into the markers of the next lowest level and as representing the lowest level marker (that is, the phonetic representation which is the marker on the lowest, phonetic level — the primes of this level being sets of features), which is associated directly with an actual signal. We limit the discussion here to the levels of phrase structure and transformational structure. The general requirement on a syntactic theory is that it define the notions 'deep structure' and 'surface structure', representing the inputs to the semantic and phonological components of a grammar, respectively (see above), and state precisely how a syntactic description consisting of a deep and surface structure is generated by the syntactic rules. These requirements are met by the theory outlined above in the following way. The rewriting rules of the base component and the rules governing ordering and arrangement of transformations generate an infinite class of T-markers, in the manner just sketched. We take a T-marker to be the deep structure; we take the derived phrase-marker that is the final output of the operations represented in the T-marker to be the surface structure. Thus in the case of (18), the deep structure is the T-marker represented as (19), and the surface structure is what we designated as K«. The phrase-marker Kg, then, must contain all information relevant to determination of the form of the signal corresponding to (18) (i.e., it is to be mapped into a phonetic representation of (18) by rules of the phonological component); the T-marker (19) is to contain all information relevant to the semantic interpretation of (18). To complete the theory, we must add a description of the phonological and semantic components that interpret surface and deep structures, respectively. I will discuss the phonological component briefly in the fourth section, along lines suggested by R. Jakobson, G. Fant and Halle ([Preliminaries to speech analysis (Cambridge, Mass., 1952)]; Chomsky, Halle and Lukoff ("On accent and juncture in English"); Halle (The sound pattern of Russian, "Phonology in a generative grammar", "On the bases of phonology", (Structure of language, eds. Fodorand Katz 324-33); Chomsky ("Explanatory models in linguistics"); and other related publications. The theory of semantic interpretation is in a much less developed state, as noted above, although

34

NOAM CHOMSKY

recent work of Katz, Fodor and Postal has been quite encouraging and, as we shall note directly, has had important consequences for the theory of syntax as well. A theory of semantic interpretation based on the syntactic model outlined above would have to provide first, a characterization of the notion 'semantic interpretation of a sentence', and second, a system of rules for assigning such an object to a deep structure, that is, a T-marker. Analogously a theory of phonetic interpretation must specify the notion 'phonetic interpretation of a sentence' —• it must, in other words, specify a universal phonetic alphabet — and must provide a system of rules for assigning such an object to a surface structure, that is, the final derived phrase marker of a sentence. The notion 'semantic interpretation of a sentence' remains in a rather primitive state, for the moment. Several important steps have been taken towards the study of rules that assign semantic interpretations to deep structures, however. First of all, it is evident that the grammatical relations among the elements of the string representing a sentence and the grammatical functions (i.e. Subject, Object, etc.) that these elements fulfill provide information that is fundamental for semantic interpretation. Furthermore, it has been evident since the beginnings of recent work on transformational grammar that it is the grammatical relations and grammatical functions represented in the base phrase-markers underlying a sentence that are critical for its semantic interpretation (for example, it is not the 'grammatical Subject' of the passive but rather its 'logical Subject' that is the Subject in the sense relevant to semantic interpretation). This is evident from consideration of the examples discussed throughout this paper. These examples were chosen primarily to illustrate this fact, as is characteristic of expository papers in transformational grammar. As emphasized above, it is examples of grammatical relations and functions that are obscured in the surface representation (the IC analysis) that provide the primary motivation for the rejection of all versions of taxonomic syntax, and for the development of the theory of transformational grammar. To my knowledge, the first fairly explicit discussion of grammatical relations of the deep structure that are not represented in the actual physical form and organization of the sentence, and the first discussion of the importance of these for semantic interpretation, is in the Grammaire générale et raisonnée of Port-Royal (1660). For some brief references, see Chomsky (Current issues in linguistic theory, § 1, and for some further discussion, Chomsky, Cartesian linguistics). In modern linguistics, the same insight was expressed by Harris, in somewhat different terms, in his early work on transformations, 33 and the point is also emphasized in Chomsky (The logical structure

of linguistic

theory,

Syntactic

structures),

a n d in all s u b s e q u e n t w o r k o n

transformational grammar. To go beyond this observation, it is necessary to define grammatical relations and grammatical functions, and to show how the relations and functions of the base phrase-markers play a role in determining the semantic interpretation of the sentence 33

E.g. Harris, "Discourse analysis", Lg. 28.18-23 (1952); "Distributionalstructure", Word 10.146-62 (1954); "Co-occurrence and transformation in linguistic structure", Lg. 33.283-340 (1957).

TOPICS IN THE THEORY OF GENERATIVE GRAMMAR

35

that they underlie. A phrase structure grammar is, in fact, a very natural device for assigning a system of grammatical relations and functions to a generated string. These notions are represented directly in the phrase-marker assigned to a string generated by such rules, as has been frequently pointed out. Various ways of defining these notions are discussed in Chomsky (The logical structure of linguistic theory, Current issues in linguistic theory; Aspects of the theory of syntax ) and Postal (Constituent Structure). For concreteness, consider a highly oversimplified phrase structure grammar with the rules (21): (21)

S -* NP VP VP -> V NP NP John, Bill V -* saw

This grammar generates the string 'John saw Bill' with the phrase-marker (22): (22) NP I John

V

NP

Saw

Bill

To the grammatical rule A -* XBY, we can associate the grammatical function [B, A]. Thus associated with the rules of (21) we have the grammatical functions [NP, S], [VP, S], [V, VP], [NP, VP]. We may give these the conventional names Subject-of, Predicate-of, Main-Verb-of Object-of, respectively. Using the obvious definitions of these notions, we can say, then, that with respect to the phrase-marker (22), John is the Subject-of the sentence, saw Bill is the Predicate-of the sentence, saw is the MainVerb-of the Verb Phrase, and Bill is the Object-of the Verb Phrase. We can go on to define grammatical relations (Subject-Verb, etc.) in terms of these and other notions and there are various ways in which one can attempt to formulate language-independent definitions for the central concepts (for details, see the cited references). The important point is that a phrase structure grammar need not be supplemented in any way for it to assign these properties to the strings it generates. Once we recognize the relational character of these notions, we see at once they are already assigned, in the appropriate way, with no further elaboration of the rules. Notice that we might define the grammatical functions not in terms of the generating rules, but in terms of the Phrase-marker itself, in an obvious way. If we do this, we will have a more general notion of 'grammatical function' that will apply to derived phrase markers as well as to base phrase markers. I do not go into this here, since, in any event, it is only the functions in the base phrase-markers that are significant for semantic interpretation (but see Chomsky, Aspects of the theory of syntax, pp. 220, 221, for some discussion of the role of 'surface functions', so defined).

36

NOAM CHOMSKY

The first attempt to develop a theory of semantic interpretation as an integral part of an explicit (i.e. generative) grammar is in Katz and Fodor ("The structure of a semantic theory"). This is the first study that goes beyond the assertion that the base phrase-markers underlying a sentence are, in some sense, the basic content elements that determine its semantic interpretation. Basing themselves on the account of syntactic structure outlined above, Katz and Fodor argue that the semantic component of a grammar should be a purely interpretive system of rules that maps a deep structure (a T-marker) into a semantic interpretation, utilizing in the process three sorts of information : (i) intrinsic semantic features of lexical items; (ii) the grammatical functions defined by the base rules; (iii) the structure of the T-marker. The semantic component should have two sorts of 'projection rules'. The first type assign semantic interpretations ('readings') to categories of the base phrase-markers in terms of the readings previously assigned to the elements dominated by (belonging to) these categories, beginning with the intrinsic readings of the lexical items and using the grammatical functions defined by the configurations of the base Phrase-markers to determine how the higher level readings are assigned; and, ultimately, assigning a reading to the dominant category S. The projection rules of the second type utilize the readings assigned in this way to base phrase-markers, and, in terms of the elements and configurations represented in the T-marker, determine the semantic interpretation of the full sentence. Not much is said about type two rules; as we shall see below, this is not a serious gap in their theory. With this brief survey, we conclude part three of the outline of the introductory section, having now sketched a certain theory of generative grammar that in part overcomes the fundamental inability of taxonomic syntax to provide an adequate notion of deep structure. Turning now to part four of the outline, I would like to consider some of the defects that have been exposed in the theory just sketched as it has been applied to linguistic material. In Lees (The grammar of English nominalizations), it is shown that the negation transformation of Chomsky (Syntactic structures, "A transformational approach to syntax") 34 is incorrectly formulated. He shows that there are syntactic arguments in favor of an alternative formulation in which the negation element is not introduced by a transformation but is, rather, an optional element introduced by rewriting rules of the base, the transformation serving simply to place it in the correct position in the sentence. At about the same time, E. S. Klima pointed out that the same is true of the question transformations of Chomsky (Syntactic structures, "A transformational approach to syntax"). There are syntactic arguments in favor of assuming an abstract 'question marker' as an element introduced by base rules, the question transformations then being conditional on the presence of this marker (i.e. obligatory when it appears in a string, and inapplicable otherwise). Further arguments in support of 34

Publication delays account for the discrepancy in dates, here, and in several other places.

TOPICS IN THE THEORY OF GENERATIVE GRAMMAR

37

this view, and further elaboration of it, are presented in Katz and Postal (An integrated theory of linguistic description). See now also Klima, "Negation in English", Structure of language: Readings in the philosophy of language, eds. Fodor and Katz, 246-323). In Katz and Postal, it is further observed that the same is true of the imperative transformation of earlier work. In the light of this and other observations, Katz and Postal then conclude that all singulary transformations which affect meaning are conditional upon the presence of markers of this sort; in other words, the singulary transformations in themselves need not be referred to by the rules of the semantic component since whatever contribution they appear to make to the meaning of the sentence can be regarded as an intrinsic property of the marker that determines their applicability, and can therefore be handled in base structures by type 1 projection rules. It follows, then, that the function of type 2 projection rules is much more restricted than Katz and Fodor were forced to assume, since they need not take into account the presence of singulary transformations in a T-marker. Turning then to generalized transformations, Katz and Postal carry out a detailed analysis of many examples described in earlier studies that seem to demonstrate a contribution of generalized transformations to the semantic interpretation of the generated sentence in some way that goes beyond mere 'amalgamation'. They argue (quite convincingly, it seems to me) that in each such case, there are syntactic grounds for regarding the description as in error; furthermore, that in each such case the only function of the generalized transformation is to embed a sentence transform in a position that is already specified in the underlying structure (let us say, by the presence of a dummy symbol). Generalizing upon these various observations, they conclude that the only function of generalized transformations, so far as semantic interpretation is concerned, is to interrelate the semantic interpretations of the phrase-markers on which they operate; in other words, to insert the reading for the embedded phrase-marker in the position already marked (by a dummy element) in the phrase-marker in which it is inserted. Thus the only aspect of the T-marker that need be considered in semantic interpretation is the interrelation specified by the nodes where generalized transformations appear in the representation. Beyond this, transformations appear to play no role in semantic interpretation. Thus the function of type II rules is still further restricted. This principle obviously simplifies very considerably the theory of the semantic component as this was presented in Katz and Fodor ("The structure of a semantic theory"). It is therefore important to observe that there is no question-begging in the Katz-Postal argument. That is, the justification for the principle is not that it simplifies semantic theory, but rather that in each case in which it was apparently violated, syntactic arguments can be produced to show that the analysis was in error on internal, syntactic grounds. In the light of this observation, it is reasonable to formulate the principle tentatively as a general property of grammar. Furthermore, it seems that there are good reasons for regarding even the passive transformation as conditional upon the presence of an abstract marker in the under-

38

NOAM CHOMSKY

lying string (see Chomsky, Aspects of the theory of syntax, for a survey of syntactic arguments in support of this), rather than as optional, as assumed in earlier work. Consequently, it seems that all singulary transformations other than those that are 'purely stylistic' (cf. Chomsky, Aspects of the theory of syntax, pp. 221, 223, for some discussion of this distinction — discussion, incidentally, which is far from satisfactory, although it seems to me that a real and important distinction is involved) are conditional upon markers in base strings, whether or not these transformations effect semantic interpretation. Independently of these developments, C. J. Fillmore pointed out that there are many restrictions on the organization of T-markers beyond those that were assumed in earlier attempts to formulate a theory of transformational grammar [Fillmore, "The position of embedding transformations in a grammar", Word 19.208-31 (1963)]. What his observations come to is essentially this : there is no ordering among generalized transformations, although singulary transformations are ordered (apparently linearly) ; there are no singulary transformations that must apply to a matrix sentence before a constituent sentence is embedded in it by a generalized embedding transformation, 35 although there are many instances of singulary transformations that must apply to a matrix sentence after embedding of a constituent structure within it and to a constituent sentence before it is embedded; embedding should be regarded as substitution of a sentence transform for a 'dummy symbol' rather than as insertion of this transform in a categorially unspecified position. The last observation is further elaborated by Katz and Postal (An integrated theory of linguistic description), as noted above. Returning now to the T-marker (19) used as an example above, we observe that it has just the properties that Fillmore outlines. That is, singulary transformations are applied to a matrix sentence only after embedding and the only ordering is among singularies. But the earlier theory of T-markers left open the possibility for ordering of a much more complex sort. It is therefore quite natural to generalize from these empirical observations, and to propose as a general condition on T-markers that they must always meet Fillmore's conditions and have the form illustrated in (19). As just formulated, this principle appears to be quite ad hoc, but there is another way of saying exactly'the same thing that makes it seem entirely natural. Notice that if no singulary transformations apply to a matrix phrase-marker before embedding, and if, furthermore, all embedding involves the insertion of a constituent phrase86

The terms 'matrix sentence' and 'constituent sentence' are due to Lees (The grammar of English nominalizations)\ the matrix sentence is the one into which a constituent sentence is inserted by a generalized transformation. The same notion appears in the analysis of transformational processes in the Grammaire générale et raisonnée, where the terms 'proposition essentielle' and 'proposition incidente' are used for 'matrix sentence' and 'constituent sentence', respectively. Actually 'matrix proposition' and 'constituent proposition' would, in any event, be preferable terms, since what is involved here is not an operation on sentences but rather on the abstract structures that underlie them and determine their semantic interpretation. This is the way in which these operations are interpreted, correctly, in the Grammaire générale et raisonnée.

TOPICS IN THE THEORY OF GENERATIVE GRAMMAR

39

marker in a position marked by a dummy element in the matrix structure, then we can, in fact, dispense with generalized transformations entirely. Instead of introducing constituent phrase-markers by embedding transformations, we can permit the rewriting rules of the base to introduce the initial category symbol S, i.e. we can permit rewriting rules of the form A -»• ...S... Wherever such a symbol is introduced, we can allow it to head a new base derivation. In short, we can apply the linearly ordered system of base rewriting rules in a cyclic fashion, returning to the beginning of the sequence each time we come upon a new occurrence of S, introduced by a rewriting rule. Proceeding in this way, we construct what we can call a generalized phrase-marker. We now apply the linear sequence of singulary transformations in the following manner. First, apply the sequence to the most deeply embedded structure dominated by S in the generalized phrase-marker. Having completed the application of the rules to each such structure, reapply the sequence to the 'next-higher' structure dominated by S in the generalized phrase-marker. Continue in this way, until, finally, the sequence of transformations is applied to the structure dominated by the occurrence of S which initiated the first application of base rules, i.e., to the generalized phrasemarker as a whole. Notice that with this formulation, we have, in effect, established the particular formal properties of the T-marker (19) as general properties of any transformational derivation. Let us now return to the example (18)-(20) in the light of these suggested revisions of the theory of transformational grammar. By the application of the rewriting rules of the base, we construct the generalized phrase-marker (23) (omitting all but the central configurations, and many details). The transformations indicated in (19) now apply, obligatorily, in the following order. First, TRei applies to the most deeply embedded structure. We then turn to the next higher structure, i.e. the one dominated by the occurrence of S in the fourth line of (23). At this point, an inversion rule (not indicated in (19), though in fact also needed in the earlier formulation) inverts the relative clause and the following N. Next we apply the passive transformation and the subsequent deletion of the unspecified Subject, these operations now being obligatorily marked by the dummy elements passive and A (standing for an unspecified category) in (23). Since no further transformational rules apply at this point, we turn to the next higher structure dominated by S — in this case, the full generalized phrase-marker. To this we apply Tt0, as before, giving (18). The transformations indicated in the T-marker (19) are now obligatory and the structure of the T-marker (19) is fully determined by (23) itself, given the general convention for cyclic application of transformations. Notice now that all of the information relevant to the semantic interpretation of (18) is contained in the generalized Phrase-marker (23) that underlies (18). Furthermore, the same will be true in all other cases, if the modifications suggested above are correct. By the principle suggested by Katz and Postal, the singulary transformations will not make an intrinsic contribution to meaning, and the generalized transformations will

40

NOAM CHOMSKY

Det / 7 K the # S #

man

work

do so only insofar as they interrelate base Phrase-markers. But we have now eliminated generalized transformations in favor of a recursive operation in the base. Consequently all information relevant to the operation of the interpretive semantic component should be contained in the generalized phrase-marker generated by base rules. The advantages of this modification are obvious. It provides a more highly structured theory which is weaker in expressive power; in other words, it excludes in principle certain kinds of derivational pattern that were permitted by the earlier version of transformational theory, but never actually found. Since the primary goal of linguistic theory is to account for specific properties of particular languages in terms of hypotheses about language structure in general, any such strengthening of the constraints on a general theory is an important advance. Furthermore, there is good internal motivation for enriching the structure (and hence decreasing the expressive power) of transformational theory in this way, namely, in that this modification permits us to eliminate the notion of 'generalized transformation' (and with it, the notion 'T-marker') from the theory of syntax. Hence the theory is conceptually simpler.

TOPICS IN THE THEORY OF GENERATIVE GRAMMAR

41

Finally, the theory of the semantic component can be simplified in that type two projection rules are no longer necessary at all. Recapitulating, we are proposing that the syntactic component of a grammar consists of rewriting rules and transformational rules. The rewriting rules are permitted to introduce the initial symbol S. These rules apply in a linear sequence; if the initial symbol S appears in a derivation, then the sequence of rules reapplies to form a subderivation dominated by this symbol, in the usual manner. The recursive property of the grammar (its 'creative aspect', to return to terminology used above) is restricted to the base component. In fact, the restriction may be still heavier than this, since recursion may be limited to introduction of the symbol S, that is, to introduction of 'propositional content'. This is not a necessary property of a phrase structure grammar. The base rules, applying in the way just outlined, form generalized phrase-markers. The function of the transformational rules is to map generalized phrase-markers into derived phrase-markers. If the transformational rules map the generalized phrasemarker MD into the final derived phrase-marker Ms of the sentence X, then MD is the deep structure of X and Ms is its surface structure. This approach to syntax formalizes, in one specific way, the view that the phonetic form of a sentence is determined by its actual labeled bracketing, whereas its semantic interpretation is determined by the intrinsic semantic properties of its lexical items and by a network of grammatical relations, not necessarily represented in the surface structure, that interconnect these items [cf. (13)]. The underlying grammatical relations are determined by the base rules. This abstract system of categories and relations is related to a labeled bracketing of the actual sentence by transformational rules and the interpretive rules of the phonological component. There is fairly good reason to suppose that the base rules are rather narrowly constrained both in terms of the symbols that may appear in them and in terms of the configurations of these symbols, but I will not go into this further question here (see Chomsky, Aspects of the theory of syntax, for some discussion). Insofar as information is presently available about syntactic structure, and about the relation of signals to semantic interpretations of these signals, this view seems compatible with it. It is worth mention that a view very much like this is expressed in the Grammaire générale et raisonnée, to which we have now had occasion to refer several times. We might ask why a natural language should be constructed in this way; why, in particular, should it not identify deep and surface structures and thus dispense with the transformations that interrelate them. One would naturally attempt to find an answer to this question on perceptual grounds. For some speculations that seem to me worth pursuing further, see Miller and Chomsky ("Finitary models of language users", part II). Observe that the base rules may form generalized phrase-markers that cannot be mapped by any sequence of transformations into a surface structure. For example, suppose that we had chosen the phrase 'the boy' instead of 'the man' in the most deeply embedded structure of (23). In this case, the generalized phrase-marker would evi-

42

NOAM CHOMSKY

dently constitute the deep structure of no sentence — there is no sentence for which this structure provides the semantic interpretation. And in fact, the relative transformation would block when applying to this structure, because of the lack of identity between the Noun Phrases of the matrix and constituent sentences.36 Hence not all generalized phrase-markers underlie sentences and thus count as deep structures. The deep structures are the generalized phrase-markers that are mapped into wellformed surface structures by transformational rules. Thus the transformations serve a 'filtering' function; in effect, they supply certain global constraints that a deep structure must meet, constraints that are, in fact, entirely unstatable within the framework of elementary rewriting rules that seem perfectly adequate for the generation of base structures with the grammatical functions and relations that they express. For further discussion of this property of transformations, see Chomsky {Aspects of the theory of syntax, Ch. 3).

In this way, we can construct a theory of grammatical transformations that is conceptually simpler than the earlier version, described above, but still apparently empirically adequate. In this modified formulation, the functions of the base rules and the transformational rules are more clearly expressed, as are also the notions of deep and surface structure. We have, correspondingly, a simplification of semantic theory.37 I began this section by presenting a certain theory of grammar in outline. I have now tried to show that this theory was too broad and rich in expressive power, and that a much more restricted version of it (which is, furthermore, conceptually wellmotivated) will suffice to account for what empirical data is now available. I would now like to turn to an inadequacy in earlier theory of the opposite sort, that is, to a class of problems that show this theory to be too poor in expressive power, in a certain way. Let us limit our attention now to the base component of the syntax. The theory outlined followed structuralist assumptions in supposing that the relation of lexical items to the categories to which they belong is fundamentally the same as the relation of phrases to the categories of which they are members. Formally speaking, it was assumed that a lexical item A" is introduced by rewriting rules of the form A-+X, where A is a lexical category, exactly in the way that phrases are introduced.38 However, a

" What is involved here is a set of very general conventions on recoverability of deletion, in the transformational component of a grammar. For discussion, see Chomsky (Current issues in linguistic theory; Aspects of the theory of syntax); Katz and Postal (An integrated theory of linguistic description). *' Incidentally, only embedding transformations were considered here. It is also necessary to show how various transformations that introduce coordinate structures (e.g. conjunction) can be developed within this framework. For some remarks on this question, see Chomsky (Aspects of the theory of syntax) and the references cited there. 88 Notice that although this has been the view of all work in modern syntactic theory that has gone beyond mere elaboration of terminology, the incorrectness of this view became obvious only when it was formalized within the framework of an explicit theory of grammar. An essential reason for formalization and explicitness is, of course, that it immediately exposes inadequacies that may otherwise be far from evident.

TOPICS IN THE THEORY OF GENERATIVE GRAMMAR

43

difficulties in this view quickly emerged. Shortly after the publication of the earliest work in transformational generative grammar, it was pointed out by G. H. Matthews that whereas the categorization of phrases is typically hierarchic, and therefore within the bounds of phrase structure grammar, lexical categorization typically involves cross-classification, and therefore goes beyond these bounds. For example, a Noun may be either Proper or Common, and, independently of this, may be either Animate or Inanimate; a Verb may be Transitive or non-Transitive, and independently of this, may or may not take non-Animate Subjects; etc. This fact is unstatable within the framework of phrase structure grammar. Consequently, the theory of the base must be extended in some way so as to provide an analysis of lexical categorization that is different in fundamental respects from the analysis in terms of rewriting rules that seems quite adequate above the level of lexical category. Similar observations were made independently by Stockwell, Anderson, Schachter and Bach, and various proposals have been made as to how to remedy this defect of the base component. The general problem is studied in some detail in Chomsky (Aspects of the theory of syntax, Ch. 2), where reference is also made to the earlier work just noted. I will sketch briefly the proposals offered there for modification of the theory of the base component. Notice that the problem of lexical cross-classification is formally analogous to the problem of phonological classification. Thus phonological elements are also typically cross-classified with respect to the operation of various phonological rules. Certain rules apply to the category of Voiced segments; others to the category of Continuants; membership of a segment in one of these categories is independent of its membership in the other. This is, furthermore, the typical situation. This, in fact, is one major reason for the view that segments (e.g. phonemes or morphophonemes) have no independent linguistic status and are simply to be regarded as sets of features. More generally, a lexical item can be represented phonologically as a certain set of features, indexed as to position. Thus the lexical item bee can be represented by the feature set [Consonantal^ Voiced^ non-Continuant!, ..., Vocalic2, non-Grave 2 , ...] indicating that its first 'segment' is consonantal, voiced, a non-continuant, ..., and that its second 'segment' is vocalic, non-grave, .... Such a representation can be given in matrix form, in an obvious and familiar way. It provides a perfectly satisfactory solution to the cross-classification problem on the phonological level (and, furthermore, relates very nicely to what seems to me to be, for the present, the most satisfactory theory of universal phonetics, namely, Jakobson's theory of distinctive features — I will presuppose acquaintance with this, in the form recently given to it by Halle, for the remainder of this paper). Observe also that the semantic analysis of lexical items also apparently requires a kind of feature theory, and that these features typically cross-classify lexical entries. Thus Katz and Fodor ("The Structure of a semantic theory") and Katz and Postal (An integrated theory of linguistic description) are led to the conclusion, essentially, that a lexical entry in its semantic aspect should consist of a set of semantic features. These observations suggest that the problem of syntactic cross-classification be

44

NOAM CHOMSKY

dealt with in the same way, particularly, since it apparently involves only lexical items and not phrase types. Adopting this rather natural proposal, let us revise the theory of the base in the following way. The base consists of a system (presumably, a linear sequence) of rewriting rules which we may call its categorial component. Beyond this, it contains a lexicon. The lexicon is an unordered set of lexical entries. Each lexical entry is simply a set of specified features. The features constituting the lexical entry may be phonological (e.g. [ ± Voiced»], where n is an integer indicating position), semantic (e.g. [ ± Artifact]), or syntactic (e.g. [¿Proper]). We limit our attention here to the syntactic features. The categorial component of the base generates no lexical items in strings (though it may introduce grammatical morphemes). As a first approximation, we may think of each lexical category A (e.g. Noun, Verb, etc.) as being involved only in rewriting rules of the form A -> A , where A is a fixed dummy symbol. Thus the final strings generated by the categorial component (let us call these pre-terminal strings) are based on a 'vocabulary' (i.e. a set of primes — see above, p. 33) consisting of grammatical morphemes and the symbol A . The latter will occupy the position in which items from the lexicon will be inserted, in a manner which we will describe directly. A pre-terminal string is converted to a terminal string by insertion of an appropriate lexical item in each position marked by A . Recall that the deep structures that determine semantic interpretation are generalized phrase-markers generated by the base component. As we noted above, it seems plausible to develop semantic theory in terms of projection rules that assign readings to successively higher nodes of the deep structure, basing this assignment on the readings assigned to already interpreted lower nodes and the grammatical relations represented by the configuration in question. The grammatical relations and the order of application of the interpretive projection rules are determined completely by the categorial component of the base. The intrinsic semantic properties that provide the initial readings for this process of semantic interpretation (i.e. the readings of the lexical items that are the terminal elements of the generalized phrase-marker) are provided completely by the lexicon. Thus the two separate aspects of the semantic theory are mirrored in the subdivision of the base into a categorial and a lexical component. The functioning of the categorial component is clear; let us, therefore, consider the lexicon in some further detail. The lexical entry for a certain item should contain all information about idiosyncratic features of this lexical item, features that cannot be predicted by general rule. Thus the fact that 'buy' begins with a Voiced non-Continuant, that it is a transitive Verb, that it has irregular inflections, that it involves transfer of ownership, etc., must all be represented by features of the lexical entry. Other properties (for example, that the initial non-Continuant is non-Aspirated) can be predicted by rule — in this case, a phonological rule. But there may be redundancy rules of various kinds that operate on phonological, semantic, and syntactic features, and that specify interrelations among features of the various types. Insofar as regularities concerning feature composition can be expressed by rule, the features in question can be extracted from the lexical entry (for discussion of redundancy rules, see

TOPICS IN THE THEORY OF GENERATIVE GRAMMAR

45

Chomsky, Aspects of the theory of syntax, particularly Ch. 4, § 2.1). Normally, a lexical item will be idiosyncratic in many respects. Since these can now be specified in the lexical entry, they need no longer be represented in the rewriting rules. This leads to an enormous simplification of the base component, as will be evident to anyone who has ever attempted to construct a detailed grammatical description. Let us now consider the rule that inserts lexical items in pre-terminal strings. Notice that this rule must take account of the structure of the phrase-marker in which the item is being inserted. For example, when we say that a Verb is Transitive, we are asserting that it can appear in the position —NP in a Verb Phrase. Therefore the syntactic feature [+Transitive] must specify some aspect of the phrase-marker in which the item can be inserted. Let us call a feature of this sort a contextual Feature. In contrast, we will call such features of Nouns as [ ± Human] non-contextual. The degenerate case of a contextual feature is the feature [ ± Noun] itself, which indicates a minimal aspect of the phrase-marker, namely, the category dominating the occurrence of A for which the item in question may be substituted. These degenerate contextual features, we may call category features. For the category features, the obvious notation is [ ± A], where A is a lexical category. By convention, then, we assert that an item with the category feature [ + A ] can only replace an occurrence of A dominated by the category symbol A. Consider now the problem of a proper notation for the other contextual features, e.g. transitivity. Clearly the best notation is simply an indication of the context in which the item can occur. Thus the feature [+Transitive] can be represented simply [H— NP], Similarly, the fact that 'persuade' can be followed by a Noun Phrase and a following Prepositional Phrase (e.g. 'I persuaded John of the pointlessness of his actions') can be indicated by assigning the contextual feature [ H — N P PP] to the lexical entry for 'persuade' (in fact, this is apparently the only contextual feature needed to specify the frame in which 'persuade' can appear, all other forms being derived by transformation — for discussion, see Chomsky, Aspects of the theory of syntax). Contextual features of this sort, which specify the frame in which an item can be substituted, we will call strict subcategorization features. Alongside of strict subcategorization features, there are contextual features of a radically different sort that we will call selectional features. Whereas the strict subcategorization features specify categorial frames in which an item may appear, the selectional features of a lexical item X specify lexical features of the items with which X enters into grammatical relations. Thus the selectional features for 'frighten' will indicate that its Object must be specified as [+Animate], the selectional features for 'elapse' will indicate that its Subject cannot be [ + Human] (and for true descriptive adequacy, must obviously give a much narrower specification than this), etc. Similarly, the selectional features for 'abundant' must indicate that it can be predicated of 'harvest' but not 'boy', whereas the selectional features for 'clever' must contain the opposite specification. We can represent selectional features by a notation very much like that suggested above for strict subcategorization features.

46

NOAM CHOMSKY

Contextual features can be regarded as specifying certain substitution transformations. The context stated in the contextual feature specifies the condition that must be met by the phrase-marker to which the transformation in question applies and the manner in which this phrase-marker must be analyzed for the purposes of this transformation. Thus it defines the structural analysis of the transformation (see above, p. 31). The elementary transformation that completes the definition of the transformation states that the lexical item in question (i.e. the set of specified features that constitutes the lexical entry) substitutes for the occurrence of that appears in the position indicated in the structural analysis. It is clear from the examples cited that there are many restrictions on the form of the substitution transformations defined by contextual features. Thus the strict subcategorization features only involve 'local contexts' — i.e. contexts dominated by the phrase category that immediately dominates the lexical category for which the lexical item is substituted. On the other hand, selectional features refer only to 'heads' of grammatical related constructions. These restrictions can be made precise, and can be shown to lead to certain interesting consequences concerning the possible constraints that may appear in a grammar. For discussion, see again Chomsky (Aspects of the theory of syntax).

I have not discussed the problem of deviation from grammaticalness here. However, it is clear that whenever a grammatical rule exists, we may ask how a sentence is interpreted that deviates from this rule. It seems that sentences deviating from selectional rules are interpreted quite differently from those deviating from strict subcategorization rules. Deviation from selectional rules gives such examples as 'colorless green ideas sleep furiously', 'sincerity admires John', etc.; deviation from strict subcategorization rules gives such examples as 'John persuaded to leave', 'John found sad', etc. Sentences of the former type are often interpreted as somehow metaphorical; sentences of the latter type, if interpretable at all, must be looked at in an entirely different way. Deviations from contextual rules involving category features (see above, p. 45) are still different in interpretive potential. Thus the various types of contextual feature are rather different in the conditions that they impose on sentence structures. Notice incidentally that the ease with which sentences deviating from selectional rules can be interpreted is not simply a result of the fact that 'low-level' syntactic features such as [ ± Human] or [takes Animate Object] are involved. These features can participate in rules that are not at all violable in the way in which selectional rules may be (consider, for example, such expressions as 'the table who I scratched with a knife', 'who I saw was John', 'a very barking dog', etc.). There is much to say about this general problem; it is clear, however, that a nontrivial study of it demands a rich and detailed understanding of the various types of grammatical process. We assumed, in this brief account of syntactic features, that the features of a Noun are inherent to it and that the features that selectionally relate Nouns to Verbs or Nouns to Adjectives appear as contextual (selectional) features of the Verbs and

TOPICS IN THE THEORY OF GENERATIVE GRAMMAR

47

Adjectives. This was not an arbitrary decision; it can easily be justified on syntactic grounds. For discussion of this question, and many of the other topics mentioned briefly here, see Chomsky (Aspects of theory of syntax, Ch. 2). With this, I conclude part 5 of the introductory outline. I have now briefly sketched two major respects in which the first modern attempts to formulate a theory of grammatical transformations were shown to be defective by later work. The first defect was one of excessive richness in expressive power. We have now been discussing a defect of the opposite kind, namely, an inability to express certain aspects of grammatical structure, and have suggested a way to modify the theory so as to overcome this. The theory of transformational generative grammar that results from these modifications is conceptually quite simple, and is reasonably well-supported by what empirical evidence is presently available. Each component of the theory has a well-defined function; I see no way in which any of the postulated mechanisms can be eliminated without sacrifice of descriptive adequacy, and know of no justification for postulating a more complex structure and organization of the theory of the syntactic component than what has been stretched in outline here. For the present, then, this theory seems to me to constitute the most satisfactory hypothesis as to the form of the syntactic component of a grammar. IV

I would like to conclude this paper with a few remarks about sound structure, more specifically, about the organization of the phonological component of a generative grammar. The phonological component is a system of rules that relate a surface structure to the phonetic representation of a string. We have been assuming that a surface structure is a labeled bracketing of a sequence of minimal elements which we may call formatives. In the last lecture, we distinguished between two types of formatives — grammatical and lexical. Each lexical formative is a complex symbol, that is, a set of features. Among these are phonological features, which can be represented in matrix form with rows corresponding to features and columns to 'segments'. Thus if a formative contains the phonological feature [uF n ], where a is + or —, F is a feature, and n is an integer, then the matrix will have the entry a in the nth column in the row corresponding to the feature F. This matrix is essentially a classificatory device; it determines the phonological rules that will apply to the item in question. In effect, then, the phonological matrices represent a classification induced by the system of phonological rules. The rules of the phonological component apply to such matrices, adding entries or revising them, and perhaps adding or deleting columns. In the course of this operation, entries that are blank in the surface structure may be filled in, and we may think of all entries (which are initially either blank, or marked + or —) as being replaced by integers indicating the position of the segment in question in the scale defined by the feature in question. The rules will also delete grammatical forma-

48

NOAM CHOMSKY

tives or 'spell' them in terms of feature matrices. The final output of the system of phonological rules will be a phonetic matrix for the sentence as a whole in which columns stand for successive segments (phones) and rows define phonetic distinctive features, regarded now as scales, the entry indicating where a segment falls along a scale. The character of such rules has been described elsewhere in detail.39 I presuppose familiarity with this topic in the following remarks. In particular, it should be noted that the phonetic distinctive features are proposed as linguistic universals, which meet the requirement (9i) of section I for general linguistic theory. The input to the phonological component I will call a phonological representation.40 The output produced by the phonological component I will call a phonetic representation. I will not attempt to investigate the relation of phonological to phonetic representations in any detail, but I would like to mention a few crucial issues that arise when this problem is considered. Before continuing, I would like to remark that there can hardly be any question as to the linguistic significance of phonological and phonetic representation, in the sense defined above. A generative grammar that does not provide representations of these two types is unimaginable, in the present state of our knowledge. Furthermore, I do not believe that there is any serious controversy about this question. The first question that arises, in connection with the phonological component of a grammar, is whether there is any other linguistically significant system of representation intermediate between phonological and phonetic. In particular, we may ask whether there is an intermediate level meeting the conditions that have been imposed on the notion 'phonemic representation' in modern (taxonomic) linguistic theory. I will not state these here; for a detailed discussion, see Chomsky (Current issues in linguistic theory). This is a substantive question, and cannot be settled by terminological decision. It is a question of fact. Furthermore, it is clear that the burden of proof is on the linguist who believes that the answer is positive — that there is, in other words, a linguistically significant level of representation meeting the conditions on taxonomic phonemics and provided by the phonological rules of the grammar. The claim that taxonomic phonemics exists as a part of linguistic structure seems to me without justification. There are various ways in which one might try to establish the existence of this level; none of them, so far as I can see, succeeds in establishing this conclusion. A detailed argument is presented in Chomsky (Current issues in linguistic theory). I will survey briefly some of the main points. " E.g. Halle, The sound pattern of Russian, "Phonology in a generative grammar", "On the bases of phonology"; Chomsky and Miller, "Finitary models of language users"; Chomsky, Current issues of linguistic theory. 40 Alternatively, we might restrict the term 'phonological representation' to the representation that we have at the point at which all grammatical formatives other than boundary symbols are eliminated in favor of matrices, so that what we have is a string of phonological matrices and boundary symbols (which, incidentally, also require feature analysis), with IC structure (i.e. labeled bracketing) marked. This is what is called "systematic phonemic representation" in Chomsky (Current issues in linguistic theory), where the topics now under discussion are elaborated in much more detail.

TOPICS IN THE THEORY OF GENERATIVE GRAMMAR

49

Certain versions of taxonomic phonemics are based on distributional analytic procedures of various sorts, the notion of 'complementary distribution' being central to these. But this notion is fundamentally defective. It permits analyses that are not acceptable to the taxonomic phonemicist (or to anyone else); it excludes the optimal taxonomic phonemic analysis in certain cases. Similarly, the other procedures of phonemic analysis fail to provide the intended results. But there is another consideration that cuts much deeper than this. It might, after all, be shown that in some way the procedures can be improved to the point where they provide an analysis of the type postulated as essential by the taxonomic phonemicist, and exclude all other analyses. It is therefore necessary to shift our attention to the postulated analysis itself, and to ask whether an analysis of this sort can be provided by a system of phonological rules. This question was considered by Halle several years ago, and he produced an argument, which has since been repeated in the literature many times, 41 to show that if a taxonomic phonemic representation is provided by the phonological component, then important generalizations of obvious and unchallenged linguistic significance must be given up. It seems to me that this argument is unanswerable, but since its force has not been fully appreciated, it will perhaps be useful to present it schematically once again. Reducing the argument to its essentials, suppose that we have a language with a phonological asymmetry, for example, a language with [t], [d], [5] and [j] phonetically, and with a phonological contrast between /t/ and /d/ but none between [5] and [j]. Thus there are, let us say, morphemes /Xt/, /Xd/, but no comparable pair or near-pair for 5—j. Suppose that there is, furthermore, a general rule of voicing assimilation in the language. This rule can be stated as (24), using customary conventions: (24)

Consonant -> [+Voiced] in the environment: — [+Voiced],

Thus the morpheme /Xt/ will appear as [Xd] in the context —[+Voiced] and as [Xt] in the context —[—Voiced]; and a morpheme /Y5/ will appear as [Yj] in the context —[+Voiced] and [Y5] in the context —[—Voiced]. The rule (24) converts phonological representations directly to phonetic representations in both cases. Let us suppose that the only occurrences of [j] are those produced by rule (24). But observe that this grammar does not provide phonemic representations. For the taxonomic phonemicist, of any school, the lexical representations /Xt/, /Y£/ are not phonemic but 'morphophonemic' (what we have been calling 'phonological', following essentially Sapir's usage of terms). The morphophonemic, phonemic, and phonetic representations would be as given in table (25). The first column gives the phonological ( = morphophonemic) representation of the, forms in the third column; the second gives their phonemic representations. The first column does not qualify as phonemic because it fails biuniqueness; the third does not 41

Cf. Lees {The grammar of English nominalizations); Halle (The sound pattern of Russian); Chomsky ("A transformational approach to syntax", Current issues in linguistic theory).

50

(25)

NOAM CHOMSKY

phonological morphophonemic

phonemic

phonetic

in the environment

Xt

Xd Xt

Xd Xt

—[+Voiced] —[—Voiced]

Y5

Y5

Yj Y5

—[+Voiced] —[—Voiced]

qualify as phonemic because [5] and []] belong to the same phoneme, under the circumstances we have described, in accordance with any phonemic theory that has ever been produced. The grammar containing rule (24) thus converts phonological to phonetic representations without providing phonemic representations as a linguistic level. That is, if we mean by the phrase 'level of representation' a system of representations that appears at some well-defined point in the process of sentence-generation, then the grammar provides no level of phonemic representation (it is difficult to imagine what other sense might be given to this expression). To provide a level of phonemic representation, the grammar would have to replace (24) by the two rules (26), (27): (26) (27)

t [+Voiced] in the environment: —[+Voiced] 5 -> [+Voiced] in the environment: —[+Voiced].

Rule (26) is now a 'morphophonemic rule' and rule (27) a 'phonemic rule'. The rule (24), which expresses the linguistic facts in the most general form, is now not expressed in the grammar. Real examples of this sort are easy to find. They show that a level of phonemic representation can be included in a grammar only if certain generalizations are lost as now inexpressible. Since obviously the point of grammar construction is to provide general rules governing the organization of the phonetic facts, this observation constitutes a strong argument against the assumption that there exists a linguistic level of phonemic representation. To my knowledge, the only defense that has been offered for taxonomic phonemics against this argument is due to Sydney Lamb, in a paper delivered before the Linguistic Society of America in December, 1963. Lamb attempted to refute Halle's argument against taxonomic phonemics in the following way. Let us take the symbol h to be a 'aevoicing element', so that t is represented dh and 5 is represented jh. Let us now construct a grammar for the example described above, using this notation. The morpheme /Xt/ will be represented /Xdh/, the morpheme /Yc/ will be represented /Yjh/, in morphophonemic representation. We now give just the single rule (28): (28) 42

h -> 0 in the environment: —[+Voiced]. 42

Technically, the environment should be stated as follows: '—CZ, where C is any consonant and Z is any segment other than h, or is a boundary symbol.'

TOPICS IN THE THEORY OF GENERATIVE GRAMMAR

51

The rule (28) converts the representations /Xdh/ and /Yjh/ to [Xd] and [Yj], respectively, before Voiced elements, and leaves them in the form [Xdh] = [Xt] and [Yjh] = [Y5] everywhere else. Thus this grammar produces the correct phonetic forms, and does so without losing the generalization, since the full set of phonetic facts is summarized in the single rule (28). Thus Halle's argument is refuted, since it is not necessary to replace the generalization (28) by two special cases (26), (27), when the grammar is formulated this way. But it is evident that this attempt to defend taxonomic phonemics against Halle's criticism is simply an exact restatement of this criticism in a new notation. Instead of writing features in columns, as Halle does, Lamb writes them sequentially; instead of using the feature [Voicing], Lamb marks the unvoiced case with h, and leaves the voiced case with no explicit indication. 43 But the crucial point is that Lamb's notational reformulation of Halle's argument gives a grammar in which there are no phonemic representations just as Halle's optimal grammar provides no phonemic representations, and for the same reason; namely, the generalization is incompatible with the presence of a phonemic level. There is nothing corresponding to the middle column of (25) in the Halle-Lamb analysis, and the only point at issue is the status of this column. Thus Lamb has simply presented a notational variant to Halle's argument against taxonomic phonemics. From this and many other similar considerations, one must conclude that there is no internal linguistic justification for phonemics in its modern (post-Sapir) sense. The only way to demonstrate the linguistic signifinance of this concept, then, is by an argument based on some other grounds, e.g. perceptual or methodological grounds. However, no such approach seems feasible (for discussion, see Chomsky, Current issues in linguistic theory). For the time being, then, there is no reason to assume that the phonemic level is anything other than an artifact. Taxonomic phonemics developed from the assumption that sound structure should be studied either in complete isolation from syntax, or, in the case of approaches such as those of Pike and Harris, in partial isolation, with consideration of syntactic structure only insofar as this can be introduced in accordance with certain specified analytical procedures. Unless the arguments that have been directed against taxonomic phonemics can be met, one can only conclude that these assumptions are incorrect. So far as I know, there has never been any attempt to offer any justification for them

" Lamb's conventions actually seem to provide a restricted form of feature theory in which a feature cannot be left unspecified, and in which direct reference can be made in the rules to only one of the two possible values of a binary feature. [-Voice] corresponds to presence of h, [ + Voice] to its absence. Why the single feature voicing (in Lamb's Russian example, actually Voicing and Palatalization) should be singled out for this special treatment is unclear; and analysis of Russian sound structure that went beyond the one example would show that other features must also be extracted in this way. Such a restricted form of feature theory can be studied on its own merits, quite apart from any consideration of taxonomic phonemics, if it is presented in a clear enough form.

52

NOAM CHOMSKY

beyond the observation that they lead to a theory that is consistent and reasonably well-defined. These conditions are surely necessary, but hardly sufficient to guarantee linguistic significance. Let us now return to the question of how the rules of the phonological component apply to surface structures to determine, ultimately, the phonetic form of the utterance that they represent. It was proposed several years ago (Chomsky, Halle, Lukoff, "On accent and juncture in English") that one part of the phonology may be a sequence of transformational rules that apply to surface structures in the following way: first, they apply in a prescribed sequence to the expressions within innermost brackets, erasing these at the termination of the sequence; then, they reapply in exactly the same way; etc., until the maximum domain of phonological processes is reached. These rules are 'transformational' in the technical sense of linguistic theory, in that they take into account the phrase structure of the string to which they apply, and not just its linear form as a sequence of symbols. These transformational rules thus in effect determine the phonetic form of larger units on the basis of the phonetic form of the smaller units that constitute them. Notice that in their manner of operation they are quite analogous to the projection rules of the semantic component. Once the transformational cycle has completed its operation, surface structure is completely erased (up to the maximal domain of phonological processes). In support of this hypothesis, we showed how the multilevel stress contours of American English that had been observed by many phoneticians and phonemicists, involving at least five perceptual levels of stress, could be accounted for by postulation of only a single Accented-Unaccented distinction and a few simple rules operating in a transformational cycle (this is summarized and extended in various other publications, e.g. Chomsky, "Explanatory models in linguistics"; Chomsky and Miller, "Finitary models of language users"). Since that time, several studies have shown that other complex phonetic data can be explained on the same assumption (see note 6, p. 14, of Chomsky, Current issues in linguistic theory, for references). In the light of what information is now available, it seems reasonable to maintain that insofar as syntactic structure plays a role in determining phonetic form, it is the surface structure in the sense described in previous lectures that provides the relevant syntactic information, and (except for pre-cyclic rules that involve only lexical category — i.e. syntactic features, in the sense of the preceding section), the rules that determine the phonetic form in this way apply in a transformational cycle, as just indicated. This seems a very natural assumption, as well as one that is, for the present, well supported empirically by its ability to provide an explanation for what is in any other terms merely a collection of data. Our theory of the transformational cycle, as presented in Chomsky, Halle, Lukoff ("On accent and juncture in English"), was discussed at length in the Second University of Texas Conference in 1957, first in a critical paper by A. H. Marckwardt and then in an extended discussion [all of which appears in Hill, ed., Second (1957) Texas Conference on problems of linguistic analysis in English (Austin, 1962)]. The partici-

TOPICS IN THE THEORY O F GENERATIVE GRAMMAR

53

pants felt that the theory was completely demolished by this discussion (cf. e.g. Hill, p. 95, Twaddell, p. 104). Therefore, it is worthwhile to take up the criticisms that were presented point by point. Marckwardt's primary objection is that we were postulating new, complex items (namely, the surface structure of the utterance) 44 in order to account for the phonetic facts. Thus the Trager-Smith phonemic (we would call them 'phonetic') representations utilize four levels of 'phonemic' stress of the five or more that must appear in accurate perceptual representations, leaving the others to be predicted by rule. Although our transcription reduced this to a single 'phonemic' distinction, it did this at the cost of introducing the entire surface structure of the utterance, and therefore was not really more economical at all. This objection is a very curious one. Obviously, there would be no point in eliminating a set of stresses from a transcription by introducing a new set of arbitrary distinctions from which the stresses can be deduced. But this is not what we proposed. In fact, we showed that no new distinctions are needed at all to account for the stress distinctions represented in, for example, Trager-Smith transcriptions. In fact, the data presented in these transcriptions can be accounted for by assuming a single Accent distinction and relying, beyond this, only on surface structure and on the general theoretical principle of the transformational cycle. But surface structure is not some new, arbitrary construction invented for the purpose of accounting for stress. It is present in a grammatical description quite apart from any consideration of phonetic form. Thus if one wanted to generate, let us say, conventional orthographic representations instead of phonetic ones, in a full grammar, he would arrive at exactly the conclusions regarding the derived phrase markers of strings of words as we assumed in our attempt to account for the phonetic form. The surface structure is provided by the syntax; it is not based on phonetic considerations. This criticism shows that Marckwardt, as well as the other discussants, completely missed the entire point of our paper. Our purpose was to show how it is possible to go beyond mere listing of data by providing an explanation for this data on the basis of a general hypothesis about linguistic structure (the theory of the transformational cycle) and other, independently established facts about the language (namely, the surface structure of utterances). There is no question here of comparing two phonetic transcriptions in terms of their relative complexity. The distinction is rather between the development of a transcription to record data (e.g. the Trager-Smith system, which we adopted in our paper), and the attempt to account for data that is recorded in this

44

In this paper, we did not use labelled bracketing to represent surface structure, but rather a system of abstract 'junctures', indexed by integers to define the hierarchic structure. N o lexical categories and only two phrase categories were considered in that analysis, namely N o u n and 'everything else'; corresponding to these were the junctures — and = , respectively. Thus, for example, in this notation the full labeled bracketing [ N P [ D E T [ N John ] N ' S ]DET [ N [ N [ A black ]A [N board ] N ] N [ N [ V erase ]v er ) N ] N ] N P would be represented John's =3 black —t board —2 eraser.

54

NOAM CHOMSKY

way. 45 Marckwardt's first argument thus simply reduces to an objection to any attempt to account for certain data by bringing to bear other facts and general theoretical principles — that is, it amounts to nothing more than an insistence that one should not go beyond mere recording of data in linguistic description. A second argument presented in the discussion is that junctures had been proven (in Joos' paper in the same conference) to be phonetically detectable, whereas we had argued that the delimitation of phrases given in surface structure is not represented by phonetically detectable junctures. There is surely no one who would be willing to claim today that the IC structure of an utterance is indicated by phonetically detectable junctures, so I will omit any further discussion of this point. A third objection to our paper was that by assuming a surface structure which is not merely a projection of the phonetics (e.g. of phonetically detectable junctures), we had given up the hope of using taxonomic discovery procedures (i.e. procedures of segmentation and classification of various sorts) to establish syntactic structure. This we cheerfully admit, taking for granted that the discussion in the intervening years has shown conclusively that the attempt to discover syntactic structure in this way is hopeless (and, furthermore, entirely unmotivated). 46 A fourth objection is that we invent syntactic analyses arbitrarily so as to produce the correct phonetic results. T o support this, Marckwardt cites our example 'excess profits tax', which we had assumed to have the meaning 'tax on excess profits', and had analyzed accordingly, showing that this analysis (with the phrase 'excess profits' embedded within the Noun 'excess profits tax') accounts for the stress contour. This syntactic analysis of the expression 'excess profits tax' Marckwardt sees no reason to accept. Since no other examples are given, I pass over this objection in silence. The fifth objection is that our analysis is not complete. Thus there are many constructions that we did not account for in our explanatory scheme, whereas, in contrast, the Trager-Smith system of transcription can, presumably, provide a representation for any utterance that is likely to be produced. It is quite true that our attempt to explain the facts of English was far from complete; at the same time, we quite agree that it is possible to construct a system of phonetic transcription that will be complete. But it is entirely senseless to compare the 'completeness' of an explanatory theory with the 'completeness' of a scheme for recording data. It is, for example, no criticism of physical theory to point out that its 'coverage' is far less than that of a system for recording meter readings. Within linguistics, obviously, the discrepancy between 46 A system for recording data is itself a 'theory' in a certain weak sense, insofar as it embodies a hypothesis about the system of distinctions (and the fineness of distinction) that must be represented to account for some mass of data. Thus a universal phonetic alphabet makes such an assumption about language in general, just as, for example, the Trager-Smith phonemic analysis makes the assumption that for all dialects of English, the actual phonetic form can be determined by rules of some sort from a representation containing no more than four stress levels, four pitch levels, etc. 46 T o support his criticism of our paper for its abandoning of the use of discovery procedures, Marckwardt offers only the following quotation from Trager and Smith: "The application of this [i.e. the representation of an utterance in terms of stress levels, phonetically detectable junctures, etc.] to the problem of determining immediate constituents is obvious."

TOPICS IN THE THEORY OF GENERATIVE GRAMMAR

55

what can be recorded and what can be explained in any serious way is enormously greater than in physics. This sort of objection in principle to attempts to explain phenomena in terms of other facts and postulated theoretical principles can only have the effect of guaranteeing that linguistics will remain indefinitely at the stage of datacollection and data-arrangement. The final objection presented in Marckwardt's critique is based on two assertions in our paper, namely, that surface structure is not in general represented by physically defined junctures, and that the studies of stress by Trager-Smith and others are purely impressionistic. Marckwardt does not question either assertion, and surely neither can possibly be questioned. But he concludes that it is pointless to attempt to explain impressionistically recorded stress contours in terms of IC analysis that is not marked by phonetic juncture. This conclusion is apparently based on his belief that if surface structure is not represented by phonetic junctures then it is hardly better than a figment of the imagination; more generally, that linguistics should not concern itself with phenomena that have only vague physical concomitants (90). This is a fantastic proposal, going far beyond the limitations on linguistic research proposed by Reichling, Uhlenbeck, Dixon, etc. (see Section II). There is not the slightest reason to suppose that phonetics provides the only evidence in support of conclusions about syntactic structure (or, for that matter, that it provides any significant evidence). Neither the traditional nor the modern study of syntax has accepted the restriction to phonetically marked aspects of utterance (except for a brief moment when it was hoped, apparently vainly, that 'phonological syntax' might yield some useful conclusions). As to the fact that representation of stress is purely impressionistic and not, so far as is known, determinable by physical measurement, this supports the conclusion of our paper and remains as an embarrassing difficulty for those who insist, for some reason, on limiting grammar-construction to what they can discover on the basis of phonetic observation alone. From our point of view, there would be nothing at all surprising about the discovery that the stress levels heard by the careful phonetician have no physical basis at all.47 Since in any event, the phonetic contour is largely an automatic reflection of the syntactic structure, it follows that anyone who understands an utterance and thus, in particular, has determined its surface structure, should be able to predict the phonetic contour by rules that constitute part of his linguistic competence. He will, then, 'hear' what these rules predict, as long as this is not in too violent disagreement with the physical facts. 48 However, the fact that stress judgements may not have a purely physical basis seems quite impossible to reconcile with " I do not assert that this strong statement is true, but only that we would not find such a conclusion incompatible with what we have found and presented. In fact, it seems not at all unlikely that something like this may be true — that is, that what the phonetician 'hears' is largely a reflection of what he knows rather than just of what is present in the physical signal itself. Such a conclusion would hardly surprise either psychologists or acoustic phoneticians, and it is entirely in accord with what little is known in linguistics about the nature of phonetic representation. 48 The point is worth pursuing. O u r theory of English stress is compatible with the assumption that

56

NOAM CHOMSKY

a belief in the potentialities of 'phonological syntax', or, for that matter, with a belief in the usefulness of the study of sound in isolation from syntactic structure. I am unable to extract any other specific points that merit discussion from the remainder of the recorded proceedings of the Texas conference. In summary, I find nothing in Marckwardt's paper or in the discussions that suggests any error or defect either in our proposals concerning rules of stress assignment in English or in the principle that we suggested to account for the data that had been recorded. Insofar as they are not based on confusion or misreading, the criticisms agreed on by all participants simply reduce to the observation that within the narrow bounds that they impose on linguistic research, there is no possibility of studying the principles that determine phonetic form and no point in doing this, where it is possible. We agree with this observation, and feel that it can be generalized well beyond the domain of fact that was discussed in this conference, This, in fact, seems to us the primary reason for rejecting the approach to the study of language accepted by the participants in this conference. I think that a careful study of the proceedings of this conference should prove quite rewarding in the light that it sheds on the limitations of one particularly influential form of modern taxonomic linguistics. There is, however one very real criticism of our paper. As we have since discovered, it did not go far enough in eliminating stress from phonological transcription. In fact, more recent work shows that even the Accented-Unaccented distinction is so marginal in English that it hardly makes any sense to regard Stress or Accent as a phonologically functional category (i.e. as constituting a row which is filled by + ' s and —'s in lexical matrices), except for a class of examples comparable, essentially, to strong verbs or irregular plurals. Furthermore, we were in error in assuming that the reduced vowel [i] must be represented in phonological matrices. Its occurrence, too, can largely be predicted by extension of the transformational cycle, along lines described in Halle and Chomsky ("The morphophonemics of English", Quart. Progr. Rep. No. 58 275-81, Cambridge, Mass.; Chomsky ("Explanatory models in linguistics"), Chomsky and Miller ("Introduction to the formal analysis of natural languages"). Apart from the iterating rules of the transformational cycle, there are also many nontransformational phonological rules that apply only at one stage of the process of derivation. We can express this fact by incorporating these rules into the transformational cycle, limiting their application to the level of word boundary. These nonthe only 'functional' distinction is a single differentiation of relative stress — for example, the differentiation that distinguishes 'black bird' from 'blackbird'. We showed that this single differentiation (which is itself predictable in many syntactic contexts, though not, obviously, in isolation) is sufficient to give a mass of well-attested stress contours, with many levels, given the facts of syntactic organization and the principle of the transformational cycle. Consequently, we would expect that these many-leveled contours would be heard, whether physically present or not, by the phonetician who, as native speaker, has internalized the more-less differentiation that is functional and who understands the utterances he is transcribing. The question of the acoustic reality of these wellattested contours (which we do not challenge, as perceptual facts) deserves much closer investigation. For evidence bearing on this question, see Lieberman. "On the acoustic basis of the perception of intonation by linguists" (forthcoming).

TOPICS IN THE THEORY OF GENERATIVE GRAMMAR

57

transformational rules effect modifications of phonological matrices that are not determined by surface (IC) structure. There is a great deal to be said about the nature of both transformational and nontransformational phonological rules. In particular, it is important to observe that the phonological matrix postulated as an underlying form may undergo significant modification in the course of derivation, and in fact, it is not unusual for the postulated underlying form to appear in none of the actual phonetic realizations. Furthermore, it has been widely observed that the underlying forms are extremely resistant to historical change, whereas late phonetic rules are much less resistant to modification. Thus it is commonly found that the underlying forms for a language are barely different from those of much earlier stages of this language, and that dialects that differ greatly in phonetic realization and in surface organization of the phonetic data may be quite similar or identical in the phonological representations assigned to formatives and sentences. Recent work in English supports these general conclusions strongly; it shows, in fact, that phonological representation is for the most part rather like conventional orthography, for the dialects that have so far been studied.

SUMMARY

In this paper I have been discussing topics in linguistic theory from a point of view which is in most respects quite traditional, but which has been given new life and scope in recent work. I have also tried to show that this traditional view must be adopted, in its essentials, if linguistic research is to progress and to provide understanding of significant questions. There are value judgments here, of course; I have tried, here and in the references mentioned previously, to justify those that underlie the work I have been reviewing. This work has been based on the assumption that competence must be distinguished from performance if either is to be seriously studied. It has, beyond this, attempted to provide an explanatory theory of competence, and to use this as a basis for constructing an account of performance. The theory of competence is mentalistic, naturally, in that it can at the present stage of knowledge draw no evidence from and make no direct contribution towards the study of the mechanisms that may realize the mental structures that form the subject matter for this theory, or that carry out the mental processes that it studies. Thus the theory of competence (i.e. the theory of grammar) deals with abstract structures, postulated to account for and explain linguistic data. Certain aspects of the theory of grammar seem reasonably well-established today. The abstract character of underlying (deep) structure in both syntax and phonology is hardly open to question, and there are interesting general conclusions that can be drawn from this fact (see p. 22, n. 19). The role of grammatical transformations in syntax and phonology seems hardly disputable, in the light of present information, and the role of distinctive features in syntax and phonology also seems to be firmly

58

NOAM CHOMSKY

established. There is also little doubt that the rules relating abstract underlying structures to surface forms, in syntax and phonology, are ordered either linearly or cyclically in many or perhaps all parts of the grammar. Nevertheless, it goes without saying that any theory of grammar that can be formulated today must be highly tentative. Many questions remain totally open, many partially so. In general, the empirical assumptions about the form of language that can currently be formulated will undoubtedly be refined and improved, and, no doubt revised in essential ways as new critical evidence accumulates and deeper theoretical insights are achieved. Changes in linguistic theory are inevitable in coming years. In short, linguistics is a living subject.

BIBLIOGRAPHY

N. Chomsky, The logical structure of linguistic theory. Unpublished manuscript, Microfilm, M.I.T. Library (Cambridge, Mass., 1955). , "Three models for the description of language", I.R.E. Transactions on Information Theory IT-2/3.113-24 (1956). Reprinted with corrections in Readings in mathematical psychology, Vol. II, eds. R. D. Luce, R. Bush, and E. Galanter (New York, 1965). , Syntactic structures (The Hague, 1957). , "On the notion 'rule of grammar'", Structure of language and its mathematical aspects. Proceedings of the 12th Symposium in Applied Mathematics, ed. R. Jakobson 6-24 (Providence, 1961). Reprinted in Fodor and Katz 119-36 (1964). , "A transformational approach to syntax", Third (1958) Texas conference on problems of linguistic analysis in English, ed. A. A. Hill 124-58 (Austin, 1962). Reprinted in Fodor and Katz 211-45 (1964). , "Explanatory models in linguistics", Logic, methodology, and philosophy of science, eds. E. Nagel, P. Suppes, and A. Tarski 528-50 (Stanford, 1962). , Current issues in linguistic theory (The Hague, 1964). A slightly earlier version appears in Fodor and Katz 50-118 (1964). This is a revised and expanded version of a paper presented to the session "The logical basis of linguistic theory", Ninth International Congress of Linguists, Cambridge, Mass., 1962, and it is published in The Proceedings of the Ninth International Congress of Linguists, ed. H. Lunt (The Hague, 1964) and, in an earlier version, in the Preprints of the Congress (Cambridge, Mass., 1962). , Aspects of the theory of syntax (Cambridge, Mass., 1965). , Cartesian linguistics (forthcoming). , and M. Halle, "Some controversial questions in phonological theory", Theory of language 111 (1965). , M. Halle, and F. Lukoff, "On accent and juncture in English", For Roman Jakobson, eds. M. Halle, H. Lunt, and H. MacLean 65-80 (The Hague, 1956).

TOPICS IN THE THEORY OF GENERATIVE GRAMMAR

59

, and G. A. Miller, "Introduction to the formal analysis of natural languages", Handbook of mathematical psychology, Vol. II, eds. R. D. Luce, R. Bush, and E. Galanter (New York, 1963). R. M. W. Dixon, Linguistic science and logic (The Hague, 1963). , ' " A trend in semantics': rejoinder", Linguistics 4.14-8 (1964). C. A. Ferguson, Review of M. Halle, The sound pattern of Russian (1959), in Lg. 38.284-97 (1962). C. J. Fillmore, "The position of embedding transformations in a grammar", Word 19.208-31 (1963). J. Fodor and J. Katz, eds., Structure of Language: readings in the philosophy of language (Englewood Cliffs, N.J., 1964). H. A. Gleason, "The organization of language: a stratificational view", Report of the Fifteenth Annual (First International) Round Table Meeting on Linguistics and Language Studies. Monograph Series on Languages and Linguistics, ed. C. I. J. M. Stuart 17 (1964). M. Halle, The sound pattern of Russian (The Hague, 1959). , "Phonology in a generative grammar", Word 18.54-72 (1962). Reprinted in Fodor and Katz 334-54 (1964). , "On the bases of phonology", Structure of language: readings in the philosophy of language, eds. J. Fodor and J. Katz 324-33 (Englewood Cliffs, N.J., 1964). and N. Chomsky, "The morphophonemics of English", Quart. Progr. Rep. No. 58 275-81, Research Laboratory of Electronics, M.I.T. (Cambridge, Mass., 1960). G. H. Hannan, "Generative grammars without transformation rules: a defense of phrase structure", Lg. 39.597-616 (1963). Z. S. Harris, "Discourse analysis", Lg. 28. 18-23 (1952). Reprinted in Fodor and Katz 33-49 (1964). , "Distributional structure", Word 10.146-62 (1954). , "Co-occurrence and transformation in linguistic structure", Lg. 33.283-340 (1957). Reprinted in Fodor and Katz 155-210 (1964). A. A. Hill, Introduction to linguistic structures: from sound to sentence in English (New York, 1958). , ed., Second (1957) Texas Conference on problems of linguistic analysis in English (Austin, 1962). , "Grammaticality", Word 17.1-10 (1961). F.W. Householder jr., "On some recent claims in phonological theory", Theory of language 1/1 (1965). R. Jakobson, G. Fant, and M. Halle, Preliminaries to speech analysis (Cambridge, Mass., 1952, 1961). O. Jespersen, Philosophy of grammar (London, 1924). J. Katz, Innate ideas (forthcoming). and J. Fodor, "The structure of a semantic theory", Lg. 39.170-210 (1963). Reprinted in Fodor and Katz 479-518 (1964).

60

NOAM CHOMSKY

and P. Postal, An integrated theory of linguistic description (Cambridge, Mass., 1964). E. S. Klima, "Negation in English", Structure of language : readings in the philosophy of language, eds. J. Fodor and J. Katz 246-323 (Englewood Cliffs, N. J., 1964). R. B. Lees, Review of N. Chomsky, Syntactic structures (1957), in Lg. 33.375-408 (1957). , The grammar of English nominalizations (Bloomington, Ind., 1960). P. Lieberman, " O n the acoustic basis of the perception of intonation by linguists", Word (forthcoming). R. E. Longacre, Review of Z. S. Harris, String analysis of sentence structure (The Hague, 1962), in Lg. 39.473-7 (1963). H. Lunt, ed., Proceedings of the Ninth International Congress of Linguists (The Hague, 1964). A. H. Marckwardt, " 'On accent and juncture in English' — a critique", Second (1957) Texas Conference on problems of linguistic analysis in English, ed. A. A. Hill 87-93 (Austin, 1962). G. A. Miller and N. Chomsky, "Finitary models of language users", Handbook of mathematical psychology, Vol. II, eds. R. D. Luce, R. Bush, and E. Galanter (New York, 1963). , E. Galanter, and K. H. Pribram, Plans and the structure of behavior (New York, 1960). , and S. Isard, "Some perceptual consequences of linguistic rules", J. Verb. Learn. Verb. Behav. 2.217-28 (1963). P. M. Postal, "On the limitations of context-free phrase structure description", Quart. Progr. Rep. No. 64 231 231-7 Research Laboratory of Electronics, M.I.T. (Cambridge, Mass., 1961). , Constituent structure : a study of contemporary models of syntactic description (Bloomington, Ind., and The Hague, 1964). , "Limitations of phrase structure grammars", Structure of language: readings in the philosophy of language, eds. J. Fodor and J. Katz 137-51 (Englewood Cliffs, N.J., 1964). A. Reichling, "Principles and methods of syntax : cryptanalytical formalism", Lingua 10.1-17(1961). P. Rosenbaum, A grammar of English Predicate Complement constructions. Unpublished Ph.D. dissertation, M.I.T., 1965. E. M. Uhlenbeck, "An appraisal of transformation theory", Lingua 12.1-18 (1963).

LANGUAGE UNIVERSALS

JOSEPH H. GREENBERG

The problem of universals in the study of human language as in that of human culture in general concerns the possibility of generalizations which have as their scope all languages or all cultures. The question is whether underlying the diversities which are observable with relative ease there exist valid general principles. Such invariants would serve to specify in a precise manner the notion of 'human nature' whether in language or in other aspects of human behavior. They would, in effect, on the lowest level correspond to the 'empirical generalization' of the natural sciences. On higher levels they might be dignified by the name of laws. The search for universals, therefore, coincides on this view with the search for laws of human behavior, in the present context more specifically those of linguistic behavior. It was pointed out in an earlier study that for a statement about language to be considered fully general it is sufficient that it have as its logical scope the set of all languages. 1 The logical form may vary. It is typically, though not invariably, implicational. For all values of X, if X is a language, then, if it contains some feature a, it always contains some further feature P, but not necessarily vice-versa. Statements of this form, it is maintained, satisfy all of the usual requirements for fully general statements. The logical equivalence of such statements to certain typological ones has also been indicated. 2 Thus if all languages with the feature a, also have P, then a typology defined by the four logically possible types produced by the combinations of a and not-a with p and not-P (i.e. 1. Languages with both a and p, 2. Languages with a and with not-P, 3. Languages with not-a and P, 4. Languages with not-a and not-P) when applied to the empirically existing languages will give the following result. One of the types, namely a and not-P, will have no members since if a language has a, ex hypothesi it always has P also, and thus never not-p. Though in previous studies all of the generalizations stated have been synchronic in nature, it has been proposed that some connections between diachronic process and synchronic regularities must exist since no change can produce a synchronically unlawful state and all synchronic states are the outcome of diachronic processes. In the present study, which is frankly speculative and exploratory, the questions just 1 J. H. Greenberg, J. J. Jenkins, and C. E. Osgood, "Memorandum concerning language universals", in ed. J. H. Greenberg Universals of language 258 (Cambridge, Mass., 1963). 2 J. H. Greenberg, ed., "Introduction", Universals of language X (Cambridge, Mass., 1963).

62

JOSEPH H. GREENBERG

mentioned are the subject of further investigation. The topic of universals is here approached through the consideration of a single, but as it will turn out, rich and complex set of notions, those pertaining to marked and unmarked categories. What at first might seem very limited subject matter in relation to the more general one of universals, will in fact lead to the proposing of a considerable number of specific universals. The concept of the marked and unmarked will be shown to possess a high degree of generality in that it is applicable to the phonological, the grammatical, and the semantic aspects of language. Moreover, the topic is of such a nature that it will afford the opportunity of illustrating from concrete materials a number of the general methodological problems already mentioned : the relation between typology and universals; the relation of synchronic regularities to diachronic processes; and the problems of levels of generalization. In particular, it will be shown that the concept of marked and unmarked categories provides the possibility of formulating higher level hypotheses with deductive consequences in the form of more specific universals commonly arrived at by a more purely empirical consideration of the evidence. Moreover, as is usual in such cases, it will in certain instances suggest hypotheses which might not have occurred to the investigator outside of the more inclusive theory. In the final section, a specific application of this kind is made to the highly organized semantic area of kinship terminologies. Although this subject matter no doubt presents a more systematic semantic structure than is to be found by and large in language, nevertheless it is reasonable to suppose that the principles to be found here are operative elsewhere to the extent that a similar, even if usually lesser, degree of such organization is to be found. The idea of marked and unmarked categories is chiefly familiar to linguists from Prague school phonology. The best known instance is doubtless Trubetzkoy's classic Grundzuge der Phonologie, in which the notion plays an important role.3 In "Signe zéro" and other writings Jakobson showed that these ideas could be applied to the study of grammatical categories and to semantics.4 In the present study we shall be chiefly concerned with the following problems : what, if any, are the common features which would justify the equating of the concept of unmarked and marked categories in fields as diverse as phonology, grammar, and semantics? Is it possible to isolate some one characteristic which might serve as definitional for this notion which tends to take on Protean shapes? What is the connection between marked and unmarked ' N. S. Trubetzkoy, Grundzuge der Phonologie (Prague, 1939). It is not the purpose here to give a detailed historical account. The first occurrence of the terminology marked and unmarked (in phonology) appears to be by Trubetzkoy in 1931, "Die phonologischen Systeme", TCLP 4.96-116, especially p. 97. The first explicit use of this terminology for grammatical categories is probably by Jakobson in "Zur Struktur des russischen Verbums", Charistera Guilelmo Mathesio... 74-84 (Prague, 1932). Cf. also with a different terminology Hjelmslev in La Catégorie des Cas (Aarhus, 1935), particularly p. 113. Earlier adumbrations of these ideas in reference to inflectional categories are to be found in certain Russian grammarians, e.g. Peshkovskij, KarCevskij. 4 R. Jakobson, "Signe Zéro", in Mélanges Bally 143ff. (Geneva, 1939).

LANGUAGE UNIVERSALS

63

categories and universals? In the discussion of these subjects, applications will first be considered in phonology and then in grammar and semantics. The treatment will be at least partly in terms of the history of the subject, but it should be understood that the historical material is purely illustrative and merely incidental to the main purpose. As was mentioned earlier, the first use of the concept of marked and unmarked categories was in Prague school phonology. It arose in the context of the problem of neutralization and the archiphoneme. It was noted that in certain environments the contrast between correlative sets was neutralized in that both could not occur. By correlative set is meant a group of phonemes, usually two in number, which differ only in a single feature of the same category (e.g. voice, when one is unvoiced and the other voiced) and whose remaining shared features are not found in any other set. Thus, in English b and p are a correlative pair since they differ in voicing only and in regard to their remaining features they are the only non-nasal bilabial stops. In environments in which they do not contrast, the representative of the so-called archiphoneme, that is, the unit defined by the common features, may either be externally determined, that is, be conditioned by adjacent phonemes, or be internally determined. This last case is one in which a single phoneme always appears regardless of the environing sounds. A good example of external determination, found in many languages, is the neutralization of the contrast among nasals before stops where the choice is determined by the following homorganic consonant. A commonly cited example of internal conditioning is the neutralization of voice in final position for obstruents in German. Here it is always the unvoiced phoneme which appears regardless of the environment. The choice is thus internally conditioned. Another well-known example is classical Sanskrit where, in sentence final the opposition among voiced and unvoiced stops and aspirated and non-aspirated stops is neutralized and the unvoiced, unaspirated phoneme appears as the representative of the archiphoneme. Although, in principle, no doubt, neutralization is viewed as a phenomenon specific to each language, one cannot help noting that in different languages it is generallly the same category which appears in the position of neutralization. Thus in both German and Sanskrit it is the unvoiced member of the unvoiced/voiced opposition which is found. The feature which occurs in such instances is called the unmarked feature and the other the marked. Thus voicing is a marked feature; unvoicing an unmarked feature in German. Again for Sanskrit the unaspirated feature and the unvoiced feature are both unmarked as against the aspirated and voiced which are marked features. It may be noted in passing that in both these instances the unmarked feature is described phonetically by a term itself having a negative prefix un- while the marked feature lacks it. This turns out to be generally true. Thus nasality is a marked feature while non-nasality is the corresponding unmarked feature. It is as though the marked feature is a positive something, e.g. nasality, aspiration, while the unmarked feature is merely its lack. This aspect, not explicitly noted by Trubetzkoy,

64

JOSEPH H. GREENBERG

will reappear very importantly in our later consideration of marked and unmarked features in grammar and lexicon. Another important characteristic of unmarked and marked categories noted by Trubetzkoy is that of text frequency. In general the unmarked category has higher frequency than the marked. It is of some interest to note that George K. Zipf, in his pioneering studies of language frequency phenomena, had arrived at the same hypotheses by a different, but, as can be shown, ultimately related route, and some of his results are quoted by Trubetzkoy. For, if the marked feature contains something which is absent from the unmarked, it is relatively more complex and by Zipf's wellknown principle of least effort the more complex should be used less frequently. 5 Most of Zipf's data refer to the categories of voiced and unvoiced consonants and aspirated and unaspirated consonants. He also cites data regarding vowel length from Icelandic on the assumption that long vowels are more complex than the corresponding short vowels, that is, that length is the marked feature. In general Zipf's hypotheses regarding aspirated and voiced consonants hold although there are a few exceptions. It may be noted that Ferguson's hypotheses regarding the relatively greater text frequency of non-nasal over nasal vowels is consonant with the general thesis of the greater frequency of unmarked features. 6 Additional data on the less frequently considered cases of marked and unmarked phonologic features compiled by myself are presented here, along with some evidence already published in other sources and cited here for purposes of comparison. My own data are to be considered tentative insofar as the samples are small, usually 1000 phonemes. The results, nevertheless, are obviously significant and unlikely to be seriously modified by subsequent work. The following are examples of counts, all done by myself, on the relative frequency of glottalic and non-glottalic consonants in the following languages: Hausa, in West Africa, and the Amerind languages Klamath, Coos, Yurok, Chiricahua Apache, and Maidu. In the case of Hausa, voiced implosives contrast with ordinary voiced consonants in the pairs b/B and d/cf, and glottalized consonants contrast with nonglottalized in the pairs k/k', s/s' (usually ts'), and y/'y. In Maidu voiced implosives as well as glottalized contrast with ordinary unvoiced consonants in certain positions. In the other languages a single series of unvoiced, unglottalized consonants occurs, but for Chiricahua I have counted the three series unaspirated, aspirated, and glottalized. The results for each language are found in Tables I, II, III, IV, and V, and the results for the six languages are summarized and compared in Table VII. 7 6 G. K. Zipf, especially Psychobiology of language (Boston, 1935) and Human behavior and the principle of least effort (Cambridge, Mass., 1949). 6 C. A. Ferguson, "Assumptions about nasals; a sample study in phonological universals", in ed. J. H. Greenberg Universals of language 46 (Cambridge, Mass., 1963). 7 The Hausa count consists of the first 1000 phonemes on pages 1, 5 and 9 of R. C. Abraham Hausa literature and the Hausa sound system (London, 1959) [Greenberg]; Klamath from M. A. R. Barker, Klamath texts (Berkeley and Los Angeles, 1963), first 1000 on pages 6,16 and 26 [Greenberg]; Coos from L. Frachtenberg, Coos texts (Leyden, 1913) first 1000 from pages 5, 7, 14, 17, 20 and 24 (14 from commencement of new story on middle of page) [Greenberg]; Yurok from R. H. Robins, The Yurok language (Berkeley and Los Angeles, 1958) first 1000 on pages 162, 164 and 166 [Green-

65

LANGUAGE UNIVERSALS TABLE I Hausa (1000

b d k s y

17.0 19.8 21.9 14.2 19.3

phonemes)

6 cf k' ts' 'y

00.2 03.7 02.8 00.3 00.8

TABLE II Klamath (1000

p t Ö k q

l m n w

y

02.8 07.6 08.7 10.4 02.4 05.4 04.0 13.9 08.3 08.6

b d j g g L M N W Y

phonemes)

01.8 04.7 00.2 06.1 02.4 00.5 00.4

P' t' 6' k' q'

00.2

l' m' n' w'

00.0

y'

00.1

00.3 01.9 01.5 02.1 01.9 00.7 01.3 00.8 00.4 00.6

TABLE III Coos (1000

02.9 23.9 12.8 15.8 03.8 07.7 09.9 11.4

P t ts ö kk q I

phonemes)

P' t' ts' 6' k-' k' q' 1'

00.0 01.1 00.0

01.9 01.0

02.0 00.6 05.2

TABLE IV Yurok (1000

P t c k kw

08.9 14.3 12.9 38.3 11.4

phonemes)

P' t' c' k' kw

01.0

00.2 00.8 10.1 02.1

berg]; Chiricahua from H. Hoijer, Chiricahua and Mescalero Apache texts (Chicago, 1938), first 1000 on pages 5, 10, 15, 20, 23 and 25 [Greenberg]; Maidu from W. F. Shipley, Maidu texts and dictionary (Berkeley and Los Angeles, 1963) first 1000 on pages 10, 20, 30 and 40 [Greenberg].

66

JOSEPH H. GREENBERG TABLE V

Chiricahua (1000 phonemes) unaspiratedjunglottalized d z I X g

28.2 03.0 07.7 02.3 21.4

aspirated t c 5 I k

05.3 05.8 01.8 00.1 13.4

glottalized t' c' 5' k'

03.3 00.0 02.8 01.2 03.7

TABLE VI

Maidu (1000 phonemes) P t ts k

09.3 19.6 00.2 19.2

P' t' ts' k'

00.5 01.4 19.9 11.4

6 n is not uncommon, but b > d or p > / is practically unheard of. M

J. H. Greenberg and J. J. Jenkins, "Studies in the Psychological Correlates of the Sound System of American English", Word 20.157-177 [esp. 177] (1964).

LANGUAGE UNIVERSALS

97

The greater frequency of the unmarked then would be a resultant of certain common diachronic factors. Where other diachronic factors are at work, however, discrepancies may arise. Thus as was pointed out, some languages have a larger number of long vowel phonemes than short vowels because of the common monophthongization of the biphthongs aj and au. Of course, e and o having no short partner may be expected to become shorter, but various morphological or canonical form factors may serve to maintain length. For these reasons, while there is a far better chance tendency not only for the total text frequency of an unmarked set to be greater than that of the corresponding marked, but even for each individual pair, there are occasional exceptions. While frequency is thus merely a resultant, though a very important one, of overall diachronic tendencies in phonology, it is tempting to adjudge its role in grammarsemantics as primary. There is a real difference between frequency phenomena in phonology and in the grammatical-semantic sphere. For the former, we do not choose our expression in terms of sounds, except perhaps marginally in poetry so that phonologic frequency is an incidental characteristic which bears the marks of past diachronic changes. But we make grammatical and semantic choices based on the momentary situation. It is therefore plausible, insofar as there are constants in the human situation, that, for example, everywhere the singular should be more frequent than the plural and that this remains quite constant over time in spite of changes in the means of expression. De Saussure here, perhaps anachronistically interpreted, had a real insight where he has sometimes been judged to be obviously wrong; namely, in his identification of the diachronic with the phonological and the synchronic with the grammatical. The important phenomena of zero and facultative expression can be understood in terms of frequency phenomena based on the situation in the world with which the users of language must deal. In fact there is here no real difference between semantic and grammatical phenomena. For example, it is not so much in English that male is in general the unmarked category in relation to female, but the frequency of association of things in the real world. 'Author' means facultatively a writer of either sex, but par excellence, male, because in fact most authors are male. We see this if we compare the term 'nurse'. Since nurses are usually female, nurse takes on the meaning of nurse in general, or non-male nurse. To express the maleness of the nurse, when relevant, we use the marked expression 'male nurse'. Just so we may compare the ordinary semantic interpretation of words with or without syntactic modifiers with the morphological expression of corresponding categories. In a language without a grammatical category of diminutives and augmentatives, where size is indicated by modifying adjectives, if we use 'house' in a sentence without modifiers, the size is unspecified but the house may in fact be unusually large or unusually small. We will usually assume that it is of normal size because most houses are of normal size. On the other hand, 'small house' or 'large house' exclude explicitly from interpretation as normal size. The frequent assimilates the ambiguous, save contrary indications.

98

JOSEPH H. GREENBERG

There are other advantages to a frequency interpretation of marked and unmarked in grammar and semantics by which marked simply means definitionally less frequent and unmarked means more frequent. To begin with there is the obvious methodological advantage that frequency phenomena can be explored for every language whereas the other criteria are more limited in this respect, e.g. neutralization of certain subcategories may not exist in a given language. Frequency data will allow of degrees of marked and unmarked by which the associated phonemes will be expected to be most common and least subject to exception where the frequency disparity is the greatest. This indeed seems to be the case insofar as, for example, the hierarchy of persons is both less certain and overwhelming in regard to frequency and also less clear in other matters, whereas the hierarchy of numbers shows almost no exception in non-frequency phenomena and great constancy together with large frequency disparity for singular, plural, and dual. In addition to gradualizing and quantifying the scale, it also allows the construction of a much more subtle and manifold hierarchy, for example, for the cardinal and ordinal numbers. In addition the frequency definition will cover at least one case in which none of the other criteria is present but which has been considered as an example of the marked/unmarked distinction by Jakobson; namely, normal (unmarked) versus emphatic (marked) word order. The so-called normal order, it would seem, is necessarily the most frequent. We may refer here to the well-known story of the boy who cried wolf. Finally it may help to overcome the problem of lack of interlinguistic comparability of categories. Thus, for gender categories, we may at least conjecture that the associated phenomena such as zero expression and neutralization will be present to the degree that frequency differences exist among the genders. Since these are largely or completely conventional semantically and differ in size of membership, it is entirely plausible that the gender labelled 'masculine' in one language will be of much greater text frequency than the feminine in that language, while in another language, the relationship is reversed. We may hypothesize that in the first language the masculine will display the other characteristics of the unmarked category, while in the second it will rather be the feminine. Where the categories are not 'conventional', e.g. for cases, the way lies open to explain the frequencies of specific cases as a summation of a number of discrete uses, each substantially similar in frequency among languages but differently combined in different languages. For example, traditional grammar describes the uses of the ablative in Latin under such rubrics as the ablative of personal agent, separation, instrument, etc. If we had the frequencies of each of these, we could then, for example, compare it with Russian cases by equating a component of separation with the genitive with prepositions ot and iz while agent and instrument would be equated with the Russian instrumental. There is at the moment a great practical difficulty here, of course, as well as the theoretic problems of sampling. It is rare to have frequency studies of grammatical categories, and even these do not specify the separate uses of the categories. But this

LANGUAGE UNIVERSALS

99

can in principle, of course, be overcome in order to test the hypotheses presented here. The connection between frequency and the phenomena of grammatical or semantic neutralizations and morphological irregularities has not yet been discussed. It has often been noted that the most frequent forms are the most irregular. These are indeed now by our definition the unmarked forms. Where there is a complex set of intersecting categories, the frequency differences between combinations of unmarked categories and of marked categories are very great. For example, in Avery's study of the Rigvedic verb, the form which involves all of the most unmarked categories, singular, third person, present, active, indicative has 1404 occurrences, while the dual, second person, medio-passive perfect optative has zero frequency. Such enormous disparities must surely have an effect in that such a highly infrequent formation must follow analogically other parts of the system, while only a fairly frequent form can preserve irregularities. Hence, also syncretisms produced by the accidents of sound change will in such cases not lead immediately or inevitably to new formations to reintroduce the lost distinctions. Thus the general course of the reduction of the case system in Indo-European languages leads to the coalescence of the marked oblique cases, and where the whole structure finally collapses, it seems to be one of the direct cases, nominative or accusative, which is the historical source of the nouns now undifferentiated for case. Thus in phonology, diachronic process explains frequency, while in grammar, frequency explains diachronic process. Frequency not included in la langue definitionally is in fact an ever present and powerful factor in the evolution of grammatical categories and thus helps in explaining the types of synchronic states actually found. A particular type of connection between marked categories in phonology and grammar may be pointed out, and its explanation will now be clear on the basis of the above considerations. Sometimes the marked category in phonology is the expression of a marked category in grammar. Thus certain Amerind languages use the marked feature of glottalization to express the marked grammatical category of the diminutive. In German umlauted vowels may be considered a marked phonetic category as against their non-umlauted partners. Rounded front vowels always imply rounded back vowels in a particular language; their number is never greater, and their text frequency is generally less. Umlaut is used in German as a grammatical process to express the marked categories of plurality in the noun, comparative and superlative in the adjective, and past subjunctive in the verb. These phenomena result from zero expression of the unmarked where a phoneme involved in the expression of the marked disappears after having modified the simple preceding sound to produce a marked complex sound, e.g. umlauting produced by a former i or glottalization from a former glottal stop. Another example of phonological-grammatical connection is the widespread use of the marked category of final rising pitch for the expression of interrogation. Here the problem is somewhat different in that since the intonational pattern has this meaning directly we may seem to be tautologous in asserting that the less usual

100

JOSEPH H. GREENBERG

intonation expresses the less frequent category. However, there is further independent evidence for the 'normality' of tonal descent in that phonemes of pitch often have progressively lower allophones the later they occur in the sentence, but the phenomenon of allophonic raising never seems to occur. If it turns out that in fact frequency is an adequate unifying principle for the domain of the marked and unmarked in semantics and grammar, a great over-all simplification will have been achieved. But frequency is itself a symptom and the consistent relative frequency relations which appear to hold for lexical items and grammatical categories are themselves in need of explanation. Such explanations will not, in all probability, arise from a single principle. Thus it may be noted that in adjectival opposites where a theoretical scale with an implied zero point is unmarked, e.g. heavy, large, wide, deep, etc., there is obviously a unifying principle but it will not even apply to all adjectival opposites, e.g. good/bad, and is irrelevant in a host of other examples. Again the center of a normal frequency distribution is unmarked in relation to the extremes, e.g. normal size as against diminutive or augmentative. This topic is left for future exploration. In the discussion of universals of kinship terminology to which we now turn, the attempt will be made to apply these principles in a particular semantic domain. In this connection it will be possible to illustrate from concrete materials the relationship between the over-all theory of the marked and unmarked and typologies which accompany the specific universals derived from the theory. It will then appear that such a theory is of a higher level in that it binds together within common deductive structure various typologies which lack overt interconnections. In the foregoing discussion several examples were adduced from the kinship terminology of speakers of English as an illustration of the principle of marked and unmarked categories. Thus in the English term 'cousin' there is neutralization of sex reference as against 'brother' and 'sister'. Again there is zero expression of the consanguineal as against the affinal relation in such pairs as 'father' vs. 'father-in-law' and 'brother' vs. 'brother-in-law'. As a further example we might cite the absence of a term 'cousin-in-law', concocted here for illustrative purposes, which will exemplify defectivation, of the marked category 'affinal' which lacks, in ordinary usage, a term corresponding to 'cousin' among consanguineal terms. Of course all of these examples are taken from English. But as will be shown later, the specific hierarchy of categories in English kinship terminology such as lineal (unmarked) vs. collateral (marked), consanguineal (unmarked) vs. affinal (marked) are very widespread, and in fact for these, and others to be shortly mentioned, no significant exceptions have been found as yet. Let us then pursue the matter further, confining ourselves for the moment to English. In addition to the evidence we have already found for the marked or unmarked nature of the lineal/collateral and consanguineal/affmal categories, we have certain other evidence. In the direct descent line, i.e. among lineal ascendants and descendants, we see zero expression for the first ascending as against the second ascending in the pairs father/grandfather,

101

LANGUAGE UNIVERSALS

mother/grandmother. A corresponding contrast exists between G - 1 and G - 2 in the pairs son/grandson, daughter/granddaughter. This system, of course, extends further since G + 3 is marked as against G + 2 by the prefix 'great-' and G+4 as against G+3 by an additional occurrence of 'great-' and correspondingly for descending generations. We have then, in English, a recursive device by which a more remote generation is always marked as against a less remote generation. These additional data already suggest a tentative hypothesis of the third level as defined in the previous section. Of two categories it is the more remote from the speaker which is always marked in relation to the less remote. In fact it can be shown formally by the counting of the number of occurrences in definitions reduced to a chain of successive applications of the relation 'parent' and its converse 'child' (abstracting from qualifiers such as sex, relative age, etc.) that collateral and affinal relatives are more remote than lineal and consanguineal respectively. In testing further these and similar hypotheses, I have not set up a formal sample. As the basic set, the Gifford study of California kinship terminologies which contains kinship terminologies of approximately 80 California Indian groups was utilized.29 This was supplemented by approximately 40 additional terminologies from various other parts of the world. While it cannot be, of course, guaranteed that exceptions to the conclusions described here do not exist, their absence in the set examined gives reasonable assurance of at least statistical predominance. In what follows, therefore, I will illustrate with examples from this sample but without giving all the supporting instances from the sample. In the terminology of English speakers it was seen that the less remote generation has zero expression as against the more remote marked category. Neutralization for sex reference, not found in English, is fairly common elsewhere. Thus for the Bavenda, a South African Bantu-speaking group, a single term makhulu includes all four grandparents, father's father, father's mother, mother's father, and mother's mother in its reference, whereas there are separate terms for the male parent, khotsi 'father' and female parent, mme 'mother'. It is indeed a probable 'factual universal' that all systems distinguish male and female parent by separate terms even though very frequently other kin types are included in the referents of both, e.g. father's brother is often designated by the same term as father. The Bavenda example also involves neutralization of the distinction between lineal and collateral in the marked second ascending generation as against the unmarked first ascending. The just quoted term makhulu also comprehends siblings of grandparents, e.g. mother's mother's brother. In the first ascending generation there are separate terms for the mother's brother and the father's sister, malume and makhadzi, respectively. The Venda system is an example of the widespread bifurcate merging type in which the father and father's brother are referred to by the same term, while there is " E. W. Gifford, "California Kinship Terminologies", University American Archaeology and Ethnography 18.1-285 (1922).

of California

Publication

in

102

JOSEPH H. GREENBERG

a separate term for mother's brother. Similarly mother's sister and mother have a single designation, while mother's brother has a separate term. A similar neutralization of the lineal-collateral distinction in the second ascending generation is found also in some systems which like English have a single term for both father's brother and mother's brother and another for father's sister and mother's sister. Thus Hanunoo in the Philippines has qamaq for 'father' and bapaq which designates among all the kin types to which we apply the term 'uncle'. Likewise there is qinaq 'mother' and bayih 'aunt'. But for the second ascending generation a single term lakih includes both grandfathers and either grandmother's grandfather's brothers or grandmother's or grandfather's sister's husband. The term qiduh 'grandmother' has a corresponding extension for females. The same Hanunoo system exhibits still further neutralization in the third generation in that the word qumput in addition to covering both lineals and collaterals as does the second generation term, makes no distinction in the sex of the referent. It covers, therefore, all lineal and collateral relatives of the third ascending generation. Returning to the Bavenda we find here also evidence of the relatively marked character of the third ascending as against the second ascending generation, in that the first is makhulukuku formed from the grandparental term by the addition of a suffix -kuku. Similarly there are numerous evidences for a corresponding hierarchy for the descending generations. Since the Hanunoo terms for aunts, uncles, grandparents, and great-grandparents are all self-reciprocal, that is whenever A calls B by one of these terms, the same term is the appropriate one for B to call A, neutralizations of successively increasing scope are found in the first, second, and third descending generations as in the first, second, and third ascending generations. A further example is the Sara dialect of Ainu in which the sex difference found in the first descending generation terms po 'son' and matne-po 'daughter' is neutralized in the second descending generation in the sex-undifferentiated term mitpo 'grandchild', which is, incidentally, also marked by a prefix to the term for son. The third generation appellation is likewise not distinguished for sex and has an additional prefix to the second generation term, i.e. mitpo 'grandchild', san-mitpo 'great-grandchild'. Similarly for the other categories already mentioned for which there is evidence in English, such as lineal-collateral, consanguineal/affinal and, we may add, steprelatives as against non-step-relatives, examples of neutralization and non-zero expression in the marked members are not difficult to find. Thus for ego's generation in Malay of Singapore we have three terms abang 'older brother', kakak 'older sister', and adik 'younger sibling of either sex'. For cousin these distinctions of sex and relative age are all obliterated in the single term sa-pupu. As an example of neutralization in affinal as against consanguineal relatives we may cite Umbundu from Angola in which, as everywhere, father and mother are distinguished in the terminology. Here the term tata 'father' also includes male collaterals of the first and second degree, i.e. father's brothers and father's cousins,

LANGUAGE UNIVERSALS

103

and mai includes 'mother' as well as female collateral relatives of the mother. However, there is a single parent-in-law term ndatembo embracing parents of either sex of either husband or wife and with collateral extensions like that of the consanguineal parental terms. There is thus neutralization for sex of the person addressed. Further observation of the generational hierarchy shows that an additional factor to that of remoteness from ego must be taken into consideration. There are many examples which show that ascending generations are unmarked as against descending generations of equal genealogical distance from ego. An example is from Logoli, a Bantu-speaking people of Kenya, where we have guga 'grandfather', guku 'grandmother', and omwitjuxulu 'grandchild' for a lineal descendent of the second generation and of either sex. For the first ascending as against the first descending generation it is fairly common to find systems in which the marked character of the latter is evidenced by neutralization for sex reference, whereas, as has been seen, the distinction of father and mother terms is universal. Thus in Bantu languages generally there is a single 'child' term. This same situation usually holds in Austronesian languages also. Thus in Malay we have both bapa 'father', emak 'mother', but anak 'child' without distinction in gender. Of course both here and in the Bantu cases a qualifier can be added when necessary to specify the sex of the child, but this is not usual. At any rate, there are distinct morphemes for father and mother and a monomorphemic term of the first descending generation designates the child regardless of sex. That seniority is involved as an additional factor distinct from genealogical distance from ego is also shown in sibling terms. When relative age is indicated in the terminology, which is quite frequent, there are often indications that the terms designating older siblings are unmarked whereas those indicating younger siblings are marked. It may be noted that in the earlier example cited in a different connection of sibling terms from the Malay of Singapore, older siblings are distinguished for sex while younger are not. The terms are abang 'older brother', kakak 'older sister', and adik 'younger sibling of either sex'. Further evidence for the factor of seniority is the unmarked character of ego's own generation G° as against the first ascending generation. For example, it is not uncommon to find systems in which father is distinguished from father's sister and mother from mother's brother but in which their respective offspring, the siblings and cousins of the speaker, are all merged in a single term, thus eliminating the linealcollateral distinction. In this particular instance, however, it might be claimed that kinship distance to a sibling is greater than to a parent. This would follow from the uniform procedure of reckoning kinship distance by the number of occurrences of the relation 'parent' or its converse 'child' in the relational product required to define the terms. Thus for either father or mother of ego the relation parent obviously occurs once, whereas for brother or sister it occurs twice, since my sibling is my parent's child. Taking the two factors of seniority and genealogical distance from ego, then, the

104

JOSEPH H. GREENBERG

hierarchy of generations will begin with the first ascending as unmarked in relation to all others, then ego's generation and the first descending generation as about equal, the first descending being lesser in seniority but closer genealogically. After these we have successively, second ascending generation, second descending generation, third ascending generation, third descending generation, etc. The marked character of descending generations in relation to corresponding ascending generations is also shown in the phenomenon of reciprocal terms. T w o terms may be defined as reciprocals if whenever x refers to y by the first term, y refers to x by the second. If x and y are identical, then we say the term is self-reciprocal. Our English system of terminology has only one true reciprocal term 'cousin' and it is self-reciprocal. Take, for example, grandfather and grandson. If x calls y grandfather then y calls x grandson only if x is male. Therefore these terms are only partial reciprocals. On the other hand, uncle and grandfather are non-reciprocal since if x calls y uncle, y never calls x grandfather. The reason that complete reciprocity fails in the case of the English terms grandfather and grandson is obviously that for both the speaker may be of either sex while the person addressed is distinguished for sex. Where reciprocity holds, the following cases are possible: Both speaker and addressee may be of either sex, as with English 'cousin'. In such instances, the term may be self-reciprocal as in the case with 'cousin'. Both speaker and addressee may be of the same sex. Here also self-reciprocity is possible. Thus many Bantu languages have a sibling term commonly glossed as 'sibling of the same sex'. This word may be used by males to refer to males and by females to refer to females. These are necessarily self-reciprocal. Finally sex of the speaker and addressee may be different by the definitional requirement of the term. The same Bantu languages which have a term 'sibling of the same sex' normally have also a term meaning 'sibling of the opposite sex', naturally also self-reciprocal. N o w very many kinship systems, of which our own is an example, only contain terms which do not involve sex of the speaker in their definition. Other systems contain some terms in which the sex of the speaker is involved, but only along with terms of the former type which are thus universal. In the present connection what is significant is that commonly, though not always, terms involving sex of the speaker are reciprocal or self-reciprocal terms. They are, as it were, secondary, arising from the reciprocal use of the extremely common type of term in which sex of the speaker is not specified but sex of the addressee is, as with all English terms except cousin. Thus the true reciprocal of the term grandfather will be child's child where the speaker is necessarily male and of grandmother will be child's child with the speaker specified as female. In such instances we will gloss the terms as man's child child and woman's child child. Logically we could have reciprocals or self-reciprocals either of the type grandfather with its reciprocal man's child's child or grandson with its reciprocal man's parent's parent. The remarkable fact seems to be that examples of the first type in

LANGUAGE UNIVERSALS

105

which the so-to-speak normal situation that the sex of the speaker is not specified occurs for the ascending generation term but never, as for the second type, in the descending generation term. This tentative universal may be stated as follows: whenever there are two terms differing in generation which are true reciprocals, or there is one which is a self-reciprocal term with two referents, and one involves the sex of the speaker in its definition and the other does not, it is always the term of lower generational reference which contains the sex of the speaker in its definition. It may at first seem rather far-fetched to interpret the association of the normal situation of lack of reference to speaker's sex in the higher generation as a further evidence for the unmarked status of higher generation in distinction from lower generation terms. However, there are cases in which two distinct words are used, that is the terminology is not self-reciprocal and the ascending generation term, here interpreted as unmarked, has zero expression while the lower generation term with sex of speaker specified has an affix. Kawaiisu, a Shoshonean language of California, may serve as an example. We have a whole series of paired terms of the following type: sinu- 'mother's brother', sinuci- 'man's sister's child', logo- 'mother's father', 'spouse's mother's father', togoci- 'man's daughter's child', 'man's daughter's child's spouse', etc. Such reciprocals are most common in grandparental and uncleaunt terms but are also found in great-grandparental and parental terms and for in-laws. The generalizations thus far offered have all been based on the concept of marked and unmarked categories. It may be observed that at least one very important category, sex, has not been considered from this point of view. It may well be that neither male nor female can be described as the unmarked category on a universal basis. In a number of instances the male term has zero expression where the corresponding female term has an additional morpheme, but the data on neutralizations give conflicting evidence. Further, Lounsbury, in a pioneering contribution on the subject, describes the feminine as unmarked in Iroquois, in consonance with purely linguistic facts concerning Iroquois sex gender. In view of the earlier observations regarding the higher text frequencies of unmarked forms, it will be of interest to consider the data from English based once more on the Lorge magazine count and data from Spanish by Bou. We approach these data with the following expectations: among lineal terms the generational hierarchy leads to the predicted ordering 1. parental terms; 2. sibling terms and first descending generation terms; 3. grandparental; 4. grandchildren; 5. great-grandparental; 6. great-grandchildren. On the basis of the discussion of the sex category we will not expect a consistent preponderance of either male or female terms. The results conform fully to these expectations. In category two, the children terms are more frequent than the sibling terms, except for Spanish hija and hermana, suggesting that generational remoteness is here more important than seniority as a factor. The results are subsumed in Table XXX.

106

JOSEPH H. GREENBERG

TABLE XXX

G+1 G- 1

G° G+2

G- 2 G+3

G-3

father mother son daughter child brother sister grandfather grandmother grandson granddaughter great-grandfather great-grandmother great-grandson great-granddaughter

3235 3993 993 865 1574 659 590 173 346 32 33 8 19 0 0

padre madre hijo hija

5631 5598 3765 1749

hermano hermana abuelo abuela nieto nieta bisabuela bisabuela bisnietol bisnieta]

3120 1811 1234 1540 94 58 83 10 4t

Data from English, Spanish, French, German and Russian from the earlier mentioned sources in which terms of the same generation with different sex reference are consolidated shows that the predicated hierarchy holds for these languages without exception.30 TABLE XXXI

G+1 G-1 G° G+2

G- 2 G+3

G-3

English 7,228 1,858 1,249 519 65 27 0

Spanish 11,229 5,514 4,931 2,774 152 93 4

French 1,260 1,030 419 83 31 — —

German 9,428 6,047 3,449 614 242 31 29

Russian +721 721 703 293 20 — —

A second set of hypotheses predicts greater frequency for lineal than corresponding collateral terms. This is also verified in the figures of Table XXXII. English, E. L. Thorndike and I. Lorge, op. cit.; Spanish, I. R. Bou, op. cit.; German, Keading, op. cit.; Russian, H. H. Josselson, op. cit. Blanks indicate items not concluded in the count. In Russian both 'father' and 'mother' otets and mat'' are in Josselson's group of words (Group I) whose frequency was so great that they were not counted after a certain point. The figures are therefore not comparable with the rest but are necessarily greater than any of the others in the first sources counted.

107

LANGUAGE UNIVERSALS TABLE XXXII

G+1 G+ 1 G-1 G-1 G° G° G+2 G+2 G-2 G-2

lineal collateral lineal collateral lineal collateral lineal collateral lineal collateral

French 1,260 511 1,030 140 419 151 83

Spanish 11,229 4,717 5,514 361 4,931 867 2,774 0 152 0

English 7,228 1,504 1,858 148 1,249 316 519 0 65 0

—

31 —

German 9,428 1,219 6,047 464 3,449 427 614 6 242 6

Finally, as would be expected there is overwhelmingly greater frequency for consanguineal terms over corresponding affinal ones. TABLE XXXIII

father mother brother sister son daughter

3235 3993 659 590 993 590

father-in-law mother-in-law brother-in-law sister-in-law son-in-law daughter-in-law

17 53 23 18 27 19

padre madre hermano hermana hijo hija

5631 5598 3120 1811 3765 1749

suegro suegra cuñado cuñada yerno nuera

15 37 50 16 17 16

The approach to universals of kinship has thus far been through the concept of marked and unmarked categories. There has been no overt mention of typologies. Yet it is easy to see that implicit typologies are involved. Thus, to take one example among many, the neutralization for sex of referent in the marked category of second descending generation for lineal terms as against second ascending generation can be restated in terms of a typology. We classify kinship terminologies into those which distinguish sex of referent in second ascending generation lineal terms and those which do not. We similarly classify systems into two types for second descending generation lineal terms. The operation of these two sets of criteria simultaneously produces four logically possible types of language. Type one, in which both ascending and descending generations distinguish sex, is exemplified by English. Type two, in which neither second ascending nor second descending generations distinguish sex of referent, is represented by Lunda, a Bantu group. Type three, with sex distinguished in the ascending but not descending generation, has as one of its members Sara Ainu. The fourth type, however, with sex distinction in the second descending generation but not in the second ascending generation, apparently has no members. From this we restate our universal in the common implicational form, distinction of sex in the second descending generation implies the same distinction in the second ascending generation, but not vice versa.

108

JOSEPH H. GREENBERG

The approach through typologies in these and similar instances is clumsy and rather unrevealing because a separate typology is required for almost every universal and because the connections among these universals through the master principle of marked and unmarked categories do not appear. In other instances the sheer number of possible typologies makes this approach inadvisable. Consider, for example, the question of sex of speaker and addressee discussed earlier. A full typology will be based on the existence of nine possible classes of terms according to whether the speaker is male, female, or either sex and the addressee male, female, or either sex. Of these 9 types of terms, any system might theoretically contain a single type only, some combination of two types, and so on up to use of all 9. Of course, some of these are excluded by certain considerations. For example, a system consisting exclusively of terms with addressees of male sex only would lack all designations for female kin. The theoretical possibilities are 29 or 512 types, and even the exclusion of some of these for reasons such as those just described will leave several hundred types. On a pure sampling basis some of these will be expected to be lacking. A large variety of unenlightening implications will be possible. However there are some instances in which a typological approach is useful. There was earlier current a typology of kinship systems, the main lines of which continue to be followed in more recent work. 31 Kinship systems were classified on the basis of parental and parents' sibling terms, in other words, those of the first ascending generation. The key terms here are for males — father, father's brother, and mother's brother. We may distinguish four types of kinship terminologies. In the generational type all three of these relatives are referred to by the same terms. In the lineal type, to which our system belongs, the father is distinguished from the two collateral relatives which are merged in a single uncle term. In the bifurcate collateral system all three — father, paternal uncle, and maternal uncle — are designated by separate terms. Finally in the bifurcate merging systems the paternal line relatives, father and father's brother, receive the sample appellation, while a second term is used for the mother's brother. There are thus four types, generational, lineal, bifurcate collateral, and bifurcate merging, and no other type is even considered. But, in fact, there are five logical possibilities. For we can have either one, two, or three terms for these three kin types. Obviously the use of a single term or three separate terms each give one type. But for systems with two terms, any one of the three can receive a unique designation, while the other two fall under a second term. There are therefore three additional types producing a total of five not four. The missing type is the one in which the father and mother's brother are covered by a single kin term, while the father's brother is given a separate name. The fact that this type is not even mentioned is sufficient evidence of its extreme rarity or non-existence. In fact, I do not know of a single instance of this type. Its usual absence leads to the following implicational 11

R. H. Lowie, "Kinship Terminology", Encyclopaedia Britannica (date of first edition in which this article appears was not obtainable).

109

LANGUAGE UNIVERSALS

universal: whenever the father and mother's brother are designated by the same term the father's brother is likewise designated by the same term. Note that the father and mother's brother are the two most divergent, as it were, of the three relatives in that they differ both in the lineal/collateral dimension and in line of descent paternal/maternal. Analogous typologies can be constructed in other cases, and their complexity, in the sense of number of possible types, will depend of course on the size of the basic set of relatives. The earlier observation that all languages distinguish father from mother was an example of the simplest possible case. Here there are only two kin types, father and mother, and therefore only two logically possible types, those which use two terms and those which use one, that is have no separate father and mother term. Of these two types, apparently all languages belong to the first and none to the second. An example of a more complex typology is one based on grandparent terms, for here there are four kin types to be considered — father's father, mother's father, father's mother, and mother's mother. In this instance there are fifteen logically possible classifications. With one term there is one possibility. With two terms either term covers two relatives, or one covers three and the other a single kin type. The former occurs three ways, the latter four, making a total of seven. For three terms the only possible division is two, one, one; and this can occur in six ways. There is only one possible way of applying four terms. This gives us a total of 1 + 7 + 6 + 1 or fifteen types. Of these fifteen types, only six types occur in Gifford's survey of California kinship systems. Two other types occur in my material, one being common elsewhere but not found in California. In the following table each type is listed, together with a judgement as to whether it is frequent, uncommon, or, at least to my present knowledge, non-existent: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.

A. A. A. A. A. A. A. A. A. A. A. A. A. A. A.

FaFa, FaMo, MoFa, MoMo FaFa, FaMo B. MoFa, MoMo FaFa, MoFa B. FaMo, MoMo FaFa, MoMo B.FaMo, MoFa FaFa B. FaMo, MoFa, MoMo FaMo B. FaFa, MoFa, MoMo MoFa B. FaFa, FaMo, MoMo MoMo B. FaFa, FaMo, MoFa FaFa, FaMo B. MoFa C. MoMo FaFa, MoFa B. FaMo C. MoMo FaFa, MoMo B. FaMo C. MoFa FaMo, MoFa B. FaFa C. MoMo FaMo, MoMo FaFa C. MoFa MaFa, MoMo B. FaFa C. FaMo FaFa B. FaMo C. MoFa D. MoMo

common occurs common not found not found not found not found occurs occurs occurs not found occurs not found not found common

A certain order is brought into this multiplicity of types if we search for those combi-

110

JOSEPH H. GREENBERG

nations of kin types which are never classified together in occurrent types, except in type 1 which involves a single term for all relatives. In fact, there is only one such set consisting of FaFa and MoMo. Among all terminologies which involve a classification, that is all outside of type 1, those for which I have found examples, namely 2, 3, 8, 9, 10, 12, and 15, put FaFa and MoMo in different classes. For the converse there are only two types, 5 and 14, which put FaFa and MoMo in different classes, but which do not occur in my material. The explanation for the non-occurrence of these types is probably that both involve some terms in which sex of referent is specified and some which do not. This result is obviously consonant with the conclusion derived earlier from the consideration of first ascending generation terms. It will be recalled that the only theoretically possible type which did not occur was that in which father and mother's brother are classified together as against father's brother. Here, similarly, father's father differs from mother's mother in the two coordinates of sex of connectiong relative and sex of referent. This is true of one other pair, mother's father and father's mother, and indeed there is only one occurrent type in which these are classified together, outside of type 1. Of course, this is type 12 for which I have thus far found only a single example, Wikmunkan in Australia. It may be noted that throughout this discussion free use has been made of number of categories, e.g. consanguineal vs. affinal, lineal vs. collateral, generation, etc. It is worth noting that these categories play much the same role in the analysis of kinship terminologies as features do in phonological comparison. It is this analogy which underlies the current development of componential analysis. Like the features they are a finite set of categories, usually binary, in terms of which any kin term in any system may be adequately specified and which provide the indispensible analytic framework for comparative analyses, such as the present one. As with the phonological features, certain ones are utilized in all systems and certain ones are more restricted in their distribution. A consideration of these facts leads to universals of a kind analogous to those of Jakobsonian phonology, e.g. that all languages exhibit an opposition of vocalic and non-vocalic. This set of categories was first described in a fundamental study of A. L. Kroeber, in which he proposed eight categories: 1. generation; 2. lineal vs. collateral; 3. relative age within generation; 4. consanguineal vs. affinal; 5. sex of relative; 6. sex of speaker; 7. sex of connecting relative; 8. condition of life of connecting relative, i.e. whether living or dead.32 A category may be said to be used in a system if it enters into the definition of at least one kinship term. Thus sex of relative (i.e. of referent) is present in the English system because of terms like 'brother' and 'sister' even though it is neutralized in 'cousin'. On this basis three of Kroeber's eight categories seem to be universal. All systems make some use of (1) generation; (4) consanguineal vs. affinal distinction; and (5) sex of relative. " A. L. Kroeber, "Classificatory Systems of Relationship", Journal of the Royal Anthropological Society 39.77-84 (1909).

LANGUAGE UNIVERSALS

111

Conclusions such as those proposed here may seem of lesser interest to students of social structure than the current typological approaches in which differences in kinship types are the objects of attention because they lend themselves to the framing of hypotheses connecting terminology with social institutions. In fact, of course, the two enterprises are complementary. In seeking for differences we by the same token discover similarities as a negative result, and in seeking for similarities we uncover differences as a negative result. It may also be pointed out that in the wider context of social sciences in general, among which linguistics must be numbered, a correlation involving kinship and social institutions is a universal connecting linguistic and non-linguistic social data, while a universal within terminologies connects linguistic with other linguistic data, and these are also in the broad sense social. It is to be hoped that the present results, which will certainly both be amplified in scope and rectified in details in subsequent studies, will serve to show that at least one area of semantics can be treated with as great exactitude and can be as fruitful in conclusions of universal scope as the study of the more formal aspects of language such as phonology and grammar. APPENDIX ON WORD ASSOCIATION

A further intuitive manifestation of the marked-unmarked hierarchy discussed in the body of this paper is shown in word association where the stimulus words selected by psychologists have been exclusively drawn from the unmarked category, e.g. singular nouns, positive adjectives. However, a recent set of norms by D. S. Palermo and J. J. Jenkins [Word association norms, grade school through college (Minneapolis, 1963)] employs some stimulus words from grammatically marked categories. In formulating an hypothesis in advance of an examination of these data, a further factor independent of the marked-unmarked relationship has to be considered. There is a well-attested tendency for stimulus words of a particular grammatical category to elicit reponses of the same category. This is most completely documented in a study of L. V. Jones and S. Fillenbaum, Grammatically classified word associations (Chapel Hill, 1964), in which stimuli were classified on a part of speech basis into categories for each one of which the response frequency was greatest for the same part of speech. If we hypothesize on the basis that, for example, singular nouns ceteris paribus will elicit singular nouns and plural nouns will elicit plural nouns, we will make a set of predictions of the following form. A stimulus of an unmarked category will have responses of the same unmarked category almost exclusively since both factors, the tendency towards responses in the same category on the marked-unmarked hierarchy are working in the same direction. A marked stimulus will have a marked response but to a substantially smaller degree. It was possible to test this general hypothesis from the Palermo-Jenkins material in the following instances with consistently favorable results. For nouns there were

112

JOSEPH H. GREENBBRG

64 singulars as stimuli and 11 plurals and one ambiguous ('sheep')- The noun responses to each noun were classified as singular or plural with the following results. Singular R .940 .367 .897

Singular S Plural S Ambiguous S

Plural R .060 .633 .103

Total R 41456 7058 817

For adjectives some comparatives were included along with the usual positives, but no superlatives. The number of comparative responses to positive stimuli were so small (4 in 15,353) that it does not figure in the percentage summary. Superlative responses to comparative stimuli were exclusively with the same adjective base, e.g 'hottest' to 'hotter' as stimulus. There were 29 positive and 9 comparative adjective forms in the study, with these results. Positive S Comparative S

Positive R 1.000 .294

Comparative R .000 .689

Superlative R .000 .017

Total R 15353 6018

For verbs the data only included the 'general' (i.e. infinitive) form and the present participle in utilizable form. In two instances 'come' and 'become' the stimulus was ambiguous as between the general form and past participle but the results were tabulated with other examples of the general form. Practically all participle responses involved the same base as the general form stimulus. There were 22 verb stimuli of the general category and 5 present participles. The results are once more summarized in the following table: General S Participle S

General R .997 .194

Participle R .003 .806

Total R 7686 1749

HISTORICAL LINGUISTICS AND THE GENETIC RELATIONSHIP OF LANGUAGES MARY R. HAAS

1. INTRODUCTORY REMARKS

1.1 The most widely acclaimed theoretical approaches in the field of linguistics during the past thirty years have been directed toward the development of methodologies for dealing with the structures of languages in a nonhistorical sense. We have had, among many others, Sapir, Bloomfield, Hjelmslev, Jakobson, Martinet, Harris, and Chomsky. Whatever differences may exist between their approaches (especially as developed by their followers), they have in common the desire to expose, in the most precise and elegant fashion, the structure of language. In the view of many students today linguistics can lay claim to being a science only to the extent that it can demonstrate success in this aim. For a contrast to this view it is instructive to take a look at the predominant attitude of the nineteenth century. In that period it was COMPARATIVE AND HISTORICAL linguistics that was held up for admiration as being 'scientific'. In a late essay Bloomfield1 made no bones about this fact: ... a new mastery of historical perspective brought about, at the beginning of the nineteenth century, the development of comparative and historical linguistics. The method of this study may fairly be called one of the triumphs of nineteenth century science. In a survey of scientific method it should serve as a model of one type of investigation, since no other historical discipline has equalled it. (p. 2) [Emphasis mine.] Indeed the very term 'linguistic science', so commonly used in this period, seems more often than not to have meant 'historical and comparative linguistics'. Moreover, the discipline was generally acknowledged to be the most rigorous and hence most 'scientific' of all those branches of knowledge commonly subsumed under such terms as the humanities and the social sciences, or, as Sapir2 has pointed out: "In the course of their detailed researches Indo-European linguists have gradually developed a technique which is more nearly perfect than that of any other science dealing with man's institutions." (p. 207) [Emphasis mine.] In that same period anthropology, for example, was struggling to achieve the status of a scientific discipline, whereas linguistics, 1

Leonard Bloomfield, "Linguistic aspects of science", Foundations of the unity of science 1.4.1-59 (1939). 2 Edward Sapir, "The status of linguistics as a science", Lg. 5.207-14 (1929).

114

MARY R. HAAS

even t h o u g h being claimed as a branch of anthropology, 3 h a d already achieved recognition as a scientific discipline of the highest order. The success of linguistics thus served as a spur to m a n y other disciplines, particularly those concerned with ' m a n ' s institutions'. 1.2 W h a t was it that linguistics h a d that other disciplines sought t o emulate? Clyde Kluckhohn 4 has expressed it this way: In a period when even some natural scientists considered the systematic study of humanity as fruitless because of the complexities involved or actually denounced it as contravening the conception of God-given free will, the success of comparative philology, perhaps more than any other single fact, encouraged students of man to seek for regularities in human behavior, (p. 110) [Emphasis mine.] Linguistics had a rigorous m e t h o d of demonstrating the genetic relationship of languages. Moreover, it had amassed a great a m o u n t of material t h a t was m o r e t h a n sufficient to prove the genetic relationship of what is now k n o w n as the I n d o - E u r o p e a n family of languages. The key t o success in this demonstration can be summed u p in two simple statements: (1) Phonetic 'laws' 5 are regular provided it is recognized that (2) certain seemingly aberrant f o r m s can be shown t o be the results of analogy. The discovery of these truths was crucial in establishing linguistics as a scientific discipline. T h o u g h they m a y seem simple enough now — as all great truths do, once they are formulated — they did not take f o r m overnight a n d they were not arrived at without m a n y a false start a n d wrong assumption. Moreover — a n d this m a y come as a surprise to m a n y — their power has n o t yet been fully exploited. There are dozens a n d dozens of linguistic families in the world b u t few indeed can lay claim t o having been as thoroughly studied and as adequately reconstructed as I n d o European. 8 If we can convince ourselves of the necessity of applying the rigorous methodology already developed for I n d o - E u r o p e a n to as m a n y other families as 3 The latter part of the nineteenth century was characterized by an almost feverish desire to classify everything, including scientific disciplines which were subdivided into a variety of branches and subbranches. Among many others, Daniel G. Brinton proposed a scheme 'for the nomenclature and classification of the anthropological sciences' which included four main branches: 'I. Somatology: Physical and Experimental Anthropology'; 'II. Ethnology: Historic and Analytic Anthropology'; 'III. Ethnography: Geographic and Descriptive Anthropology'; and 'IV. Archeology: Prehistoric and Reconstructive Anthropology'. 'Linguistics' finds its place as item (e) under Ethnology. See Daniel G. Brinton, "The nomenclature and teaching of anthropology", American Anthropologist, o.s. 5.263-71 (1892), particularly pp. 265-6. 4 Clyde Kluckhohn, "Patterning as exemplified in Navaho culture", Language, culture and personality, eds. Leslie Spier, A. Irving Hallowell, and Stanley S. Newman 110 (Menasha, Wisconsin, 1941). 5 The term 'law' is a misnomer if interpreted as a universal term. A 'phonetic law' simply states what phone is found in a particular language at a particular time in terms of its correspondent in an earlier language (attested in writing or reconstructed), or vice versa. 6 This is not, of course, intended to imply that the work on Indo-European is now complete. On the contrary, a reassessment is urgently needed in order to integrate the vast amount of new material that has become available to scholars in the twentieth century.

HISTORICAL LINGUISTICS AND GENETIC RELATIONSHIP

115

possible, we can hope to achieve many highly rewarding advances in our knowledge of farflung genetic relationships among the languages of the world. But if we are unable to convince ourselves of this necessity, our handbooks will continue to be filled with highly speculative and all too often plainly dubious or misleading information. 1.3 Although scholars in the eighteenth century were already fumbling around with notions of language relationship, their efforts were on the whole crude. It was not until Sanskrit became known to scholars of the West that real progress began to be made. Sanskrit was much older than the oldest languages of Europe then known, and the transparency of much of its structure soon revealed answers to problems that had previously vexed scholars. Nevertheless, even this great treasure house did not provide ready-made solutions to all problems. The proper evaluation and interpretation of the material was acquired only gradually. For example, it was thought at first that since Sanskrit was older than other then known Indo-European languages everything about it was to be considered a more accurate reflection of an earlier state of affairs than anything found in more recent languages. Scholars tended to feel that if Sanskrit was not itself the 'ancestor' of Greek, Latin and most other languages of Europe, it was nonetheless chronologically so much closer to it that its testimony should take precedence over the testimony of the younger languages.7 The numerous errors that were engendered by this approach were eventually corrected, however, and this in itself was one of the triumphs of Indo-European scholarship. That much the same kinds of problems had to be tackled all over again with the discovery of the still older Hittite in the early part of the twentieth century means only that it takes time to assess the evidence from a previously unknown cognate language and that chronological readjustments are not easy to make on short notice. Today it is commonplace for students to take 'field methods' courses and it is taken for granted that a well-trained student will be able to cope with any language in the world whether or not it has ever been written down. Indeed, one of the most beneficial aspects of modern applied linguistics is the devising of alphabets for unwritten languages as an aid in combatting illiteracy. In view of our present sophistication in this regard, it is hard to realize how enslaved the minds of scholars of only a few decades ago were to writing and to the written forms of language. It comes as something of a shock to realize that most of the great advances in Indo-European studies were made under the illusion that the written language was the language. The rationale of 7

William Dwight Whitney, in an article originally published in 1867, eloquently expresses the situation in the following words: "The temptation is well-nigh irresistible to set up unduly as an infallible norm a language [Sanskrit] which casts so much light and explains so many difficulties; to exaggerate all its merits and overlook its defects; to defer to its authority in cases where it does not apply; to accept as of universal value its features of local and special growth; to treat it, in short, as if it were the mother of the Indo-European dialects, instead of the eldest sister in the family." [Emphasis mine.] See Whitney, "IndoEuropean philology and ethnology", Oriental and linguistic studies, 1.198-238 (New York, 1874);pp. 203-4. Of course Sanskrit can no longer even be considered the 'eldest sister'.

116

MARY R . HAAS

this unquestioned assumption is not hard to find. The fact that Sanskrit, for example, was a written language is the reason that we know it well today. If it had not been written we should certainly never be able to know what we do know about it. Even with all our hard-earned skill in the reconstruction of protolanguages, we would not quite be able to 'reconstruct' Sanskrit by comparing the modern Indie vernaculars. So even though we are no longer dependent upon the discovery of written documents in advancing our knowledge of linguistic relationship (in the case of unwritten languages, for example), it would be a serious mistake not to recognize the great value of written languages. In particular there is the historical consideration that we might never have arrived at the point of being able to reconstruct great numbers of the morphs of an unwritten language we call Proto-Indo-European if we had not had written documents of many Indo-European languages at different time levels to help us verify our results and thus give us confidence in our methods. With written languages of different time levels scholars can check their hypotheses in two directions because they have documented verification which provides relative chronology. Scholars who work with unwritten languages cannot do this in quite the same way since they have only one DOCUMENTED time-point, namely the present. But the earlier reliance on written languages needs to be noted in order to see how it eventually threatened to become an impediment to the further development of linguistic science. Since the existence of written languages, particularly those long extinct whose age can be calculated not only in centuries but millennia, was of great strategic importance in the development of our knowledge of Indo-European, some scholars came to believe that the historical and comparative study of languages was impossible without written records of earlier stages of the same or related languages. This view, as expressed in the first edition of Les langues du monde,8 aroused the ire of Leonard Bloomfield and thus gave rise to an interesting and important chapter in the development of comparative linguistics. 1.4 Bloomfield, usually celebrated for the prominent role he played in the development of DESCRIPTIVE (as opposed to historical) linguistics after 1933, the year of the appearance of his epoch-making book Language,9 was actually one of the greatest historical linguists of this century. A fine Germanic and Indo-European scholar, he also became interested in the Algonkian languages of North America and soon recognized the feasibility of reconstructing Proto-Algonkian. That the task imposed problems and difficulties of a type not likely to be encountered by the Indo-European comparativist made it all the more intriguing. 8

A. Meillet and M. Cohen, Les langues du monde1 (Paris, 1924). Even though the book devotes almost as much space to historical and comparative linguistics as to descriptive linguistics, the strong 'behavioristic' and 'nonmentalistic' orientation of the descriptive section gave it a greater initial impact. Such orientation was in accord with the dominant scientific trends of the time and was motivated, in part at least, by the continuing desire to demonstrate the scientific nature of linguistics as a discipline. Sapir, on the other hand, was never seduced by this particular kind of scientism and that is the reason why his book Language (New York, 1921) still strikes new readers of today as a fresh and lively piece of work. 9

HISTORICAL LINGUISTICS AND GENETIC RELATIONSHIP

117

The Algonkian languages were of course not 'written' languages in the ordinary sense of the term, and of course there were no written records of any earlier stages of any of the languages. On the other hand, many of them had been written down, in one fashion or another, by nonnatives of several nationalities, particularly missionaries and travelers, and there was a far greater amount of material in existence on these languages than on any other language family of North America.10 Brief vocabularies and other materials on one or another Algonkian form of speech began appearing as early as 1609,11 and by 1663 Eliot had completed his monumental task of translating the Bible into Natick (or Massachusetts).12 A few years later Eliot published The Indian grammar begun, or an essay to bring the Indian language into rules,13 but this

language was unfortunately one of those which became extinct before modern firsthand studies could be made of it. From these beginnings the stream of materials on Algonkian languages became a virtual flood. There were many dictionaries, some bilingual for French, English, or German, some for more than one of these. There were grammars14 and etymological studies and numerous other works. Moreover, the resemblances among the languages were such that it had long been recognized that they were genetically related and that this remarkable family had a geographical spread greater than that of any other family in North America. So there were even scholars who had commenced comparative work on these languages, the most notable of whom was Truman Michelson,15 but the results, though considerable, had been only haphazardly presented when Bloomfield entered the field. In order to give a rigorous demonstration of the genetic relationship of these languages, it was obvious to Bloomfield that he would have to reconstruct the protolanguage, and he proposed to do so by using exactly the same techniques that had been so successfully applied by the neogrammarians in the re-

10

The remarkable Bibliography of the Algonquiatt languages by James C. Pilling (Washington, 1891) is by far the largest (614 pages) of the several bibliographies of important American Indian linguistic families compiled by the same author. 11 Fide Pilling, op cit., p. 577. According to this source the earliest material published was a list of numerals of Souriquois, or Etchemin, which appeared in Histoire de la novelle France contenant les navigations, découvertes, et habitations faites par les François ..., by Marc Lescarbot (Paris, 1609). 12 John Eliot, The holy Bible, containing the Old Testament and the New (Cambridge, 1663). 18 Cambridge, 1666. 14 One of the most famous of these is a nineteenth century one, A grammar of the Cree language, with which is combined an analysis of the Chippeway dialect, by Joseph Howse (London, 1844). The most recent grammar is Bloomfield's own study of Menominee, never entirely completed and published posthumously: The Menomini language (New Haven-London, 1962). 15 His first important work on Algonkian was "Preliminary report on the linguistic classification of Algonquian tribes", Annual Report of the Bureau of [American] Ethnology 1906-07, 221-90b (Washington, 1912). Many others followed since the study of these languages remained his principal preoccupation throughout his life. Sapir and Kroeber also made early contributions to the study of comparative Algonkian, e.g. Edward Sapir, "Algonkin p and 5 in Cheyenne", American Anthropologist, n.s., 15.538-9 (1913); A. L. Kroeber, "Arapaho dialects", Univ. of Calif. Public, in Amer. Arch, and Ethn. 12/3.71-138 (1916), especially'pp. 77-80.

118

MARY R. HAAS

construction of Proto-Indo-European. 16 Furthermore, since many Indo-European scholars thought such a task could not be successfully accomplished in the absence of written records of earlier stages of the languages, Bloomfield set out quite deliberately to disprove this thesis. The result was his masterly paper "On the sound-system of Central Algonquian" 17 which paved the way for all future work in comparative Algonkian. Furthermore, to make sure that the nature of his accomplishment, with its important implications for similar work on all unwritten languages, would not be lost on his Indo-European confreres in Europe, and especially the editors of Les langues du monde, he appended the following footnote: I hope, also, to help dispose of the notion that the usual processes of linguistic change are suspended on the American continent (Meillet and Cohen, Les langues du monde, Paris, 1924, p. 9). If there exists anywhere a language in which these processes do not occur (soundchange independent of meaning, analogic change, etc.), then they will not explain the history of Indo-European or any other language. A principle such as the regularity of phonetic change is not part of the specific tradition handed on to each new speaker of a given language, but is either a universal trait of human speech or nothing at all, an error, (p. 130) 1.5 Bloomfield's success in reconstructing Proto-Algonkian is so significant in the development of historical linguistics that it deserves closer scrutiny and will be discussed in somewhat more detail later. Bloomfield's intention was to demonstrate that the 'sounds' of the protolanguage of a set of unwritten related languages could be reconstructed with the same degree of rigor and reliability as had been achieved for the Indo-European languages. Before he could do this, however, he had to have a completely accurate and reliable 'description' of each unwritten language that was to be used in the demonstration. All too frequently nonnatively written materials were entirely inadequate for his purposes. 18 He therefore seems to have developed his theories of descriptive linguistics in large part in order to tackle his problems of historical linguistics. But this had a result which he perhaps did not anticipate, namely, that in many quarters a schism developed between descriptive linguistics and historical linguistics. Although other factors and other persons were also involved in the movement, it remains true that Bloomfield helped to crystallize a theory of descriptive linguistics, particularly the theory of the phoneme, at exactly the moment in history at which it was most needed in the rapidly developing studies of American Indian and other unwritten languages. The repercussions quickly affected even the study of written European languages which came more and more to be studied as if they were 16 Holger Pedersen, The discovery of language (Linguistic science in the nineteenth century) 277-310 (Bloomington, 1962). This is a reprinting of John W. Spargo's translation (Cambridge, 1931). 17 Lg. 1.130-56 (1925). 18 However, he used such materials when he had no other choice. For example, in the paper just cited he says: "... for Cree I use Lacombe, Dictionnaire etgrammaire de la langue des Cris, Montreal 1874, correcting the forms where necessary, from observations made last summer for the Canadian Bureau of Mines" (130). Later on, when he had more materials of his own on Cree, he relied most heavily on these.

HISTORICAL LINGUISTICS AND GENETIC RELATIONSHIP

119

unwritten languages. This was, needless to say, a shocking approach at the time and one which did not, of course, go unchallenged. The important point here is that the development of a rigorous methodology in comparative linguistics in the nineteenth century led to the development of a rigorous methodology of descriptive linguistics in the twentieth century. The thread of continuity is therefore rigor. Moreover, Bloomfield is probably properly credited with being the first American linguist to exercise the utmost scrupulousness in maintaining the separateness of history and description. His deep concern with reconstructing the phonological and grammatical structure of Proto-Algonkian made him realize he would first have to have adequate descriptions of the languages to be used in making the reconstruction. And it was to this end he was so interested in sharpening the tools of descriptive linguistics. This was no idle or sterile interest. It culminated in the tightly-knit paper entitled simply "Algonquian" which was the only comparative sketch in Linguistic structures of native America.19

1.6 The greatness of Bloomfield's treatment of Algonkian can be ascribed to a variety of reasons.80 A very important point of methodology lies in his LIMITATION 21 OF THE PROBLEM to the comparison of four languages for which he had adequate (though often less than abundant) descriptive materials, namely Fox, Cree, Menominee, and Ojibwa. This method had both advantages and disadvantages, though for an INITIAL comprehensive statement of Algonkian comparative grammar the advantages seem far to outweigh the disadvantages. The languages chosen are all so-called Central languages and do not comprise even these in toto. To have attempted to use all available materials on the dozens of Algonkian languages for which some kind of information was available22 would have rendered the task so unwieldy und unmanageable that he would not have been able to complete it in a lifetime. By limiting himself to those languages over which he had good control he was able (1) to work out the phonological system for Proto-Algonkian as reflected in those particular four Central languages,23 and (2) to reconstruct large numbers of fully inflected words rather than being confined solely to the reconstruction of roots.24 The second result had the 19

By Harry Hoijer and others ( = Viking Fund Publications in Anthropology 6) (New York, 1946). Some of these reasons are set forth in an appreciation of Bloomfield's work on Algonkian written some years ago by C. F. Hockett; see "Implications of Bloomfield's Algonquian studies", Lg. 24.117-38 (1948) (reprinted in Readings in Linguistics, ed. Martin Joos 281-289, Washington, D.C., 1957). 21 Otto Dempwolff, working at about the same time on comparative Austronesian, also effectively limited his problem. The principal languages used in his Vergleichende Lautlehre des Austronesischen Wortschatzes (Hamburg, 1934) are Javanese, Toba-Batak, and Tagalog. 22 Pilling's 614-page bibliography of these languages, already referred to, was published in 1891. In the half century following the Pilling publication a great deal more material had been printed on these languages. 28 The phonological system reflected in the Central languages will probably turn out to be ALMOST adequate to account for the systems of all the Algonkian languages, and not just the Central ones. However, some minor changes will almost certainly have to be made. 24 It is interesting to observe that, for reasons extraneous to the present discussion, the work on Indo20

120

MARY R. HAAS

further advantage of enabling him to write what was essentially an outline descriptive grammar of the protolanguage. He had also begun work on a comparative dictionary of his four selected Algonkian languages, and to this end had assembled extensive slip files on each of them. He did not live to complete the task himself but progress toward achieving this goal is now being made by a younger scholar. 25 1.7 It would be a great satisfaction to be able to say that the work of Bloomfield on Algonkian, together with that of Sapir on Uto-Aztecan, 26 Athapaskan, 27 and other linguistic families, not to speak of the vast amount of work currently being done on the reconstruction and classification of unwritten languages, has succeeded in allaying the doubts of all Indo-Europeanists about the type of results that can be achieved without the aid of documentary materials from earlier periods when these are by definition unobtainable. But in a recent textbook on historical linguistics we find that the old prejudices, though slightly modified perhaps, have not been entirely banished. The following quotation, for example, has a familiar ring: 28 Genealogical classification was admirably suited to determine the interrelationships of languages such as the Indo-European for which we have many records from several millennia. For languages attested only today we may be limited to classification based on typology, (p. 49) Typological classifications are of value in their own right, and can, needless to say, be applied to anciently recorded languages as well as to 'languages attested only today'. To imply that genealogical classification is possible only for linguistic families having written records of varying chronology while typological classification belongs to contemporary languages is to sell both types of classification short. Languages are languages, whether written or unwritten, living or dead, and whatever type of classification can be applied to one can also be applied to any other. The best answer, I think, is to paraphrase a statement of Sapir's 29 about the discovery of phonetic laws in unwritten languages: If these laws are more difficult to discover in primitive [unwritten] languages, this is not due to any special characteristic which these languages possess but merely to the inadequate technique of some who have tried to study them. (p. 74) European led to the reconstruction of roots. This led at least one famous American linguist of the nineteenth century, William Dwight Whitney, to assume that the protolanguage had no inflection ; see Language and the study of language6 (New York, 1875), especially pp. 357 and 279. 25 C. F. Hockett, who has been given charge of all of Bloomfield's Algonkian materials by Bloomfield's literary executor, published the first installment of such a dictionary in 1957; see "Central Algonquian vocabulary: stems in /k-/", IJAL 23.247-68 (1957). 26 Edward Sapir, "Southern Paiute and Nahuatl, a study in Uto-Aztekan", Pt. I and Pt. II, Journal de la Société des Américanistes de Paris, n.s. 10.379-425 (1913) and 11.443-88 (1914). 27 A discussion of some of his results in comparing the Athapaskan languages is included by Sapir in "The concept of phonetic law as tested in primitive languages by Leonard Bloomfield" 297-306 in Methods in social science-, a case book, ed. Stuart A. Rice (Chicago, 1931); reprinted 73-82 in Selected writings of Edward Sapir, ed. David G. Mandelbaum (Berkeley-Los Angeles, 1949). 28 Winfred P. Lehmann, Historical linguistics (New York, 1962). 29 "The concept of phonetic law ..." 74 (of reprint).

HISTORICAL LINGUISTICS A N D GENETIC RELATIONSHIP

121

In the same way, if .genealogical classification is more difficult of achievement in unwritten languages, this is again due to the 'inadequate technique of some who have tried to study them' [emphasis mine]. Indeed we might better paraphrase Bloomfield's famous footnote, quoted earlier, and say that the possibility of both genealogical and typological classification 'is either a universal trait of human speech or nothing at all, an error' [emphasis mine]. There is also still current an even more flagrant misunderstanding of the nature of unwritten languages. Although the error has been refuted innumerable times in the literature, there are still some who believe that unwritten languages change with a rapidity that soon renders reconstruction so tenuous as to be meaningless. A recently expressed version 30 of this view is seen in the following fantastic statement: In some linguistic families, notably Amerindian and African, prehistory is but a few decades distant. Any thrust into the past will involve the linguist in reconstruction... By the time the Amerindian or African linguist has reached, speaking in terms of the genealogical tree ..., the third or fourth generation, which perhaps carries him backward no farther than a century [!], he faces a proto-language of his own making that has an exceedingly small degree of verisimilitude... (p. 32) [Emphasis mine.] Indeed one might almost say, would that it WERE true! For if among American Indian and African linguistic families, prehistory were 'but a few decades distant', comparative linguists would have a field day. They could take a large 'live' sample every ten years and thus have an actual check on linguistic change that would bid fair to equal the short-lived fruit fly in studying genetic change in biology. Unfortunately for the prospects of any such check, languages change in much the same ways the world over, and writing per se neither retards nor accelerates the change. 31 When sister languages, 30

Ernst Pulgram, "The nature and use of proto-languages", Lingua 10.18-37 (1961). This is not to be taken to imply that change in language is never retarded or accelerated; rather it is claimed that (1) writing in itself is NOT NECESSARILY accompanied by significant retardation and, conversely, (2) lack of writing in itself is NOT NECESSARILY accompanied by great acceleration. The study of the ifossible retardation of replacement of items in the lexicon (especially the so-called 'basic list') is receiving some attention from persons interested in glottochronology (e.g., A. Richard Diebold, Jr., "A control case for glottochronology", American Anthropologist, n.s., 66.987-1006 [1964]). An important paper describing one type of condition that could accompany conservatism in lexical replacement is Charles A. Ferguson's "Diglossia", Word 5.325-40 (1959) (reprinted in Language in culture and society, ed. Dell Hymes 429-39, New York, Evanston, and! London, 1964). He discusses the not uncommon situation in which a superposed or 'high' (H) variety of a language is used, especially by adults, in addition to the regional or 'low' (L) variety. He limits discussion of the problem, however, only to instances in which the 'high' variety also has 'a sizeable body of written literature' (330). In the event that this written literature in H is an older variety of L, the borrowing of lexemes from H into L is bound to show up as a seeming retardation in lexical replacement. Even here, however, I do not believe that writing is a NECESSARY condition for such a situation. A highly venerated oral literature which is passed from generation to generation by memorization provides an entirely comparable situation. Similarly a high form of speech used by privileged persons can exist side by side with a 'lower' form of speech spoken by common persons. For the Natchez Indians of Mississippi it has been reported that 'the speech of the Nobles differed from that of the lower orders'; John R. Swanton, Indian tribes of the lower Mississippi Valley ... 182 ( = Bulletin 43, Bureau of American Ethnology) (Washington, 1911). Swanton's information is taken from Le Page du Pratz, Histoire 31

122

MARY R. HAAS

both written and unwritten, are seen again and again to have diverged in phonology, morphology, and lexicology in remarkably similar ways, then we can be sure that the lapse of time needed to accomplish this has been comparable too. No, among unwritten as well as written languages PREHISTORY IS WRITTEN IN MILLENNIA, not decades. We have already seen in section 1.4 that the Algonkian languages (whose divergence is comparable to that of the Romance or Germanic languages) have been written down since the early 1600's, i.e. starting over three and a half centuries ago. If Pulgram's thesis had a grain of truth, then Eliot's Natick Bible of 1663, now three centuries old, would be written in a language far more archaic than Proto-Algonkian itself; indeed it would be a kind of 'Hittite' of Algonkian. Instead Natick is as close to Penobscot and other nearby Algonkian languages as Swedish is to Danish, and ProtoAlgonkian cannot by any manner of means be reckoned as any less ancient than Proto-Germanic. In innumerable instances where American Indian linguists have checked words in early vocabularies (100-300 or more years old) with the same words in languages still spoken — Natick is no longer spoken — they have discovered no appreciable change whatsoever. In other instances, minor sound changes appear to have taken place, but in no instance are these more drastic than those known to have taken place in European (written) languages in a comparable period of time. One very interesting example of such minor sound changes has been called to my attention by my student, Allan Taylor, who collected a vocabulary of Atsina (a Plains Algonkian language varying only dialectically from Arapaho) and compared this with a vocabulary of the same language (same dialect) written down nearly 200 years ago, in 1790 to be exact. The first four numerals are sufficient to illustrate the nature of these changes. Moreover, we also know the reconstructions of the Proto-Algonkian forms of these words and can thus compare both varieties ofAtsina with these much older forms :32 de la Louisiane, 3 vols. (Paris, 1758) and he adds that "Du Pratzsays 'this difference in language exists only in what concerns the persons of the Suns and Nobles in distinction from the people"'. [Emphasis mine.] The decimation of the tribe at the hands of the French in the early eighteenth century brought on the breakdown of most of the old culture and it was impossible to confirm this in the 1930s when I worked with the last fluent speaker (now deceased). There are, however, a few ideas still expressed by two distinct terms and it is my guess that these may be all that is left of this example of preliterate diglossia in North America. Preliterate diglossia can also exist when the speech of men differs from that of women, since both sexes usually know both types of speech in order that male children can be taught by the mother as well as the father and that men as well as women can speak the proper forms when imitating female characters in telling myths; see, for example, my paper "Men's and women's speech in Koasati", Lg. 20.142-9 (1944) (reprinted in Language in culture and society, ed. Dell Hymes 228-233, New York, Evanston and London, 1964). 32 The Atsina forms quoted are taken from a comparative study of Arapaho and Atsina presented by Allan R. Taylor at the Conference on Algonquian Linguistics held at the National Museum of Canada, August 24-28, 1964. The bracketed interpretation of the 1790 Atsina forms is my own. — The 1790 forms of Atsina are taken from Edward Umfreville, The presera state of the Hudson's Bay 202 (London, 1890). The Proto-Algonkian forms are taken from Bloomfield's "Algonquian" 116-117 with the omission of the numeral suffix *-wi and with the following changes in symbols: 7 replaces q and vowel length (e.g. /•) replaces double vowels (e.g. ii). In many languages, e.g. Fox, Menominee, Shawnee, Miami, Delaware, Powhatan, and Natick, the usual word for 'one' is a descendant of PA *nekot-, but several other languages, e.g. Ojibwa, Abenaki, Passamaquoddy, and Arapaho-Atsina, have words descended from PA *pe-Sik-.

HISTORICAL LINGUISTICS AND GENETIC RELATIONSHIP

one two three four

Atsina (1960) 5es0iy nii9 nesG yeen

Atsina (1790) kar-ci [keesay] neece [niis] nar-ce [nees] ne-an [ni(y)£EN]

123

Proto-Algon. *pe-sik*ni-s*ne?9*nye-w-

Early Atsina k > modern c (before front vowels), early Ats. s > modern 0, and the initial syllable of 'four' has dropped. But the changes that have taken place between PA and early Atsina are far more drastic: PA *p > early Ats. k everywhere (modern k before back vowels, c before front vowels); PA *s > early Ats. s ( > 6); PA *-?0- > early Ats. s ( > 6); PA *w > n (in 'four' and many other words).

2. PROTOLANGUAGES AND PROBLEMS OF RECONSTRUCTION

2.1 What is a protolanguage? The answer, quite simply, is that any language is an actual or potential protolanguage. Two or three thousand years hence — barring catastrophic changes — a variety of languages stemming from English will be spoken in wide areas of the globe, areas more or less delineating those in which English is now spoken as a first language. The same can be said, with appropriate modifications in regard to size of area, of Spanish, Portuguese, Russian, and Chinese, not to speak of Hindi, Tamil, Kikuyu, and a host of others. Given the lapse of a sufficient amount of time, this means (1) that the descendants of these languages, if still spoken, will have diverged enough to be compared and used in the reconstruction of their respective parent languages, and (2) that the result of such reconstruction will provide a body of material that is recognizably like the English, Spanish, Portuguese, etc. spoken today. It will not, however, be identical with what appears in written records. Linguistic change takes place in the SPOKEN language and the written language always lags far behind in recording this change, in large part, of course, to retain the tremendous advantages of the fiction of cohesiveness in the linguistic community as long as possible, in spite of ever-increasing differences in the spoken language. The word 'mayor' is spelled m-a-y-o-r from London to Vancouver and from Atlanta to Brisbane, even though the actual pronunciation of the word may vary so widely as to be unrecognizable out of context (or possibly even in context) if persons of different areas chance to meet. Furthermore, dozens of words and turns of expression having only local provenience may never chance to be recorded anywhere but will persist in their own areas completely uninfluenced by the fact that they did not find a place in the written language of today. Still more important, there are many things which will inevitably be lost in all the daughter languages and thus be unrecoverable by the comparative method. These are the reasons why Proto-Romance is not in all details identical with recorded Latin. But this fact, far from being an

124

MARY R. HAAS

indictment of the comparative method is an elegant example of its tremendous power. Every protolanguage was in the same way once a real language, whether or not we are fortunate enough to have written records of it. Furthermore, even when we do have written records, we find that what we are able to reconstruct of a given protolanguage always falls short of giving us the full picture of the real language it stands for. But written records fall short, too, as we have seen in the case of local pronunciation variations, lexical items, and turns of expression, and reconstruction methods can and do, in fact, give us information about parent languages not to be found in written records. We are of course twice blessed when we have both, as in the case of ProtoRomance and Latin. When we have only the reconstructed protolanguage, however, we still have a glorious artifact, one which is far more precious than anything an archeologist can ever hope to unearth. A protolanguage, then, is reconstructed out of the evidence that is acquired by the careful comparison of the daughter languages and, in the beginning of the work, what is reconstructed reflects what can be discovered by working backwards in those cases where all or most of the daughter languages point to the same conclusion. This provides the initial framework. Once this is established, the principle of analogy can be drawn upon, and by its use instances in which there are aberrations, statistically speaking, can often also be plausibly accounted for. Deductive as well as inductive hypotheses must be constructed and checked. Then when all the comparisons that can reasonably be made have been made, and when all the reconstructions that can reasonably be made have been made, the result is a PROTOTYPICAL MODEL OF THE DAUGHTER LANGUAGES, or, what we normally call a protolanguage. If we turn the whole thing round and look at it from the other direction we see that the daughter languages are not only different from each other but also from the protolanguage. We describe this differentiation by calling it 'linguistic change'. In phonology, linguistic change normally shows such regularity that it is possible to formulate what the nineteenth century linguists proudly called 'phonetic laws' on the analogy of what their fellow natural scientists were with equal pride referring to as 'laws of nature', 33 even though, as Sapir once remarked, "... phonetic laws are by no means comparable to the laws of physics or chemistry or any other of the natural sciences. They are merely general statements of a series of changes characteristic of a given language at a particular time."34 The most impressive characteristic of phonetic laws, or statements of phonetic correspondences, is their power to predict.35 Other types of 83 Present-day scientific philosophers are also quick to point out the imprecision, though not necessarily the uselessness, of the term 'law of nature'. Thus Ernest Nagel, in The structure of science (New York-Burlingame, 1960), says: "The label 'law of nature' (or similar labels such as 'scientific law', 'natural law', or simply 'law') is not a technical term defined in any empirical science..." (49). [Emphasis mine.] 31 "The concept of phonetic law ..." 73 (of reprint). 36 Sapir stresses this in "The concept of phonetic law ..." when he says that Bloomfield's "setting-up of phonetic law No. 6 was, by implication, a theoretically possible prediction of a distinct and discoverable phonetic pattern. The prediction was based essentially on the assumption of the regularity of sound change in language" 78 (of reprint). [Emphasis mine.]

125

HISTORICAL LINGUISTICS AND GENETIC RELATIONSHIP

linguistic change do not operate with the same kind of predictable regularity; perhaps it would be better to say that we have not yet arrived at the point of being able to make statements about other types of linguistic change in such a way as to reveal such a power. Change can occur in inflection and in other parts of morphology. It can occur in meaning and in vocabulary. But since it has not yet become possible to make predictable statements about any of these kinds of changes, the verifiable regularity of sound correspondences is seen to be even more precious than it may have seemed at first. In the sections which follow problems of both phonological and morphological reconstruction are discussed, with examples taken largely from the Muskogean family of languages. 2.20 The Muskogean family of languages36 formerly fluorished in the southeastern part of what is now the United States. There are four distinct languages still extant, so by accident we have a neat workable set of languages without being constrained to choose among them. They are Choctaw (Ch), Koasati (K), Hitchiti (H), and Creek (C). As an aid to the understanding of the illustrative sound changes, Table I TABLE I

Combined chart of the consonants and vowels of four Muskogean Bilabial Voiceless Stops

P

Dental

Alveolar

Palatal

languages

Velar

P

t

t

c

c

k

k

P

P

t

t

c

c

k

k

Voiced Stops

b

b

Voiceless Spirants

f

f

1

I

f

f

1

1

Voiced Nasal Continuants

m

m

n

n

m

m

n

n

Voiced Nonnasal Continuants

w

w

1

1

w

w

1

1

Vowels

Faucal

b

u

u

u

u

s

—

s

s

h

h

s

s

h

h

y y

y y

i

i

a

a

a

a

36 See Mary R. Haas, "The classification of the Muskogean languages", Language, culture, and personality, eds. Leslie Spier, A. Irving Hallowell, and Stanley S. Newman 41-56 (Menasha, Wisconsin, 1941). The paper sets forth most of the sound correspondences with examples, but full reconstructions are seldom given. Other extant 'languages' of the Muskogean family are but dialect variants of these, viz. Chickasaw (almost identical with Choctaw, but spoken by a separate political body), Alabama (very close to Koasati but spoken by a separate political body), Mikasuki (very close to Hitchiti but spoken by a separate group), and Seminole (almost identical with Creek but spoken by the descendants of those who fled to the Everglades during the Indian wars).

126

MARY R. HAAS

shows the combined consonant and vowel charts 37 of the four languages arranged in quadrant form (in each box) for convenience, viz. K

Ch

H C In Table II 3 8 several sets of items having the same gloss are taken from these four TABLE II

Sample sets of cognates in four Muskogean 1. sun Ch K H C

haäi hasi ha-s(i) hàsi

PM •hasi

Ch K H C

languages

2. sleep

3. arrow

4. night

5. day

nusi nuci nu-c(i-ki) nuc(ita)

naki laki (in)iak(i) li-

ninak nila(hasi) 'moon' ni-lak(i) nili-

nittak nihta nihtak(i) nittà-

*nuci

•Naki

•niNaki

*nihtaka

6. mulberry bihi bihi(cuba) 'fig' bi[h]- (Sn) ki•k w ihi

7. fish

8. squirrel

9. two

10. go through

11. snake

12. wide

nani lahi lal(i) tdlu

fani iplu hil(i) ihi

tuklu tuklu tukl(an) hukkul(ita)

lupul(li) luput(li)

sinti cintu cint(i) cittu

patha patha

•cinti/u

•patha

PM *NaNi/u

•ix w aNi/u

•hutukulu

luput(t-) •hipu(-)t-

tdph(i-)

Muskogean languages and the illustrations used in the sections which follow are based o n this material. 2.21 Sound change is often characterized as being of two types, (1) regular and (2) sporadic, and the first is sometimes further described as 'gradual' and the second as 'sudden'. Since the second type is usually said to include such phenomena as assimilation and dissimilation as well as metathesis, it is clear that the regular vs. sporadic dichotomy cannot be fitted exactly with the gradual vs. sudden one, for some " A dash is used in Table I when a given language lacks a certain sound. Choctaw is the only one of the four languages which distinguishes an alveolar and a palatal sibilant. The other languages have a single sibilant which may range from alveolar to palatal but is frequently more palatal and is therefore placed in that column. Vowel length, symbolized by a raised dot (•) configurates like a voiced nonnasal continuant in all the languages and for that reason is placed in that row on the chart. 38 The following conventions are used in Table II. A linguistic item placed in parentheses is a separate morph, not entering into the comparison, which often or usually co-occurs with the morph being compared. In Koasati nila- is not the usual word for 'night' but occurs only in certain special combinations, e.g. with hasi 'sun' in the word quoted; cuba 'big' is combined with bihi-, not recorded separately, in the quoted word for 'fig'. In Hitchiti -i is a suffix used with all nouns; -i-ki is the infinitive marker for verbs; in- in the quoted word for 'arrow' is the third person possessive marker in alienable possession; and -an in 'two' is a numeral suffix. In Creek -ita is the infinitive marker for verbs. An item placed in square brackets is inferred or 'reconstituted'; thus bi[h]- occurs only in bi hasi 'mulberry month', taken from ms. materials of John R. Swanton (Sn), and it is assumed that the single h actually should be two: bi[h]hasi.

HISTORICAL LINGUISTICS AND GENETIC RELATIONSHIP

127

types of assimilation may very well take place gradually while metathesis cannot occur in any fashion other than suddenly. It seems to me that it might be more revealing to show phonological change on two axes, the syntagmatic (horizontal) and the paradigmatic (vertical). According to this model, assimilation, dissimilation, and metathesis are arranged on the syntagmatic axis while so-called vowel and consonant 'shifts' or correspondences are placed on the paradigmatic axis. Illustrations of Muskogean sound correspondences, i.e. the paradigmatic axis, are taken up first. The items for 'fish', no. 7 (Table II), comprise a perfect set of cognates. The sound correspondences are as follows: (1) Ch n : K, H, C I. See also 'arrow', no. 3; 'night', no. 4 (second cons.); and 'squirrel', no. 8. The symbol *N has been chosen as the reconstruction for the n : I correspondence. The symbol *n is needed when all the languages show n, e.g. 'sleep', no. 2; 'night', no. 4 (first cons.). Similarly, *l is needed when all have /, e.g. 'go through', no. 10. (2) Ch, K, and C short vowel in initial open syllable : H long vowel. See also 'sun', no. 1; 'sleep', no. 2; and 'night', no. 4. This is reconstructed as a short vowel. (3) Ch final i : K, H, C final u. See also 'squirrel', no. 8; 'snake', no. 11. This correspondence is found only in final syllables and is symbolized in reconstruction as *i/u. Some final syllables show i in all languages and for these *i is reconstructed; see 'sun', no. 1. Similarly for *«; see, in part, 'two', no. 9, where Ch and K both have u. In terms of these correspondences and the symbols chosen to represent them, the full PM reconstruction for 'fish' is *NaNi/u. Other sets of cognates shown in Table II illustrate still other regularities in correspondences. Choctaw has two sibilant spirants, s and s, where the other principal languages have only s (ranging from [s] to [s] but often closer to [§]; see Table I). The usual correspondences for these two Choctaw sounds are: Ch s : K, H, C s; see 'sun', no. 1. Ch s : K, H, C c (affricate); see 'sleep', no. 2; 'snake', no. 11. 2.22 Turning now to the sound changes that are best shown on the syntagmatic axis, we can illustrate assimilation, dissimilation and metathesis. Assimilation is shown horizontally by the use of a straight arrow; it is directed to the right (->) for assimilation to what follows and to the left (*-) for assimilation to what precedes. Dissimilation is shown by a bar; it is placed on the right (—I) or on the left (|—) in accordance with the same principle. Finally, metathesis is shown by a looped arrow, to the right (->) or to the left (-', and V . The members of the set are the (unknown) denotations of the three symbols. To fall into the second error is to assume that the denotations of the symbols are known, and that those denotations are exactly the lower-case italic letters V , '£>', and V : that is, that the symbols denote themselves. This is obviously one of the possibilities, but there is no reason to assume that it is the correct one. There is still a third misinterpretation, rather more subtle than the two just described. This is to think that the symbols stand for 'imaginary mathematical objects' or something of this sort, and that if we happen to be using the mathematical symbolism to deal with the real physical world this is accomplished by pairing off these 'imaginary' objects with features of the real world. The linguist will recognize this as much like the traditional lay assumption that words stand for mental 'ideas' or 'concepts', which in turn are related to the real (physical) world. Now, of course, human brains may be so structured that there actually are things, events, or states inside our heads that can be the denotations of the words and symbols we manipulate in public view. If so, then they are parts of the real physical world, not of some other and more ethereal realm. This is a problem in psychology and physiology, not in mathematics or linguistics. Nothing is gained for mathematics by assuming that such entities exist; indeed, such an assumption is a flagrant violation of the principle cf economy (Occam's razor). If such internal states indeed exist, then they become possible denotations of our mathematical symbols; but, just as before, we do not care. More general than a notation like '{a, b, c}' is one such as '{xj, x2, ..., x m }'. Here, to start with, m must be interpreted as some nonnegative integer: that is, m may be 0, or 1, or 2, and so on. We use'm' because we do not care just what nonnegative integer is chosen. If we cared, we could specify a constraint, such as 'for m Sg 4'. Having settled on a value for m, we next let i be some positive integer • The proof is that of Kurt Godel, and dates from 1929. For discussion and extension, see Martin Davis, op. cit. fn. 2.

LANGUAGE, MATHEMATICS, AND LINGUISTICS

161

not greater than m. If we have chosen m = 0, then, of course, there is no possible value for i; otherwise there is. The notation tells us that, for any choice of a value for i allowed by the restrictions just described, xt is a member of the set. Beyond this specification, we may not have the vaguest idea what 'xt denotes, and may not care. If we were to choose m = 4, then we could expand the condensed notation given above into x2, x 3 , x 4 }'. If we let m = 3000, then it would take a lot of time and paper to write out the complete notation for the set of 3000 elements. Nothing would be gained, since a completely unambiguous abbreviation is available: '{.*!, x2,..., x 3000 }'. The nonterminal three dots in this notation mark the fact that it is an abbreviation. They are an etcetera symbol: they constitute an instruction to the reader that he can go on inventing names by the pattern that has been set, until he reaches the one overtly given after the three dots, and a guarantee to the reader that every name he invents in this way will be a name for an element of the set. With this instruction and guarantee, there is, of course, no reason why the reader should take the time and trouble actually to invent all the names, since the set and its elements can be talked about just as securely without doing so. We have here an example of the fundamental principle of mathematicizing, which can be expressed as follows: If you know exactly how to, you don't have to. It is this principle that differentiates between mathematics and mere computation. In computing, the goal is the answer. In mathematics, the goal is to demonstrate that the answer can be obtained — or that it cannot. In the somewhat more abstract notation x2, ..., xmy with which we started, the meaning of the three dots is slightly different, since the reader cannot know when he must stop inventing names for elements until he knows what value is assigned to m. Indeed, if it turns out that m = 2, then no new names may be invented and the symbols 'x 2 and 'xm in the notation refer to the same element; if m = 1, one of the specific names given in the notation is disallowed; and, as already indicated, if m = 0, then no names are acceptable because there are no elements in the set. We shall use a somewhat more compact notation for listing the elements of a set, though it is not standard. I assert that a notation of the type displayed on the left is to mean the same thing as one of the type displayed on the right: m

X2,

...,

Xm).

Thus, the members of the set {Z3}4 are Z l 5 Z 2 , Z 3 , and Z 4 (whatever they may be). Exactly the same convention will be used for an ordered set (a sequence): m

( j i i , X2,

Xm).

Now, what sort of set would be denoted by '{x^ x2, ...}' (or, in the more condensed equivalent, ' {.*«}')? The etcetera symbol, which is here terminal in the notation, tells us to go on inventing names by the pattern established by those given, but there is nothing to tell us when to stop. That is just the point. We may go on as long as we like, and every name we invent will be legal. Whenever we choose to stop, there will remain elements in the set for which we have not yet provided names. Yet every element of the set does have a name, of the form 'xi where i is some positive integer

162

CHARLES F. HOCKETT

For a set of this sort, the etcetera symbol is not just a convenience, but a necessity. However awkward or time-consuming, it would be theoretically possible to expand the notation '{jcx, x2, ..., xm}' into full form for m = 3000 or even for m — 3,000,000,000. But there is in principle no unabbreviated notation equivalent to '{xj, x2, ...}'. We shall see (§ 1.7) that sets of this sort are called infinite. Whether infinite sets actually exist in the universe is an unsolved problem of physics. 7 But in mathematics we can speak quite consistently about such sets whether they exist or not, through the careful manipulation of terminal etcetera symbols; indeed, the mathematical discussion of infinity consists exactly of such manipulation. To remember this is to avoid the confusion and mysticism about the 'notion' of infinity into which many mathematical laymen (and some philosophers of mathematics) have fallen. 8 1.4. Variables and Domains Suppose that we neither know nor care anything about a particular element except that it is a member of a particular set. The symbol used for the element in this case is a variable, and the set is its domain. Let x be such that x e {Sapir, Beethoven, Einstein}. Then x was born in Germany, and x was a genius. We say things about every member of a set (whether what we say is true or not) by using a name that refers ambiguously to any member of the set. The symbol 'p' in a phonemic transcription of English is a variable whose domain is a certain set of allophones. When we describe English /p/ as bilabial, voiceless, and a stop, we are ascribing these properties to all allophones of the set. Let E be the class of all even integers, and O the class of all odd integers. Then for any xe E and y e E, xyeE; for any xe E and yeO, xyeE; for any xeO and y e 0, xyeO. If we make such assertions in ordinary words instead of special symbols, we are still using variables and referring to domains. If we say 'The product of any two even integers, or of any even integer and any odd integer, is even, while the product of any two odd integers is odd', the phrase 'any even integer' is a variable whose domain is the class of all even integers, and so on. The linguist notes an obvious kinship here: between 'variable' and any; between 'set' and all or every. There are examples of variables and domains in § 1.3. In '{x^ x2, ..., xm}\ and in the equivalent notation '{x>)'• For ordinary numbers, as we have just seen, sS is a binary relation: 2 sS 3, 2 ^ 2, but ~ ( 3 2). The symbol 'e' of § 1.2 stands for a binary relation; in this particular case we adopted the convention of writing 'x e y" instead of l ~ ( x e y ) ' , though it would not really matter. The symbols ' c ' , ' 2 ' , '' of§ 1.5 represent binary relations: if A and B are sets, then either A £ B or ~ (A s B), and so on. An equally general notation for a binary operation is 'xOy = z': this means that if the ordered pair of elements (x, y) is subjected to the operation O, the result is the element z. If a relation can be vaguely compared to a finite verb, then an operation is rather like a preposition, prepositional phrase, or gerund: '.vOj' is not a sentence, but a subject requiring a verb ( ' = ' ) and a predicate complement ('z') to complete it.

LANGUAGE, MATHEMATICS, AND LINGUISTICS

169

In everyday arithmetic, addition and multiplication are operations (basically binary, though by extension n-ary for n ^ 2 because of a property we shall point out later): if x and y are numbers, then x+y and xxy are also numbers. The symbols ' n ' and ' u ' of § 1.5 represent binary operations. That is, if A and B are any two sets, then A n B is also a set (possibly A), as is A u B. Neither a relation nor an operation need be binary. A relation can be n-ary for any value of n greater than 1; an operation can be n-ary for any value of n greater than 0. But if fewer or more than two elements are involved, notation of the form 'xRy' and 'xQy — z' is obviously impossible. Instead, we use functional notation, which, it will be noticed, has already been slyly slipped in: the general n-ary relation can be symbolized as R(xu x2, ...,

Xn)

(n Sg 2)

and the general n-ary operation by 0(*i,

x2, ..., xn) = y

(n ^ 1).

For example, let R be the ternary relation of betweenness: then ^(window, chair, door) just in case the chair is between the window and the door. Let O be the singulary operation (on numbers) of multiplying by — 1: then, for any number x, 0(x) = — x, and x + O W = 0. Or let O be the ternary operation on numbers defined as follows: 0 ( x , y, z) = xxyz. Then, for example, 0(1» 2, 3) = 8; 0(2, 1, 3) = 2; 0(4, 3, 2) = 36. Operations, relations, and sets are close kin. To show this, let us first note that an n-ary operation can alternatively be regarded as an (n+l)-ary relation. The general symbolization for a binary operation presented above involves three variable names for elements: V , a n d 'z'. Now suppose we have a particular binary operation OWe can define a ternary relation Rr, by saying that this relation holds for a particular ordered triple of elements (x, y, z) just in case xQy = z. Suppose the operation is ordinary arithmetical addition. Then, since 2 + 5 = 7, we assert that R+(2, 5, 7); similarly, R+(5, 2, 7), R+(3, 81, 84); but ~R+(2, 5, 8), since 2 + 5 # 8. In a sense, all we have here is a change of notation; but that is just the point. Whether we speak of an 'operation' or of a 'relation' depends on notation and on attitude, rather than on the abstract mathematical nature of what we are dealing with. In general, given an n-ary operation OC*i, x2, ..., xn) = y, we can define an equivalent (n+l)-ary relation -RoC^, x2, ..., xn, y). Next, we note that an n-ary relation can be viewed as a set whose members are ordered n-ads of elements rather than single elements. A binary relation, in this view, is a set of ordered pairs of elements. Let R be the relation 'less than or equal to' for numbers. This relation holds for the ordered pairs (1,3), (2,3), (2,2), (1,2), (99,3000), and so on, but not, say, for (2,1). Or, since R is a class, we can say the same thing with class-membership notation: (1,3) e R, (2,3) e R, ..., (2,1) e R. We can say that a particular ordered n-ad belongs to or is a member of a particular

170

CHARLES F. HOCKETT

relation (or operation) just as we say that an element belongs to a set. A function (or, indeed, any association) can always be reinterpreted as an operation and hence, indirectly, as a set. Suppose we consider the function of D4 (or M4) in § 1.6. We have / ( a j = blyf(a2) = b2, f(a3) = b2, / f a ) = ¿>4, and / f a ) = b3. Not only is / a function; it is also, with no change of notation, a singulary operation. Hence, by the procedure described just above, we can reinterpret it as a set of ordered pairs: fa, e f , fa, b2) e f , fa, b2) e f , fa, ¿ 4 ) e f , and fa, b3) e f . By an argument that is approximately the reverse of the first part of this, one can show that an n-ary operation can alternatively be viewed as an association on n variables which may, under certain circumstances, be a function of n variables. Thus, clearly, instead of writing '3 + 2 = 5' and the like, we could use functional notation and write ' + ( 3 , 2) = 5'. It would now seem that our definition of 'mathematical system', given at the beginning of § 1.8, is more complicated than necessary: instead of referring to one or more classes together with one or more relations or operations, we need only refer to one or more classes. But this is not quite true. There is one relation that resists the reduction. The relation in question is that denoted by 'e': the relation that holds between an element and a class to which the element belongs. Surely, we could rewrite 'a e A' in functional notation as ' e f a A)'. Either notation means the same thing. But now try to take the next step. Note the parallelism: x ^ y same as x, j ) same as {x, y) e ^ a e A same as e f a A) same as (a, A) e e. In trying to eliminate the relation e we find that we must use that very relation. Hence the elimination is impossible. The most we could say, then, is that any mathematical system is definable in terms of one or more sets of elements, the relation e, and (for reasons spelled out in § 1.1) the notion of order. This is logically so; in practice, however, it is much more convenient, and more stimulating to the mathematical imagination, to use relations, operations, functions, and so on, reducing them to appropriate classes of ordered «-ads only under special circumstances. I wish now to add a point on which perhaps very few mathematicians would agree; obviously, therefore, it should not be taken too seriously. To me, a mathematical system in which the primary emphasis is on relations feels like a geometry, whereas when the major emphasis is on operations it is, instead, an algebra. Formally, this difference clearly amounts to no difference at all. But there is such a thing as the 'psychology' of mathematics (though I am not sure exactly what it is), and unless the difference between geometry and algebra resides here it ceases to have any reality at all. And mathematicians persistently continue to use both of these words, in ways that seem to fit my impressionistic definition. 12 " A more elegant approach, which I believe is approximately equivalent, is to say that geometry (as over against algebra) deals with spaces, and to define a space as a set furnished (at least) with a

LANGUAGE, MATHEMATICS, A N D LINGUISTICS

171

1.9. Properties of Binary Relations and Operations Relations and operations can be classed in terms of properties of a quite abstract sort. A binary relation R is reflexive if, for any element x, xRx. The relation is symmetric if, for any x and y such that xRy, then also yRx. It is transitive if, for any x, y, and z, xRy and yRz imply xRz. A relation that has all three of the properties just defined is an equivalence relation. Let A be a set over which an equivalence relation s is defined. Then A consists of pairwise disjunct (§1.5) subclasses {Bi}, such that x and y belong to the same subclass Bi if and only if xsy. The subclasses {Bi} are called equivalence classes. For example, let A be the set of all ordered pairs (m, n) of positive integers, and let (mu nx) = (m2, n2) just if mx + «i = + Then one of the equivalence classes, say Bs, contains the ordered pairs (1, 5), (2, 4), (3, 3), (4, 2), and (5, 1), and no others; nor does any of these belong to any other equivalence class. This is so because 1 + 5 = 2 + 4 = 3 + 3 = 4 + 2 = 5 + 1 = 6, and there are no other ordered pairs of positive integers whose sum is 6. Or let P be the set of all people in a certain village, and let xRy just if x and y are children of exactly the same two parents. Each equivalence class is then a set of full siblings. N o equivalence class is empty, but it can be a unit class if a particular individual has no full brothers or sisters. A binary relation R is irreflexive if there is no x such that xRx. The relation 'is brother o f is irreflexive. The relation is nonreflexive if it is neither reflexive nor irreflexive. The relation 'is best friend o f ' is nonreflexive if we think that some people are their own best friends and some are not. A relation R is unsymmetric if, for any pair of elements x and y for which xRy, it is necessarily the case that ~(yRx). 'Is father o f ' is unsymmetric. A relation R is antisymmetric if xRy and yRx can both be true just when x = y. The relation 'is less than or equal to', for numbers, is antisymmetric. A relation which is not symmetric, not unsymmetric, and not antisymmetric is nonsymmetric. 'Is best friend o f ' seems to be nonsymmetric as well as nonreflexive. A relation R is intransitive if, for any three elements x, y, and z such that xRy and yRz, it is necessarily the case that ~(xRz). The relation 'is father o f ' is intransitive. A relation that is neither transitive nor intransitive is nontransitive. Once again, 'is best friend o f ' seems to be an example. An irreflexive, unsymmetric, and transitive relation is a proper inequality relation. Such a relation holds among the members of any simply ordered set (§ 1.8), although it is not the relation used in the definition of that class of mathematical systems. Let S(K, be any simply ordered set, and define x < y to mean that x i j y but that x # y (or, what amounts to the same thing, define x < y to mean that it is false that y g x). Then < is a proper inequality relation, which either holds or does

topology: see Hu, op. cit. fn. 11, p. 16. This view is the most recent descendant o f the brilliant suggestion of Felix Klein, Erlanger Programm (1872).

172

CHARLES F. HOCKETT

not hold for any pair of elements x,ye K.13 If we did not already know what ordering is, we could use a proper inequality relation to define it — except that we have to know what ordering is before we can define relations in the first place. A reflexive, antisymmetric, and transitive relation is an improper inequality relation. For numbers, 'less than' is a proper inequality relation, while 'less than or equal to, is an improper one. It is a relation of this sort that we used in the definition of a simply ordered set (§ 1.8). The relation e, it is interesting to note, is the exact 'opposite' of an equivalence relation. Whereas an equivalence relation is reflexive, e is by definition irreflexive: for any x whatsoever, xex. Whereas an equivalence relation is symmetric, e is unsymmetric: for any x and y whatsoever, if xsy then yex. Whereas an equivalence relation is transitive, e is intransitive: for any x, y, and z whatsoever, if x e y and yez then xez. Since a binary relation can be of any of three sorts relative to reflexivity, any of four relative to symmetry, and any of three relative to transitivity, it would seem that there should be 3 x 4 x 3 = 36 abstractly different kinds of relations. It is interesting to list all 36 combinations and to look for, or to invent, examples. Some of them turn out to be rather dull mathematically: for example, a nonreflexive, nonsymmetric, and nontransitive relation such as 'is best friend o f ' is dull because it affords one no toehold for drawing any conclusions from any given facts. A few combinations are not possible. For example, if a relation R is symmetric and transitive, it cannot be irreflexive or nonreflexive, but must be reflexive. For, since, the relation is symmetric, from xRy we can infer yRx; but, since the relation is also transitive, from xRy and yRx we can infer xRx, so that it is also reflexive. N o w let us turn to binary operations. A binary operation O is commutative (or Abelian) if, for any x and y, xOy = yOx. Ordinary arithmetical addition and multiplication are commutative. Subtraction, however, is noncommutative, since x—y = y—x only in the special case where x = y. A binary operation O is associative if, for any x, y, and z, (xQy)Oz = xO(yQz). Here the parentheses indicate the order in which the operation is to be applied. Thus, in the left-hand expression we first compute xOy; let us say that xOy = u. Then we compute uOz. That is, ( x O y ) O z = uOz. In the right-hand expression, we first compute yOz\ let us say that yOz = v. Then we compute x O v . That is, * 0 ( . v 0 z ) = x O v . The operation is associative if, for every choice of x, y, and z, uOz = xOv. Ordinary arithmetical addition and multiplication are associative; hence we need not bother to include parentheses to indicate the order in which the operation is applied, and can, if we wish, think of the operations as «-ary rather than merely binary. We can freely write ' 3 x 4 x 3 = 36' (as we did a few paragraphs back), because either ( 3 x 4 ) x 3 or 3 x ( 4 x 3 ) will give the same result. Arithmetical subtraction, on the other hand, is not associative, and parentheses cannot be omitted: (9—4)—3 = 2, but 9—(4—3) = 8. 13

This notation,

y e K\ is shorthand for 'x £ K and y e

K\

LANGUAGE, MATHEMATICS, AND LINGUISTICS

173

A set is closed under a binary operation O if, for every choice of elements jc, y in the set, x O j is also in the set. Arithmetical addition and multiplication of positive integers always yield positive integers: the set of all positive integers is closed under these two operations. But the positive integers are not closed under subtraction (for example, 7—9 is undefined for the positive integers) nor under division (7/9 is similarly undefined). The definition of the closure of a set under a singulary (or n-ary) operation is entirely analogous. 1.10. Isomorphism. Let S be a mathematical system; let K be any set of elements involved in S\ let R be any relation involved in S; and let 0 be any operation involved in S. Then let S' be another mathematical system, with K', R', and 0 ' defined in the same way. For simplicity of notation we shall pretend that the relations and operations are binary, but the conditions we are about to set forth must in fact be met by all n-ary relations and operations, for any n, and must hold for all paired sets K and K' in the two systems. Suppose a one-to-one correspondence can be established between K and K', so that to any element x in K there corresponds a unique element x in K' and conversely. Suppose, furthermore, than under this correspondence: (1) if, in S, xRy, then, in S', x'R'y', and conversely; (2) if, in S, xOy = z, then, in S', x'O'y' = z', and conversely. Under these conditions, systems S and 5" are isomorphic. The relation 'is isomorphic to' is an equivalence relation (§ 1.9): obviously, any system is isomorphic to itself; if 5 is isomorphic to S' then S' is isomorphic to S; and if S is isomorphic to S' and S' to S", then S is isomorphic to S". Thus, isomorphism assigns all possible mathematical systems to equivalence classes. Let S be the simply ordered set of three elements (Paul, John, Luke) of § 1.1, where the ordering principle is priority of birth. Let S" be the simply ordered set (John, Luke, Paul) where the ordering principle is priority of graduation from high school. Here is the one-to-one correspondence under which these two systems are isomorphic: (Paul, John, Luke) I I I (John, Luke, Paul)

Paul John Luke

Paul, John, Luke 0 1 0 0 0 1 1 0 0

Again, let S involve the familiar elements {1, 2, ...}, for which are defined such operations as addition (e.g., 3 + 79 = 82) and multiplication (3 x 79 = 237). Let S' involve the elements {I, II, III, ...}, for which are defined what we will call 'Roman addition' (III & LXXIX = LXXXII) and 'Roman multiplication' (III @ LXXIX = CCXXXVII). The isomorphism is obvious:

174

CHARLES F. HOCKETT

{l.

2,

3,

4,

5,

& @ {I,

II,

III,

IV,

v,

+

X

X I

X

X

X

X

X

Indeed, most people would say that the Arabic numerals and the Roman numerals are just two different sets of symbols for the same meanings. Of course this is true; but how do we know that it is? We know precisely because of the isomorphism. And this is exactly the point whenever an isomorphism is discovered. If two systems are isomorphic, then, abstractly speaking, they are the same system. This might seem to vitiate the notion of isomorphism, but it does not, for the following reason. Two mathematical systems may have been developed from totally different points of departure, with differing symbols and terminology, and perhaps with totally different applications in view. Their similarity may thus be obscured, and a proof of their isomorphism may come as a surprise. In order to demonstrate an isomorphism, it is always necessary to ignore something. Whatever is ignored is, by definition, being classed as what linguists would call nondistinctive, relative to the given abstract system. Thus, in our second example, we ignore the different appearances of Arabic and Roman numerals, as well as the differing mechanics of construction of composite numerals (the place notation for Arabic numerals, the subtractive versus additive significance of the relative order of an 'I' and a 'V', and the like, for Roman numerals). It would seem that by ignoring different things we might well obtain different and cross-cutting isomorphisms. This is true, but becomes more significant if we turn the statement around and say that a specified isomorphism tells us exactly what to ignore about any particular notation or exemplification that may confront us. That is, isomorphisms define irrelevance, rather than the reverse.14 A formal definition of a mathematical system never does more than to define an indefinitely large class of isomorphic systems, but we then proceed to ignore anything that the definition sets aside as irrelevant. Sometimes a formal definition does not do that much. For example, our postulates for a simply ordered set (§ 1.8) only define a class of systems many pairs of which are not isomorphic. If we add, to the postulates already given, a specification that K is to contain exactly three elements, then any two systems that meet the specifications are isomorphic and, hence, differ only as to irrelevancies. Abstractly, it does not matter whether the system is (Paul, John, Luke) or (John, Luke, Paul) or (1, 2, 3) or x2, x3) or (.v, y, z) or something else — it is still the same system in every respect that, by definition, can interest us. As another example, let us consider Peano's postulates for the natural 14 There is, of course, no metamathematical principle requiring the mathematician to 'identify' (that is, to ignore the differences between) two systems that have been shown to be isomorphic. He is free merely to say that the systems are isomorphic (relative to certain criteria), and to search for inferences from that fact, but also to seek crosscutting isomorphisms for either system obtained by distinguishing differently between the distinctive and the nondistinctive. Isomorphism is more powerful and more general than more naive principles of 'equivalence' or 'identity': isomorphism leaves the mathematician in command.

LANGUAGE, MATHEMATICS, AND LINGUISTICS

175

numbers (or 'positive integers'; for convenience just called 'numbers' in the statements). These postulates make use of a singulary operation s: Postulate N l . 1 is a number. Postulate N2. If n is a number, then s(n) is a unique number called the successor of n. Postulate N3. There is no number n such that s(n) = 1. Postulate N4. If m and n are numbers and if s(m) = s(n), then m — n. Postulate N5. Let Pin) be any statement about a number n. If i ' ( l ) is true, and if the truth of P(s(m)) follows from the truth of P(m) for any number m, then P(n) is true of all numbers. (By 'any statement about a number n' we mean just that: for example, '« is blue', or '« 2 ri, or '« is prime'.) Any set of elements that satisfies these five postulates is just the set of natural numbers. If we find two different sets, they are isomorphic and hence differ at most in nondistinctive ways, such as appearance. Any notation for the natural numbers is a set of numerals, one for each natural number. Two numeral systems are isomorphic just if they satisfy postulates Nl-5. Beyond this, choice of notation is purely a matter of mnemonic and typographical convenience. If Mr. and Mrs. Jones would guarantee to go on having and naming sons forever, we could use {'Paul', 'John', 'Luke', ...} as numerals.

1.11. Recursive and Recursively Enumerable Sets We return, as promised, to the topic of § 1.2, since we now have the background for some rather crucial points having to do with the conditions under which we can proclaim that a set is defined. 15 First let us define an algorithm (in a somewhat, stricter sense than is often given that term). An algorithm is a mechanical and determinate way of computing something. For example, given any positive integer n, there is an algorithm for computing the «th even number. We learn a form of this algorithm appropriate for Arabic numerals in grade school. Suppose n = 3,281,956,481. We all know how to compute 2 n: 3,281,956,481 2 6,563,912,962. The 3,281,956,481st even number, then, is 6,563,912,962. It is easy to imagine an n so large that the application of the algorithm would lie beyond all available resources:

15 For this section see particularly Martin Davis, op. cit. fn. 2; I have also been helped by E. Mark Gold, Language identification in the limit (= The RAND Corporation Memorandum RM-4136-PR) (1964).

176

CHARLES F. HOCKETT

e.g., one so great that, in Arabic notation, with digits the size of those on this page, it would stretch across the galaxy. This does not matter — the algorithm still fits our specifications. (Recall the principle 'If you know exactly how to, you don't have to.') N o w let us consider any denumerably infinite set X. From the definition of 'denumerably infinite', there exists a one-to-one correspondence between the set X and the set {1, 2, ...} of all positive integers. We are therefore entitled to imagine that we have arranged the elements of X in the order implied by this correspondence, and have labelled them according to their position. Having done so, we have an ordered set or sequence (x^ x2, ...), where xt is the particular element of set X that corresponds to the positive integer i under the one-to-one correspondence. Thus, by definition, any denumerably infinite set can be ordered; we can therefore continue our discussion in terms of infinite sequences instead of directly in terms of denumerably infinite sets. A sequence (finite or denumerable) differs from a mere set in that the members of a set must all be different, whereas two terms of a sequence can be occurrences of one and the same element. The set {John, Bill, John} is the same as the set {John, Bill}, since in the former notation we have merely, as though by oversight, named one element twice. But the ordered set (John, Bill, John) is not the same as the ordered set (John, Bill) — there must be some reason for the repetition. (For example, perhaps we are tallying who passes through a certain d o o r : John comes in, then Bill comes in, and then John, having meanwhile climbed out the window, comes in again.) Thus (1, 0, 1, 0, ...), in which the terms are alternately 1 and 0, or (1, 1, 1, ...), in which all terms are 1, is just as good an infinite sequence as is ( 1 , 2 , 3, ...) or ( 1 , 4 , 9, 16, 25, ...). We want to consider only infinite sequences in which all terms are nonnegative integers. There is really no loss of generality in this restriction, because, as is easily shown, any infinite sequence, whatever the nature of its elements, is isomorphic to one in which the elements meet our requirements. We merely need number the elements, perhaps using the number 0 or 1 first, and repeating a number whenever we come to a recurrence of the element to which that number has already been assigned. For example, (John, Bill, John, Bill, ...), in which the terms John and Bill alternate, is isomorphic to (1, 0, 1, 0, ...), with l«-»John and 0Bill. Since the set of all elements that occur as terms in an infinite sequence is, by definition, either finite or denumerable, it is clear that in this process of substitution we will never run out of nonnegative numbers. For our further consideration, then, we have a sequence (plt p2, ...) = (/>«), in which every term is some nonnegative integer. The set of elements that occur as terms in this sequence is clearly the image of some surjective function / ( § 1.6) whose domain is the set of all positive integers: that is, for any positive integer / , / ( 0 = pi. In the first example of this section, we have f(i) = 2i. In the case of the sequence (1, 0, 1, 0, ...), we have

LANGUAGE, MATHEMATICS, AND LINGUISTICS

/(/) = 1 = 0

177

if i is odd, if i is even.

A set {pi} is said to be recursively enumerable if (1) it is the image of some surjective function / whose domain is the set of all positive integers, and (2) there exists an algorithm by which f(i) can be computed for any choice of i. A recursively enumerable set may be either finite or (denumerably) infinite, and we have already had an example of each. The function that generates the sequence (1,0, 1,0, ...) has as its image the finite set {1, 0}; the function /(/) = 2i has as its image the denumerable set {2, 4,...}. An algorithm may be such that, in order to compute f(i) for i > 1, we must first know/(i—1), hence also /(/—2) and so on all the way back to / ( l ) . It is still an algorithm, and the image of the function is still recursively enumerable. Suppose, however, we are supplied with a mechanical procedure for trying to compute f(i) for any positive integer i, but have no guarantee that, for every choice of i, the computation will eventually be completed. For example, imagine that the procedure consists of several steps. The first step gives us, for any i, a number x which is not necessarily an integer. The next step requires us to determine the square root of x. Now there is a mechanical procedure for extracting square roots — one which we all learn in high school and which most of us then forget. But for a great many numbers this mechanical procedure never ends: we can carry it on as far as we wish, but whenever we get tired and stop we still have only an approximation to the square root. If, in the computation o f f ( i ) for a given i, the x we get is of this sort, then we can never move on to the remaining steps of the computation. The whole mechanical procedure for the computation o f f ( i ) is then not an algorithm in our sense, and the image o f / is not a recursively enumerable set. Now, again, consider any denumerably infinite set X whose members are nonnegative integers. Suppose that, when confronted by any positive integer n, there exists an algorithm by which one can decide in a finite amount of time whether n e X or ne X. As before, we may imagine (because of the definition of 'denumerably infinite') that the elements of X are ordered and appropriately labelled, so that from the set X we get the infinite sequence (pu p2, •••)• Then the supposition just stated can be rephrased as that of being able to decide, in a finite amount of time, whether or not there is a positive integer i such that p% = n. If we can, then A1 is a recursive set. Referring back to the discussion of § 1.2, we see that a set is recursive just if (1) its elements are nonnegative integers, (2) it is denumerably infinite, and (3) it is 'defined', in the sense given to that term in the earlier discussion. We remove the first of these three restrictions, however, by asserting that a set is recursive, regardless of the nature of its elements, if it stands in one-to-one correspondence with a set of nonnegative integers which is recursive by the definition given above. Thus, as in the case of recursive enumerability, the restriction to nonnegative integers actually loses no generality.

178

CHARLES F. HOCKETT

Any recursive set is also recursively enumerable. We shall illustrate with the set of all primes, and then generalize. There is a simple algorithm for determining whether a given integer n is prime or not: one attempts to divide it by 2, by 3, and so on, and if it is not divisible by any integer (except 1) up to and including n—1, then it is prime. Hence the set of all primes is recursive. Now suppose that we know that n is the ith prime. To find the (z+ l)st prime, we apply the algorithm just described first to « + 1 , then to n+2, and so on, until we find the next integer after n that is a prime; that integer is the (/ + l)st prime. This mechanical procedure will always work. It will never go on forever without yielding a result, since (as has been known from Euclid's day) there is no largest prime. Hence it is an algorithm, and the set of all primes is recursively enumerable. The generalization is immediate. Let X be a recursive set. Then there is an algorithm by which we can decide whether any given nonnegative integer is in X or not. Using that algorithm, we test in succession the nonnegative integers 0, 1, and so on, until we find the first integer that meets the test. Call that integer px. We proceed to test /7 1 +l,/7 1 +2, and so on until we encounter the next integer that meets the test; that integer is p2. In this way, we can determinate pi for any positive integer i. Since X is denumerably infinite, there can be no largest element in it — our procedure will never go on forever without discovering the next larger element. Thus, X is recursively enumerable. Although, as we have just shown, all recursive sets are recursively enumerable, the reverse is not true: there are recursively enumerable sets that are not recursive. To prove this we merely need exhibit one case. For our example we shall make use of the decimal expansion of n, which can be computed to any desired number of decimal places by a mechanical procedure: to five places, n = 3.14159. It is known that, no matter how far we carry the computation of 7r, the decimal portion will never end and will never repeat itself. Yet we can carry it as far as we want to. Now we shall throw away the digit to the left of the decimal place, and use the digits to the right as the source of the successive integers pi of an infinite sequence (pu p2, •••), as suggested by this display : .14159... Pi= 1 P» = 4 P3 = H P*= 1 />6 = 41 />«= 141

The next four terms, all of which end in the fourth digit of the decimal part of n, are 5, 15, 415, and 1415; the next five are 9, 59, 159, 4159, and 14159. It is clear that the sequence {pi) is infinite, and that pt can be computed for any integer i by the

LANGUAGE, MATHEMATICS, AND LINGUISTICS

179

algorithm we have described, since the decimal expansion of n can be carried out as far as we wish. Some integers will occur more than once: we have already seen Pi— Pi — 1- The sequence cannot contain any largest term, since an integer of n digits is larger than one of m < n digits, and the sequence contains integers of every finite number of digits. Therefore the set P of all nonnegative integers that occur as terms in the sequence (pi) is denumerably infinite. The set P is recursively enumerable, but it is not recursive. Suppose we are given some particular integer, say 3,827,561,422, and are asked whether that integer is in P. All we can do is proceed to compute successive terms of the sequence (pi) and watch for the given number. If it turns up, we know that it is in P. But if, after any given amount of computation, it has not yet turned up, we do not know whether or not it will turn up later. Thus, for an arbitrary integer n, we can sometimes prove that n is in P, but can never prove that it is not in P. Are there denumerably infinite sets of nonnegative integers that are not recursive and not even recursively enumerable? Suppose that from the sequence (pi) just described we generate a sequence (qi) by the following procedure. We consider in turn each term of (pi), and decide whether or not to assign it to (qi) by tossing a coin. If the coin comes up heads, then we let qi—pv, if it comes up tails, then, regardless of the value of pu we set qt = 0. Clearly, a number (other than 0) cannot occur in (qi) unless it occurs in (pi). But since we cannot know, except as it were by accident, that a given number belongs to P, we certainly cannot know whether or not it is in Q, the set of all integers that occur as terms in (qi). So Q is not recursive. Nor is it even recursively enumerable. Two different people (or machines), independently following the instructions for generating the members of a recursively enumerable set, will, after the same amounts of computation, have generated exactly the same members. But the instructions we have given for (qi), though simple to follow, guarantee no such identity of result. The procedure is not determinate, because of the coin-tossing requirement, and hence is not an algorithm in our sense. However, we cannot simply conclude from the foregoing that Q shows the existence of a denumerably infinite set that is not even recursively enumerable. In order to reach that conclusion, we should have to agree that our description of Q defines a set; and that issue is problematic. To be sure, if we follow the instructions and list n terms of the sequence (qi), we have generated a perfectly respectable finite set: the set of all numbers that occur as members of the finite sequence. But the instructions we have given constitute a sort of etcetera symbol, and the problem is whether that etcetera symbol can be viewed as clearly enough defined to count as naming a set. There is disagreement as to the answer. In any case, we see that there are reasonably respectable and manipulable sets (namely, those that are recursively enumerable but not recursive) that are not defined in the sense of § 1.2. The implication is that our definition of 'defined' was not sufficiently precise. For denumerably infinite sets, we must replace the simple notion of 'defined' by the two notions developed in this section. Rather obviously, wherever

180

CHARLES F. HOCKETT

in pure or applied mathematics we encounter what purports to be a denumerably infinite set, it is of importance to determine whether it is recursive or only recursively enumerable, or perhaps neither; if it is neither, then we may not even have a set in the mathematical sense; if it is only recursively enumerable, there are strict limitations on possible mathematical manipulations. The example we shall confront in the sequel is the so-called 'set of all sentences' of some language. Some of the assertions often casually made about this 'set' are not at all obviously true empirically. Yet we cannot put mathematics to work unless we make some precise assumption about the nature of this 'set', even if the assumption goes against some of the empirical evidence and requires a willful suspension of disbelief. 1.12. Model and Exemplification When a particular mathematical system is used to talk about, or to make inferences and computations that bear on, something outside itself, the system is called a model of what it is being used to discuss, and the latter is called an exemplification of the system. Children learn model and exemplification together, and only gradually pull the model out of context so that it can be manipulated independently of any particular exemplification. Thus they start by noting that two oranges and two more oranges makes four oranges, that two pencils and two more pencils makes four pencils, and come to the conclusion that two and two are four whether the exemplification involve oranges, pencils, or anything else. A mathematical system, such as arithmetic, is a model of anything it can be a model of. In manipulating a system, we do not care what its exemplification is; but this (as we have insisted before) is very different from saying that it has no exemplification. A formal system that by definition could have no exemplification — if it can even be imagined — would not be mathematics but nonsense. However, it is perfectly acceptable for a formal system to find its exemplification in some other formal system. For example, one exemplification of simply ordered sets (§ 1.8) is the natural numbers as defined by Peano's postulates (§ 1.11). To show that systems of type S^ find exemplification in systems of another type, S2, one must show that the class of all systems that meet the postulates for S2 is a non-null subclass of the class of all systems that meet the postulates for Sv In the case mentioned, we define a relation iS for natural numbers by saying that, for any natural numbers m, n, and p, m n if and only if one of these two is true: (1) m = n; (2) m ^ p and n = s(p). It is not hard to prove that, with this definition, the natural numbers are a simply ordered set. Of course, there are many other simply ordered sets; but if it happened that the natural numbers were the only example, the notion of a simply ordered set would nevertheless be validated by the one exemplification — provided, of course, that the natural numbers themselves be taken as valid. Another way to express the point of the preceding paragraph is this. If systems of type Si find exemplification in systems of type S2, then any system of type S2 is

LANGUAGE, MATHEMATICS, AND LINGUISTICS

181

isomorphic to some system of type St. Suppose we define a particular simply ordered set by adding the following requirements to those which define the class of all such systems: (1) the set K is denumerably infinite; (2) K contains a unique element J such that, for any element x,f• Bxbb; R3.

BX

-* x.

This is more compact than the earlier notation, but is by definition absolutely equivalent. In the sequel, the preceding example will be referred to as Simple Sample One. 2.2. Kinds of

Rules

We must now ask whether the class of harps characterizable by linear grammars contains any that are usefully similar to languages. The answer is neither a definite yes nor a definite no, partly because it depends on how close a match is judged useful. Suppose, however, that the answer were an unqualified yes: that we knew that a suitable linear grammar would yield a harp so similar to a language, say to English, that we would be tempted simply to call it a grammar of English. That such a grammar was linear would tell us very little. For, as illustrated at the end of § 2.1, linear grammars also characterize harps that are very unlike languages. We should therefore want to know, in exact formal terms, the differences between linear grammars that yield languagelike harps and those that do not. This is the problem to which generative grammarians have been devoting their attention for a number of years. Roughly, their program has been to seek the strongest constraints on a linear grammar (or on a grammar of some other formally defined class) within which it seems to generate a languagelike harp. 21 Our argument will take a different turn: we shall be more concerned with the shortcomings of linear grammars for linguistic purposes than with their excessive lack of specificity. First, however, we shall show that most of the kinds of rewrite rules that have been proposed by generative grammarians are acceptable within the framework of a linear grammar. (1) A context-free phrase-structure rule is usually stated in the form a -»

b,

where a and b are non-null strings over A and a ^ b; it may also be required that 21 This is made clear in many places: for example, in Emmon Bach, An introduction to transformational grammars (New York, 1964), especially chapters 1,2; or Paul Postal, Constituent structure: a study of contemporary models of syntactic description ( = Publication 30 of the Indiana University Research Center in Anthropology, Folklore, and Linguistics, Supplement to International Journal of American Linguistics 30:1) (1964). This and similar aims of formal grammatical theory should help explain (to those of us who have been largely occupied with producing 'practical' descriptions intelligible to our colleagues) why so many items in the transformationalist literature seem to devote so much time and machinery to such 'trivial' bits of data. The aim is not just to subsume the data in any old fashion, but to subsume it within specified formal constraints that impose stringent requirements of explicitness and simplicity. The search for simplicity, we should recognize, is enormously difficult.

LANGUAGE, MATHEMATICS, AND LINGUISTICS

191

b not be shorter than nor a permutation of a. A string s is acceptable as instring for the rule just if it can be deconcatenated into xay, where x and y are any strings over A; the corresponding outstring is then xby. This formulation, as it stands, does not guarantee that the outstring corresponding to any given instring will be unique; hence the 'rule' does not fit our postulates. For suppose that a occurs more than once in a string s. Let us say that it occurs twice. Then s can be deconcatenated into x1ax2ax3, where the strings Xi involve no occurrences of a. The 'rule' will yield, as outstring, either x^bx^ax^ or X^XJJX^, the 'rule' is therefore an association but not, as required, a function (§ 1.6, § 2.1). Several adjustments are possible. One is to specify that a single application of the rule a -* b is to rewrite all occurrences of a in the instring. Thus, if J = x1ax2a. ..axn, where the xt are a-free, then R(s) = x1bx2b...bxn. Such a rule is truly context-free, but it is hardly a 'phrase-structure' rule. If we relax the requirements given first above so that we may set 6 = 0, we have an erasure rule, for which we will later find good use. Any other adjustment of a context-free rule to make it fit the postulates of a linear grammar turns out to be a conversion from context-free to context-sensitive. Suppose we want the rule to rewrite only the first occurrence of a in the instring. Then a string j is acceptable if it can be deconcatenated into xay where y can be any string over A but x must be a-free. As soon as we specify required properties for either x or y in the deconcatenation of s into xay, we have what is by definition a context-sensitive rule. The one just described could be expressed in the currently most prevalent format as follows: a

b in the environment x where x is a-free.

Again, suppose we want a rule that will rewrite a as b only if the a is initial in the instring. We introduce a supplementary symbol ' © ' , which is not a character of the alphabet A and also not the name of any string over that alphabet, but which appears in statements of environments with the meaning boundary of string, so that ' © ' means 'initial in the string' and ' © ' means 'final in the string'. The desired rule is then of the form: a -» b in the environment © The three rules of Simple Sample One (§2.1, end) can be reformulated: Rv I -* Bb; R2. b -* bb in the environment Bx R3. B 0.

©;

Here the first and third rules are formulated as context-free because we know it is safe to do so: no string will ever involve more than one occurrence of I, so that there can be no ambiguity about R1; no string will ever involve more than one occurrence of B, so that there can be no ambiguity about the erasure rule R3. When, in the

192

CHARLES F. HOCKETT

environmental description for R2, we use 'jc' without any expressed restrictions, it is meant that x may be any string at all. (2) Many rewrite rules, as ordinarily formulated, are compact statements of whole (finite) sets of so-called 'minimal' rules; only the minimal rules are rules in our sense of the term. For example, a 'selection rule' a -* bi, b2, ... bn in the environment x

y

states n minimal rules, the ;th of which is a -* bt in the environment

y .

Other sorts of composite rules achieve compactness by describing many environments at once, using cover symbols of various kinds. The rules of a linear grammar are not inherently ordered: the three rules of Simple Sample One could be listed in any order whatsoever without in any way modifying the yield. However, economy of statement can often be achieved by ordering some subset of the rules of the system: the specification of environments for each rule can, by convention, ignore what has been specified for the preceding rules of the ordered subset, so that for the last of the set the simple assertion 'otherwise' or 'in all other environments' suffices. The conventions for abbreviating composite rules are not always spelled out as clearly as they should be in the literature, but it seems likely that any composite rule, or ordered set of rules, incapable of being expanded so as to fit our postulates also violates the intentions of the grammarian who has formulated it. The postulates for a linear grammar in no way preclude rule-grouping for compactness. But the fact that rule-grouping of some sort is possible is relevant information about a particular system; indeed, such differences may play a part in distinguishing formally between languagelike and unlanguagelike harps. For a fixed G, and a specific way of grouping minimal rules into composite ones, R will consist entirely of pairwise disjunct subsets R t , such that two minimal rules R l and R2 are subsumed by the same composite rule if and only if they are members of the same subset. Two systems would obviously be strikingly different if, in one, the subsets Ri were unit classes while, in the other, some of them contained many rules. (3) A single-based transformation requires an operand with constituent structure, rather than merely a string, and yields a transform that also has constituent structure. We are accustomed to thinking of a string with constituent structure as represented by a tree, where the labels on the terminal nodes constitute the string and the rest of the tree marks the structure. Thus, the tree at the top of p. 193 represents the string cfli with a particular phrase structure (reflecting a particular 'generative history'). Although a tree of this sort is the most vivid way of exhibiting constituent structure, it is well known that all the information conveyed by a tree can equally well be given by a bracketed string with labelled brackets. The bracketed string that corresponds

LANGUAGE, MATHEMATICS, AND LINGUISTICS

B

/

I

c exactly to the above tree is

193

E

G

f

h

I(B(c)D(E(f)G(h))) .

If a character appears in the tree as the label of a terminal node (one of the nodes at the bottom), it appears here enclosed by an innermost pair of brackets: that is, by a pair of brackets that does not enclose any other pair. If a character appears in the tree as the label of a non-terminal node, it appears here immediately before an opening bracket, and constitutes the labelling of that bracket and of its paired closing bracket. It is not necessary to label the closing brackets separately since, given any legal sequence of opening and closing brackets, there is only one way for them to pair off.22 Now, if there is some way in which a bracketed string of this sort can be interpreted as a string over the alphabet A of a linear grammar G, then we should be able to accommodate single-based transformations within the grammar. This is indeed possible, if two requirements are met. The first requirement is very simple: the brackets Y' and must be included as characters of the alphabet A, though not of the terminal subalphabet T. The earlier rules of a rule chain will introduce these two characters; hence some late rule or rules of any chain must erase them after they have served their purpose. The second requirement is really just as simple, b.ut necessitates a formulation of 'phrase-structure' rules somewhat different from the most customary one. The customary way would set up such rules as these:

Rv R2. R3. R,. Rs.

I -> BD B^c D -» EG E -»/ G h.

(We may suppose, if we wish, that the system includes also rules that rewrite B, E, and G in other ways.) One then applies these rules in any possible order, writing the steps down as follows: 22

A sequence of opening and closing brackets (interrupted or not by other characters) is legal if (1) there are the same number of opening brackets and of closing brackets and (2) the «th closing bracket is preceded by at least n opening brackets. The pairing is then as follows. The first closing bracket must pair with the nearest preceding opening bracket. One can then delete this pair, and again have a legal sequence, in which, once again, the first closing bracket must be paired with the nearest preceding opening bracket. This operation is repeated until all brackets have been paired. This is the only procedure that guarantees that no two pairs will define intersecting spans.

194

CHARLES F. HOCKETT

Rv R2. R3. RI. R*

RiR3. RiR*. Rs-

I D B c D c E G G c f h, c f I B D G B E G Bf G c f h, c f

or in any of several other possible orders. From any of these displays, we can construct a tree: (a) connect each character in each row to the character in the row above it from which it derives; (b) along any resulting line from / to a character in the bottom row, erase all but one occurrence of any character that appears more than once. The result, of course, regardless of which display is used, is just the tree given earlier. It is then to this tree, rather than to any single string, that a transformation is applied. Suppose the transformation is this:

J Here j is a 'transformational constant' — a terminal character introduced by the transformation itself. If we apply To to the tree already displayed, the terminal string cfh is converted to hcjf, with the phrase structure specified to the right of the arrow in the statement of the transformation. Instead of the customary procedure just described, suppose we formulate the phrase-structure rules as follows (environmental specifications might be necessary, but are omitted here for simplicity as they were in the customary formulation): RI RI Rl Rl Rl

I B D E G

-* I(BD) -»• B(c) D(EG) -> E(f) -* G(h) .

LANGUAGE, MATHEMATICS, AND LINGUISTICS

195

The transformation becomes: T'0. I(BD(EG))

I(K(GB)L(M(j)E))

.

A string s is acceptable to To if it can be deconcatenated into I(BxJ)(Ex2Gx3)), where each x< has the form (yt), in which the brackets are paired and yi is any nonnull string over A. T'o(s) is then I(K(Gx3BxJ L (M(j) ExJ). Finally, we need an erasure rule RE : a string s is acceptable to this rule just if it contains no nonterminal character — other than a bracket — that is not immediately followed by an opening bracket; the rule erases, in a single application, all occurrences of all nonterminal characters in the string. With the rules so formulated, all we need have to tell whether a given rule will accept a particular string is that string itself. We do not need a separate overt record of the 'generative history' of the string, in the form of a tree, since everything relevant of that generative history is included in the string. For example, consider the following initial rule-row and its yield: I Ri I(BD) R*. I(B(c)D) I(B(c)D(EG)) Ra- I(B(c)D(E(f)G)) RI I(B(c)D(E(f)G(h))) To the last string of those shown, we may apply either (a) RE, to obtain the terminal string cfh, or else (b) first To, yielding I(K(G(h)B(c))L(M(j)E(f))), and then RE, to obtain the terminal string hcjf. For a reason to be discussed later, a further adjustment is desirable. The rules R'o through can be applied in a number of different orders to generate exactly the same string I(B(c)D(E(f)G(h))). Rule R'i has to come first; but after that there are the following possible orderings, all equivalent: 2 3 4 5 ; 2 3 5 4 ; 3 2 4 5 ; 3 2 5 4 ; 3 4 2 5 ; 3 4 5 2 ; 3 5 2 4 ; 3 5 4 2. We wish to eliminate this irrelevant freedom of ordering, so that if a particular initial rule row generates a certain string, no permutation of the rules of that rule row will generate the same string. This aim can be achieved by the imposition of appropriate context-sensitivity, probably in a number of different ways. For example, let us define a bracketless label as a nonterminal character (other than a bracket) occurring not immediately followed by an opening bracket; then let us say that a phrase-structure rule (any of the set Ri through R's in our example) will accept a string only if the operand of the rule, in that string, is not preceded by any bracketless labels. Thus the string I(BD) is acceptable to R'2, since B is not preceded by a bracketless label, but not to R3, since D is preceded by the bracketless label B. With this constraint, the only valid ordering of the five rules is 1 2 3 4 5, as displayed above: the 'expansion', so to speak, of nonterminal characters proceeds from left to right. The system, with seven rules that we know of (it may

196

CHARLES F. HOCKETT

have others also), generates two terminal strings, with just one rule chain for each terminal string: rule chain terminal string R1R2R3R4R&RE R1R2R3R4R5TQR-E

cfli hcjf.

Hereafter we shall refer to the above example as Simple Sample Two. (4) We come now to two sorts of rules proposed by generative grammarians that cannot be adapted to fit the postulates of linear grammars. A context-sensitive rule of the sort proposed by Halle: 23 a -* b

in the environment x

z

y

neither accepts nor yields strings, nor is the environment proposed in the rule specifiable as a string or ordered set of strings. A double-based (embedding or conjoining) transformation requires as operand not a string but an ordered pair of strings. Since rules of these two sorts seem to be linguistically useful, the inability of a linear grammar to accommodate them constitutes a defect. The implications of this will concern us in subsequent chapters. 2.3. Source and Transducer; Effective Rule Chains In the face of the doubts just expressed, and more general ones stated earlier, for the balance of § 2 we shall make a wildly improbable assumption: not only that there exists a linear grammar G whose harp H(G) is indistinguishable from some language (for convenience, English), but that we have the grammar at hand in full detail. With such a grammar, we could program a computing machine to behave in the following way: for each successive value of k, beginning with k = 1, it is to form all rule chains of length k, and is to print out each rule chain together with the terminal string generated thereby. The rule chain, of course, is in effect what Chomsky calls a structural description of the terminal string.24 For sufficiently large values of k, the computer will run out of capacity, but we can imagine adding more capacity as needed, expense being no consideration in a Gedankenexperiment. Since G is (to all intents and purposes) a grammar for English, every terminal string printed out by the machine will be an English sentence. Furthermore, given any English sentence, no matter how long or complicated, we shall have to wait only a finite length of time "

Morris Halle, "Phonology in generative grammar", Word 18.54-72 (1962). Of course, this is not what Chomsky says a structural description is. But we are here engaged in the very process of showing how 'structural description' and 'input to a generative grammar' can be identified. So far as I know, the proposal is new, or at least independent; but see fn. 5 of Paul M. Postal, "Underlying and superficial linguistic structure", Harvard Educational Review 34.246-66 (1964) for possible parallelism of thinking. 21

LANGUAGE, MATHEMATICS, AND LINGUISTICS

197

until the computer prints out that very sentence. Programmed in accordance with G, the computer will produce only, but any of, the sentences of English. It is easy to see that the computer operation just described accords with our requirement that a grammar characterize a recursively enumerable harp; indeed, what the computer is doing is exactly to recursively enumerate the harp. If the harp is not also recursive, then we might be surprised by some of the terminal strings that turn up, which we would not have known were English until they were produced by the grammar that we have accepted, by definition, to be correct. At the moment, this possibility need not concern us. Suppose, now, that we do not want merely to wait around until the computer grinds out a particular sentence. Suppose we want the computer to produce a particular sentence now. To make this realistic, pretend that we plan to use our computer as the second of a pair whose joint task will be to translate from Russian to English. A Russian sentence is fed into the first computer. It is there digested and analyzed (I have no idea how), and a series of impulses is transmitted to the second computer. These impulses are input to the second computer: they are supposed to govern its functioning in such a way that it will produce, not just any English sentence, but a sentence that corresponds to the Russian original. It is clear that this requires a very different sort of program from the one described above. That was a program without input: it made the computer function as a source. Now we want a program that will convert input into output in accordance with G ; we want to make the computer function as a transducer. What is there about a generative grammar that could be regarded as input for a computer programmed to function as a transducer? Obviously not the vacuous initial character /, with which the generation of every terminal string begins. Some set of nonterminal strings? N o ; there is only one possible answer. If we select a particular rule chain, the grammar, and hence the computer, will generate exactly the terminal string determined by that rule chain. We have said that a linear grammar G characterizes, or is a grammar of, its harp H(G). What has not been noticed is that a linear generative grammar characterizes not just one harp, but two. One is H(G), a subset of the free monoid F(T) over the terminal subalphabet T ; this is the one that everyone has always talked about, the one that, in linguistic application, we seek to have match the sentences of a language in a usefully close way. But consider also the set R of rules. There is nothing to keep us from interpreting this as a finite alphabet of characters. The free monoid over this alphabet is F(R). The set C(G) of all rule chains of G is a subset of F(R), so that C(G) is also a harp. Thus, the two harps characterized by G are C(G) and H(G). In addition to characterizing these two harps, G also specifies a surjective function g:C(G)->H(G), since, by definition, any rule chain of G generates exactly one terminal string. In § 2.1 we designated the terminal string generated by a rule chain C by C(/). This is perfectly valid, but since it is C rather than / that can vary to yield different terminal strings, it seems a bit peculiar. We remove the peculiarity

198

CHARLES F. HOCKETT

by defining a function g: for any rule chain C, g(C) = C(/). The function g is not necessarily injective, since there is nothing in the postulates to preclude the case in which g(C x ) = g(C2) even if C1 ^ C 2 . Thus, G has a potential built-in asymmetry. Empirically, for languages, this asymmetry might allow great inefficiency. Surely we wish so to tailor G as to eliminate superfluous instances of multiple generation of a single terminal string; this is why, towards the end of §2.2 (3), we proposed a certain sort of context-sensitivity for the phrase-structure rules of Simple Sample Two. But some instances are not superfluous. If 'The sons raise meat' and 'The sun's rays meet' are a single terminal string in spoken English because they are phonologically identical, we still want at least two rule chains for the one terminal string. The same is true of'Flying planes can be dangerous' or of 'How do you find the brakes in this car?' Let us return to our computing machine. We can imagine, for simplicity's sake, a long row of switches on a control console, each of which except the first has a many settings as there are rules in G, plus one 'out-of-line' setting. (Putting perhaps hundreds of thousands of settings on a single switch is a purely technical difficulty.) The first switch has, in addition to the out-of-line position, only as many settings as there are initial rules in G. By turning the first k of these switches to the positions that correspond to the rules of a rule chain of length k, and the rest to the out-of-line position, we control the internal circuitry in such a way that the machine will write out the terminal string determined by that rule chain. Everything else about G can be viewed as built into the permanent wiring. The input to our transducer from a Russian analyzer is then simply a series of switch-settings. 25 Of course, the machine can have only a finite capacity — the row of switches cannot be infinitely long. But this limitation is not imposed by G : the grammar allows us to make the finite capacity of the machine as great as we wish. Despite the enormous triviality of Simple Sample Two, it may be useful to concretize in terms there of what has just been said. The first switch would need only one on-line setting, for R'i, the only initial rule in the system. The remaining switches would need seven on-line settings. The only two rule chains we know of for this system are the two listed at the end of § 2.2 (3), one of length six, the other of length seven. We can turn over even more of the total task to the internal wiring. We could so arrange matters that any legal combination of settings for the first i switches would activate interlocks, preventing any but a legal setting for the (z'+ l)st switch. This is possible because, although in selecting input we can choose among rule chains, the choice of individual rules for a chain is not independent. Indeed, G specifies exactly what choices are compatible. The appearance of a particular rule at a particular position may be obligatory, in the sense that if certain other rules and their positions 25 Although I disavow any concern here with the problem of mechanical (or other) translation, one should note an agreement between the machine example given in the text and Eugene Pendergraft's proposal (made, as far as I know, only orally) that one should attempt to translate not a Russian sentence into an English sentence, but a description of the former into a description of the latter.

LANGUAGE, MATHEMATICS, AND LINGUISTICS

199

in the chain have been decided, there is no longer any option at the position in question. Whether a particular rule at a particular position in a particular chain is obligatory or not does not depend, however, entirely on the structure of the grammar G. If we think of choosing rules for a chain — setting switches on the computer — from left to right beginning with an initial rule for the first switch, then a particular rule R will be obligatory at position i just if we find that the ith switch cannot be turned to any other setting. But suppose, for a rule chain of length n, that we were to begin at the nth switch and work backwards, again with appropriate interlocks, and intending to set up exactly the same rule chain. We might find that the ith switch (from the left-hand end) could be turned to any of various settings; further, by setting it for rule R we might find that some earlier switch was locked in a particular setting. Or, indeed, we might begin with the ith switch itself, activating interlocks in both directions; in this case, clearly, we should not find rule R at position i obligatory. T o get around this indeterminacy, I shall arbitrarily choose to define a rule R as locally obligatory (at a particular position in a particular rule chain C) if it is obligatory under the convention of left-to-right selection. N o w consider any rule chain C. Construct a sequence of rules C' by deleting from C all locally obligatory rules. Clearly, the set of all sequences of rules constructed in this way stands in one-to-one correspondence with the set of all rule chains. A sequence of rules C' is not a rule chain, under our definition, except just when it corresponds to a C that contains no locally obligatory rules so that C' = C. However, we shall call these sequences C' effective rule chains. By properly adjusting the internal circuitry, we can adapt our computer so that it will accept effective rule chains, rather than rule chains, as inputs. Without the adjustment, if the first locally obligatory rule of a chain is the ith (and the next one is not also locally obligatory), we find when we reach the ith switch that it has already been set for us. With the adjustment, setting the first i— 1 switches provides internally for the workings not just of the first i— 1 rules but also for the ith; the ith switch is therefore free to be used for the ( i + l)st rule of the rule chain, which is only the ith rule of the effective rule chain. The difference, of course, can be described in similar terms if the rule chain involves any row of rules all of which are locally obligatory. Thus, the input harp for our transducer need not be the set C(G) of all rule chains of G. Instead, it can be the set E(G) of all effective rule chains of G. This set, like C(G), is a subset of the free monoid F(R) over the set of rules R ; hence we are correct in calling it a harp. In what follows, we shall assume this adjustment. In order to illustrate with Simple Sample One, it is best to assume that R'2 is only one of two or more rules that expand B, that R'i is only one of two or more for E, and that R's is only one of two or more for G. Then we have: rule chain R2 R3 Ri R& RE R1R2 RiRiR&To Re

effective rule chain R2R4R5 R2R4R5T0

200

CHARLES F. HOCKETT

2.4. Generation and Duality Our hypothesis (the improbable one set forth at the beginning of § 2.3) guarantees that, for the grammar G currently under consideration, H(G) is indistinguishable from the set of all English sentences. But what is E(G)? What is there about E(G), as over against H(G), that might lead us (for example) to want to try to translate from Russian to English indirectly via E(G) rather than directly? Formally, all we can say about E(G) is that it is the input harp for the grammar G. Empirically, however, we can say what we should like E(G) to be. There are two points. (1) English sentences contain all manner of trivial irregularities that must be provided for in any grammar but that seem very superficial. It certainly seems accidental rather than essential that, although the plural of 'boy' is 'boys', that of 'man' is 'men'. It would be pleasing if our grammar G of English could take care of as many as possible of these trivial irregularities by locally obligatory rules, which are thus not in the effective rule chains that constitute input. (2) It would be pleasing if G were to provide two or more effective rule chains for a single terminal string just in those cases in which the terminal string is ambiguous. If G can have the properties just described, then E(G) will be largely free of the superficial irregularities and the ambiguities of H(G). By virtue of this, E(G) ought to differ from H(G) in that there are fewer limitations on the co-occurrence of rules in effective rule chains than there are of terminal characters in terminal strings; hence, also, the structure of E(G) should correlate more directly with meaning than does that of H(G). Inspect the following two pairs of strings of characters: pair one ABCDZEFGZHIJJ HIJJZGFEZEKKDZHLZABCD

pair two ABCDZEFGZHIJJ ABCDZEFGZMFNL.

Everyone would agree, I believe, that the strings of the first pair are more different from each other than are those of the second pair. Now, the strings of this display are equivalent, under a simple substitution code, to the everyday English sentences 'John saw Bill' (the first of each pair), 'Bill was seen by John' (the second of the first pair), and 'John saw Mary'. The coding is merely a device by which we can inspect the sentences without regard to our practical control of English, which is hard to set aside when we see or hear English in uncoded form. In terms of the underlying effective rule chains of a grammar G of English, the situation is different. We should like the first sentence of the first pair to differ from the second of that pair only by the exclusion versus the inclusion of a single rule, and the two sentences of the second pair to differ only by the inclusion of one rule rather than another. This surely agrees both with our feeling for English pattern and our awareness of meanings. To put this in another way, if G is a good grammar, then E(G) highlights and pinpoints the choices open to a speaker of the language. Now the transmission of

LANGUAGE, MATHEMATICS, AND LINGUISTICS

201

information in any communicative system, and hence the differentiation of meanings, is totally dependent on choices made by the transmitter and not known in advance to the receiver. A speaker of English has virtually no option about pluralizing 'man': that he pluralizes it as 'men' rather than in some other way thus conveys virtually no information. But he can choose between 'Bill' and 'Mary'; he can choose between active and passive; he can choose between present and past tense. Any such choice, relative to a system G, is the choice of a rule. This is what is meant by saying that E(G) correlates more directly with meaning than does H(G). This is why we prefer E(G), the set of all effective rule chains, to C(G), the set of all rule chains: E(G) maximizes the independence of choice of rules for an input. Thus, if G has the properties we want it to have, it is the choice of rules that differentiates meanings. We are thus led to discover, between generative grammar and certain older views, a much closer kinship than has been discerned heretofore. Let us not modify our definition of linear grammars in any way, but, for a moment, let us replace two technical terms. Instead of 'terminal character', let us say 'phoneme', and instead of 'rule' let us say 'morpheme'. We then have, in effect, the two-stratum model for a natural spoken language, incorporating the design feature of duality of patterning, as proposed variously by Saussure, Hjelmslev, and many others, including me.26 H(G) becomes the set of all strings of phonemes; E(G) becomes the set of all strings of morphemes. The function g maps morpheme strings into phoneme strings. We could say, using another traditional term, that a phoneme string realizes a morpheme string — sometimes, indeed, ambiguously. Our program in subsequent chapters is to investigate the shortcomings of linear grammars for the characterization of languages and to try to define types of grammars that are free of these shortcomings. We shall not again use the terms 'morpheme' and 'phoneme' in the way they were used in the preceding paragraph, since both words have by now been assigned such a large variety of conflicting meanings in the literature that they are worn out — much like old slang, and for a similar reason. But the point of the preceding paragraph will remain with us. Let us put this in another way, since the point is worth emphasizing. We said in § 2.0 that the task of algebraic grammar, narrowly defined, is to establish a significant and, if possible, complete formal classification of all harps. If this is the orientation 26

Ferdinand de Saussure, Cours de linguistique genérale (Lausanne, 1916; 2nd edition Paris, 1922); Louis Hjelmslev, Omkring Sprogteoriens Grundlsegelse (Copenhagen, 1943), tr. by Francis J. Whitfield as Prolegomena to a theory of language (= Indiana University Publications in Anthropology and Linguistics, 7) (1953); Andre Martinet, "La double articulation linguistique", Travaux du Cercle Linguistique de Copenhague 5.30-7 (1949); Charles F. Hockett, "Linguistic elements and their relations", Lg. 37.29-53 (1961); also, of course, the work of many other scholars influenced by Saussure. However, this view must be sharply distinguished from the 'one-stratum' view expressed by Leonard Bloomfield (e.g. in Language, New York, 1933), and adhered t o by most so-called 'structural' linguists in this country in the 1930's, 1940's, and 1950's, whereby a morpheme is taken to consist of (an arrangement o f ) phonemes. The use of the term 'morpheme' in the transformationalist literature is essentially this latter use, and must not be permitted to confuse understanding of what I am asserting in the text.

202

CHARLES F. HOCKETT

we bring to linguistics, we are led to define a language purely as an infinite collection of sentences. A language and a grammar for it then have only a single interface (§1.12): the grammar is a model of the language, and the language an exemplification of the grammar, just if the terminal strings of the grammar match in a sufficiently close way the sentences of the language. This is obviously not enough. It establishes a necessary, but by no means sufficient, condition for the acceptability of a grammar for any particular language. Even in their most recent publications, the transformationalists stick to the narrow definition of 'language' given above. 27 They are not trapped by it — they clearly realize that the condition for acceptability implied by it is only minimal. Even so, this use of the term 'language' seems misleading. Since we have the technical term 'harp' for a subset of a free monoid (essentially the transformationalist definition of 'language'), we can afford to use the term 'language' in a more natural and traditional way. Let us view a language, then, as a system manifested in the behavior of human beings, a system that itself has two interfaces with the universe as a whole. The interfaces are those suggested by the traditional pair of terms 'sound' and 'meaning' — or by various fancier equivalents, such as Hjelmslev's 'expression' and 'content'. At or near the sound interface, we find a language manifested by the potentially infinite collection of sentences that might be our sole concern if we were thinking purely as algebraic grammarians. Empirical observations can be made of a language at this interface: we can observe and record what people say. Empirical observations can also be made at the other interface: we can observe the conditions under which people say various things, we can observe their responses to what others say, and — probably most important — we can ask them what things mean. But the two interfaces are the only places where observations are possible. The machinery that links them can only be inferred. The two interfaces between a language and the rest of the universe are also the only places where a language and a formal grammar can meet. Instead of requiring a match at only one of these interfaces, let us require a match at both. This gives us a necessary and sufficient condition for the empirical acceptability of a grammar. A formal grammar for a language thus becomes a black box hypothesis about the machinery that links the two interfaces within the language. No matter how close a match we obtain, we must never think of the details of our formal system as anything more than the roughest sort of analog for what goes on in the central nervous system of a speaker or hearer — control flow charts are not wiring diagrams. But, since the actual machinery within the language (or within the speaker or hearer) is not directly observable, a black box hypothesis is the best we can hope for. 27 See, for example, N. Chomsky and G. A. Miller, op. cit. fn. 19, p. 283. It is this terminological identification of 'language' with 'output harp' that necessitates Chomsky's distinction between weak equivalence and strong equivalence of grammars; we here handle the matter in a different way, but will return to 'weak equivalence' in Chapter 7.

LANGUAGE, MATHEMATICS, AND LINGUISTICS

203

3. STEPMATRICES

3.0. The First Inadequacy of Linear Grammars In § 2.2 (4) we gave two reasons why linear grammars are not as useful for linguistic application as we should like. The first reason, which has to do with spoken languages, will concern us in this chapter. Classical phonological theory rested on several assumptions, held with varying degrees of steadfastness by different investigators. They may be itemized as follows: (1) Quantization. Two distinct sentences of a language cannot be indefinitely similar. They must differ at least by the presence of a particular element at a particular position in one versus its absence, or the presence of a different element, at the same position in the other. If they do not differ at least in this minimal way, then they are not two sentences but one. (2) Audibility. The difference between any two sentences of a language is audible to a native speaker in the absence of channel noise, even if there is no defining context. (3) Finiteness. Every sentence of a language consists of an arrangement of a finite number of occurrences of elements, and the total stock of elements for all sentences is finite. (4) Patterning. Some arrangements of elements occur more frequently than others, and some combinatorially possible arrangements do not occur at all. (5) Constancy. Recurrences of the same form (e.g. of the same 'morpheme') are recurrences of the same (phonological) elements in the same arrangement. (6) Linearity. The arrangements in which elements occur in sentences are exclusively linear. Although the empirical evidence for some of these assumptions is vast and varied, when any language is observed in detail it becomes obvious that two of them are incompatible and that one is superfluous. The two mutually incompatible assumptions are those of audibility and of constancy.28 'Wives' rhymes with 'hives', and 'wife' with 'fife', but not 'fife' and 'hive' and not 'fifes' and 'hives'. We can insist on audibility, in which case 'wife' and 'fife' are phonologically alike at the end, and similarly 'wives' and 'hives', but this means that the form 'wife' appears in two phonologically distinct shapes, violating constancy. Or we can insist on constancy, in which case 'wife' and 'fife' do not end the same way phonologically even though they are an exact rhyme, thus violating audibility. But we cannot maintain both principles at once. This was realized many decades ago, and much of the history of phonological theory has been that of a search for a motivated compromise. But there is no single non-arbitrary compromise between 28

The argument is given in extenso in Charles F. Hockett, op. cit. fn. 26, and therefore need not be given in detail here. The current transformationalist view (e.g. Chomsky and Miller, op. cit. fn. 19, esp. § 6) virtually discards audibility for any purpose, by ignoring the bulk of the evidence of rhyme, assonance, patterning in slips of the tongue, and the like, all of which surely reflect language habits. Formally, however, the transformationalist view and that developed in this chapter and the next are very similar. The disagreement is about the empirical facts.

204

CHARLES F. HOCKETT

the two. Instead, there are two. By forgetting about audibility, we arrive at a representation of the sentences of a language in terms of elements for which various terms have been used: I shall use Lamb's convenient term morphon,29 By forgetting about constancy and insisting on audibility, we arrive at what has been called a phonemic representation of sentences. It is this that we must discuss further here. In this context (that of 'phonemic' representation), the linearity assumption does not stand up under scrutiny. It is suggested by the linearity of most writing systems, and by the obvious possibility of devising an unambiguous linear notation for any phonological system. But these are extrinsic considerations. The principle is quite regularly ignored when dealing with accentual and intonational phenomena. Apart from these, if one insists on the assumption then such a pair as English 'pit' and 'bit' must be taken as evidence for minimal elements /p/ and /b/. Bloomfield, among others, was willing to accept the consequences of this insistence, asserting that the phonetic similarity of English /p/ and /b/ has no linguistic significance.30 The Prague School disagreed, and undertook systematic dissection of such linear 'phonemes'; yet even for the Prague School linear elements stayed in the forefront, their decomposition being somehow secondary. 31 The linearity assumption, with its orthographic overtones, has created pseudoproblems. For example, the argument as to whether the syllable nuclei of English 'beat', 'bait', 'boot', 'boat', and the like are single phonemes or clusters becomes meaningless if there are no such things as phonemes in one's frame of reference. Obviously, problems of assigning allophones to phonemes vanish in just the same way. I do not mean that all phonological problems are automatically resolved upon the abandonment of linearity. But those that remain can be attacked more directly and realistically. I therefore propose, as have various investigators in recent years, that we abandon the linearity assumption altogether. The pair 'pit' and 'bit' attests to a minimal difference, true enough, but that difference is between voicelessness and voicing, not between /p/ and /b/. We shall follow Lamb in calling the terms of such minimal differences phonons.32 Every sentence of a language, then, consists of an arrangement 29

The transformationalists use phoneme; the traditional term has been morphophoneme, of which 'morphon' can be viewed as a sort of abbreviation. Similarly, the transformationalists call the 'component' G" (see below) of a generative grammar the phonology, while Lamb and I use the traditional term morphophonemics. Of course the choice of terms does not matter, and in linguistics it would be an idle dream to hope for terminological agreement; it is just unfortunate that each of the terms just mentioned is used also in various other ways, by various other 'schools' or 'trends' within our field. For Lamb's terminology, see Sydney M. Lamb, "On alternation, transformation, realization, and stratification", Report of the 15th Annual Round Table Meeting on Linguistics and Language Studies (= Monograph Series on Languages and Linguistics, No. 17), 105-22 (1964); "The sememic approach to structural semantics", American Anthropologist (to appear). 80 Leonard Bloomfield, "A set of postulates for the science of language", Lg. 2.153-64 (1926). 81 See Travaux du Cercle Linguistique dePrague vols. 1-10 (1929-39), passim; now also Josef Vachek, A Prague school reader in linguistics (Bloomington, 1964). 38 See op. cit. fn. 29. In § 4.5 it will become clear why I do not at this point simply use the term 'distinctive features', even though, ideally, phonons and distinctive features should be identical.

LANGUAGE, MATHEMATICS, AND LINGUISTICS

205

of (occurrences of) phonons, of which the language has a finite stock; but the arrangements are not exclusively linear. Two phonons may occur in succession, or they may occur simultaneously. For instance, 'pit' begins with a simultaneous bundle of three phonons: voicelessness, labiality, and stopness. We now see why a linear grammar, as defined by the postulates of § 2.1, cannot be a satisfactory grammar of a spoken language. Such a grammar generates terminal strings in which, by definition, simultaneity is impossible. What we must have for a spoken language is a kind of grammar that may be very similar to a linear one, but that yields outputs of some sort in which simultaneity is possible. Let us suppose that a satisfactory grammar G for some spoken language can take the form of a coupled pair of partial grammars G'G". Our assumption about G' will be that it generates terminal strings; in other respects it may or may not conform to our postulates for a linear grammar, though in this chapter we shall speak as though it did. Our assumption about G" will be that it generates arrays of the sort that are empirically required; more on this in a moment. Our assumption about the coupling of G' and G" is that the terminal strings from G' govern the operation of G". Underlying the whole supposition is the conjecture that, even if sentences are not strings in the simple sense of chapter 1, much of the internal workings of G can be regarded as linear. This means that G" is being assigned a very special responsibility: that of delinearizing material that comes to it, from G', in strictly linear form. This approach sets before us the following tasks: (1) The exact mathematical formulation of the possible structures of outputs from G". (2) The analysis of the possible ways of coupling G' and G", together with possible formats for G". (3) The determination of the optimum location of the 'break' between G' and G".

3.1. Stepmatrices The mathematical formalism we need does not exist; it has to be invented. It turns out to be extremely simple, which is probably why no one has bothered with it before. Let Q be a finite alphabet, called a componential alphabet, of characters {Ki}q, where (1) each Ki is an unordered non-null set of distinct components e\ and (2) the set E of all components is finite. From these definitions, it follows that Kx — K2 if and only if they contain exactly the same components. Since Q is finite, the number m of components in any character Ki does not exceed, for any i, some finite integer max(n); that is, for all i, 1 ^ m si max(n). For the case in which max(w) = 1, the AT's are unit classes of e's, one K for each e; consequently, the free monoid F(Q) is isomorphic to the free

206

CHARLES F. HOCKETT

monoid F(E) and one has, in effect, not a componential alphabet but an ordinary or simple one. To preclude this trivial case, we specify, for any Q, that (3) max(n) > 1. We shall allow ourselves to speak of a character as a simultaneous bundle of components although, formally, the system we are developing has nothing to do with time so that the term 'simultaneous' is out of place. Despite this usage, the mathematics of individual characters is just finite set theory. Thus, we define a subcharacter Kto be any subset of some character K, including ^itself and the null subcharacter 0.33 It will be convenient to display the components of a subcharacter (or character) in column rather than row form; since the symbols in a single column represent members of a set, and the vertical alignment will not be used for anything else, we can omit the braces that usually enclose the names of members of an unordered set, but I use enclosing vertical lines in a way that will be clear in a moment: Kj =