Language, mathematics, and linguistics 9783111353500, 9783110998269

189 77 16MB

English Pages 243 [244] Year 1967

Report DMCA / Copyright


Table of contents :
Recommend Papers

Language, mathematics, and linguistics
 9783111353500, 9783110998269

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview















© Copyright 1962 by Mouton & Co., Publishers, The Hague, The Netherlands. No part of this book may be translated or reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publishers.

Printed in The Netherlands by Mouton & Co., Printers, The Hague.


This essay has two aims. 1 The first, a subsidiary one sought mainly in §1, is to introduce some of my fellow linguists to mathematics. Of course, some linguists know much more mathematics than I do; §1 is not for them. But many know almost none; a few, strangely enough, boast of this ignorance. This is an undesirable state of affairs, for they are thereby seriously hampered in following certain current developments in our field. No linguist will acquire a practical control of mathematics by studying this essay, any more than one can become fluent in a foreign language merely by reading a description of it. But, hopefully, the ice will be broken, and he can proceed to learn more mathematics from the many excellent standard texts. 2 Learning mathematics is like learning any subject, in that one must acquire a new vocabulary. It is like learning a foreign language rather than, say, history, in that one must also acquire alien grammatical habits. And it is like no other subject in that one must also learn how to invent new grammatical devices as they are needed. •This is a revision and enlargement of my 'Four Lectures on Theoretical Linguistics', delivered at the Linguistic Institute, Indiana University, Summer 1964. Except for the Preface, which replaces an earlier Introduction, and for minor alterations of format, the present version is identical with that included in Current Trends in Linguistics, edited by Thomas A. Sebeok, vol. Ill: Theoretical Foundations (The Hague, Mouton & Co., 1966), pp. 155—304. 2 The following bibliography is representative rather than highly selective; it begins with three recent texts intended for accelerated American high school students. Allendoerfer and Oakley 1963; Glicksman and Ruderman 1964; Richardson 1958; Breuer 1958; Lipschutz 1964; Tarski 1950; Courant and Robbins 1941; Wilder 1952; Polya 1954; Kemeny, Snell, and Thompson 1956; Birkhoffand MacLane 1944; Davis 1958; Chevalley 1956. Full information will be found in the References at the end of the essay.



For all this, experience in traditional linguistics should afford about as useful a background as one could hope for. I have tried to help the linguist reader by emphasizing the language-like attributes of mathematics, most of which are not recognized as such by mathematicians because they have no specialist's knowledge of language. This ignorance on the part of the average mathematician has no bearing on the quality of his mathematics; but ultimately the languagelike nature of mathematics is of basic importance, since it is the most critical clue to an understanding of the place of mathematics in the physical universe of which mathematics, mathematicians, language, linguists, and all of us are a part. 3 The second aim, pursued in the rest of the essay, now needs more extended comment than it received in the earlier printed version. 4 It was there set forth as follows: '... to explore certain properties of certain grammatical systems. The investigation is conducted at a fairly abstract level, so that the conclusions have nothing in particular to do with one human language rather than another; hence examples, on the rare occasions when they are given, have been drawn as a matter of convenience from languages of which I happen to have some knowledge. However, the argument is not intended to be completely abstract and formal. I assume that even in the most ethereally theoretical linguistics we are still concerned with real languages, 6 so that the tie between theory and empirical data, though it may become exceedingly tenuous, must not be broken. The point of departure for the investigation, developed at the beginning of §2, is similar to the usual point d'appui of the "algebraic grammarians", but the direction the investigation then takes is different and, if I am correct, new.' The question that must now be raised is whether the 'tie between theory and empirical data' has not, indeed, been severed—not only in the bulk of the present essay 6 but also in all algebraic grammar •For this view, see Bloomfield 1935, 1936,1939; Hockett 1948b; Hockett and Ascher 1964. 4 See footnote 1 above. 'Compare the unbreakable, though almost indefinitely stretchable, tie between theoretical and experimental physics, described beautifully in Born 1962. "But not in §4 and some parts of §3 (and, of course, not in §1).



in most of Chomsky's work, in a good deal of Lamb's, and, indeed, in a sizable proportion of linguistic theory since Bloomfield. Nor do I mean entirely to exonerate Bloomfield: in synchronic matters he followed Saussure quite closely, and thus set a whole generation off on what I now believe was the wrong track. A language, viewed synchronically, was a 'rigid system' 7 ; individual speakers may in practice violate almost any feature of the system, without modifying the system itself unless 'whole groups of speakers ..., for some reason unknown to us, coincide in a deviation'. 8 Bloomfield's own work was too thoroughly infused with historical good sense for this view of synchronic language design to do much damage— though, to be sure, the descriptive chapters of his textbook 9 set us off in the direction of a long-unchallenged 'item and arrangement' model of grammar, 10 ultimately formalized as 'phrase-structure grammar' by the transformationalists. 11 After Bloomfield, even this check on spurious formalism was lost, as a whole generation of American descriptive linguists were trained in virtual ignorance of the findings of historical linguistics. It is greatly to the credit of Harris and of Chomsky that they uncovered inadequacies in 'item and arrangement' grammar and were thus led to develop a more realistic 'item and process' model featuring transformations. The contexts in which they set forth these new (or revived) proposals were unfortunate. Harris's context was that of 'game-playing'. 12 Perhaps partly in reaction to this, Chomsky spoke from the beginning as though phrase structures, transformations, and the like are in the language, rather than merely

'Bloomfield 1924, 1927. »Bloomfield 1927. »Bloomfield 1933. 10 Hockett 1954. Peculiarly, this article, which introduced the term 'item and arrangement' and the contrasting 'item and process', has often been interpreted as a defense of the former, whereas in fact it was intended as a challenge to the former and a proposal that we investigate the latter more thoroughly. n Postal 1964a is the most explicit discussion of this. 12 This is clear from the tone of Harris's many articles in Language during the 1940's and early 1950's, and of Harris 1951. For criticism at the time, see Hockett 1948c, 1952.



useful descriptive devices.13 It is easy to miss this point in reading Chomsky's essays, since he is quite insistent that a generative grammar is not supposed to be any sort of a picture of how speakers produce (or receive) utterances; thus, merely because phrases structures, transformations, and the like are in the language does not mean that they are in the speaker. To achieve this peculiar state of affairs, he takes the language itself out of its speakers: that is, a language cannot, in his view, be regarded as a set or system of habits of real people. 14 Yet obviously a language has something to do with people and their behavior, and Chomsky puts the language back into its users in a different way, in the form of the user's competence, or knowledge of his language, which is somehow to be distinguished from his actual performance and cannot even be identified with his regularities (habits) of actual performance. 15 The main trouble, however, is not merely an innovating terminology—doubtless we could say about language what needs to be said in Chomsky's terminology about as well as in Bloomfield's or Paul's. Once the vocabulary has been mastered, Chomsky's system of views has a persuasive coherence and almost has to be accepted or rejected as a whole. It takes quite a bit of exegesis to discern that the whole structure rests, for better or for worse, on one unchallenged key assumption. 16 Bloomfield, following Saussure, called a language a 'rigid' system, but this use of the term 'rigid' was metaphorical. In the postBloomfieldian descriptive work of the 1940's, we sought to match the rigidity of languages by the rigor of our methods, without any very clear notion of what rigor was. It was in this atmosphere that Chomsky received his training. He was the first (as far as I know) to refine the notion of rigidity-rigor by bringing to bear on this the "Chomsky 1957, and repeatedly since. "Chomsky 1964, fn. 3, p. 10. 15 Chomsky 1965, ch. 1. 16 For me, this fact emerged clearly only from a careful reading of Chomsky 1965, ch. 1. This chapter is a reductio ad incredibile of the mistakes we have been making in linguistics for the last thirty or forty years; my study of it, after the present essay was completed, was responsible for the radical change of view reported in this Preface.



relevant segment of modern mathematical logic: to wit, the theory of computability and unsolvability. 17 In this exact logical frame of reference, the obvious translation of the imprecise 'rigid' is the term well-defined. We cannot take the space here to expound this term (the reader can consult the footnote reference), but will merely give a few examples. All or almost all mathematical systems (see §1.8 below) are well-defined. At the opposite extreme, no physical system is well-defined. Of human institutions, there are some of each: thus, chess is well-defined; baseball may be; American football is definitely not. It seems that indisputable instances of well-defined systems are all products of the human intellect. Chomsky's views are based on the assumption that a language, at any given moment, is a well-defined system. All known versions of algebraic grammar rest on this same axiom (§2.0 below, where—I am happy to say—even a year ago I expressed some doubts). It is surely not Chomsky's fault that he should never have challenged this assumption, in view of its honorable pedigree. But it must be challenged. For, in fact, there seems to be not one shred of empirical evidence that supports the view. In particular, the assumption that a language is well-defined proves to be incompatible with all that we know of linguistic change. It is therefore not surprising that some of Chomsky's followers, whose training in traditional historical linguistics was deficient or wholly missing, are now inventing historical linguistics all over again—and, of course, are repeating all the same old mistakes that were overcome after decades of toil by our predecessors of a century ago. 18 All this requires much more extended exposition than is possible here; put thus briefly, my remarks will probably seem gratuitous. It is in no sense my purpose to be cryptically derogatory towards 17

Davis 1958. A system is well-defined (this particular term does not appear in Davis) if it can be completely characterized by deterministic functions; a deterministic function is either a computable function or one specified with sufficient explicitness that its noncomputability can be proved. 18 Closs 1965 represents this new development; see especially her first paragraph, which refers to other examples. See also the diachronic remarks in Halle 1962; and Postal (forthcoming).



some of my colleagues. But I must be frank about a radical shift in point of view between the time this essay was written (the original manuscript was transmitted in February 1965) and the present. For, if a language is in fact not a well-defined system, then what is the point of this or any other elaboration of algebraic grammar? It is easy to say that, even if the basic assumption of algebraic grammar is false, it may nevertheless afford us a useful approximation. Since this is easy to say, I hereby say it, thus offering some justification for the republication of this essay. But there is a much more important point. I now believe that any approximation we can achieve on the assumption that a language is well-defined is obtained by leaving out of account just those properties of real languages that are most important. For, at bottom, the productivity and power of language—our casual ability to say new things— would seem to stem exactly from the fact that languages are not well-defined, but merely characterized by certain degrees and kinds of stability. This view allows us to understand how language works, how language changes, and how humans, using language, have created the well-defined systems of mathematics—for well-definition is born of stability through certain tricks of which only a speaking animal seems to be capable. Current algebraic grammar is good fun. But preoccupation with it should not blind us to the possibility of discovering some mathematicization of language design that would exploit, instead of ignoring, this basic empirical fact. From here on, for the sake of brevity, the bare term 'language' will be used only of real human 'natural' languages, spoken or written, never in transferred or metaphorical senses. Cornell University Ithaca, New York, U.S.A. January 1966






1.0. 1.1. 1.2. 1.3. 1.4. 1.5. 1.6. 1.7. 1.8. 1.9. 1.10. 1.11. 1.12.

Some Cautions to Linguists Ordered and Unordered Sets Elements and Sets Abstraction, Notation, Abbreviation Variables and Domains Relations among Sets Associations, Functions, Correspondences . . . . Cardinality Systems Properties of Binary Relations and Operations . . Isomorphism Recursive and Recursively Enumerable Sets . . . Model and Exemplification


2.0. 2.1. 2.2. 2.3. 2.4.

Semigroups, Monoids, Harps, and Grammars. . . Linear Generative Grammars Kinds of Rules Source and Transducer; Effective Rule Chains . . Generation and Duality


3.0. 3.1.

The First Inadequacy of Linear Grammars . . . . Stepmatrices


15 17 17 19 23 24 25 29 30 34 37 41 47 51

51 59 62 71 76 81

81 84



3.2. 3.3. 3.4. 3.5. 3.6. 3.7. 3.8.

Some Empirical Considerations The Three Formats for Problem Two The Rewrite Format Rewrite Rules for Potawatomi Morphophonemics The Rewrite Format: Discussion The Realizational Format Realizational Rules for Potawatomi Morphophonemics 3.9. Stepmatricial Grammars and the Stepmatricial Format 3.10. Comparison and Summary 4.

88 93 94 95 104 105 107 114 116



4.1. 4.2. 4.3. 4.4. 4.5.

123 127 129 130 132

Non-Probabilistic Approximation Introducing Probability Paralinguistic and Idiosyncratic Effects Distinctive Features; Sound Change Phonons and Distinctive Features


5.0. 5.1. 5.2. 5.3. 5.4. 5.5. 5.6.

The Second Inadequacy of Linear Grammars . . . The Ordered Pair and Unordered Pair Procedures . Trees Binary Tree Grammars Linearizing Input to a Tree Grammar The Time Bomb Method Summary


6.0. 6.1. 6.2. 6.3. 6.4. 6.5. 6.6.

Nonlinear Inputs Finite Networks Conversion Grammars Generalized Rewrite Rules An Application of Generalized Rewrite Rules. . . The Stratificational Model Architectonic Comparison of the Models


134 136 138 143 144 149 153 155

155 155 161 163 165 170 174


6.7. 6.8.

Semons and Semon Networks From Semons to Lexons



178 188 196


Introductory; The Problems of Scope and Relationship 196 7.1. Groups 198 7.2. Categories 206 7.3. Categories of Conversion Grammars 209 7.4. Grammars from Silence to Silence 216 7.5. Other Applications 219 APPENDIX







1.0. Some Cautions to Linguists. Mathematics is derived from everyday language by the introduction of various special conventions that permit a precision of statement and of inference not otherwise attainable. Although mathematics can become extremely complex and difficult, there is no mystery in it, save for such mystery as it may inherit from everyday language or from life itself. Several very elementary points should be remembered by any linguist who is seriously undertaking to learn about mathematics. The first is that mathematicians do not care whether the symbols and equations they write can be distinguished in pronunciation. Although they communicate orally, they sometimes get into trouble unless paper and pencil, or blackboard and chalk, are handy; lacking these, they have been known to deface the luncheon tablecloth. Two different symbols that have the same name (or the same pronunciation) in spoken English, say 'h' and ' H \ may be quite freely used in totally unrelated senses. I do not know why this feature of mathematical practice should be disturbing to linguists, who in general can make visual distinctions as well as anybody, but in fact it sometimes is. Perhaps it is due to the linguist's traditional preoccupation with spoken language, which may lead him to feel that there is something not quite cricket about purely graphic distinctions. Cricket or not, this is the way mathematicians behave, and we have to accommodate. A second point is that the denotations of many of the mathematician's symbols change quite kaleidoscopically. When a symbol is no longer needed in one sense, it may be used in another. In elegant mathematical exposition—there is bad writing here as



everywhere—due warning is always given. This variability is easiest to take if we remember the similar behavior of certain everyday words, such as this or it. The mathematician needs many terms of the this and it variety, so he makes them up as he goes along, drawing on various alphabets and type fonts for symbols and modifying them as he finds convenient by diacritics, subcripts, and the like. The mathematical vocabulary of ordinary words is rather more stable, but here we encounter another point worth mentioning. Some technical terms are Latin- or Greek-based neologisms. Many, however, are picked up off the street, and the everyday senses of the words thus introduced into mathematics are at best only mnemonically helpful, at worst quite misleading. Thus, set has nothing to do with bridge, concrete, or hair, field nothing to do with agriculture, nor group with sociology, ring with matrimony, ideal with ethics, imaginary with imagination, lattice with architecture, tree with forestry. If and only ¡/has a precise meaning, which is the same as that of just if or just in case. One might expect almost to be vague; but almost everywhere is in fact defined with absolute precision. Three words that are not technical terms turn up constantly in the talk of mathematicians: obvious, elegant, and trivial. Something is obvious if anyone—that is, anyone with proper training!—would agree that it is so. A proof or argument is elegant if what it demonstrates was not obvious before the demonstration but is thereafter. Something is trivial (or in some current usage uninteresting) if it is already obvious to the person who calls it so at the time he pronounces the judgment. Obviously, the triviality, obviousness, or elegance of something in mathematics has little to do with its validity or utility. The amateur or novice might as well face the fact that almost any bit of mathematics with which he concerns himself is going to be trivial to most professional mathematicians. One learns to shrug off such adverse judgments and go about one's business. In this, one has support from Einstein, who said 'If you are out to describe the truth, leave elegance to the tailor.'



1.1. Ordered and Unordered Sets. A year after their marriage, Mr. and Mrs. Jones had a son Paul; a year later, a son John; a year still later, Luke. If we pay attention to their ages, the Jones boys constitute an ordered set (or sequence) (Paul, John, Luke). Paul suffered a lengthy childhood illness, so that he graduated from high school after his brothers. This defines a different ordered set, denoted by '(John, Luke, Paul)'. The two ordered sets are different even though their elements—namely, the Jones boys—are the same. The notion of order, just illustrated, is for mathematics an empirical primitive, not to be defined in terms of something simpler (though many have tried to do so), but to be accepted as familiar to all of us, perhaps to all organisms, because of the nature of life, time, and the physical universe. But for many purposes order is irrelevant. The notion of an unordered set, which is usually just called a set, is slightly less obvious. We represent the (unordered) set of Jones boys by '{John, Paul, Luke}', or by '{Paul, John, Luke}', or in any of several other ways that can easily be figured out. In either speaking or writing, the names of the elements have to be presented in one order or another, but, in writing, by enclosing the list of names in braces we indicate that the order is non-distinctive and is to be ignored. Thus, {John, Luke, Paul} = {Paul, John, Luke}, because the order is irrelevant; but (John, Luke, Paul) ^ (Paul, John, Luke) because, as indicated by the curved parentheses, the order of naming is to be considered distinctive. 1.2. Elements and Sets. The notation 'a e A' means that a is an element (which can be anything whatsoever), that A is an (unordered) set (or class, collection, aggregate, ensemble), and that the element belongs to (or is a member of, or is contained in, or is in) the set. (Of course, we could replace 'a' and 'A' by any pair of distinguishable symbols and still mean the same thing.) If a is Leonard Bloomfield and A is American linguists, then it is true



that a e A. The notation 'a e A' asserts that the element a does not belong to the set A. A set is defined if, when presented with any element % whatsoever, we can say whether xeAorxeA. This turns out to be a surprisingly complicated matter, which we shall discuss in an elementary way here and to which we shall have to return in §1.11. One way to define a set is to name all its members. The notation for this was shown in §1.1. Thus, A = {Beethoven, Sapir,[Einstein} is the set whose members are Sapir, Beethoven, and Einstein. B = {1, 3, 5, 7, 9} is the set ot all positive odd integers less than 10. C = {27, Beethoven, the typewriter I am now using} is a peculiar set empirically, but a perfectly good one mathematically. A second way to define a set is to supply a test that must be passed for membership. Let D consist of all positive even integers less than 11; hence D = {2, 4, 6, 8, 10}. Let E consist of all positive even integers. We cannot list all the members of E, but if we believe that we can invariably identify a positive even integer as such when we see one, we believe that E is defined. This reference to faith may seem peculiar, but it is not out of place. Suppose we define n as the integer such that the nth prime 1 number is the smallest one greater than 10100. Does n belong to the set E or not? It would take a long time to find out, but it is known that the answer could be computed if we had enough time. Suppose, next, that we define F as the set of all saints honored on All-Saints' Day: that is, all human beings who ever have been or ever will be canonized. To determine the membership in class F of people yet unborn, we can only wait, and presumably this will always be so; furthermore, the hagiological record is uncertain for some individuals now long dead. Definition by test obviously requires care. What seems at first to be an acceptable test sometimes turns out not to be. The old paradox of the barber illustrates this. In a certain village there is a barber who shaves all the men of the village except those who shave 1

A prime number is a positive integer (1, 2, 3, and so on) which is not evenly divisible by any integer except itself and 1. Thus, 7 is divisible by 1 and by 7, but not by 2, 3,4,5, or 6, and, of course, not by any larger integer.



themselves. This seems to define a set G, consisting of all villagers shaved by the barber. It turns out, however, that the formulation of the test fails to tell us whether the barber himself belongs to set G. If he does, he doesn't; if he doesn't, he does. Since this is nonsense, the test is unusable, and we have no set. Again, consider the following definition: 'the smallest integer that cannot be named in fewer than twenty-two syllables'. This seems to define a certain integer, and therefore, indirectly, a class whose sole element is that integer. But the definition itself names the integer, and the definition consists of fewer than twenty-two syllables. (Anyway, we can invent a one-syllable name for any integer!) All logical paradoxes or antinomies are of this sort. They are frauds, worked not (usually) by unscrupulous men but by human language, which allows us to say false things as well as true things and to talk nonsense as well as sense. Mathematics will have none of this: whenever a fraud is discovered, it is cast out. However, there are membership tests so complicated that it may be extremely difficult to tell whether or not they are paradoxical—in fact it has been proved that there are cases in which decision is impossible. 2 1.3. Abstraction, Notation, Abbreviation. Typically, in mathematics, we do not care just what the definition of a particular set is, and so are willing merely to assume that it is defined. Or it might be better to say that we frequently do not care exactly what the members of the set are, as long as the set and its members meet certain explicit formal specifications. Thus, we may use a notation like l{a, b, c}', where ' a \ and 'c' are names with unspecified denotations. There are two easy misinterpretations of this last statement that must be carefully avoided. First: to say that a symbol is a name with an unspecified denotation is not to say that the symbol has no denotation. We just happen not to care what the denotation is. Second: one must not infer that the members of the set {a, b, c} are the symbols ia\ lb\ and 'c'. The members of the set are the (unknown) denotations of the three symbols. To fall into "The proof is that of Kurt Godel, and dates from 1929. For discussion and extension, see Davis 1958.



the second error is to assume that the denotations of the symbols are known, and that those denotations are exactly the lower-case italic letters 'a', '¿>', and 'c': that is, that the symbols denote themselves. This is obviously one of the possibilities, but there is no reason to assume that it is the correct one. There is still a third misinterpretation, rather more subtle than the two just described. This is to think that the symbols stand for 'imaginary mathematical objects' or something of this sort, and that if we happen to be using the mathematical symbolism to deal with the real physical world this is accomplished by pairing off these 'imaginary' objects with features of the real world. The linguist will recognize this as much like the traditional lay assumption that words stand for mental 'ideas' or 'concepts', which in turn are related to the real (physical) world. Now, of course, human brains may be so structured that there actually are things, events, or states inside our heads that can be the denotations of the words and symbols we manipulate in public view. If so, then they are parts of the real physical world, not of some other and more ethereal realm. This is a problem in psychology and physiology, not in mathematics or linguistics. Nothing is gained for mathematics by assuming that such entities exist; indeed, such an assumption is a flagrant violation of the principle of economy (Occam's razor). If such internal states indeed exist, then they become possible denotations of our mathematical symbols; but, just as before, we do not care. More general than a notation like l{a, b, c}' is one such as '{xl5 x2, ..., xm}\ Here, to start with, m must be interpreted as some nonnegative integer: that is, m may be 0, or 1, or 2, and so on. We use 'm' because we do not care just what nonnegative integer is chosen. If we cared, we could specify a constraint, such as 'for m ^ 4'. Having settled on a value for m, we next let i be some positive integer not greater than m. If we have chosen m = 0, then, of course, there is no possible value for i; otherwise there is. The notation tells us that, for any choice of a value for i allowed by the restrictions just described, xt is a member of the set. Beyond this specification, we may not have the vaguest idea what '*(' denotes, and may not care.



If we were to choose m = 4, then we could expand the condensed notation given above into '{x l5 x2, x3, x 4 }\ If we let m = 3000, then it would take a lot of time and paper to write out the complete notation for the set of 3000 elements. Nothing would be gained, since a completely unambiguous abbreviation is available: '{x 1; x2, ...,X3ooo}'- The nonterminal three dots in this notation mark the fact that it is an abbreviation. They are an etcetera symbol: they constitute an instruction to the reader that he can go on inventing names by the pattern that has been set, until he reaches the one overtly given after the three dots, and a guarantee to the reader that every name he invents in this way will be a name for an element of the set. With this instruction and guarantee, there is, of course, no reason why the reader should take the time and trouble actually to invent all the names, since the set and its elements can be talked about just as securely without doing so. We have here an example of the fundamental principle of mathematicizing, which can be expressed as follows: If you know exactly how to, you dont have to. It is this principle that differentiates between mathematics and mere computation. In computing, the goal is the answer. In mathematics, the goal is to demonstrate that the answer can be obtained—or that it cannot. In the somewhat more abstract notation x 2 , ..., x m }' with which we started, the meaning of the three dots is slightly different, since the reader cannot know when he must stop inventing names for elements until he knows what value is assigned to m. Indeed, if it turns out that m = 2, then no new names may be invented and the symbols 'x2' and 'xm' in the notation refer to the same element; if m = 1, one of the specific names given in the notation is disallowed; and, as already indicated, if m = 0, then no names are acceptable because there are no elements in the set. We shall use a somewhat more compact notation for listing the elements of a set, though it is not standard. I assert that a notation of the type displayed on the left is to mean the same thing as one of the type displayed on the right: m

{ X i , X z , ...,




Thus, the members of the set {Zy}4 are Z 1 ; Z 2 , Z 3 , and Z 4 (whatever they may be). Exactly the same convention will be used for an ordered set (a sequence): m

(xu x2> •••> Xm)-

Now, what sort of set would be denoted by '{x l5 x2, ...}' (or, in the more condensed equivalent, The etcetera symbol, which is here terminal in the notation, tells us to go on inventing names by the pattern established by those given, but there is nothing to tell us when to stop. That is just the point. We may go on as long as we like, and every name we invent will be legal. Whenever we choose to stop, there will remain elements in the set for which we have not yet provided names. Yet every element of the set does have a name, of the form 'xi' where i is some positive integer. For a set of this sort, the etcetera symbol is not just a convenience, but a necessity. However awkward or time-consuming, it would be theoretically possible to expand the notation x2, ..., xm}' into full form for m = 3000 or even for m = 3,000,000,000. But there is in principle no unabbreviated notation equivalent to ' x 2 , ... }'. We shall see (§1.7) that sets of this sort are called infinite. Whether infinite sets actually exist in the universe is an unsolved problem of physics. 3 But in mathematics we can speak quite consistently about such sets whether they exist or not, through the careful manipulation of terminal etcetera symbols; indeed, the mathematical discussion of infinity consists exactly of such manipulation. To remember this is to avoid the confusion and mysticism about the 'notion' of infinity into which many mathematical laymen (and some philosophers of mathematics) have fallen. 4 3 In the following way: (1) If the plenum is continuous, then any region, no matter how small, contains a nondenumerable infinity of point-events. (2) If the plenum is quantized, then: (a) if the perfect cosmological hypothesis is correct, then the largest set in the universe is denumerably infinite; (b) if only the cosmological hypothesis is correct, then there may be no infinite sets. For the latter half of this assertion, see Sciama 1961. 4 A readable and fairly good survey of speculations about infinity (and other mathematical notions), notable especially for the author's ability clearly to express older points of view with which he obviously does not agree, is Barker 1964.



1.4. Variables and Domains. Suppose that we neither know nor care anything about a particular element except that it is a member of a particular set. The symbol used for the element in this case is a variable, and the set is its domain. Let x be such that x e {Sapir, Beethoven, Einstein}. Then x was born in Germany, and x was a genius. We say things about every member of a set (whether what we say is true or not) by using a name that refers ambiguously to any member of the set. The symbol 'p' in a phonemic transcription of English is a variable whose domain is a certain set of allophones. When we describe English /p/ as bilabial, voiceless, and a stop, we are ascribing these properties to all allophones of the set. Let E be the class of all even integers, and O the class of all odd integers. Then for any xe E and y e E, xy e E; for any xe E and y e O, xy e E\ for any xe O and y e O, xy e O. If we make such assertions in ordinary words instead of special symbols, we are still using variables and referring to domains. If we say 'The product of any two even integers, or of any even integer and any odd integer, is even, while the product of any two odd integers is odd', the phrase 'any even integer' is a variable whose domain is the class of all even integers, and so on. The linguist notes an obvious kinship here: between 'variable' and any; between 'set' and all or every. There are examples of variables and domains in §1.3. In '{*!, x2, ..., xm}', and in the equivalent notation '{xt}m', the subscript 'rn is a variable: we said as much, and defined its domain, when we said that it is to be interpreted as some nonnegative integer. S i m i l a r l y , i s a variable, but we can know its exact domain only by choosing a value for m first—though we can certainly describe its domain as the set {1,2, ..., m}. Finally, 'xt' is a variable whose domain can be described as x2, ..., xm}.5 5 A common source of confusion for many learners of mathematics is a use of the terms 'variable' and 'constant' as though they were 'opposites'. In fact, most of the time a constant is simply a kind of variable: namely, one held fixed (perhaps within specified bounds, though otherwise its exact value may be unknown and unimportant) during a certain span of the mathematical discussion. Thus, y = mx + b is the equation of a line that will have slope m and will intersect the y-axis at b. In this, x and y are variables, while m and b are 'arbi-



1.5. Relations Among Sets. Consider two sets, A and B. Suppose that, for any x whatsoever, if x e A then also x e B. We then say that A is a subset of, or is included in, B\ the notation is 'A £ B' (or ' 5 2 A'). From the definition, for any set A, A £ A. Let A be American linguists and B be all Americans. Then obviously, Ac. B. If A £ B and B £ A , then every element of A is also an element of B, and vice versa, so that A — B. ' A ' and 'B' in this case are two names for the same set. Such synonymy is not introduced for stylistic variety. The definitions of 'A' and ' 5 ' might be quite diverse, and the proof that A = B might be very difficult. The null set A is the set that contains no elements at all. Let A be all twentieth-century linguists whose works were read by Rasmus Rask; then A = A. If, in {jq, x2, ..., xm}, we let m — 0, then we have the null set. Note that, from the definitions, for any set A, A £ A. (There are no elements in A, so that it is true for any x that if x e A, then also xe A.) If A is not null, if A £ B, and if it is not the case that B £ A (that is, if there are elements in B that are not elements of A), then A is a proper subset of B; the notation is 'A cz B\6 A man's sons are a subset of his children, as are his daughters, whether he has any children or not; but those two sets are both proper subsets of his children only if he actually has at least one son and one daughter. If A and B are any sets, then A n B, the intersection or meet of A and B, is the set of all elements x such that both xe A and xe B. A u B, the union or join of A and B, is the set of all elements x such that either x e A OT x e B but not necessarily both. Let A be all Americans, and B be all linguists. The intersection is all American linguists; the union includes the intersection and contains also all un-American linguists and all American non-linguists. trary constants'—that is, variables held fixed for any one line. But the view can change: if we unlock m and let it very, we get different lines intersecting the yaxis at b; if, instead, we stop holding b fixed, we get all those lines with slope m\ if we let both b and m very, we can get all the lines in the plane except those perpendicular to the x-axis. True constants (usually numbers, like 2 or 3.8971) enter mathematics mainly when one is concerned with computation, rather than with typical mathematicizing (see the end of the fourth paragraph of §1.2). 6 Note that, contrary to one common usage, we do not regard the null set as a proper subset of any set.



If A n B = A, then A and B are disjunct. If {Ax, A2, ..., An} is some finite collection of sets, and if Ai n A] = A for every i and j from 1 to n except just when i = j, then the set of sets is pairwise disjunct. Number the American universities that offer graduate work in linguistics, and let Ai be the set of students formally enrolled for such work at university i on a given date. Then the set of sets {Ai}n is (presumably) pairwise disjunct. A — B is the set of all elements x such that x e A but x e B. Let A be all Americans and B all linguists; then A — B is all American non-linguists. If A c B, then A — B = A: for instance, let A be all even integers and B be all integers. If A n B = A, then A — B = A: for example, let A be all undergraduate students at Yale, and let B be all women. 1.6. Associations, Functions, Correspondences. Suppose we have two sets, A and B. An association from A to B assigns, to each element of A, at least one element of B. Here are two associations: A = {alt

as, a3,

B = {&!, b2, b3, Dl.


A = {alt

a2, a3,

B — {blt

b2, b3, ¿>4, bs}



K \


In example D l , for instance, imagine that the a's are the young unmarried men of a small village and the b's the marriageable girls; an arrow from an a to a b means that a is courting b. We see that Miss bi is not very popular. In each example, A is the domain of the association and B is the range. In D l , we see that arrows lead to blt b2, and b3, but not to b4. The set {bx, b2, b3} is the image of A under the association (also called the image of the association itself). If we denote the association by m, then we express the fact that m is an association from A to B by 'm:A -> 5 ' ; and we write m(a^) = {b^ b2}, m(a4) = b3, m(A) = {bt, b2, b3}. The set of elements of A associated with a given element b of B is called the counterimage of b, symbolized as



m~\b): thus, for example, m_1(i>1) = a1; m_1(62) = =




Given an association m:A->B, one has also m^'.B-tA, although, as in example D l , this inverse of the association m is not necessarily itself an association. The definition requires that to each element of the domain there be assigned at least one element of the range. In example D l , under the inverse m - 1 , B is the domain and A the range; but to the element bi in the domain no element of the range is assigned by m - 1 . If the inverse m"1 of an association m is also an association, then the association m is said to be surjective (from which, of course, it follows that m _ 1 is also surjective). Another way to define surjectiveneSs is to say that an association m : A ^ B is surjective if its image is identical to its range. Thus (by either of the definitions, which are, of course, equivalent), the association of D l is not surjective, while that of D2 is. If A and B are both finite sets, then an association from A to B can be represented by a rectangular table with as many rows as there are elements in A and as many columns as there are elements in B. We mark the rows, on the left, by the names of the elements of A, and the columns, at the top, with the names of the elements of B. Then at each intersection of row and column we put some symbol, say a '1', if the association assigns the column element to the row element, and some other symbol, say a '0', if it does not. The association matrix M l is for example D l , and M2 is for D2:

ai a2 a3 a i

h 1 0 0 0

b2 1 1 1 0

h 0 0 l l


K 0 0 0 0

ax a2 a3 a,»

h l 0 0 0

b2 1 1 0 0

h 0 0 1 0

K 0 0 1 0

b 0 0 0 1


The matrix representation makes checking easy: one does not have a surjective association unless there is at least one 1 in each row and at least one 1 in each column. A function is an association f:A->B for which, given any element



a e A, its image f(a) is a unique element of B. The associations of D1 and D2 are not functions, since for some elements of the domain the image consists of more than one element of the range. D3 shows a function, with M3 as its matrix: we see that in the diagram one and only one arrow must lead from each element of the domain , and that in the matrix there must be exactly one 1 in each row: 61 1 0 «3 0 a4 0



a3, o 4 }

A = {au i B =

i x s b2, ¿>3,

b2 0 1 1 0

b3 0 0 0 0

b4 0 0 0 1



To exemplify D3, think again of the village, but with modified customs which forbid any young man to court more than one young woman at a time (although allowing a young woman to entertain more than one suitor). A function, like any association, may be surjective. The function of D3 and M3 is not; the following one is:

at A = {alt i

a2, a3, a4, aB} 4«/



B = {¿>!, b2, b3> ¿4} D4.

a3 aA

K 1 0 0 0 0

b2 0 1 1 0 0

b3 0 0 0 0 1

a. 0 0 0 1 0


Here we see that, in the diagram, at least one arrow must lead to each element of the range and that, in the matrix, there must be at least one 1 in each column (and exactly one 1 in each row). A function f:A->B is injective if/(a) = f(a') implies a = a'; that is, if no two elements of the domain have the same image. Neither D3 nor D4 is injective; the following is:



A = { 1 I b3, h,

I bit

B =

«1 a2



a3 a4

K b2 1 0 0 1 0 0

b3 64 b 0 0 0

0 0 0 1 0 0 0 0 0 1 0



We may think of a monogamous community in which all the adult males (the as) are married, but not necessarily all the adult females (the 6's). A surjective function (but not just any surjective association) is also called a surjection\ and an injective function is also called an injection. A function which is both a surjection and an injection is a bijective function or a bijection. Since D4 is surjective but not injective, it is not a bijection; since D 5 is injective but not surjective, it is also not a bijection. Here is a bijection: b1 b2 b3 64 A — {alt i B = {bly

a.j, a3, X

a4) i

b2, b3, ¿4} D6.

fli a2 a3 di

1 0


0 0




1 0







Think of a monogamous society at a moment when all adults are married. We see that: (1) in the diagram, each element of the range is the target of exactly one arrow and each element of the domain the source of exactly one; (2) in the matrix, each row and each column must contain exactly one 1; and (3) the domain and the range must have exactly the same number of elements. As to the inverses of various sorts of associations, we note the following: (1) The inverse of an association is not necessarily an association; (2) The inverse of a surjective association is a surjective association ;



(3) The inverse of a surjection (a surjective function) is a surjective association but not necessarily a function; (4) The inverse of a bijection is a bijection. Since a bijection f ' . A ^ B defines, as its inverse, a unique bijection f~x:B-+A, it often does not matter whether we think of passing from A to B or from B to A. Whenever it does not matter, the bijection (or its inverse) is called a one-to-one correspondence between the two sets.7 1.7. Cardinality. If a one-to-one correspondence (§1.6) can be established between two sets A and B, then A and B have the same cardinality. For many sets, called finite sets, the cardinality is simply the number of elements. To say that the cardinality of a set A is m, where m is some nonnegative integer, means that a one-to-one correspondence can be established between A and the set {1, 2, ..., m}. This is exactly what we mean when we say that there are m sheep in a flock or that we have m dollars in our bank account. Clearly, the cardinality of the null class is 0. A set whose cardinality is 1 is a unit class. Now suppose that A is a proper subset of B (§1.5), and yet that A and B have the same cardinality. For example, let A be all positive even integers and let B be all positive integers. It is clear that A is a proper subset of B, since A is not empty, B is not empty, and every element of A is also an element of B, while there are elements of B that are not in A—namely, the odd positive integers. The required one-to-one correspondence between A and B associates, with each positive even integer k, the positive integer or, working from B to A, associates with each positive integer m the positive even integer 2m. We may display the beginning of this correspondence as follows:

'Except for 'association', the terminology of this section is that which has now become standard: see Chevalley 1956, Hu 1964 ch. 1. Our sole use of 'function' corresponds to the older 'single-valued function'.



A = {2,









* = {1,









(We put heads at both ends of the arrows because it does not matter in which direction one goes.) We say, when a pair of sets A and B meet the conditions just described, that both are infinite sets. This is the formal definition of 'infinite'. It obviously agrees, however, with the informal remark in § 1.3 about infinity and terminal etcetera symbols: in our display, just above, of the correspondence between A and B we use terminal etcetera symbols for both sets. A set that is not infinite is finite. This is the formal definition of 'finite', but it agrees with the comment on finiteness made two paragraphs above. An infinite set that can be put into one-to-one correspondence with the set of all positive integers is denumerable. Thus, the set of all positive integers is itself denumerably infinite; so, as we have just seen, is the set of all positive even integers. The notation '{*!, jc2, ...}' (without any terminal name or '{x«}'(without any subscript after the closing brace) denotes a denumerably infinite set, since it tells us that if i is any positive integer, 'xf is a legal name for some element of the set. We can thus establish a one-to-one correspondence between the set and the set of all positive integers by associating xi with i for all i. An infinite set that is not denumerable is nondenumerable. 1.8. Systems. A mathematical (or formal) system involves one or more sets of various kinds of elements (to which special names may be given for mnemonic convenience), together with one or more relations or operations that tie the sets and elements together. Certain assertions will hold for any system of a certain type: some of these assertions are given as postulates, which define the type of system, while others follows from the postulates as theorems', but there is typically some choice as to just which assertions are selected as postulates and which are left to be theorems. Here is a very simple type of mathematical system. A system



S(K, is defined in terms of one set, K, and one binary relation, (1) Let x and y each be any element of K. Then it is to be always the case that either x sS y or y x, and both of these are to be the case if and only if x = y (that is, if 'x' and 'y' are names for the same element). (2) Now let x, y, and z be any elements of K. If x sS y and y ^ z, then it must also be the case that x z. Any system that meets the specifications just given is a simply (or linearly) ordered set. There are an endless number of systems of this type. For example, let K be the set of all positive integers, and let gS mean 'is less than or equal to'. The Jones boys of §1.1 form a system of this type if we interpret iS as 'is not younger than' (or, indeed, as 'is not older than' or as 'graduated from high school not later than' or in various other ways—but, of course, only one way at a time). In a general way, we can symbolize a binary relation by 'xRy': this means that x and y are elements of some set or sets with which we are concerned, and that the relation R holds between them. Vaguely, ' x R y ' is like a (declarative) sentence, in which R is like a finite transitive verb or verb phrase. Thus, ' x R y asserts something about x and y. If the relation does not hold—if the assertion is false—we can write ' ~{xRy)\ For ordinary numbers, as we have just seen, ^ is a binary relation: 2 si 3, 2 iS 2, but ~ ( 3 2). The symbol 'e' of §1.2 stands for a binary relation; in this particular case we adopted the convention of writing 'x e y instead of ' ~ (x e y)\ though it would not really matter. The symbols ' £ ' 2 ' , ' c ' , and '=j' of §1.5 represent binary relations: if A and B are sets, then either ^ £ fior c 5), and so on. An equally general notation for a binary operation is ' x O j = z': this means that if the ordered pair of elements (x, y) is subjected to the operation O, the result is the element z. If a relation can be vaguely compared to a finite verb, then an operation is rather like a preposition, prepositional phrase, or gerund: ' x Q y is not a sentence, but a subject requiring a verb ( ' = ' ) and a predicate complement ('z') to complete it. In everyday arithmetic, addition and multiplication are operations (basically binary, though by extension n-ary for n 2 because of a property we shall point out later): if x and y are



numbers, then x+y and xy.y are also numbers. The symbols ' n ' and ' u ' of §1.5 represent binary operations. That is, if A and B are any two sets, then A n B is also a set (possibly A), as is A u B. Neither a relation nor an operation need be binary. A relation can be n-ary for any value of n greater than 1; an operation can be n-ary for any value of n greater than 0. But if fewer or more than two elements are involved, notation of the form 'xRy and 'xOy — z is obviously impossible. Instead, we use functional notation, which, it will be noticed, has already been slyly slipped in: the general /i-ary relation can be symbolized as R(xu x2, ..., x„)

(n > 2)

and the general n-ary operation by

0(*i, *2,

xn) = y

(n ^ 1).

For example, let R be the ternary relation of betweenness: then J?(window, chair, door) just in case the chair is between the window and the door. Let O be the singulary operation (on numbers) of multiplying by — 1: then, for any number x, 0(x) — —x, and * + 0(x) = 0. Or let O be the ternary operation on numbers defined as follows: G(x, y, z) = xxyz. Then, for example, 0(1, 2, 3) — 8; 0(2, 1, 3) = 2; 0(4, 3, 2) = 36. Operations, relations, and sets are close kin. To show this, let us first note that an n-ary operation can alternatively be regarded as an (x+l)-ary relation. The general symbolization for a binary operation presented above involves three variable names for elements: 'x', 'y\ and 'z\ Now suppose we have a particular binary operation O- We can define a ternary relation Ra, by saying that this relation holds for a particular ordered triple of elements (x, y, z) just in case x Oy = z. Suppose the operation is ordinary arithmetical addition. Then, since 2 + 5 = 7, we assert that R+{2, 5, 7); similarly, R+(5, 2, 7), R+(3, 81, 84); but ~R+(2, 5, 8), since 2 + 5 ^ 8 . In a sense, all we have here is a change of notation; but that is just the point. Whether we speak of an 'operation' or of a 'relation' depends on notation and on attitude, rather than on the abstract mathematical nature of what we are dealing with. In general, given



an n-ary operation 0(*i, x2,..., xn) = y, we can define an equivalent (n+l)-ary relation R0(xu x2, ..., x„, >>)• Next, we note that an «-ary relation can be viewed as a set whose members are ordered n-ads of elements rather than single elements. A binary relation, in this view, is a set of ordered pairs of elements. Let R be the relation 'less than or equal to' for numbers. This relation holds for the ordered pairs (1,3), (2,3), (2,2), (1,2), (99,3000), and so on, but not, say, for (2,1). Or, since R is a class, we can say the same thing with class-membership notation: (1,3) e R, (2,3) e R, ..., (2,1) e R. We can say that a particular ordered n-ad belongs to or is a member of a particular relation (or operation) just as we say that an element belongs to a set. A function (or, indeed, any association) can always be reinterpreted as an operation and hence, indirectly, as a set. Suppose we consider the function of D4 (or M4) in §1.6. We have /(%) = bx, f(a2) = b2, fiflz) = b2, /(a 4 ) = ¿>4, and/(a 6 ) = K Not only i s / a function; it is also, with no change of notation, a singulary operation. Hence, by the procedure described just above, we can reinterpret it as a set of ordered pairs: (a±, bx) e / , (a2, b2) e / , (a3, b2) e / , (tf4, bi) e f and (as, b5) e / . By an argument that is approximately the reverse of the first part of this, one can show that an «-ary operation can alternatively be viewed as an association on n variables which may, under certain circumstances, be a function of n variables. Thus, clearly, instead of writing ' 3 + 2 = 5' and the like, we could use functional notation and write '+(3, 2) = 5'. It would now seem that our definition of 'mathematical system', given at the beginning of §1.8, is more complicated than necessary: instead of referring to one or more classes together with one or more relations or operations, we need only refer to one or more classes. But this is not quite true. There is one relation that resists the reduction. The relation in question is that denoted by 'e': the relation that holds between an element and a class to which the element belongs. Surely, we could rewrite 'a e A' in functional notation as 'e(a, A)'. Either notation means the same thing. But now try to take the next step. Note the parallelism:



x < y same as y) same as (x, y)e < ae A same as e(a, A) same as (a, A) e e. In trying to eliminate the relation e we find that we must use that very relation. Hence the elimination is impossible. The most we could say, then, is that any mathematical system is definable in terms of one or more sets of elements, the relation e, and (for reasons spelled out in §1.1) the notion of order. This is logically so; in practice, however, it is much more convenient, and more stimulating to the mathematical imagination, to use relations, operations, functions, and so on, reducing them to appropriate classes of ordered «-ads only under special circumstances. I wish now to add a point on which perhaps very few mathematicians would agree; obviously, therefore, it should not be taken too seriously. To me, a mathematical system in which the primary emphasis is on relations feels like a geometry, whereas when the major emphasis is on operations it is, instead, an algebra. Formally, this difference clearly amounts to no difference at all. But there is such a thing as the 'psychology' of mathematics (though I am not sure exactly what it is), and unless the difference between geometry and algebra resides here it ceases to have any reality at all. And mathematicians persistently continue to use both of these words, in ways that seem to fit my impressionistic definition. 8 1.9. Properties of Binary Relations and Operations. Relations and operations can be classed in terms of properties of a quite abstract sort. A binary relation R is reflexive if, for any element x, xRx. The relation is symmetric if, for any x and y such that xRy, then also yRx. It is transitive if, for any x, y, and z, xRy and yRz imply xRz. A relation that has all three of the properties just defined is an equivalence relation. Let A be a set over which an equivalence 8 A more elegant approach, which I believe is approximately equivalent, is t o say that geometry (as over against algebra) deals with spaces, and t o define a space as a set furnished (at least) with a topology, see Hu 1964 p. 16. This view is the most recent descendant of the brilliant suggestion of Felix Klein, Erlanger Programm (1872).



relation = is defined. Then A consists of pairwise disjunct (§1.5) subclasses {Bi}, such that x and y belong to the same subclass Bi if and only if x=y. The subclasses {Bi) are called equivalence classes. For example, let A be the set of all ordered pairs (m, n) of positive integers, and let (m^ n{) = (m2, m2) just if m1 + «j = m 2 + n2. Then one of the equivalence classes, say Bt, contains the ordered pairs (1, 5), (2, 4), (3, 3), (4, 2), and (5, 1), and no others; nor does any of these belong to any other equivalence class. This is so because 1 + 5 = 2 + 4 = 3 + 3 = 4 + 2 = 5 + 1 = 6, and there are no other ordered pairs, of positive integers whose sum is 6. Or let P be the set of all people in a certain village, and let xRy just if x and y are children of exactly the same two parents. Each equivalence class is then a set of full siblings. No equivalence class is empty, but it can be a unit class if a particular individual has no full brothers or sisters. A binary relation R is irreflexive if there is no x such that xRx. The relation 'is brother of' is irreflexive. The relation is nonreflexive if it is neither reflexive nor irreflexive. The relation 'is best friend of' is nonreflexive if we think that some people are their own best friends and some are not. A relation R is unsymmetric if, for any pair of elements x and y for which xRy, it is necessarily the case that ~ (yRx). 'Is father of' is unsymmetric. A relation R is antisymmetric if xRy and yRx can both be true just when x = y. The relation 'is less than or equal to', for numbers, is antisymmetric. A relation which is not symmetric, not unsymmetric, and not antisymmetric is nonsymmetric. 'Is best friend o f seems to be nonsymmetric as well as nonreflexive. A relation R is intransitive if, for any three elements x, y, and z such that xRy and yRz, it is necessarily the case that ~(xRz). The relation 'is father of' is intransitive. A relation that is neither transitive nor intransitive is nontransitive. Once again, 'is best friend o f seems to be an example. An irreflexive, unsymmetric, and transitive relation is a proper inequality relation. Such a relation holds among the members of any simply ordered set (§1.8), although it is not the relation used in the definition of that class of mathematical systems. Let S(K, 2, or anything of this kind. We might say that the grammar characterizes the form of the integers (in one representation) but is silent as to their meaning. The same could be the case of a grammar whose harp matches the sentences of some language; if the match were really close, the grammar would be abstractly interesting, yet perhaps of little value for linguistic purposes. (2) Although the harp H is infinite, the grammar that characterizes it is finite. All grammars are finite, whether the harps they characterize be finite or infinite. This point is rather obvious, if we remember that all manipulation of infinite sets (including even their definition) turns on the use of terminal etcetera symbols (§1.3). More specifically, however, we require that a harp, if it is infinite, be at least recursively enumerable (§1.11). This is really equivalent to saying that we do not recognize a set as a harp unless it can be characterized by a finite grammar. Perhaps there is such a thing as a non-recursively-enumerable subset of a free monoid (this depends partly on the definition of 'set'), and possibly such subsets resemble languages more closely than do the recursively enumerable ones. But the formal manipulation of such 'sets' is forbiddingly difficult. We are now ready for the definition cf a class of (formal) grammars whose properties, and applicability to languages, will concern us throughout §2.



2.1. Linear Generative Grammars.4 A linear (generative) grammar is a system G (A, I, T, R) characterized by the following definitions and by Postulates Pl-4: A is a finite alphabet of characters; I is a unique character of A; and T is a proper subset of A—{/} called the terminal subalphabet. The symbols of A—T are called nonterminal or auxiliary. R is a non-null finite set of rules {Rt}m- Each rule is a function whose domain is the free monoid F(A) over the alphabet A and whose image (§1.6) is some subset of F(A): if s is any string over A and R is any rule, then R(s) is a uniquely determined string over A. Postulate PI. For every rule R, R(0) = 0. A non-null string over the terminal subalphabet T is a terminal string. Postulate P2. If s is a terminal string and R is any rule, then R(s) = 0. Consider an arbitrary finite sequence of rules S = (Ri)n, « 1, where each Ri is some rule of R; for a given string s over A let R\(s), ..., sn = Rn(sn-i)- Then S, like any R, is a function whose domain is F(A) and whose image is a subset thereof, and sn = S(s). If for such a sequence there exists a non-null string s over A such that S(s) ^ 0, the sequence is a rule row. We say that s is (acceptable as) an instring for S and that S(s) is the outstring from S corresponding to s. (We also say that the 'application' of the rule row to the instring 'rewrites' it as the outstring.) From the definitions, every rule is itself a rule row provided there exists some string s which it 4

Here and throughout, the term 'linear' has not its customary mathematical connotation (as in 'linear transformation', 'linear algebra'), but refers merely to the simple one-dimensional ordering of the terms of a string. There are some terminological innovations in this section. 'Instring' and 'outstring' are used because I wish to reserve 'input' and 'output' for a sharply different use. I use 7 ' instead of'S' for the arbitrary point-of-departure auxiliary character for generation because there were too many other mnemonically convenient uses of the letter's' in various typefaces. For other axiomatizations of classes of grammars more or less similar to the class axiomatized here, see Chomsky and Miller 1963; Ginsburgh and Spanier 1963 and references cited; and Hays (forthcoming).



rewrites other than as 0; we may therefore speak of an instring to and the corresponding outstring from a single rule. Postulate P3. Given any rule row S and any string s over A acceptable as instring for S, then S(s) ^ s. A rule R such that R(I) ^ 0 is an initial rule; a rule row whose first rule is an initial rule is an initial rule row. If S is an initial rule row, then S(I) clearly depends only on choice of S. A (rule) chain C is an initial rule row whose outstring C(I) is a terminal string. Postulate P4. Every rule of R appears on at least one chain. P3 prevents infinite 'recycling': G will never rewrite a string as itself. P3 does not, be it noted, prevent a rule from occurring more than once in a rule row. From P4 and the definitions, it follows that G involves at least one chain; that every rule of R is a rule row of length one; and that G involves at least one initial rule. Let C(G) be the set of all rule chains of G. We say that a chain generates a terminal string. A linear grammar may be said to generate, or to cover, or to characterize, or to be a grammar of, the set of all terminal strings generated by its chains. Let us call this set of terminal strings 'H(G)'. Since this is some subset of the free monoid F(T) over the terminal subalphabet T, H(G) is a harp. P4 guarantees against 'useless' rules—rules not involved in the generation of terminal strings. It is clear from the definitions and postulates that a system G cannot generate a null H(G). However, the minimal system that meets the postulates is almost as trivial. In this minimal system, A contains just two characters, / and (say) t; T contains only the one character t\ and there is a single rule R such that R(I) = t. Thus, H(G) is a one-stringed harp, whose only element is the only string of length 1 over an alphabet of one character. Among the non-trivial cases, those of greatest interest are doubtless to be found among the systems that characterize an infinite H(G). A sufficient condition for this is as follows. Let G have at least one chain C = SiSSt, where: (1) Si is an initial rule row whose outstring is acceptable as instring for S; (2) S is a rule



row such that for any string s acceptable as instring, S(s) is also acceptable; and (3) St will accept as instring any outstring from any initial rule row that terminates with S, and will generate terminal strings as the corresponding outstrings. The existence of one chain so characterized guarantees the existence of a denumerable infinity of chains in G, and hence of a denumerable infinity of terminal strings in H(G). For, from the conditions, not only is SiSSt a chain, but so also are SiSSSt, SiSSSSt, and each of these generates a different terminal string. Although sufficient, the condition is a little stronger than necessary. It is enough that the 'general' chain of the form SiSSS...SS(,n), where n is the number of times the row S appears in the chain, end with a 'terminal' row S'"' which is a defined function of SJ15 and of n.5 A system G for which H(G) is infinite is open; otherwise it is closed. An example of an open system, even if trivial, may be helpful. Let A = {/, B, b}, and let T = {b}. Let x be any arbitrary string (possibly the null string) over A. Then let R contain three rules: KM


= = = = = =

Bb 0 Bxbb 0 X 0

if J = I, otherwise; if s = Bxb, otherwise; if s = Bx, otherwise.

As we see, the only string acceptable as instring for Hi is I. All sorts of strings would be acceptable for R2 and R3, but the only ones that can ever turn up within the system consist of a single B followed by some finite number of b's. To such a string, R2 adds a 5 In the mathematical symbolism of this monograph there will be little need for the use of superscript symbols to indicate powers (as in '22 = 4' or x" = y'). Hence the superscript position, like the subscript, is free for use by modifiers or indices, and a superscript symbol should be so interpreted unless the context makes it utterly obvious that it indicates a power. In the present passage of text, parentheses around the superscript serve as an added warning that it is not a power; but this practice will not be followed hereafter. Superscript numbers indicating footnotes are always so placed as to avoid ambiguity.



terminal b, while R3 erases the intial B. The rule chains of the system are RIR3, R1R2R3, RiR^i^ and so on with any finite number of occurrences of R2. The terminal strings of H(G) are b, bb, bbb, and so on—a string of any finite number of b's. Obviously, H(G) is infinite. This harp is a subset of the free monoid F(T), containing all the strings of that free monoid except just the null string 0. We can use this same trivial example to show how the notation usually used for 'rewrite rules' relates to the functional notation used above. For each of the three rules, we had to include the specification ' = 0 otherwise'. Since we are concerned only with non-null results, we can just agree that this specification is always to hold, and hence not repeat it for each rule. We can then recast the statements into the following form: I Bb; R2. Bxb Bxbb; Ra. Bx x. This is more compact than the earlier notation, but is by definition absolutely equivalent. In the sequel, the preceding example will be referred to as Simple Sample One. 2.2. Kinds of Rules. We must now ask whether the class|of harps characterizable by linear grammars contains any that are usefully similar to languages. The answer is neither a definite yes nor a definite no, partly because it depends on how close a match is judged useful. Suppose, however, that the answer were an unqualified yes: that we knew that a suitable linear grammar would yield a harp so similar to a language, say to English, that we would be tempted simply to call it a grammar of English. That such a grammar was linear would tell us very little. For, as illustrated at the end of §2.1, linear grammars also characterize harps that are very unlike languages. We should therefore want to know, in exact formal terms, the differences between linear grammars that yield languagelike harps and those that do not.



This is the problem to which generative grammarians have been devoting their attention for a number of years. Roughly, their program has been to seek the strongest constraints on a linear grammar (or on a grammar of some other formally defined class) within which it seems to generate a languagelike harp. 6 Our argument will take a different turn: we shall be more concerned with the shortcomings of linear grammars for linguistic purposes than with their excessive lack of specificity. First, however, we shall show that most of the kinds of rewrite rules that have been proposed by generative grammarians are acceptable within the framework of a linear grammar. (1) A context-free phrase-structure rule is usually stated in the form a -* b, where a and b are non-null strings over A and a ^ b \ it may also be required that b not be shorter than nor a permutation of a. A string s is acceptable as instring for the rule just if it can be deconcatenated into xay, where x and y are any strings over A; the corresponding outstring is then xby. This formulation, as it stands, does not guarantee that the outstring corresponding to any given instring will be unique; hence the 'rule' does not fit our postulates. For suppose that a occurs more than once in a string s. Let us say that it occurs twice. Then s can be deconcatenated into XyClXyCl Xg y where the strings Xi involve no occurrences of a. The 'rule' will yield, as outstring, either x1bx2ax3 or x1axibx3; the 'rule' is therefore an association but not, as required, a function (§1.6, §2.1).

Several adjustments are possible. One is to specify that a single "This is made clear in many places: for example, in Bach 1964, especially chs 1, 2; Postal, 1964a. This and similar aims of formal grammatical theory should help explain (to those of us who have been largely occupied with producing 'practical' descriptions intelligible to our colleagues) why so many items in the transformationalist literature seem to devote so much time and machinery to such 'trivial' bits of data. The aim is not just to subsume the data in any old fashion, but to subsume it within specified formal constraints that impose stringent requirements of explicitness and simplicity. The search for simplicity, we should recognize, is enormously difficult.



application of the rule a -» b is to rewrite all occurrences of a in the instring. Thus, if s = x1ax2a...axn, where the x% are a-free, then R(s) = x1bx2b...bxn. Such a rule is truly context-free, but it is hardly a 'phrase-structure' rule. If we relax the requirements given first above so that we may set b = 0, we have an erasure rule, for which we will later find good use. Any other adjustment of a context-free rule to make it fit the postulates of a linear grammar turns out to be a conversion from context-free to context-sensitive. Suppose we want the rule to rewrite only the first occurrence of a in the instring. Then a string s is acceptable if it can be deconcatenated into xay where y can be any string over A but x must be a-free. As soon as we specify required properties for either x or y in the deconcatenation of s into xay, we have what is by definition a context-sensitive rule. The one just described could be expressed in the currently most prevalent format as follows: a -* b in the environment x where x is a-free. Again, suppose we want a rule that will rewrite o a s i only if the a is initial in the instring. We introduce a supplementary symbol ' © ' , which is not a character of the alphabet A and also not the name of any string over that alphabet, but which appears in statements of environments with the meaning boundary of string, so that '© ' means 'initial in the string' and ' © ' means 'final in the string'. The desired rule is then of the form: a

b in the environment ©

The three rules of Simple Sample One (§2.1, end) can be reformulated : / Bb; R2. b -* bb in the environment Bx R3. 5 ^ 0 .


Here the first and third rules are formulated as context-free because we know it is safe to do so: no string will ever involve more



than one occurrence of J, so that there can be no ambiguity about Rt; no string will ever involve more than one occurrence of B, so that there can be no ambiguity about the erasure rule R a . When, in the environmental description for R2, we use 'jc' without any expressed restrictions, it is meant that x may be any string at all. (2) Many rewrite rules, as ordinarily formulated, are compact statements of whole (finite) sets of so-called 'minimal' rules; only the minimal rules are rules in our sense of the term. For example, a 'selection rule' a -» ¿!, b2, ... bn in the environment x


states n minimal rules, the z'th of which is a -* bi in the environment *


Other sorts of composite rules achieve compactness by describing many environments at once, using cover symbols of various kinds. The rules of a linear grammar are not inherently ordered: the three rules of Simple Sample One could be listed in any order whatsoever without in any way modifying the yield. However, economy of statement can often be achieved by ordering some subset of the rules of the system: the specification of environments for each rule can, by convention, ignore what has been specified for the preceding rules of the ordered subset, so that for the last of the set the simple assertion 'otherwise' or 'in all other environments' suffices. The conventions for abbreviating composite rules are not always spelled out as clearly as they should be in the literature, but it seems likely that any composite rule, or ordered set of rules, incapable of being expanded so as to fit our postulates also violates the intentions of the grammarian who has formulated it. The postulates for a linear grammar in no way preclude rulegrouping for compactness. But the fact that rule-grouping of some sort is possible is relevant information about a particular system; indeed, such differences may play a part in distinguishing formally between languagelike and unlanguagelike harps. For a fixed G, and a specific way of grouping minimal rules into composite ones, R will



consist entirely of pairwise disjunct subsets Rj, such that two minimal rules Rx and R2 are subsumed by the same composite rule if and only if they are members of the same subset. Two systems would obviously be strikingly different if, in one, the subsets Ri were unit classes while, in the other, some of them contained many rules. (3) A single-based transformation requires an operand with constituent structure, rather than merely a string, and yields a transform that also has constituent structure. We are accustomed to thinking of a string with constituent structure as represented by a tree, where the labels on the terminal nodes constitute the string and the rest of the tree marks the structure. Thus, the following tree

represents the string cfh with a particular phrase structure (reflecting a particular 'generative history'). Although a tree of this sort is the most vivid way of exhibiting constituent structure, it is well known that all the information conveyed by a tree can equally well be given by a bracketed string with labelled brackets. The bracketed string that corresponds exactly to the above tree is I(B(c)D(E(f)G(h))). If a character appears in the tree as the label of a terminal node (one of the nodes at the bottom), it appears here enclosed by an nnermost pair of brackets: that is, by a pair of brackets that does not enclose any other pair. If a character appears in the tree as the label of a non-terminal node, it appears here immediately before an opening bracket, and constitutes the labelling of that bracket and



of its paired closing bracket. It is not necessary to label the closing brackets separately since, given any legal sequence of opening and closing brackets, there is only one way for them to pair off.7 Now, if there is some way in which a bracketed string of this sort can be interpreted as a string over the alphabet A of a linear grammar G, then we should be able to accommodate single-based transformations within the grammar. This is indeed possible, if two requirements are met. The first requirement is very simple: the brackets '(' and ')' must be included as characters of the alphabet A, though not of the terminal subalphabet T. The earlier rules of a rule chain will introduce these two characters; hence some late rule or rules of any chain must erase them after they have served their purpose. The second requirement is really just as simple, but necessitates a formulation of 'phrase-structure' rules somewhat different from the most customary one. The customary way would set up such rules as these: / -> BD R2. B c Rs. D ^ EG R,. E -»/ Rs. G-+h.

(We may suppose, if we wish, that the system includes also rules that rewrite B, E, and G in other ways.) One then applies these rules in any possible order, writing the steps down as follows:

7 A sequence of opening and closing brackets (interrupted or not by other characters) is legal if (1) there are the same number of opening brackets and of closing brackets and (2) the nth closing bracket is preceded by at least n opening brackets. The pairing is then as follows. The first closing bracket must pair with the nearest preceding opening bracket. One can then delete this pair, and again have a legal sequence. This operation is repeated until all brackets have been paired. This is the only procedure that guarantees that no two pairs will define intersecting spans.



I D B C D G C E G c f H, Rs- c f

RV R2. R3. R4.

I D Rv B G B E R3. G R,. BF G R2. c f H, Rs- c f or in any of several other possible orders. From any of these displays, we can construct a tree: (a) connect each character in each row to the character in the row above it from which it derives; (b) along any resulting line from / to a character in the bottom row, erase all but one occurrence of any character that appears more than once. The result, of course, regardless of which display is used, is just the tree given earlier. It is then to this tree, rather than to any single string, that a transformation is applied. Suppose the transformation is this:





Here j is a 'transformational constant'—a terminal character introduced by the transformation itself. If we apply To to the tree already displayed, the terminal string cfh is converted to hcjf, with



the phrase structure specified to the right of the arrow in the statement of the transformation. Instead of the customary procedure just described, suppose we formulate the phrase-structure rules as follows (environmental specifications might be necessary, but are omitted here for simplicity as they were in the customary formulation): RI R'i. R's. Rl Rl

I I( BD) B B(c) D D(EG) E -* E ( f ) G G(h) .

The transformation becomes: U.




A string s is acceptable to To if it can be deconcatenated into I(BX1D(EX2GX3)), where each x% has the f o r m (yt), in which the brackets are paired and yt is any nonnull string over A. T'0(s) is then I(K(GxzBx1)L(M(i)Exi)). Finally, we need an erasure rule RE: a string s is acceptable to this rule just if it contains n o nonterminal character—other than a bracket—that is not immediately followed by an opening bracket; the rule erases, in a single application, all occurrences of all nonterminal characters in the string. With the rules so formulated, all we need have t o tell whether a given rule will accept a particular string is that string itself. We d o not need a separate overt record of the 'generative history' of the string, in the form of a tree, since everything relevant of t h a t generative history is included in the string. For example, consider the following intial rule-row and its yield: / Ri. I(BD) i?2. I(B(c)D) Rl I(B( c)D(EG)) Ri I(B(c)D(E(f)G)) Ri I(B(c)D(E(f)G(h)))




To the last string of those shown, we may apply either (a) RE, to obtain the terminal string cfh, or else (b) first 7o, yielding I(K(G (h) B(c)) L(M(j) E(f))), and then RE, to obtain the terminal string hcjf. For a reason to be discussed later, a further adjustment is desirable. The rules R'o through R'r, can be applied in a number of different orders to generate exactly the same string I(B(c)D(E(f) G(h))). Rule R'i has to come first; but after that there are the following possible orderings, all equivalent : 2 3 4 5 ; 2 3 5 4 ; 3 2 4 5 ; 3 2 5 4; 3 4 2 5; 3 4 5 2; 3 5 2 4; 3 5 4 2. We wish to eliminate this irrelevant freedom of ordering, so that if a particular initial rule row generates a certain string, no permutation of the rules of that rule row will generate the same string. This aim can be achieved by the imposition of appropriate context-sensitivity, probably in a number of different ways. For example, let us define a bracketless label as a nonterminal character (other than a bracket) occurring not immediately followed by an opening bracket; then let us say that a phrase-structure rule (any of the set Ri through Rs in our example) will accept a string only if the operand of the rule, in that string, is not preceded by any bracketless labels. Thus the string I(BD) is acceptable to Rz, since B is not preceded by a bracketless label, but not to Rsince D is preceded by the bracketless label B. With this constraint, the only valid ordering of the five rules is 1 2 3 4 5, as displayed above: the 'expansion', so to speak, of nonterminal characters proceeds from left to right. The system, with seven rules that we know of (it may have others also), generates two terminal strings, with just one rule chain for each terminal string: rule chain R1R2R3R4R5 RE R1R2R3R4R5T0RE

terminal string cfh hcjf.

Hereafter we shall refer to the above example as Simple Sample Two. (4) We come now to two sorts of rules proposed by generative grammarians that cannot be adapted to fit the postulates of linear grammars.



A context-sensitive rule of the sort proposed by Halle :8 a -* b in the environment x

y 2

neither accepts nor yields strings, nor is the environment proposed in the rule specifiable as a string or ordered set of strings. A double-based (embedding or conjoining) transformation requires as operand not a string but an ordered pair of strings. Since rules of these two sorts seem to be linguistically useful, the inability of a linear grammar to accommodate them constitutes a defect. The implications of this will concern us in later sections. 2.3. Source and Transducer; Effective Rule Chains. In the face of the doubts just expressed, and more general ones stated earlier, for the balance of §2 we shall make a wildly improbable assumption: not only that there exists a linear grammar G whose harp H(G) is indistinguishable from some language (for convenience, English), but that we have the grammar at hand in full detail. With such a grammar, we could program a computing machine to behave in the following way: for each successive value of k, beginning with k = 1, it is to form all rule chains of length k, and is to print out each rule chain together with the terminal string generated thereby. The rule chain, of course, is in effect what Chomsky calls a structural description of the terminal string. 9 For sufficiently large values of k, the computer will run out of capacity, but we can imagine adding more capacity as needed, expense being no consideration in a Gedankenexperiment. Since G is (to all intents and purposes) a grammar for English, every terminal string printed out by the machine will be an English sentence. Furthermore, given any English sentence, no matter how long or complicated, we shall have to wait only a finite length of time until the "Halle 1962. "Of course, this is not what Chomsky says a structural description is. But we are here engaged in the very process of showing how 'structural description' and 'input to a generative grammar' can be identified. So far as I know, the proposal is new, or at least independent; but see Postal 1964b fn 5 for possible parallelism of thinking.



computer prints out that very sentence. Programmed in accordance with G, the computer will produce only, but any of, the sentences of English. It is easy to see that the computer operation just described accords with our requirement that a grammar characterize a recursively enumerable harp; indeed, what the computer is doing is exactly to recursively enumerate the harp. If the harp is not also open, then we might be surprised by some of the terminal strings that turn up, which we would not have known were English until they were produced by the grammar that we have accepted, by definition, to be correct. At the moment, this possibility need not concern us Suppose, now, that we do not want merely to wait around until the computer grinds out a particular sentence. Suppose we want the computer to produce a particular sentence now. To make this realistic, pretend that we plan to use our computer as the second of a pair whose joint task will be to translate from Russian to English. A Russian sentence is fed into the first computer. It is there digested and analyzed (I have no idea how), and a series of impulses is transmitted to the second computer. These impulses are input to the second computer: they are supposed to govern its functioning in such a way that it will produce, not just any English sentence, but a sentence that corresponds to the Russian original. It is clear that this requires a very different sort of program from the one described above. That was a program without input: it made the computer function as a source. Now we want a program that will convert input into output in accordance with G; we want to make the computer function as a transducer. What is there about a generative grammar that could be regarded as input for a computer programmed to function as a transducer? Obviously not the vacuous initial character I, with which the generation of every terminal string begins. Some set of nonterminal strings? No; there is only one possible answer. If we select a particular rule chain, the grammar, and hence the computer, will generate exactly the terminal string determined by that rule chain. We have said that a linear grammar G characterizes, or is a grammar of, its harp H(G). What has not been noticed is that a



linear generative grammar characterizes not just one harp, but two. One is H(G), a subset of the free monoid F(T) over the terminal subalphabet T; this is the one that everyone has always talked about, the one that, in linguistic application, we seek to have match the sentences of a language in a usefully close way. But consider also the set R of rules. There is nothing to keep us from interpreting this as a finite alphabet of characters. The free monoid over this alphabet is F(R). The set C(G) of all rule chains of G is a subset of F(R), so that C(G) is also a harp. Thus, the two harps characterized by G are C(G) and H(G). In addition to characterizing these two harps, G also specifies a surjective function g:C(G)->H(G), since, by definition, any rule chain of G generates exactly one terminal string. In §2.1 we designated the terminal string generated by a rule chain C by C(/). This is perfectly valid, but since it is C rather than / that can very to yield different terminal strings, it seems a bit peculiar. We remove the peculiarity by defining a function g: for any rule chain C, g(C) = C(7). The function g is not necessarily injective, since there is nothing in the postulates to preclude the case in which g(C]) = g(C2) even if C1 ^ C2. Thus, G has a potential built-in asymmetry. Empirically, for languages, this asymmetry might allow great inefficiency. Surely we wish so to tailor G as to eliminate superfluous instances of multiple generation of a single terminal string; this is why, towards the end of §2.2 (3), we proposed a certain sort of context-sensitivity for the phrase-structure rules of Simple Sample Two. But some instances are not superfluous. If 'The sons raise meat' and 'The sun's rays meet' are a single terminal string in spoken English because they are phonologically identical, we still want at least two rule chains for the one terminal string. The same is true of 'Flying planes can be dangerous' or of 'How do you find the brakes in this car?' Let us return to our computing machine. We can imagine, for simplicity's sake, a long row of switches on a control console, each of which except the first has as many settings as there are rules in G, plus one 'out-of-line' setting. (Putting perhaps hundreds of thousands of settings on a single switch is a purely technical difficulty.)



The first switch has, in addition to the out-of-line position, only as many settings as there are initial rules in G. By turning the first k of these switches to the positions that correspond to the rules of a rule chain of length k, and the rest to the out-of-line position, we control the internal circuitry in such a way that the machine will write out the terminal string determined by that rule chain. Everything else about G can be viewed as built into the permanent wiring. The input to our transducer from a Russian analyzer is then simply a series of switch-settings.10 Of course, the machine can have only a finite capacity—the row of switches cannot be infinitely long. But this limitation is not imposed by G: the grammar allows us to make the finite capacity of the machine as great as we wish. Despite the enormous triviality of Simple Sample Two, it may be useful to concretize what has just been said in terms thereof. The first switch would need only one on-line setting, for R[, the only initial rule in the system. The remaining switches would need seven on-line settings. The only two rule chains we know of for this system are the two listed at the end of §2.2 (3), one of length six, the other of length seven. We can turn over even more of the total task to the internal wiring. We could so arrange matters that any legal combination of settings for the first i switches would activate interlocks, preventing any but a legal setting for the (7+l)st switch. This is possible because, although in selecting input we can choose among rule chains, the choice of individual rules for a chain is not independent. Indeed, G specifies exactly what choices are compatible. The appearance of a particular rule at a particular position may be obligatory, in the sense that if certain other rules and their positions in the chain have been decided, there is no longer any option at the position in question. Whether a particular rule at a particular position in a particular 10 Although I disavow any concern here with the problem of mechanical (or other) translation, one should note an agreement between the machine example given in the text and Eugene Pendergraft's proposal (made, as far as I know, only orally) that one should attempt to translate not a Russian sentence into an English sentence, but a description of the former into a description of the latter.



chain is obligatory or not does not depend, however, entirely on the structure of the grammar G. If we think of choosing rules for a chain—setting switches on the computer—from left to right beginning with an initial rule for the first switch, then a particular rule R will be obligatory at position i just if we find that the ith switch cannot be turned to any other setting. But suppose, for a rule chain of length n, that we were to begin at the «th switch and work backwards, again with appropriate interlocks, and intending to set up exactly the same rule chain. We might find that the rth switch (from the left-hand end) could be turned to any of various settings; further, by setting it for rule R we might find that some earlier switch was locked in a particular setting. Or, indeed, we might begin with the rth switch itself, activating interlocks in both directions; in this case, clearly, we should not find rule R at position i obligatory. To get around this indeterminacy, I shall arbitrarily choose to define a rule R as locally obligatory (at a particular position in a particular rule chain C) if it is obligatory under the convention of left-to-right selection. Now consider any rule chain C. Construct a sequence of rules C' by deleting from C all locally obligatory rules. Clearly, the set of all sequences of rules constructed in this way stands in one-to-one correspondence with the set of all rule chains. A sequence of rules C' is not a rule chain, under our definition, except just when it corresponds to a C that contains no locally obligatory rules so that C' = C. However, we shall call these sequences C' effective rule chains. By properly adjusting the internal circuitry, we can adapt our computer so that it will accept effective rule chains, rather than rule chains, as inputs. Without the adjustment, if the first locally obligatory rule of a chain is the rth (and the next one is not also locally obligatory), we find when we reach the rth switch that it has already been set for us. With the adjustment, setting the first i—1 switches provides internally for the workings not just of the first i—1 rules but also for the rth; the rth switch is therefore free to be used for the (i + l)st rule of the rule chain, which is only the rth rule of the effective rule chain. The difference, of course, can be



described in similar terms if the rule chain involves any row of rules all of which are locally obligatory. Thus, the input harp for our transducer need not be the set C(G) of all rule chains of G. Instead, it can be the set E(G) of all effective rule chains of G. This set, like C(G), is a subset of the free monoid F(R) over the set of rules R ; hence we are correct in calling it a harp. In what follows, we shall assume this adjustment. In order to illustrate with Simple Sample One, it is best to assume that R'2 is only one of two or more rules that expand B, that R'i is only one of two or more for E, and that R's is only one of two or more for G. Then we have : rule chain

effective rule chain

terminal string





R2R4 R§Tq


2.4. Generation and Duality. Our hypothesis (the improbable one set forth at the beginning of §2.3) guarantees that, for the grammar G currently under consideration, H(G) is indistinguishable from the set of all English sentences. But what is E(G)? What is there about E(G), as over against H(G), that might lead us (for example) to want to try to translate from Russian to English indirectly via E(G) rather than directly? Formally, all we can say about E(G) is that it is the input harp for the grammar G. Empirically, however, we can say what we should like E(G) to be. There are two points. (1) English sentences contain all manner of trivial irregularities that must be provided for in any grammar but that seem very superficial. It certainly seems accidental rather than essential that, although the plural of 'boy' is 'boys', that of 'man' is 'men'. It would be pleasing if our grammar G of English could take care of as many as possible of these trivial irregularities by locally obligatory rules, which are thus not in the effective rule chains that constitute input. (2) It would be pleasing if G were to provide two or more effective rule chains for a single terminal string just in those cases in which the terminal string is ambiguous.



If G can have the properties just described, then E(G) will be largely free of the superficial irregularities and the ambiguities of H(G). By virtue of this, E(G) ought to differ from H(G) in that there are fewer limitations on the co-occurrence of rules in effective rule chains than there are of terminal characters in terminal strings; hence, also, the structure of E(G) should correlate more directly with meaning than does that of H(G). Inspect the following two pairs of strings of characters: pair one ABCDZEFGZHIJJ HIJJZGFEZEKKDZHLZABCD


Everyone would agree, I believe, that the strings of the first pair are more different from each other than are those of the second pair. Now, the strings of this display are equivalent, under a simple substitution code, to the everyday English sentences 'John saw Bill' (the first of each pair), 'Bill was seen by John' (the second of the first pair), and 'John saw Mary'. The coding is merely a device by which we can inspect the sentences without regard to our practical control of English, which is hard to set aside when we see or hear English in uncoded form. In terms of the underlying effective rule chains of a grammar G of English, the situation is different. We should like the first sentence of the first pair to differ from the second of that pair only by the exclusion versus the inclusion of a single rule, and the two sentences of the second pair to differ only by the inclusion of one rule rather than another. This surely agrees both with our feeling for English pattern and our awareness of meanings. To put this in another way, if G is a good grammar, then E(G) highlights and pinpoints the choices open to a speaker of the language. Now the transmission of information in any communicative system, and hence the differentiation of meanings, is totally dependent on choices made by the transmitter and not known in advance to the receiver. A speaker of English has virtually no option about pluralizing ' m a n ' : that he pluralizes it as 'men' rather



than in some other way thus conveys virtually no information. But he can choose between 'Bill' and 'Mary'; he can choose between active and passive; he can choose between present and past tense. Any such choice, relative to a system G, is the choice of a rule. This is what is meant by saying that E(G) correlates more directly with meaning than does H(G). This is why we prefer E(G), the set of all effective rule chains, to C(G), the set of all rule chains: E(G) maximizes the independence of choice of rules for an input. Thus, if G has the properties we want it to have, it is the choice of rules that differentiates meanings. We are thus led to discover, between generative grammar and certain older views, a much closer kinship than has been discerned heretofore. Let us not modify our definition of linear grammars in any way, but, for a moment, let us replace two technical terms. Instead of 'terminal character', let us say 'phoneme', and instead of 'rule' let us day 'morpheme'. We then have, in effect, the twostratum model for a natural spoken language, incorporating the design feature of duality of patterning, as proposed variously by Saussure, Hjelmslev, and many others, including me. 11 H(G) becomes the set of all strings of phonemes; E(G) becomes the set of all strings of morphemes. The function g maps morpheme strings into phoneme strings. We could say, using another traditional term, that a phoneme string realizes a morpheme string—sometimes, indeed, ambiguously. Our program in subsequent chapters is to investigate the shortcomings of linear grammars for the characterization of languages and to try to define types of grammars that are free of these shortcomings. We shall not again use the terms 'morpheme' and 'phoneme' in the way they were used in the preceding paragraph, "Saussure 1916; Hjelmslev 1943; Martinet 1949; Hockett 1961a; also, of course, the work of many other scholars influenced by Saussure. However, this view must be sharply distinguished from the 'one-stratum' view expressed by Bloomfield (e.g., 1933), and adhered to by most so-called 'structural' linguists in this country in the 1930's, 1940's, and 1950's, whereby a morpheme is taken to consist of (an arrangement o f ) phonemes. The use of the term 'morpheme' in the transformationalist literature is essentially this latter use, and must not be permitted to confuse understanding of what I am asserting in the text.



since both words have by now been assigned such a large variety of conflicting meanings in the literature that they are worn out— much like old slang, and for a similar reason. But the point of the preceding paragraph will remain with us. Let us put this in another way, since the point is worth emphasizing. We said in §2.0 that the task of algebraic grammar, narrowly defined, is to establish a significant and, if possible, complete formal classification of all harps. If this is the orientation we bring to linguistics, we are led to define a language purely as a infinite collection of sentences. A language and a grammar for it then have only a single interface (§1.12): the grammar is a model of the language, and the language an exemplification of the grammar, just if the terminal strings of the grammar match in a sufficiently close way the sentences of the language. This is obviously not enough. It establishes a necessary, but by no means sufficient, condition for the acceptability of a grammar for any particular language. Even in their most recent publications, the transformationalists stick to the narrow definition of 'language' given above. 12 They are not trapped by it—they clearly realize that the condition for acceptability implied by it is only minimal. Even so, this use of the term 'language' seems misleading. Since we have the technical term 'harp' for a subset of a free monoid (essentially the transformationalist definition of 'language'), we can afford to use the term 'language' in a more natural and traditional way. Let us view a language, then, as a system manifested in the behavior of human beings, a system that itself has two interfaces with the universe as a whole. The interfaces are those suggested by the traditional pair of terms 'sound' and 'meaning'—or by various fancier equivalents, such as Hjelmslev's 'expression' and 'content'. At or near the sound interface, we find a language manifested by the potentially infinite collection of sentences that might be our sole concern if we were thinking purely as algebraic 12 See, for example, Chomsky and Miller 1963 p 283. It is this terminological identification of 'language' with 'output harp' that necessitates Chomsky's distinction between weak equivalence and strong equivalence of grammars; we here handle the matter in a different way, but will return to 'weak equivalence' in §7.



grammarians. Empirical observations can be made of a language at this interface: we can observe and record what people say. Empirical observations can also be made at the other interface: we we can observe the conditions under which people say various things, we can observe their responses to what others say, and— probably most important—we can ask them what things mean. But the two interfaces are the only places where observations are possible. The machinery that links them can only be inferred. The two interfaces between a language and the rest of the universe are also the only places where a language and a formal grammar can meet. Instead of requiring a match at only one of these interfaces, let us require a match at both. This gives us a necessary and sufficient condition for the empirical acceptability of a grammar. A formal grammar for a language thus becomes a black box hypothesis about the machinery that links the two interfaces within the language. No matter how close a match we obtain, we must never think of the details of our formal system as anything more than the roughest sort of analog for what goes on in the central nervous system of a speaker or hearer—control flow charts are not wiring diagrams. But, since the actual machinery within the language (or within the speaker or hearer) is not directly observable, a black box hypothesis is the best we can hope for.



3.0. The First Inadequacy of Linear Grammars. In §2.2 (4) we gave two reasons why linear grammars are not as useful for linguistic applications as we should like. The first reason, which has to do with spoken languages, will concern us in this chapter. Classical phonological theory rested on several assumptions, held with varying degrees of steadfastness by different investigators. They may be itemized as follows: (1) Quantization. Two distinct sentences of a language cannot be indefinitely similar. They must differ at least by the presence of a particular element at a particular position in one versus its absence, or the presence of a different element, at the same position in the other. If they do not differ at least in this minimal way, then they are not two sentences but one. (2) Audibility. The difference between any two sentences of a language is audible to a native speaker in the absence of channel noise, even if there is no defining context. (3) Finiteness. Every sentence of a language consists of an arrangement of a finite number of occurrences of elements, and the total stock of elements for all sentences is finite. (4) Patterning. Some arrangements of elements occur more frequently than others, and some combinatorially possible arrangements do not occur at all. (5) Constancy. Recurrences of the same form (e.g. of the same 'morpheme') are recurrences of the same (phonological) elements in the same arrangement. (6) Linearity. The arrangements in which elements occur in sentences are exclusively linear.



Although the empirical evidence for some of these assumptions is vast and varied, when any language is observed in detail it becomes obvious that two of them are incompatible and that one is superfluous. The two mutually incompatible assumptions are those of audibility and of constancy. 1 'Wives' rhymes with 'hives', and 'wife' with 'fife', but not 'fife' and 'hive' and not 'fifes' and 'hives'. We can insist on audibility, in which case 'wife' and 'fife' are phonologically alike at the end, and similarly 'wives' and 'hives', but this means that the form 'wife' appears in two phonologically distinct shapes, violating constancy. Or we can insist on constancy, in which case 'wife' and 'fife' do not end the same way phonologically even though they are an exact rhyme, thus violating audibility. But we cannot maintain both principles at once. This was realized many decades ago, and much of the history of phonological theory has been that of a search for a motivated compromise. But there is no single non-arbitrary compromise. Instead, there are two. By forgetting about audibility, we arrive at a representation of the sentences of a language in terms of elements for which various terms have been used: I shall use Lamb's convenient term morphon} By forgetting about constancy and insisting on audibility, we arrive at what has been called a phonemic representation of sentences. It is this that we must discuss further here. 1

The argument is given in extenso in Hockett 1961a and need not be repeated here. The current transformationalist view (e.g., Chomsky and Miller 1963, esp §6) virtually discards audibility for any purpose, by ignoring the bulk of the evidence of rhyme, assonance, patterning in slips of the tongue, and the like, all of which surely reflect language habits. Formally, however, the transformationalist view and that developed here are very similar. The disagreement is about the empirical facts. T h e transformationalists use phoneme; the traditional term has been morphophoneme, of which 'morphon' can be viewed as a sort of abbreviation. Similarly, the transformationalists call the 'component' G" (see below) of a generative grammar the phonology, while Lamb and I use the traditional term morphophonemics. Of course the choice of terms does not matter, and in linguistics it would be an idle dream to hope for terminological agreement; it is just unfortunate that each of the terms just mentioned is also used in various other ways, by various other 'schools' or 'trends' within our field. For Lamb's terminology, see Lamb 1964a, 1964b.



In this context (that of 'phonemic' representation), the linearity assumption does not stand up under scrutiny. It is suggested by the linearity of most writing systems, and by the obvious possibility of devising an unambiguous linear notation for any phonological system. But these are extrinsic considerations. The principle is quite regularly ignored when dealing with accentual and intonational phenomena. Apart from these, if one insists on the assumption then such a pair as English 'pit' and 'bit' must be taken as evidence for minimal elements /p/ and /b/. Bloomfield, among others, was willing to accept the consequences of this insistence, asserting that the phonetic similarity of English /p/ and /b/ has no linguistic significance. 3 The Prague School disagreed, and undertook systematic dissection of such linear 'phonemes'; yet even for the Prague School linear elements stayed in the forefront, their decomposition being somehow secondary. 4 The linearity assumption, with its orthographic overtones, has created pseudo-problems. For example, the argument as to whether the syllable nuclei of English 'beat', 'bait', 'boot', 'boat', and the like are single phonemes or clusters becomes meaningless if there are no such things as phonemes in one's frame of reference. Obviously, problems of assigning allophones to phonemes vanish in just the same way. I do not mean that all phonological problems are automatically resolved upon the abandonment of linearity. But those that remain can be attacked more directly and realistically. I therefore propose, as have various investigators in recent years, that we abandon the linearity assumption altogether. The pair 'pit' and 'bit' attests to a minimal difference, true enough, but that difference is between voicelessness and voicing, not between /p/ and ¡b/. We shall follow Lamb in calling the terms of such minimal differences phonons.5 Every sentence of a language, then, consists of an arrangement of (occurrences of) phonons, of which the lan3

Bloomfield 1926. See Travaux du Cercle Linguistique de Prague, vols. 1-10 (1929-39), passim; now also Vachek 1964. 5 See the references in fn. 2. In §4.5 it will become clear why I do not at this point simply use the term 'distinctive features', even though, ideally, phonons and distinctive features should be identical. 4



guage has a finite stock; but the arrangements are not exclusively linear. Two phonons may occur in succession, or they may occur simultaneously. For instance, 'pit' begins with a simultaneous bundle of three phonons: voicelessness, labiality, and stopness. We now see why a linear grammar, as defined by the postulates of §2.1, cannot be a satisfactory grammar of a spoken language. Such a grammar generates terminal strings in which, by definition, simultaneity is impossible. What we must have for a spoken language is a kind of grammar that may be very similar to a linear one, but that yields outputs of some sort in which simultaneity is possible. Let us suppose that a satisfactory grammar G for some spoken language can take the form of a coupled pair of partial grammars G'G". Our assumption about G' will be that it generates terminal strings; in other respects it may or may not conform to our postulates for a linear grammar, though in this chapter we shall speak as though it did. Our assumption about G" will be that it generates arrays of the sort that are empirically required; more on this in a moment. Our assumption about the coupling of G' and G" is that the terminal strings from G' govern the operation of G". Underlying the whole supposition is the conjecture that, even if sentences are not strings in the simple sense of §1, much of the internal workings of G can be regarded as linear. This means that G" is being assigned a very special responsibility : that of delinearizing material that comes to it, from G', in strictly linear form. This approach sets before us the following tasks: (1) The exact mathematical formulation of the possible structures of outputs from G". (2) The analysis of the possible ways of coupling G' and G", together with possible formats for G". (3) The determination of the optimum location of the 'break' between G' and G". 3.1. Stepmatrices. The mathematical formalism we need does not exist; it has to be invented. It turns out to be extremely simple, which is probably why no one has bothered with it before.



Let Q be a finite alphabet, called a componential alphabet, of characters {Ki}g, where (1) each Ki is an unordered non-null set of distinct components e ; and (2) the set E of all components is finite. From these definitions, it follows that K1 = if and only if they contain exactly the same components. Since Q is finite, the number m of components in any character Kt does not exceed, for any i, some finite integer max(n); that is, for all i, \ < m ^ max(w). For the case in which max(w) = 1, the K's are unit classes of e's, one Z f o r each e; consequently, the free monoid F(Q) is isomorphic to the free monoid F(E) and one has, in effect, not a componential alphabet but an ordinary or simple one. To preclude this trivial case, we specify, for any Q, that (3) max(n) > 1. We shall allow ourselves to speak of a character as a simultaneous bundle of components although, formally, the system we are developing has nothing to do with time so that the term 'simultaneous' is out of place. Despite this usage, the mathematics of individual characters is just finite set theory. Thus, we define a subcharacter K to be any subset of some character K, including K itself and the null subcharacter 0.8 It will be convenient to display the components of a subcharacter (or character) in column rather than row form; since the symbols in a single column represent members of a set, and the vertical alignment will not be used for anything else, we can omit the braces that usually enclose the names of members of an unordered set, but I use enclosing vertical lines in a way that will be clear in a moment: "At this point we have two symbols, the inverted 'V' and the one introduced here, both of which denote simply the null set. Strictly speaking this is redundant; but it is useful because of what might be called connotations: we use the new symbol instead of the old one just when we are dealing with subcharacters of a componential alphabet. In the ordinary mathematical representation of a term of a matrix, it is customary for the first subscript to denote the row, the second the column. But in a stepmatrix there are no rows', the first subscript therefore represents the column, the second the (arbitrary but fixed) position within the column.




ejrij A stepmatrix is a string over a componential alphabet Q. The term hints at two-dimensionality; yet a stepmatrix is no more a matrix than a stepmother is a mother. For example, suppose A =

a b and B = c d

a d c b

If A and B are matrices, then A ^ B except just in case b = d; but if they are stepmatrices they are not only equal but identical. Also, a df b e c is no matrix, but it might be a stepmatrix. Every stepmatrix over Q, then, is of the form s =



emi Srni

where m ^ 0 (to allow for the null stepmatrix 0 if m = 0), nj 2s 1 for j — 1, 2, ..., m, and the values of rij for different j are independent. In an equivalent linear notation, this stepmatrix is 5=


Mm !

where each Kj is some of Q. A stepmatricial harp is any subset of the free monoid F(Q) over



a componential alphabet Q. A harp defined in terms of an ordinary (noncomponential) alphabet will henceforth be called a linear harp; the term 'harp', without modifier, will be used ambiguously for either kind. A stepstring is a string of subcharacters; hence every stepmatrix is a stepstring, but not vice versa. Let t = \Kj\v and t' = \K'j\p be two stepstrings of the same length p such that, for all j, Kj n K) = 0. Then the join of t and t' is by definition the stepstring tut' whose y'th term is Kj u Kj. A stepstring t belongs to (or is in) a stepmatricial harp just in case there is some stepstring t' such that t u t' is a stepmatrix of the harp. If t is itself a stepmatrix of the harp, of length p, then the requirement is met by t' = ]0|p (the stepstring of length p all of whose terms are the null subcharacter). It is evident that a stepmatrix s can in general be decomposed in either of two ways. (1) If s is of length at least 2, then we may deconcatenate it by a vertical cut into shorter non-null stepmatrices s' and s" such that s's" = s. This is exactly the same as the deconcatenation of ordinary strings. (2) If s is non-null, then it can be decomposed by a horizontal slice into stepstrings t' and t", both of the same length as s, such that t' u t" = s. Except for the requirement of identical length, slicing is somewhat more general than cutting. For suppose s = a d f There is an obvious b e similarity between cutting this into s' = a d and s" = | / | , and b e c slicing it into t' = a d 0 and t" = | 0 0 / |. On the other hand, b e there is no way to match, by cutting, a slice into t' = | a d 0 and t" = b ef c If Q is any componential alphabet, we define Q to be the set of all stepstrings that belong to the free monoid F(Q).



3.2. Some Empirical Considerations. We are now in a position to be more precise about the kind of grammar we need for a spoken language. The set of all sentences of a spoken language can be more closely matched by a stepmatricial harp than by a linear harp. This tells us what we want the outputs of our partial grammar G" to be. We want them to be stepmatrices over a componential alphabet, the components of whose characters can be identified with the phonons of the language. Of course, none of the above tells us how to discover the phonons of a particular language. This is an empirical problem rather than a theoretical one, but it has certain theoretical aspects that we must discuss. We shall do so in terms of two possible stepmatricial systems for Potawatomi, arbitrarily labelled 'Aleph' and 'Gimel'. Relative to these two systems, the Potawatomi words /ntept:an/ 'I hear it' and /ciman/ 'canoe' are portrayed as shown in Figure 1. The Aleph Na St Portrayal: Ap Ap Sn Ob Cn Cn Vd Vd A1 A1

Sy L F Ur VI Vd

St Lb Ob Cn Vn UL Ft

St Ap Ob Cn Vn A1 Ft

Sy L B Ur VI Vd

Na Ap Sn Ur Vd A1

The Gimel Na St Ap St St Do Na Portrayal: Ap Ap Sy Lb Ap Sy Ap Ft

St Lm Ob Cn Af Vn A1

Sy H F Ur VI Vd

Na Lb Sn Cn Vd UL

Sy L B Ur VI Vd

Na Ap Sn Cn Vd A1

St Sy Na Sy Na Lm Lm Lb Do Ap

Af = affrication; A1 = alveolarity; Ap = apicality; B = backness; Cn = consonantality; Do = dorsality; F = frontness; Ft = fortisness; H = highness; L = lowness; Lb = labiality; Lm = laminality; Na = nasality; Ob = obstruence; Sn = sonorance; St = stopness; Sy = syllabicity; UL = involvement of upper lip; Ur = unroundedness; Vd = voicedness; VI = vocality; Vn = voicelessness. Figure 1

The theoretical bias in the Aleph system might be called that of 'phonetic realism': it seeks to acknowledge the occurrence of any



definable phonetic feature wherever it is found, undisturbed by the resulting redundancy, if that feature helps to identify sentences. The bias of the Gimel system favors economy: it officially recognizes the smallest possible number of phonons, and of phononoccurrences, from which everything else is predictable. A highly plausible case can be made out against either system by accepting the bias of the other. By detesting arbitrariness, we can argue against the Gimel system. By abhorring redundancy, we can tear the Aleph system to pieces. Thus two linguists, in complete agreement about the empirical facts of Potawatomi, could get into a long, heated, and inevitably futile argument. But this is just what we want to circumvent. Our task is not to take sides. It is to spell out the basic constraints within which choice, being purely a matter of taste, is by definition unarguable. There are four considerations. (1) The basic empirical constraint is that a stepmatricial system is unacceptable unless it distinguishes sentences just as speakers of the language do. There may be reasons, as hinted in §2.0, why the most we can hope for is an approximation, but that is here beside the point. If either the Aleph or the Gimel system meets this requirement, then so does the other. For the two systems are mutually convertible, to use Bloch's term. 7 Either system provides for an infinite number of stepmatrices. A finite set of rules can be assembled that will rewrite any stepmatrix of the Aleph system as the corresponding one of the Gimel system; furthermore, a finite set of rules can be assembled that will rewrite any stepmatrix of the Gimel system as the corresponding one of the Aleph system. This is what is meant by 'mutual convertibility': there exists a one-to-one correspondence between the stepmatrices of the two systems, so that the two systems are isomorphic down to the sizelevel of the whole stepmatrix. That a pair of corresponding stepmatrices look quite different is irrelevant, just as it is irrelevant for the isomorphism between Arabic and Roman numerals that '32'and 'XXXII' look different (§1.10). Of course, all this means that the 'Bernard Bloch apparently used this expression only orally: neither he nor were able to find it in his published works.



Aleph and Gimel systems are merely two selected from an indefinitely large set of stepmatricial systems that are pairwise mutually convertible and hence all equally 'correct' as far as the first criterion is concerned. (2) The second requirement is that phonons may not be arbitrary counters; each must be describable in articulatory or in articulatoryacoustic terms. The amount of leeway this allows is difficult to define. In the Gimel system, the difference between Potawatomi /e/ and /i/ is taken to be the same as that between /t/ and /c/: the presence of something called 'apicality' in the first of each pair, versus the presence of 'laminality' in the second. Since the blade of the tongue is raised for Potawatomi /i/ and not—or not very much—for /e/, I find this phonetically realistic. Whoever devised the Aleph system apparently did not, since for vowels he posits phonons called 'high' and 'low' that do not appear for consonants. We see that mutual convertibility can hold not merely between different systems but also between differing prejudices. What I mean to exclude altogether by the second requirement, even if the first might allow it, is a system invented more or less as follows: one lists the 'segmental phonemes' or even the 'allophones' of a language in some arbitrary order, and then encodes each into a simultaneous bundle of symbols drawn from the smallest stock of symbols that will do the job of keeping the bundles apart. For Potawatomi, Figure 2 shows one such unacceptable system. Here phonon e is said to recur in /p o i 6: ? e p:/. One need have no special knowledge to recognize that an articulatory-acoustic description of such a so-called 'phonon' is impossible. / p a b c d e

n a b c d

o a b c e

6 a b c

k: w i s t y s t: ? u m s: e k s: a p: / a a a a a a a b b b b b b c c c d e b b c c c d c c c c d d d d d d d d e e e e Figure 2

Even if we keep well within the loose bounds established by the



second requirement, we tend to have rather strong feelings about the degrees of appropriateness of different systems that meet the first requirement. I think these feelings are the manifestations of an undesirable philosophical bias that might be called 'elementalism' or 'atomism'. Having found a set of phonons that will do what must be done for a particular language, we tend to reify those particular elements, assigning them an independent reality that they need not actually have. The following analogy may help. There is an age-old sophomoric dispute as to whether a triangle is equal to, or greater than, the sum of its parts. Either view is incorrect, because the word 'sum' is out of place. We are obliged to recognize that a triangle is not a 'sum' of anything (unless that term be drastically redefined), but an arrangement of parts—a matter of geometry, not of arithmetic. Once this is recognized, it is rather easy to see that a triangle can be decomposed in more than one way, with a concomitant difference in the rules of assembly. We can view a triangle as composed of three line segments arranged in a certain way, or, equally well, as composed of three wedges (Figure 3). Either approach should yield a consistent treatment of

Figure 3

the geometry of triangles, and any geometric truth about triangles statable in either treatment should also be statable in the other. Perhaps one treatment would prove easier or simpler than the other. But it would be pointless to contend that line segments are more 'real' or 'natural' than wedges, or vice versa. (3) A third empirical requirement for stepmatricial systems is that the stepmatrices of the system serve as suitable points of departure for the functions that map phonological material into the speech signal. If there is a core of empirical truth in the phonemic theory, it is that a speaker's articulatory motions map a



discrete array of all-or-none elements into a continuous signal, and that, to understand what is said, a hearer must requantize the incoming continuous signal, thus recovering or reconstituting the discrete array (even though doubtless there are many situations in which the reconstitution need not be complete). The continuizing functions that generate the speech signal, then, have stepmatrices of phonons as their arguments. We could think of these functions as 'rewrite rules', if we wanted to, but they differ from any set of rules within what we ordinarily think of as a grammar for a language in two crucial ways: first, in that there are necessarily a nondenumerable infinity of minimal rules, so that they can be formulated only via composite rules; and, second, in that they are stochastic rather than determinate. The general structure of the phonon-to-speech-signal functions is discussed in §4, where the points just made are spelled out in greater detail. Because any two stepmatricial systems that meet the first two requirements are mutually convertible, it is obvious that any system meeting those two requirements will also satisfy the third, though not necessarily with great efficiency. The third requirement is useful in that it does away with one argument that a proponent of a 'phonetically realistic' system of the Aleph type might otherwise bring to bear against a supporter of economy. The Aleph-type phonologist wants to include redundant features because they play a part in identifying sentences, especially in the presence of noise. A phonologist of the Gimel persuasion can argue that it is not necessary to set up phonons for materials that will be added by the phonon-to-speech-signal functions, so that one might as well seek the simplest stepmatricial system that meets the other requirements. After all, he can argue, what reaches the ears of a hearer is the speech signal, not a phonon stepmatrix of either the Gimel or the Aleph type. However, the seeker of economy is not completely disenthralled by the point just made. One should like to minimize the complexity of the phonon-to-speech-signal functions, and there is no guarantee that this is accomplished by the stepmatricial system that might be judged simplest from some other point of view. This goal is



discussed in §4, especially §4.5. Furthermore, there is still a fourth requirement—one that, unfortunately, may be in conflict with the aim just mentioned, so that some compromise must be sought: (4) The stepmatricial system must be a convenient target for the operation of the rules of the partial grammar G", the part of the grammar G that delinearizes. There is here, again, no a-priori reason why the portrayal of the phonological system that best meets this requirement should be the simplest of all those that satisfy the other requirements. The rules of G" may be easier to formulate if the stepmatricial system is a bit fuller than a supporter of the Gimel system would prefer. If so, there seems to be no reason for adding special rules to prune out the redundancy. 3.3. The Three Formats for Problem Two. In order to investigate the second of the three problems set forth at the end of §3.0, we must establish some at least tentative answer to the third. I shall base my tentative answer on the kind of phenomena we used to treat, in pregenerative days, under the rubric 'morphophonemics'. I shall assume that the terminal subalphabet of G' is a finite set of characters called morphophonemes or, using the convenient short term proposed by Lamb, morphons\ and I shall assume that these morphons are very much like what we use to call 'morphophonemes'. A terminal string from G' is then a morphon string. I shall further assume that, with one sort of exception to be described in a moment, the effective rule chains of G (§2.3) terminate within G'. That is, whatever format be possible for G", its workings can be provided for—to revert to the computing machine example—by permanent wiring and interlocks, never requiring independent switch-setting as part of input. There are certain trivial cases (and perhaps some not so trivial, to be considered later) in which the second assumption meets with difficulties. The trivial ones are the cases of what we used to call 'free alternation'. In Potawatomi, glottal catch is clearly distinctive : /m?we/ is 'wolf'; /mwe/ is not a word at all. But within the phrase, after a word ending in a consonant, it does not seem to matter whether the next word begins with glottal catch followed



by a vowel or just with the vowel. By 'it does not seem to matter', I mean that hearers ignore this difference as they decode what has been said, and that, accordingly, speakers do not use the difference to distinguish between meanings but, as it were, toss a coin each time to decide whether to pronounce or omit the glottal catch. In a pregenerative frame of reference such a state of affairs is embarrassing: one would like to regard the glottal catch in the particular environment as nondistinctive, but the environment in question cannot be described without mentioning things (such as words) that have no status within the phonological system. In a generative model, such a free alternation can be provided for by a nonobligatory rule. Yet we do not wish this particular sort of nonobligatory rule to occur in effective rule chains. Accordingly, we specify that any 'free alternation' rule is to be assigned to G" rather than to G', and that its conclusion in or exclusion from any rule chain is to be controlled by an interlock device that, like a speaker, tosses a coin. G", then, is to be activated by any morphon string received from G', and is to generate a stepmatrix; further, the mapping of morphon strings into stepmatrices is to be a function, save just for instances of free alternation. There are at least three distinct formats for a G" that will meet these specifications. I shall call them respectively the rewrite format, the realizational format, and the stepmatricial format. 3.4. The Rewrite Format. In this, G" involves a finite set of rewrite rules much like those of a linear grammar, but with some crucial differences. The instrings and outstrings of the rules are stepstrings of a componential alphabet Q"(G"). Terminal strings are stepmatrices of a componential alphabet T", where T" c Q". G" must provide a rule chain for any possible morphon string generated by G', and that rule chain must be unique except for cases of free alternation. An 'initial' rule of G" is defiined not in terms of an arbitrary instring I but as one which will accept a morphon string from G' as instring. At this point there may seem to be a difficulty. Relative to G',



a morphon string is not a stepstring but merely a string. Yet the rules of G" are supposed to accept only stepstrings. The solution lies in the fact that in switching from G' to G" we switch alphabets. Relative to the terminal subalphabet T'(G'), a morphon string is a simple (linear) string. But relative to the componential alphabet Q"(G"), each morphon string must be a stepstring. To guarantee this, we merely require that T'(G') S Q". Relative to Q", then, a morphon string is a stepstring whose constituent subcharacters consist each of a single component. These components can be quite arbitrary—they need not recur in any of the other characters or subcharacters of Q". Since the special task of G" is to delinearize, it is also necessary that the very first rule of any rule chain in G" begin the delinearization. That is, the very first rule must rewrite at least one morphon of the instring as a subcharacter with more than one component. If this were not the case, then the rule would be rewriting one simple string as another simple string—a purely linear manipulation, which by definition is to be taken care of by the partial grammar G', not by G". As soon as delinearization has begun, this format for G" allows the Hallean type of context-sensitive rule, not available in a linear grammar (See §2.2 (4) and Halle 1962.) A rule of the form a -» b in the environment x

y z

can be interpreted as follows. Suppose a stepstring t can be cut into x, a', and y, and that a can be sliced into a and z. Then the stepstring t is acceptable to the rule, and the corresponding yield (the 'outstepstring' or 'stepoutstring') is xb'y, where b' — b u z. 3.5. Rewrite Rules for Potawatomi Morphophonemics. We shall now illustrate the rewrite format for a partial grammar G", using Potawatomi as the language. Potawatomi is useful for this because its morphophonemic behavior is rather complicated. Since I know nothing of Potawatomi intonation, we simply leave it out; it is doubtful that taking it into account would render matters any



simpler or neater. I assume (perhaps incorrectly) that internal open juncture is inaudible and hence not phonologically distinctive. It will be phonetically helpful to associate the letter 'u' in the transcriptions with the vowel of English 'cup' rather than with a high back rounded vowel. 8 We need 31 morphons: {p t T 5 k p: t: T: 5: k: s s s: S: s: m n N w y ? # U O u o e i a — + } . The number could be reduced to 25 by viewing as a separate morphon, but our rules here will treat p:, t:, and so on as units. In addition, we shall use the symbol ' © ' to mean 'boundary of string' in the specification of environments. That is, 'after © ' means initially, and 'before © ' means finally. We divide the rules into two sets, C-rules and R-rules. Every rule, when applied to a string (or stepstring), is to rewrite all suitably environed occurrences of its operand (§2.2 (1)). The C-rules apply first, and rewrite all morphons except + and — as simultaneous bundles of components; the R-rules then adjust the components in terms of simultaneous and successive environments. There are sixteen components: St I


| Cn


Sp I



Ob j


1 vi | Lb



| Ap apicality

Sm j semivocality Sn


G1 | glottality Ft |



*| PP palatalizability Pa palatality | Do dorsality *| W


Of these, the two marked with an asterisk do not appear in terminal stepmatrices. The other fourteen, all of which do, are phonons. After all relevant rules have been applied to a morphon string, the result is a phonon stepmatrix. The fourteen phonons occur only "Hockett 1948a.



in certain simultaneous bundles, and it will be convenient to represent stepmatrices linearly by sequences of symbols that represent the bundles. The symbols to be used in this way (enclosed between slant lines), and the bundles they represent, are as follows: /p St Lb Ob Cn

t St Ap Ob Cn

5 St Pa Ob Cn

k St Do Ob Cn

p: St Lb Ft Ob Cn

t: St Ap Ft Ob Cn

5: St Pa Ft Ob Cn

k: St Do Ft Ob Cn

s Sp Ap Ob Cn

s Sp Pa Ob Cn

s : § : m n w y ? u o e i a Sp Sp Na Na Sm Sm G1 VIVI VI VI VI Ap Pa Lb Ap Lb Pa Cn Lb Ap Pa Do Ft Ft Sn Sn Sn Sn Ob Ob Cn Cn Cn Cn Cn Cn

The first 26 C-rules can be applied in any order. We list, with them but unnumbered, three bogus rules that merely replace one symbol for a component by another. The change of notation is typographically and mnemonically convenient, but is not a true rewriting because the substitution is strictly one-to-one. CI.









C7. t:


C9. 5:





St Lb

C2. t

St Pp

C4. 5

St Do

C6. P'-


St Ap Ft

C8. T:


St Pa Ft

CIO. k:

Sp Ap




St Ap St Pa St Lb Ft St PP Ft St Do Ft


Sp Pa






Sp Ap Ft


Sp Pa Ft



Na Ap

C19. w (bogus)





Na Lb



C20. y

Sm Lb

? • U'

Sp Pp Ft















C25. i



Sm Pa






The next two C-rules may be applied in either order, but must follow the first 26: C27.



-* | Ob | in env


St C28.


| 0 -» | Sn | in env



The last C-rule must apply after the preceding two. C27-29 provide economically for the addition of redundant components: C29.


| 0 I -» I Cn I in env Ob

or Sn




To illustrate the C-rules, consider the morphon string ©OkUma©. The rules that apply are C5, C16, C21, C22, C26, C27, C28, and C29; the result is VI Cn VI Cn VI 0 . Lb Ob W Sn Do W St Na Do Lb Since the C-rules (plus the three bogus rules) completely eliminate all the symbols used for morphons, except — and + , we are now free to reintroduce any or all of those symbols in new values. We do so merely as a matter of convenience in notation, to achieve compactness in the statement of the R-rules: If X is any symbol used earlier for a morphon, we shall now use X to denote exactly the simultaneous bundle of components into which the C-rules map that morphon. For example, before the application of the C-rules the symbol 'k' denotes a morphon. After the application of the C-rules, the same symbol 'k' is defined as linear shorthand for the simultaneous bundle Cn Ob St Do Thus, the array displayed above, which results from the application of the C-rules to the morphon-string ©OkUma©, can be represented exactly by the notation '©OkUma©'. The conventions by which we allow such notations at this point are not a new set of 'rewrite rules' that undo what has just been done, but merely a matter of convenience. Further conventions for the stating of the R-rules are the following: If A" is the symbol for some component, we shall continue to enclose X in vertical lines when we refer to the component; if the vertical lines are omitted, then X represents ambiguously and indifferently any simultaneous bundle that contains the component. Thus:



St St

is a component; is any of the set p t T c k p: t: T: 5: k : .

'C' will denote any sequence of one or more Cn. 'V' will denote any VI that does not contain the component | W |. ' + ' will denote either itself or ©. The R-rules are strictly ordered. That is, given any output from the C-rules, each R-rule must be considered in turn to see whether or not it applies. Three of the R-rules are optional: that is, if the conditions for their application are met, they can nevertheless be applied or skipped at will. These three are marked by a plus-orminus sign ( ± ) . One rule, marked with an asterisk (*), while not optional, yields optionally either of two results from either of two operands. Parentheses enclose what may be present or absent. Rl.



in env or or


Sn Na

Ob Sp

in env Pa


1 PP 1 -




in env +


O —•


in env -





in env C









RIO. Rll.


1w 1 W




in env VI or O.


in env |0 |


in env +(C)WC or C C + . in env Cj_



where i is even, in I(C 1 )WC 2 W...C n WC t ! + iY, in which X ends with + , or with V if Q is present, and F begins with V, + , or W + . R12.




in env or

o w.



in env or

i y.


Sm #




in env C ( + ) C or C +.

R16. *R17.



±R18. R19.

in env +


in env St Ft Sp Sp X

Sp Ft

m env

where | X | is | Ap | or | Pa | R20.


I Ft




in env SpOb Ft in env Ob Ob(Ob(Ob)) or

Ob)ObOb Ob

St | ( + ) or










in env C +



We are now ready for examples. Each example is given as a morphon string and as a phonon stepmatrix, the latter in the linearized notation. The R-rules involved in each example are listed. The C-rules are not, since such listing would hardly be helpful. El. E2. E3. E4. E5. E6. E7. E8. E9. E10.

E l 1.


E13. E14.

n - n U k U t U N - w e ' I win in a race'. Rl,2,6,11,12. /nnuktuswe/. k w U t U m o T — k e — # 'he is out fishing'. Rl,6,7,11,12, 15,16. /ktumoCke/. n — p U k : U T — w e p — n — a ' I release him'. Rl,6,7,11,12,21. /npukcuwepna/. n U t — p U k : U T — w e p - n - a (same). Rl,6,7,11,12. /ntupk:u£wepna/. U s U - S : - n 'it lies thus'. Rl,6,7,10,11,12. /sus:un/. ? e s U - S : - k 'the way it lies'. Rl,6,7,10,12,19. /?es:uk/. n — n U k U t U N — a ' I beat him in a race'. R3,6,7,11,12. /nnuktuna/. OkUma'chief'. R4,11,12. /wkuma/. n U t - O k U m a - U m ' m y chief'. R7,8,ll,12:/ntokmam/; or R5,7,8,11,12,16: /ntokumam/. w — w a p — U m — a — U n + w U t — U k : w e y o — U m — U n 'he sees his wife'. R6,7,8,10,11,12,13,15,17,23. /(?)wapmantukiweyomun/. wUt — Uk:weyo — U m — U n + w — wap — U m — a — U n (same). R6,7,8,10,11,12,13,17,23. /wtuk:weyomun(?)wapman/. k U t — s y a — m U n + ? o t a n 'we're going to town'. R6,7,10, 11,12,22,23: /ktusyamunotan/; or R6,7,10,ll,12,23: /ktusyamunPotan/. p U m — y — i k 'they are around here'. R6,7,11,12,14. /pmu?ik/. n U t — k U k : — ? w — a ' I choose him'. R6,7,11,12. /ntukk:u?wa/.



E15. n - k U k : - ? w - a (same). R6,7,ll,12. /nkukPwa/. E16. n—wap—Um—a+UmUk:0 'I see the beaver'. R6,7,ll, 12,21,23. /nwapmamuk/. E17. UmUk: O +n—wap—Um—a (same). R6,7,l 1,12,23. /muk:nwapma/. E18. Uk:we+n—wap—Um—a 'I see the woman'. R6,7,12, 21,23. /kwenwapma/. E19. n—wap—Um—a+Uk:we (same). R6,7,12,23. /nwapmak:we/. E20. n—mUsUnUPUkUn 'my paper'. R6,10,11,12. /nmusnuPkun/. E21. nUt—Uk:U—im 'my land'. R7,9,ll,12. /ntuk:im/. E22. U n U n U # w - U k 'men. R7,10,l 1,12,16./nunwuk/. E23. w—os:—Un 'his father'. R7,10,13. /?os:un/. E24. P o + U t U + U n U n U # w ' t h a t man. RIO, 11,12,15,16,23./ Potununu/. E25. U m U k : 0 + ? o + U t U 'that beaver'. RIO,11,12,18,23: /muk:otu/; RIO,11,12,21,22,23: /mukotu/; RIO,11,12, 21,23: /mukPotu/. E26. Pes:UpUn 'raccoon'. RIO,12,20,21. /?esp:un/. Index: Rl: R2: R3: R4: R5: R6: R7: R8: R9: R10: Rll: R12:

El,2,3,4,5,6. El. E7. E8. E9. £1,2,3,4,5,6,7,10,11,12,13,14,15,16,17,18,19,20. E2,3,4,5,6,7,9,10,11,12,13,14,15,16,17,18,19,21,22,23. E9,10,ll. E21. E5,6,10,11,12,20,22,23,24,25,26. El,2,3,4,5,7,8,9,10,11,12,13,14,15,16,17,20,21,22,24,25. El,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22, 24,25,26. R13: E10,ll,23.

104 R14: R15: R16: R17: R18: R19: R20: R21: R22: R23:


E13. E2,10,24. E2,9,22.24. E10,ll. E25. E6. E26. E3,16,18,25,26. E12.25. E10,11,12,16,17,18,19,24,25.

3.6. The Rewrite Format: Discussion. The treatment of Potawatomi morphophonemics in the preceding section is admittedly not easy to read; the following discussion should help. (We use 'string' to mean either string or stepstring.) The system involves 52 rules in all. The way to 'turn the crank' and make the system work is as follows. Take a large sheet of paper with 53 horizontal ruled spaces. Insert a morphon string (e.g., one of the 26 given as examples) from left to right in the topmost space. List the 52 rules down a column on the left, one to each space from the second to the last, in just the order in which they are given in §3.5. Now inspect the morphon string and the first rule. If the rule will accept the string, apply the rule and write the outstring to the right of the rule, under the original string. If the rule will not accept the string, check the next rule. Consider each rule in turn, always checking its applicability to the last string alraedy entered on the sheet and ignoring any strings that appear above the last one. Since a rule preceded by the sign is a freealternation rule, it can be skipped even if a string is acceptable to it—toss a coin to decide. Remember that each rule that is applied is to operate on all suitably environed occurrences of its operand. When all the rules have been checked, the last string that has been entered on the sheet is the desired terminal stepmatrix. The rule chain that has been used can also be read from the sheet: it consists of all those rules next to which a string has been entered. This shows that the rules of the rewrite format are ordered. If



Rule R precedes rule R' in the list, then there may be a rule chain that includes R and R' in that order, but there can be none in which R' precedes R. It is important to note, however, that this ordering is only partly functional. The position of two rules relative to each other is functional just in case one of them leaves untouched, or may generate, environment to which the other may refer. On this basis, the first 26 of the Potawatomi rules must precede all the others, but could be permuted in any way among themselves without modifying in the slightest the mapping of morphon strings into phonon stepmatrices. The next two rules must follow the first 26, and must precede the rest, but could be reversed in order. The remaining 24 rules must have just the order in which they appear.

3.7. The Realizational Format. We can best introduce this by discussing the rewrite format for a moment more. Recall that the first 26 of the Potawatomi rewrite format rules have no inherently necessary order. Suppose we think of 'turning the crank' of the system in two stages instead of one. The second stage, using the last 26 rules, will be just like the single-step crankturning described above, except that it will operate on the output of the first stage instead of on morphon strings. The first stage will involve only the first 26 rules plus, for convenience, the three bogus rules listed with them in §3.5 and two further bogus rules that rewrite — as —, + as + . That gives us one rule for each morphon. We take a sheet of paper ruled into two horizontal strips. We write a morphon string down in the upper strip. We have the rules listed in some convenient place for ready reference, but not on the sheet of paper and not in any particular order. We now select any term of the morphon string —not necessarily the first—, find the rule that will operate on that term, and write down, in the lower strip of the paper, directly below the term in question, the subcharacter required by the rule. We then select any other term of the morphon string and do the same thing. We proceed in this way until all the terms of the morphon string have been dealt with. We have then generated, in the lower



strip of the paper, a stepstring, in accordance with the morphon string and the 31 rules. In the first stage, as described, the rules being used are strictly unordered. Also, the order in which we consider different terms of the morphon string is entirely arbitrary. Since this is so, we are certainly free, as a matter of convenience, to move along the morphon string from left to right, operating on each term in succession; but we do not have to do this in order to guarantee that our results will be correct. We also do not have to be sure that we check all the rules to see if they will apply. As the workings of the rewrite format system were described in §3.5, each of the C-rules was to rewrite all occurrences of its operand in the instring. That requirement does not hold in this new way of applying them: instead, a rule is applied to one term at a time of the morphon string. Indeed, in this way of working the system we do not rewrite anything. The morphon string is not erased, even metaphorically, as we apply the rules; it is still there when we are done. We have added something—the stepstring that accords with the given morphon string under the requirements of the rules. These are the crucial clues to the realizational format. In this format for a partial grammar G", all the rules of G" are like the first 26 of §3.5: the order in which they are applied is determined entirely by the order in which one chooses to consider the terms of the morphon string. There is, however, this crucial difference: realizational rules are context-sensitive, so that a given morphon may be realized by different subcharacters in different environments. The environments for a particular realization must be described with reference to the morphon string in which the operand occurs. This is necessary, since the morphon string is the only string involved in the application of the system except for the stepstring that one is in the process of generating. It is also possible, since the morphon string is not erased or altered in any way by the process, but is there as long as one needs it. Formally, the realizational format involves two alphabets: the alphabet T'(G') and a componential alphabet Q"(G"). There is one composite rule for each morphon of the alphabet T'(G'); it



maps that morphon into one or another subcharacter of Q"(G"), depending on the morphon environment in which the given morphon occur. The composite realizational rules are strictly unordered; however, the constituent minimal rules of a single composite rule may be ordered so as to render as simple as possible the specification of environments. 3.8. Realizational Rules for Potawatomi Morphophonemics. There are a few minor discrepancies between this treatment of Potawatomi morphophonemics and that of §3.5. Morphons strings look the same for both, but here ':' is treated as a separate morphon rather than as part of several others. The phonons are fewer for this treatment: |Ob|, |Cn|, and |Sn|, recognized in §3.5, are here ignored. A few rare situations not encountered in the examples given at the end of §3.5 are not provided for here, though they were in the rewrite format. The 25 morphons for this treatment, then, are: {p t T c k s S s : m n N w y ? # U O u o e i a — +}. We use the symbol © as before. The following cover symbols are used in specifying environments: St Sp Ob Sm Ct C C' V W W'

= ptT 2k = sSs = St Sp = wy = Ob Sm m n N ? # = any string of one or more Ct — any C except one beginning with s = u oeia = - U - U U- U-U O -O O- U-O O-U O-O = any W except —

Parentheses around something indicate that it may or absent. The specification of environments also involves two 1 and 2. To interpret the specification of environments the rules to a given morphon string, it is first necessary

be present subscripts, in applying to scan the



string and index certain occurrences of C with these subscripts. The procedure for indexing refers separately to each substring bounded by successive occurrences of + or © : we use ' + ' to represent both: First: (a) If the substring is +(C)WCW+, index the last C with 2, yielding +(C)WC2W+. (b) If the substring ends with ...CW+ but does not as a whole conform to the specification of (a), index the last C with 1, yielding ...C1W+. (c) If the substring ends with ...CWC+, index the last two C's respectively with 2 and 1, yielding ... C2 WC1+. (d) If the substring ends with ...CXC+, where X is not W, index as ...CXC1-\~. Second: Scan the substring from left to right, looking for sequences of any of the following three shapes in which C's have not been yet indexed: +

W(CW...CW) + CW(CW...CW) VCW(CW...CW) . What immediately follows, however, must not be another CW with unindexed C. In any such sequence, index with 1 and 2 alternately beginning at the left: +1W(C2WC1W...) + C1W(CiWC1W...)

VQfViCzlVQfV...). Note that in a sequence of the first type the initial + is indexed. However, in the second step: if any W is — O, U—O, or O—O, and the immediately preceding C has been indexed with 2, then the immediately following C may optionally be indexed with either 1 or 2; thereafter the indices alternate as before. Third: if any C's remain unindexed, index with 2. In the specifications of environments in the rules, 'C' with no



index refers to any C, but 'CY and 'C 2 ' refer only to C's that have been indexed that way by the above procedure. 9 In the statement of the realizational rules we use an arrow to mean 'is realized as'. The rules are strictly unordered, but within each rule there are subrules for different environments, and these subrules must be considered in the order in which they are listed. Thus, when consulting rule 13 for the morphon w, one must first —as indicated by the cross-reference—check to see whether the w fits the environment given for the spaced-out combination :...w in rule 9; if it does not, then one proceeds with the subrules of rule 13, in order, until the proper environment is found. rule






St Lb




St Ap

—(0-C —(:)(-)i

St Pa

all others

St Ap


St Pa


St Do


4. 5.



'It might seem that this complex procedure of indexing consonantal morphons constitutes a 'rewriting', so that the whole conversion from morphons to phonons is not being accomplished, as claimed, by realizational rules. I think this is not so. The indices introduced by the procedure do not appear in the realizational rules as items to be realized, but only in the descriptions of environments. Thus the indexing procedure is logically a part of the description of morphon environment; it could be replaced by a very much more complicated statement of environments for certain of the individual rules, but the loss in clarity would be great, with no compensating gain.



rule morphons(s)




0 Sp Ap

all others i-s -N(-)i ilV's

s :-C

Sp Pa Sp Ap

all others

(OjPfn-C all others 9.

:...? :...w :...0

Sp Pa

Si—OSmMtfO+W— St







0... | G11 j Ft | 0 (in free 0...0 alternation)

Ob x WOb ObOb1 WOb St t w? ?JVSt St__{Sm\{W)® St {Sm\{W)+Ob

®{W)Stz_ 10.


all others



Na Lb










Na Ap

-i-s N—C -i-N(-)i



Sp Pa

all others

Na Ap

(first see rule 9) w G1

—(-)o C(Sm)1(W)+ Ww C(SmUH0+—(-)o C(Sm\(lV)+W_2(W)0 e c

ww tv 1


i WC

11 G1 | (in free }0 alternation)

0 Sm Lb

all others




—ifVy all others

G1 Sm Pa

(first see rule 9)



CiSmMm+tW)all others

G11 0 G1 I

(in free alternation)

112 rule





























U-U U-U U :...0 O-O O-U U-O O -O O

u o




(see rule 9)

+ _ C

Sm Lb

c2 c c2_+


all others


(Note how this rule is to be applied. When one encounters, in the morphon string, an occurrence of U, O, or —, one is first to see if that occurrence is part of an occurrence of one of the sequences of two or three morphons of those tabulated in the 'morphon'



column above. If it is, the appropriate rule for the whole sequence is to be applied. If not, the appropriate rule for the single occurrence is applied.) rule








2 - C S!WN_2(-)i Ni-s—2 NX-N_2-C Nx—N—2(—)i Sp.JVOb 2

all others

| Ft


(Since a morphon string includes as many 0's as one wishes before or after every morphon, to apply this rule one must obviously look not for nulls but for the environments. The rule could be discarded by recasting rules 6, 8, and 12 for s, s, and N, and rules 1-8 for Ob.) 24.




In testing this set of realizational rules, it should be noted that they will, in the first instance, write the phonon | Ft | as a separate bundle: e.g., the morphon string nUt—pUk:UT—wep—n—a (example E4 of §3.5) will come out as — p Uk : U T — w e p — n — a n Ut Na St VI St St Ft VI St Sm VI St Na VI Ap Ap Lb Do Pa Lb Ap Lb Ap Do

(0's omitted)

for which the linearized notation (by the table given near the beginning of §3.5, but ignoring the three notations 'Ob', 'Cn', and 'Sn') is /ntupk:ucwepna/. However, this means that we are adding



a trivial readjustment by which St Ft Do

is changed to

St We Do Ft can make the rules themselves yield this result without changing their formulation, merely by adding a convention whereby, whenever a realization rule yields | Ft |, that is to be interpreted as simultaneous with the immediately preceding bundle rather than as a separate bundle in its own right. It is only the notation for this that presents any difficulty. 3.9. Stepmatricial Grammars and the Stepmatricial Format. The rewrite and realizational formats discussed above differ in the nature of the coupling of G' and G". In both cases, the coupling is via the terminal strings of G'. But in the rewrite format these terminal strings turn out, as one passes to G", to be stepstrings of a componential alphabet Q"(G"), so that they are acceptable as instrings to the rules of G". In the realizational format, there is no sudden conversion of alphabet, but merely the conversion worked by the realizational rules themselves, which map strings in the alphabet T (G') directly into terminal stepmatrices in the componential alphabet Q"(G"). In either case, of course, the terminal morphon strings from G' function, as required, as control input to G". A third possible format is suggested by a third way of coupling G' and G". Before we can describe this coupling, we must define what will be meant by a stepmatricial grammar. If Q is a componential alphabet, then Q is the set of all stepstrings that belong to F(Q). A stepmatricial grammar is a system G S (Q, I, T s , R), where: Q is a componential alphabet; / is a unique stepstring of Q ; T s is a proper subset of Q called its terminal (icomponential) subalphabet, and is such that I is not one of its stepstrings; and R is a finite set of rules. The postulates for a system Gs are such obvious analogs of those for a linear grammar (§2.1) that we shall not present them in detail but merely specify the crucial diiference: namely, that the terminal strings of a system G s must be not only stepstrings but stepmatrices over the com-



ponential subalphabet T s . Note that if we were to discard requirement (3) in the definition of a componential alphabet (§3.1), then a linear grammar would turn out to be a stepmatricial grammar for which max(w) = 1, so that Q = Q and all stepstrings, including 7, would themselves be stepmatrices. A stepmatricial grammar is therefore like a linear grammar in that inputs are rule chains, and in that outputs are terminal strings; it differs in that terminal strings are over a componential alphabet instead of a simple alphabet, and in that context-sensitive rules can refer to simultaneous environment. Now, since the output harp of a linear grammar is linear, and the input harp to a stepmatricial grammar is also linear, let us propose coupling G' and G" by letting the former be a linear grammar, the latter a stepmatricial grammar, and having the output harp from G' be the input harp for G". To provide for this, it is merely necessary that the set of rules R of the stepmatricial grammar G" have one rule R(/i) for each morphon ¡j. of the terminal subalphabet T'(G'). Obviously, R{p.) must be an initial rule (one that accepts the arbitrary substring 7 as instring) just in case ¡x occurs initially in some morphon string from G'. To turn the crank on a G" of this stepmatricial format (as we somewhat arbitrarily call it), take a large sheet of paper, write the arbitrary initial stepstring 7 at the top, and write the successive terms of a morphon string down the left-hand side in a column. Since each morphon (relative to G') is a rule (relative to G"), one may think of the entries in the column on the left as cross-references to the detailed rules, which are written out in some other convenient place. No testing is needed to see whether the successive rules apply, for by definition they all do. Therefore, one follows the first rule in the column and rewrites 7 as it requires; then one applies the second rule of the column to the outstring from the first; and so on, until one reaches the end of the rule chain. The last rewriting is the required terminal stepmatrix. There is a potential source of waste in this format that may be only apparent. Either a linear or a stepmatricial grammar characterizes exactly both its input harp and its output harp. With the coupling specified for the stepmatricial format, the output harp



from G' is characterized by G', but since it is the same as the input harp to G", it is also characterized by G". Obviously it is pointless to make both partial grammars do this one job. However, there is a simple modification that might render the discovery of an actual grammar G = G'G" (for some given language) simpler. Suppose we let the linkage between the two partial grammars be via the set H(G') n C(G"), where this may be a proper subset of H(G') and also of C(G"). That is, in designing the partial grammar G', we can allow it to generate some terminal strings that are not acceptable as inputs to G", if this freedom makes the design simpler and if it indeed generates all terminal strings that are needed as inputs to G". Similarly, in designing the partial grammar G", we can allow it to accept some input rule chains that would actually never come to it from G', if that makes the design easier, provided it does accept and respond properly to all inputs from G' empirically needed for the characterization of the language. 10 3.10. Comparison and Summary. We must now compare the three proposed formats for a delinearizing partial grammar G" (§3.3), and also consider further the third problem of §3.0, to which, so far, we have given only a tentative answer (§3.3). First I must offer a frank statement of my own experiences in trying to formulate examples of the three formats. I began with a prejudice in favor of Lamb's realizational approach, and after hitting on the cute notion of letting terminal strings from one partial grammar be rule chains for the other, I began to hope that the stepmatricial format would work best. Things did not come out that way. Working with Potawatomi, I found the rewrite format surprisingly easy to set up. The realizational format was rather more difficult. The stepmatricial format was so hard that I gave up. Thinking that all this might stem from the nature of Potawatomi, or from my long-standing habits of handling that 10

At the moment, only one version of this proposal strikes me as possibly advantageous: that of letting H(G') c C(G"). This was actually done in the sample formulations of §§3.5,8.



language, I turned to the Yawelmani dialect of Yokuts. 11 Here, again, the rewrite format was fairly straightforward, but my patience gave out before I could cast the data into either of the other formats. Understandably, then, my current prejudice (for this particular portion of the whole grammar of a language) is for the rewrite format. However, no one should take this too seriously. My report has to do only with relative ease of discovery. Relative simplicity of operation, once the partial grammar G" has actually been formulated in one or another format, is far more important. The issue is not closed. 12 Nor is the issue as to the proper answer of our third problem (§3.0) closed by the tentative answer we gave: that is, our proposal to regard the break between G' and G" as falling exactly where delinearization is to begin, to think of the characters of the subalphabet T'(G') as much like old-fashioned morphophonemes, and to allow no optional rules in G" except in the case of free alternation. Any one, or all three, of the parts of that tentative answer may be wrong: some alternative may be formally more elegant, empirically more effective, or both. Three points must be made. (1) It might be argued that there should be no break: that is, that G cannot profitably be decomposed into two coupled partial grammars. For a spoken language, this is tantamount to asserting that G should itself be a stepmatricial grammar (or, allowing for sources of difficulty to which we shall come later, some other kind "Newman 1944, 1946. Of course, there are other formats (or further variants of the ones described and illustrated in §3), on which we have not touched. Among these is a rewrite format that includes a transformational cycle: the input is a bracketed string with labelled brackets, and the transformations of the cycle are applied (in order) first to adjust the materials within the innermost pair of brackets and to erase that pair and its label, then for the next-to-innermost, and so on, until all brackets have disappeared. The examples I have seen of this are unconvincing. For English, they do not yield, as terminal strings, anything I am able to recognize as English; also, unless held in check in some manner not yet described they can yield more distinctions along some scales (such as stress) that can possibly be functional. But here, also, further exploration is assuredly in order. See Chomsky and Miller 1963 and their references. la



of grammar that yields stepmatrices as its outputs). To posit the desirability of decomposition is to subscribe to a sort of stratificational view of language design, involving at least three strata: that of (original or absolute) input (arrays of rules, inputs to G'); the morphon stratum (linear: outputs from G', inputs to G"); and the phonon stratum (stepmatricial). The term 'stratum' is Lamb's and my own; but the transformationalists, using a different vocabulary, seem to be in agreement on this particular point (see §6.6). Two types of evidence render the decomposition plausible. The first is that it has been possible to do so much, in a fairly efficient way, within the constraint of assumed linearity of output—the great bulk of the transformational literature dealing with specific languages attests to this. Decomposition makes it possible to keep almost everything that was worked out within that constraint, assigning it to G', as one adds the empirically necessary delinearization in the form of a separate partial grammar G". Actually, our reference should not be just to the transformational literature: most of what has been discovered about languages in the whole history of linguistics has been set forth linearly, and very little requires reworking merely because we recognize that a stepmatricial harp matches the sentences of a spoken language more closely than does a linear harp. The second type of evidence lies in the nature of the linkage between paired spoken and written languages, say English. It is clear that the grammars of spoken and of written English are very largely identical. Yet spoken sentences are more closely matched by stepmatrices, whereas written sentences are ordinary strings. 13 We may perhaps think of a single generative grammar for English which bifurcates somewhere near the output end, one fork generating terminal strings of letters, the other yielding spoken sentences. The bifurcation is then perhaps in the general vicinity of our break between G' and G". Let us make this clearer. A common lay view of the relation 13 This does not imply that stepmatrices might not be useful for the generative grammar of written languages—say, for capital versus lower case and the like. However, their role would be very different.



between speech and writing assigns priority of some sort to writing, of which speech is merely a fleeting reflection. To formalize this view, let Jl be the set of all (legal) morphon strings, ¡f the set of all phonon stepmatrices, and iV the set of all written sentences (for a language like English—not, say, for Chinese); then we can draw the following diagram: f



g' = g f '

Here g is the association from Ji to "W (§1.6), / ' that from •W to S f , and g' = gf that from .M to S 4 -> S 5


Figure 18 tree for purposes of cross-reference; the subscripts on the I's also supplement the vertical arrangement on the page to indicate their simple ordering. To underscore that we are accepting the ordered pair procedure, the two arrows converging on each D are also numbered, though this is obviously redundant because it is determined by the ordering of the initial nodes: the superordinate input string to a D is the one generated by the subtree that starts with the I bearing the smaller subscript. In the tree, two or more of the I's might be occurrences of the same rule; or two or more of the S's; or two or more of the D's; but by virtue of the added postulate T5 no I is the same as any S (and, of course, no I or S is the same as any D). Now the tree of Figure 18 can be represented unambiguously by a single row of symbols, given certain conventions to be spelled out in a moment. The row is presented as Figure 19. The row of Sj Sg I j S7 D j S 3 Ig Sj S9 I4 Sjo Sn D3 D^ S4 S5 S6 Figure 19



symbols represents the tree unambiguously because the orderings and interconnections shown in the tree, although not overtly represented in the row, can be completely inferred from the distribution in the row of I's, S's, and D's. To convert any rule tree into a simple row of symbols, draw a curve starting under the symbol at the upper left-hand corner of the tree, passing to the right under successive symbols until a D is encountered, then doubling back to pass under the next branch, and so on until the curve has passed under the terminal node. This is shown for our sample tree in Figure 20. Then follow along the curve and copy down each symbol as the curve passes under it for the first (or only) time.

To convert any row of symbols of this sort back into the corresponding rule tree, one need only add arrows accord ng to the following instructions: (1) Draw an arrow from each symbol in the row to the next, except when the next one is an I. In our example (Figure 19), the result is as shown in Figure 21. (2) Draw an arrowhead pointing in from the northwest towards each D, and an arrow nock pointing out towards the northeast from each symbol in the row, except the last, from which no arrow yet leads. There will be as many arrowheads as there are D's. The number of arrow nocks will be one less than the number of I's. This means the same number of heads and nocks, since in a binary converging tree the number of I's is necessarily one more than the number of D's. Furthermore, from the way in which a tree is converted into a row



S8 -»• S9 I 4 -»• S10 —*• s n Figure 21

of symbols, the nth arrowhead will be preceded by at least n nocks. In our example, the second step yields the result shown in Figure 22. y I 1 -»S 1 ->S, I




\ y D i S 3

y I 3 ->S 8 ->S1

\ \ l4-»S 1 0 -+S u ->D3-»Dg-^S4-*S5-»S 6

Figure 22

(3) Connect the nocks and heads by arcs that stay above the row of symbols and do not intersect, the nock of each resulting arrow being to the left of its head. There will be only one way to do this.6 In our example, the result is as in Figure 23. This completes the

Figure 23

conversion. One can imagine, if one wishes, pulling downwards and to the left on the medial I's, until the figure is deformed to look like the original tree with all the I's in a column. Our procedure for converting a tree into a row of symbols rests on the fact that the tree is displayed on a flat surface with arrows pointing generally from left to right. A tree can be so displayed whether its initial nodes are ordered or not, but if they are not—that is, if we choose the unordered pair procedure—then what is structurally a single tree can typically be displayed in several different ways, giving rise to several different rows of symbols. Display in a plane forces arbitrary choices if initial nodes are not ordered. For a moment, let us consider the tree of Figure 18 to be one with unordered initial nodes. The numbers on the arrows are to be deleted; the subscripting can be retained, but merely as an indication "The proof is given in §2 fn.7: merely replace the terms 'opening bracket' and 'closing bracket' respectively by 'arrow nock' and 'arrow head'.



of rule identity, not to show ordering. Now imagine that the nodes are beads and the arrows rigid wires. Suspend the tree from its terminal node, like a mobile, so that everything hanging from a D can rotate freely around the vertical axis through that D. All that is formally relevant in the tree with unordered initial nodes is invariant under any such rotations. For example, we might twist the tree into a single plane in such a way that, when set down on a table, it would look like Figure 24. With unordered initial nodes,

Figure 24

this is exactly the same tree as that of Figure 18. But this one, if we follow our instructions, gives rise to the row of symbols shown in Figure 25. In fact, our sample input tree, under the unordered I4 S 1 0 S N I3

s8 s9 D 3

SX S 2 I2 S 7

s3 D 2

S4 S 5 S6

Figure 25

pair procedure, gives rise to eight different rows of rules, all of which correspond to exactly the same input array. This shows why we insist on the first proviso. We want the row representation of an input tree to be unique. The reason for our second proviso rests in the instructions for converting a row of symbols back into a tree. The first instruction requires us to insert an arrow between each pair of symbols in the row, except when the second of the pair is an I. If some initial rules might also occur noninitially in a tree, we should have no way of knowing whether or not an arrow should be inserted just before such a rule. If initial rules are exclusively initial, there is no such uncertainty.



Now, what we have been calling informally a 'row of symbols' is, of course, simply a finite string over an alphabet whose characters are the rules of the grammar. We have thus shown that the set of all inputs to a binary tree grammar can, after all, be viewed as a harp. The more complex input geometry seemingly required to accommodate double-based rewrite rules turns out to be unnecessary: the complexities can be provided for otherwise. For our computing machine, a single long row of switches will do. An input is a linear sequence of switch settings. Interlocks can take care of the interconnections represented by the arrows in the tree and by the ordering of initial nodes. What is more, just as for a linear grammar we can replace C(G), the set of all 'rule chains' (here, rather, linearized rule trees), by E(G), the set of a l l ' e f f e c t i v e rule chains' (here effective linearized rule trees). And E(G), like C(G), is here, as for a linear grammar, a harp. Let us be clear about this. We have not concluded that tree grammars are, after all, linear grammars. They are not. The class of linear grammars is a proper subset of the class of binary tree grammars, consisting just of those for which R 2 is null. We have shown only that—given our two provisos—the inputs to any tree grammar can be viewed as finite strings of rules. 5.5. The Time Bomb Method. Chomsky has recently developed a procedure that provides for embedding and conjoining within the bounds of a linear grammar. 7 It also does away with singlebased transformat ons; at least, they appear in a greatly altered guise, not as rules but as a special sort of auxiliary character. We can begin our illustration of the procedure with a reformulation of Simple Sample Two (§2.2 (3)). Some of the rules of the earlier formulation remain unchanged, but the old 7o is deleted and two new rules, Rx and RT, are added in its place: 'When this was worked out I had not had direct access to Chomsky's discussion of the technique, and had gathered it from indirect sources, notably some of the papers delivered at the 1964 Summer Meeting of the Linguistic Society of America. The formalism is not superficially the same as his in Chomsky 1965, but I still think he should be credited for the basis of this approach.




I -* IX

in the environment



(see comment 1)


B -» B(c)

(see comments 1, 2)


D -» D(EG)

(see comment 1)


E — G(h)


I(BD(EG))X ife.


(see comments 1, 2) (see comments 1, 2) -> I(K(GB)L(M(j)E))

(see comment 3)

(erases all nonterminal characters; see comment 4)

Comment 1: These rules (that is, those with the cross-reference to this comment) will accept a string s only if the operand of the rule in the string is not preceded in s by any bracketless labels. Comment 2: We assume that each of these three is one of a set of two or more that operate, respectively, on B, E, and G ; we do not care what the other rules are, and make the assumption merely so that these three rules will not inevitably be locally obligatory. Comment 3: A string s is acceptable to this rule if it can be deconcatenated into I f B x ^ f E x a G x J j X , where each xt has the form (yi) in which the brackets are paired and yi is any non-null string over A. RT(s) is then I(K(Gx3Bx1)L(M(j) ExJ). Comment 4: A string is acceptable to rule RE only if it contains no bracketless labels. In this new formulation, as in the earlier version of Simple Sample Two, the rules that are overtly stated allow just two rule chains, generating just two terminal strings: rule chain, old version R1R2R3R4.R5RE RIR'ZR'ZRIRSTORe

rule chain, new version RiR2RaRiRsRE RxR'iRzRzR'iR'sRTRE

terminal string cfh hcjf.

We must also compare the effective rule chains of the two versions, remembering comment 2 :



effective rule chain, old version RzRiRa R2R4R5T0

effective rule chain, new version R2R4R5 RxRiRiRs

terminal string cfli hcjf.

We see that it is only the terminal string with a 'transformation' that is handled differently. The second rule chain is longer in the new version than in the old, but the greater length is due entirely to an additional locally obligatory rule; the effective rule chains are of the same length in both versions. In the old version, the transformation TO is an optional rule that can be applied or skipped for any string acceptable as instring. In the new version, the option is made at the outset, where one can either use rule Rx or skip it. If used, this rule plants a 'time bomb' X in the string, at a point where it will be out of the way and play no part, for a while, in the successive rewritings of the string. However, once planted, the bomb is bound to explode in due time, since there is no way of reaching a terminal string except via the 'trigger' rule Rt. Every rule chain of the system must end with the erasure rule Re', but that rule will not accept a string with a time bomb in it because X always remains—until it is exploded—a bracketless label. When it is triggered, its explosion redistributes the pieces of the preceding string in a specified way (see the formulation of rule Rt and comment 3), the bomb itself disappearing completely. We must think of the redistribution of pieces brought about by an explosion not as dependent on the triggering rule but rather as dependent on the time bomb itself. Early rules might allow for the planting of different time bombs, each of them, as it were, a 'shaped charge' that will have an explicit effect on the string when it is triggered. There must be a different (minimal) rule for the planting of each time bomb; but all can then be triggered by the same triggering rule Rt. Now let us speak, in very general terms, of a grammar G intended to be a grammar for English. We shall refer only to|the partial grammar G', whose output is morphon strings, but shall



indicate those strings via ordinary written English, as though they had been processed by a suitable partial grammar G". We need, first, a number of rules of each of the following types: Type 1: Type 2:

/ I(IXt) I - I(IIYj)

(i = 1, 2, ..., r) ( j = 1, 2, t).

These are the bomb-planting rules; there must be one of type 1 for each bomb Xi and one of type 2 for each bomb Yj. We need, next, phrase structure rules of the 'expansion' type described in §2.2 (3). Since these rules can operate only on a bracketless label, they cannot affect the I outside the initial opening bracket of a string generated by a rule of type 1 or 2. The rules are context-sensitive, and it is important that the context may include one or more time bombs. For example, the first / inside the bracket in I(IXx) or I(IIYJ might be expandable in ways not possible for the first / inside the bracket in I(IX2) or I(IIYJ or for the second I inside the bracket in I(IIYX). A rule chain in G has the following general structure: (1) First comes an initial rule row of zero or more bomb-planting rules, the outstring from which consists of Vs, brackets, and bombs; each bomb is immediately followed by a closing bracket. Whatever I in this outstring is not immediately followed by an opening bracket is expandable or developable. (2) Next comes a sequence of rows of phrase structure rules, one row for each developable / in the outstring from the first stage. The outstring from this second stage consists of labelled brackets, bombs, and terminal characters, with no bracketless labels except the bombs. Effective rule chains end at this stage; the remaining stages involve only obligatory rules. (3) Next is the triggering rule RT, which we may think of as exploding all the bombs in the string in a single application, but necessarily in a certain order: the innermost bomb—that is, the one enclosed within the string by the largest number of pairs of brackets—explodes first, affecting only the part of the string enclosed within the same innermost brackets; then the bomb that is now innermost, and so on. (4) Finally comes the erasure rule REThe outstring is a morphon string.



To generate 'John sees Bill', we skip the first stage, and develop / purely by phrase-structure rules. 8 To generate 'Bill is seen by John' we first use a rule of type 1, rewriting / as I(IX), where A'is defined to be the proper bomb. Then we use exactly the same phrasestructure rules as for 'John sees Bill'; but they now operate on the I within the brackets. To generate 'John said so' or 'Bill is coming' we proceed as for 'John sees Bill'. Let us say that the outstring from stage 2 for 'John said so' is IfsJ and that that for 'Bill is coming' is I(s2), where s1 and s2 are the appropriate strings with no bracketless labels. Now suppose we begin with a bomb-planting rule of type 2, say I -* I(IIYq), and then develop this in stage 2 until we have the outstring I(I(s1)I(s2)Yq). The terminal string will be 'John said that Bill is coming'. Or, if we choose a different bomb-planting rule of type 2, say Yp, and proceed in the same way, we get 'John said so and Bill is coming'. These examples, of course, are supposed to define (informally) the bombs Yq and Yp. Note that the contextsensitivity of the rules must allow also the stage-2 outstring I(I(s2)I(sj Yp), since 'Bill is coming and John said so' is a perfectly good English sentence, but that they must preclude *I(I(s2)I(s1J Yq). In a more complicated way, a stage 1 outstring of the form I(II(IIYa)Yi,) can be developed to yield a sentence like 'I heard Bill say you were coming', involving three kernel sentences; one of the form I(I(IlYc)Xa) could yield 'John was elected president by the club'; and so on. It is clear that the subset of R consisting just of bomb-planting rules of types 1 and 2 must be open (§2.1), since there is n o most complex sentence and hence no longest outstring for this stage. 5.6. Summary. The point of departure for this chapter was the assumption that a grammar for a language cannot be very satisfactory (to express it as mildly as possible) unless it allows doublebased transformations or some other method for handling complex and compound sentences. We have found three ways of doing this: 8 Alternatively, of course, we could insist that at least one bomb be planted— and provide a 'zero bomb' to take care of kernel sentences.



G (or the partial grammar G') can be (1) a binary tree grammar under the unordered pair procedure; (2) a binary tree grammar under the ordered pair procedure and with added Postulate T5; (3) a linear grammar using time bombs. The first requires inputs to be binary converging trees; the third requires them to be strings; the second permits either. A reasonable next step might be to seek empirical reasons for preferring one of the three alternatives, or, at least, for favoring one of the two types of input geometry. That step will not be taken within this monograph, however, because something else has turned up that takes precedence. We have discovered that there need not be, from the formal angle, any unalterable commitment to linear inputs. But if we can accommodate formal grammars to either linear or binary tree inputs, how about even more complex types of input geometry? Why search for empirical evidence favoring one of two alternatives if those two are in fact drawn from a larger range of possibilities? This explains the direction our inquiry will take in the next chapter.


6.0. Nonlinear Inputs. For our purposes in this section (as suggested in §5.6), we must define a new and more inclusive class of grammars; for that, in turn, we need the formalization of possible input geometries presented first below. 6.1. Finite Networks.1 A finite network is a very general sort of array of elements and interconnections. To construct one, proceed as follows: (1) Obtain a finite number of indistinguishable elements to be called nodes. If desired, attach labels so that they become at least partly distinguishable (wholly so only if no two nodes bear the same label). Scatter them on a sheet of paper (Figure 26). • •

Figure 26

(2) Draw arcs connecting pairs of nodes, not more than one arc per pair; do this in such a way that one can pass from any node to any other by moving along one or more arcs (Figure 27). (It does not matter if arcs intersect—one may imagine them passing around each other above and below the plane of the paper.) ^Network' is what we might call a 'semifree' methematical term: that is, it has been used in a number of related technical senses, but has not been precisely standardized, so that we can here give it a new precise sense without apology to anyone. See Flament 1963, Berge 1959.



\ Figure 27

(3) Attach an arrowhead to at least one end of every arc (Figure 28). Then, optionally, as a matter of convenience, remove both

Figure 28

arrowheads from any arc that bears one at each end; that is, regard a no-headed arc as equivalent to a double-headed one. For a finite network over an alphabet, one does attach labels to the nodes, each label being some character of the alphabet. 2 We shall show that all the types of array mentioned so far in this essay (strings, converging and diverging trees, stepmatrices, unordered sets), as well as many other types, are networks. Some of the types of system to be mentioned below also have infinite varieties: the restriction 'finite' is to be understood throughout. If every arc in a network bears arrowheads at both ends, the network is unoriented(Figure 27). A network that is not unoriented is locally oriented (Figure 28). If every pair of nodes of an unoriented network is connected by an arc, the network is merely an unordered set (Figure 29). If the nodes are labelled, a different label for each node, the labels name

Figure 29 2 An equivalent formal definition: a finite network is a finite set N on which is defined an irreflexive relation R whose transitive closure R* is the universal relation on N. See Clifford and Preston 1961 §1.4.



the members of the set. If they are not labelled, the network corresponds merely to a positive integer (or finite cardinality). If we ignore the proviso after the semicolon in step 2, but put arrowheads at both ends of all arcs drawn, the resulting figure is not a network but a more general structure called a simplicial complex3 (Figure 30, but also 27, 29). Any connected subset of a

Figure 30

simplicial complex (any subset that meets the proviso of step 2) is a component; thus, there are three components in Figure 30. An unoriented network is therefore a simplicial complex of one component. If, instead, we ignore the first part of step 2, allowing as many as s arcs between a pair of nodes, but then heed the proviso of step 2 and put arrowheads at both ends of all arcs, the resulting figure is not a network but an s-graph. Structural diagrams in chemistry are s-graphs, e.g. H ,H H/



Thus an unoriented network is a 1-graph—an i-graph with s = 1. A loop is a subset of a network within which one can travel from a given node via one or more arcs back to the same node, following arcs only in the directions indicated by the arrowheads. In an unoriented network every connected subset of two or more nodes is a loop: two is enough, since one can move from one node to 3 The term is from topology. The meaning given it here was normal in the point-set topology of a quarter of a century ago, when it was explained to me by a graduate student in mathematics at the University of Michigan. Topology has changed so much that current definitions of the term seem (and perhaps are) totally different; e.g., Hu 1964, §§4.1-2.



the other and then back again along the same arc. An unoriented network that contains no loops except those that require the same arcs to be traversed twice is an unoriented or unrooted tree (Figure 31).

Figure 31

An oriented network is a locally oriented network that includes no loops: thus, every arc must bear a single arrowhead. Figure 28 is only locally oriented, for it contains loops. Figure 32 shows an

Figure 32

oriented network. Apart from the trivial case of a network of one node, the nodes of an oriented network are of three kinds. There must be at least one initial node, defined as one to which no arrows lead. There must be at least one terminal node, defined as one from which no arrows lead. There may also be medial nodes, to and from each of which at least one arrow leads. Consideran oriented network for which the following holds: if one can pass from a node x to a node y by some indirect path, involving two or more arcs, then there is no arc leading directly from x to y. Such an oriented network is a partially ordered set. Figure 32 is not a partially ordered set; Figures 33,34, and 35 are. A partially ordered set can also be defined as a system S(K, 5S), where K is a set of elements and ^ is an improper inequality re-



lation, and for which the following holds: if x and y are elements of K, then either x y or y ^ x, or both, or neither', if both, then (from the definition of an improper inequality relation, §1.9), x = y; if neither, then there exists at least one z such that z or else such that z ^ X and z sS y. To represent x is z and y a (finite) partially ordered set by an oriented network, we interpret the elements of K as nodes, and draw an arc with an arrowhead from x to j just in case x si y and^here is no z such that x ^ z y. An upper bound of a subset S of a partially ordered set K is an element b such that, for any element x e S, x sS b. A universal upper bound is an upper bound for the whole set K. Lower bound and universal lower bound are defined in an obvious parallel way. If an oriented network has a unique terminal (initial) node, that is the universal upper (lower) bound of the network viewed as a partially ordered set. Figure 32 has a universal lower bound but no universal upper bound. In an oriented network (or partially ordered set) a least upper bound of two elements x and y is an upper bound b of the set {x, j } such that, if b' is any upper bound of {x, y}, b b'. A greatest lower bound is defined similarly. An upper semi-lattice is an oriented network in which every pair of elements has a least upper bound. A lower semi-lattice is an oriented network in which every pair of elements has a greatest lower bound. A lattice is an upper semi-lattice which is also a lower semi-lattice. Figure 33 shows an upper semi-lattice; Figure 34 shows a lattice.

Figure 34

If, in any oriented network, x < y and there is no element z



Figure 35

such that x < z < y, then y is an immediate successor of x and x is an immediate predecessor of y. A converging tree (Figure 35) is an upper semi-lattice in which immediate successors, when they exist, are unique (the terminal node, of course, has no successors at all). A diverging tree is a lower semi-lattice in which immediate predecessors, when they exist, are unique. For a diverging tree, look at Figure 35 holding the page upside down and imagining the arrowheads at the opposite ends of the arcs. A simply ordered set (or a string) is a converging tree that is also a diverging tree. There are many interesting special kinds of lattices; for example, any Boolean algebra is a lattice. 4 All of these, of course, are finite networks. 4

Birkhoff and MacLane 1944, ch. 11.



Consider, now, a network whose nodes fall into pairwise disjunct subsets Ki, i = 1, 2, ..., m, where: every pair of nodes of a single subset is connected by a double-headed arc; an arrow leads from every node of subset Ki to each node of subset Kt+i, for i = 1,2, ..., m—1. This could be very messy to draw, so we introduce some conventions: we put all the nodes of a subset in a column, and we let the columns come one after another from left to right; we omit the arcs and arrows, which are predictable from the arrangement. Then we omit the nodes too, just retaining the labels. We then have a stepmatrix. Thus, a stepmatrix is (or can be represented as) a network of a certain kind over an appropriate alphabet.

6.2. Conversion Grammars. A conversion grammar is a system G(C, L, g), where: (1) C is a set, at most denumerable, of inputs Q . Each Ci is a finite network over a finite alphabet R of characters Ri. (2) L is a set, non-null but at most denumerable, of outputs Lu Each Lt is a finite network over a finite alphabet T of characters 7*. (3) g is a surjection with domain C and range L; that is, for any C, g(C) is a unique L. Furthermore, for a fixed L, the set of inputs {Q} such that g(Cj) = L is finite. In addition, any conversion grammar must be either inputmonitored or output-monitored (not both): (3a) G is input-monitored if, when confronted by any finite network C over the alphabet R, one can tell without computing g(C) whether or not C e C. (3b) G is output-monitored if, when confronted by a finite network C, one must compute g(C) = L and inspect L in order to tell whether or not C e C. This means also that, when confronted by a finite network L over the alphabet T, one can tell whether or not Le L without searching for a C such that g(C) = L. From the definition of g, the cardinality of C must be at least as great as that of L; in the cases that interest us, both sets are denumerably infinite. By virtue of the definition of g and the fact that the grammar is either input-monitored or output-monitored,



the grammar specifies exactly what networks over R belong to C and also exactly what networks over T belong to L. The inverse of a conversion grammar G(C, L, g) is a system G" 1 (L, C, g- 1 ) such that, if g(C) = L, then g~\L) = C. The inverse of a grammar is not in general a grammar, since g need not be injective; if g is in fact not injective, then is not a function. We see that the inverse of an input-monitored grammar is outputmonitored, and that the inverse of an output-monitored grammar is input-monitored. The class of conversion grammars is clearly not empty. We take the following examples to be input-monitored. A linear generative grammar as defined in §2.1 is a conversion grammar in which all inputs and all outputs are (simple) strings. A linear grammar with stepmatricial output, or a stepmatricial grammar (§3), has strings as inputs, stepmatrices as outputs. A binary tree grammar allows simple strings (under the ordered pair procedure) or binary converging trees (under either procedure) as inputs, and can be adjusted for either simple strings or stepmatrices as outputs. Some valid examples of conversion grammars are not what we would ordinarily call 'grammars'. The procedure set forth in §5.4 for the reversible conversion of a certain class of rule trees into rule strings fits our definition of a conversion grammar; in this case, by exception, the inverse of the grammar is itself a grammar (outputmonitored), since the function g is bijective. There is no particular reason to doubt the existence of many other types of conversion grammar, and these should be investigated, not only for the abstract pleasure of the exploration but because we might well find a type that will fit languages more neatly than any so far proposed. For this investigation we have a few guidelines. From §3, we can conclude that outputs should be stepmatrices, except that for at least some written languages simple strings may do as well. For applications in linguistics, then, there is no reason to explore more bizarre types of output geometry. Input is another matter. That speech carries meaning—at least sometimes—is not worth arguing. That a language quantizes the world that is being talked about is true almost by definition. But



beyond this, our empirical information does not point clearly to strings, nor to converging trees, nor to any other specific type of network as the appropriate type for input geometry. For that matter, despite the great generality of networks, it is an act of faith to assume that inputs can be viewed as networks of any sort. But we shall assume this, in order to carry through two specific investigations. 6.3. Generalized Rewrite Rules. The first investigation has to do with rewrite rules in the enlarged frame of reference. Rewrite rules can obviously be used in a conversion grammar, but if they are then there are certain very specific constraints on input geometry. We should know what these constraints are, since if it should turn out that empirical data suggest a type of input geometry that does not satisfy the constraints, it would follow that rewrite rules would have to be abandoned. After all, rewrite rules were not invented with a view to their use as input. We have found that they can be so used, but we have no guarantee, only the hope, that they can be made to correlate with meanings in the way we wish. So far, we have defined two kinds of rewrite rules (in terms of their geometric properties in rule arrays: §5.3). An R 1 operates on a single instring and yields a single outstring. An R 2 operates on a pair of instrings but, like an R 1, yields a single outstring. There is a simple generalization. For fixed positive integers m and n, we define a (generalized) rewrite rule RV (i = 1,2,..., m; j = 1,2,...,«) as one whose operand is an ordered /-ad of strings and whose yield is an ordered y-ad. Each of the j outstrings, of course, must be a specified function of the i instrings. A network of such generalized rewrite rules must yield a single terminal string; this means that there must be a single terminal rule in the network, and that it must be of the type R a. There might be any finite number of initial rules in the network. We lose no generality by assuming Postulate T5 (§5.4), so that a rule that occurs initially in a network can occur nowhere else. Further, an initial rule must be of the type RV. j arrows lead from each initial rule to some non-initial rule, i arrows lead to the single terminal rule. Otherwise, i arrows



Figure 36 The subscripts on the nodes indicate the rule-type.

lead to and j arrows lead from each participating rule RV. Figure 36 shows a rule network that conforms to the specifications just outlined. It will be noted that all arrows are numbered. This is necessary in the generalized rewrite-rule case, since multiple outstrings from a rule are by definition ordered and there must be some device to control traffic from one rule to another. The numbers on the arrows are a kind of apparatus not mentioned in our definition of a network (§6.1). But each arrow bearing a number can be replaced by a pair of arrows and a node, the node bearing the number as label. The nodes thus labelled can refer to 'bogus' rules that do no true rewriting but merely direct traffic properly. Thus, a network with numbered arrows can be replaced by one without any such ancillary apparatus. 5 The class of networks admissible if the system uses generalized rewrite rules is not more inclusive than the class of all (finite) partially ordered sets with a universal upper bound.

Now let us say that a set of networks is linearizable if one can map any network of the set into a string whose labels are just those of the nodes of the network, in a way that permits unique recoverability. In §5.4 the set of all binary converging trees with ordered 'When this was first published I thought that the point made in the text was a simple generalization of the unordered pair procedure of §5.1. It turns out to be somewhat more complicated: see the Appendix.



initial nodes (and with initial nodes bearing labels that cannot label non-initial nodes) was shown to be linearizable. Now, of the whole class of networks allowed by generalized rewrite rules, only a certain subclass is linearizable by the procedure described in §5.4—which seems to be the only obvious procedure. 6 That subclass is exactly the set of all m-ary converging trees. Such a network admits only rewrite rules of the types B11. Figure 37 shows such an m-ary converging tree with m > 2, together with the 'linearization curve', the string into which it can be mapped, and the steps for recovery. If we try to draw a linearization curve in the network of Figure 36, we find no way to reach some of the nodes because arrow shafts are in the way; such an obstacle is never encountered in a converging tree. It is obvious that every m-ary converging tree is a partially ordered set with a universal upper bound, and equally obvious that the converse of this statement is false. We have, then, these results: (1) If a conversion grammar is to use rewrite rules as input elements, an input must be a partially ordered set with a universal upper bound; no more general type of network will serve. (2) If, in addition, inputs are to be linearizable, they must be m-ary converging trees. 6.4. An Application of Generalized Rewrite Rules. Although the substance of the preceding section was developed with no "Any finite network, of course, permits what we may call a linear description: (1) list the node labels as row-headings and as column-headings of a square table, using the same order for both listings and repeating each node label as often as it occurs in the network; (2) if an arrow leads from node x to node y, put a '1' in the intersection of the xth row and the yth column, otherwise a '0'; (3) copy the headings and entries out following successive northeast-tosouthwest diagonals. For example: N a 6 c

a b c

0 11 10 0 0 0 0

yields: NaabObcl l c l O O O O O . Recoverability is obvious. But this procedure (like various more or less transparent alternatives) does not conform to our definition of linearizability, since the string includes labels that do not occur as node labels.


STRING: li Sx S 2 I 2 s 3 S 4 I 3 S s S6 I 4 S7 Qi I 5 S 8 S 9 Di I 6 S 10 S^ S12 I 7 S13 S 14 T^ S 15 RECOVERY O F TREE (all arrows point to right; heads and nocks are omitted for clarity) : Ij-Sx-S^ I 2 -S 3 -S/ I 3 -S 5 -S 6 I4-S7-QI Ig-Sg-Ss-Dj I 6 -S 10 -S u -S 12 l7"Si3"S14-T1-S15 Ij-SJ-SJ







I,-S 13 -S 14 -T r S 15








Figure 37 T = initial Rn; 'S' = noninitial Jt11; 'D' = J?21; 'T' = i?31; 'Q' = R11.



practical application in view, one turned up almost immediately. As is well known, one of the dilemmas of transformational theory has been how to handle the phenomenon of multiple crosssubclassification of the forms of some major class.7 A simple example is Potawatomi noun stems (N), which are either animate (An) or inanimate (In) in gender (Gn), and also either independent (Ind) or dependent (Dep) in dependency (Dp). If rules can be only of types R11 and R21, as in a linear or binary tree grammar, the situation is as follows. One could provide for the whole situation in a single composite rule subsuming four elementary rules:8 PRV


NAnlnd, NAnDep, NInInd, NInDep.

This is undesirable because it conceals structure: two different elementary rules correlate, as it were, with An, two with In, two with Ind, and two with Dep. An alternative, more often chosen in current practice, is as follows: PR2. PR3. PRt.

N -> NAn, NIn NAn NAnlnd, NAnDep NIn NInInd, NInDep.

Here each listed composite rule subsumes two elementary rules, giving a total of six. There is now a single elementary rule to correlate with An, and a single one for In. But there are still two each for Ind and Dep. Furthermore, the ordering is arbitrary. There is no way to choose between PR2-PRi and PR5-PR7: PRS. N -> NInd, NDep PR«. NInd NAnlnd, NInInd PRT NDep NAnDep, NInDep. 'The procedure developed here stems from R. P. Stockwell's (unpublished) notion of a fork rule. 8 I am not sure how the procedure developed here would fit into the rest of the grammar. I am not sure whether my notations such as 'NAnlnd' are single characters or strings of characters; perhaps, indeed, one needs to use a componential alphabet so that NAnlnd (and the like) can be a single character but with components susceptible to separate manipulation. Compare Chomsky's 'syntactic features', Chomsky 1965.



With generalized rewrite rules, we can handle this situation more neatly. First, we need a single (elementary) rule of type i? 12 : PRS.

N -+(N Gn, N Dp).

Note that the parentheses to the right of the arrow are not italicized. They are not brackets included in the alphabet A as auxiliary characters (§2.2 (3)), but the curved parentheses that signal that what appears between denotes the elements of an ordered set (§1.1). That is, the yield of rule PRS is an ordered pair of outstrings: in the first, N has been expanded into N Gn, while in the second the same N has been expanded into N Dp. Next, we need four rules of type R11, which can be formulated as two composite rules: PRe. PR10-

Gn -» An, In Dp -»Ind, Dep

in the env N in the env N

Finally, we need a single rule of type R21; note, again, the nonitalicized parentheses, here to the left of the arrow: PRn.

(N X, N Y)


where X and Y are variables, X being An or In, Y being Ind or Dep. Instrings acceptable to PR9 and PR10 are generated, we assume, by no rule except PRS. Paired instrings acceptable to PRn are generated by no rules except PRa and PRW- PRg is obligatory— there is (presumably) no way to eliminate the nonterminal character N except via PR9-PRn. PRn is obligatory in a similar way. But the four elementary rules subsumed by PR9-PRW are not individually obligatory. Hence we have four non-obligatory elementary rules matching four alternatives: one each for An, In, Ind, and Dep. The selection of gender is not given any artificial priority over that of dependency, nor vice versa. In a more general way, suppose that there is a major class of forms N in some language, subject to r intersecting subclassifications {Gk}r ('G' for 'generic category'), and that there are nk classes ('specific categories') {Su k }n k in generic category Gk. Without generalized rewrite rules, we should need nx elementary rules for



the introduction of the first generic category, ntn2 for the introduction of the second, and so on, ending with ntn2- • -nr for the last. The total number is thus

+ nx«2 + . . . +

HJMJ •••nr

= n.

This total depends on the order in which the generic categories are introduced, and can be minimized by ordering them (with n2 . . . is nr. In the Potaappropriate relabeling) so that nx watomi sample, where r — 2 and nx — ni = 2, either order minimizes, and the total number required is 6. Whichever generic category is put last, in the general case, choice of any specific category from that generic category will correlate with n^ • -nr-i elementary rules. With generalized rewrite rules, we need, first, a single rule of type Rlr:

GRX. N-+ (NGltNG„


Next, there must be r sets of rules, all of type R"; the /cth of these sets can be formulated as a single composite rule that subsumes nt elementary rules:

GR2. Gic H* Sic1, Sk2, •••, Stnk

in the env N-.

The total number of elementary rules in these sets is clearly «i + «2 + • • • + n r = n'. Finally, we must have a single rule of type FT1:

GR3. (N Xx, N Xt,..., N Xr)

NX^X2 •••Xr,

where Xj,j = 1, 2, ..., r, is a variable with domain Thus, the total number of elementary rules required is « ' + 2 . It is easy to see that for almost all possible values of the m, n is very much greater than ri +2. This shows the greater efficiency of the generalized rewrite rule procedure. In addition, the use of generalized rewrite rules has the advantage that there is just one non-obligatory elementary rule for each specific category of each generic category.



6.5. The Stratificational Model. Our next investigation (§6.2, end) has to do with Lamb's stratificational model. 9 Figure 38 'The man caught the tiger.' decl

past —the

adult human male

Figure 38

shows one of Lamb's networks of semons. In the stratificational model, such networks are proposed as inputs. Ignoring, for the moment, the labels on the nodes, we note that although the network is a partially ordered set, there is no universal upper bound. This is perhaps clearer in Figure 39, where the same network is the, adultv human—being male/ decl agt past (catch)—2fdo

(tiger) -> (cat)

/ V

( m a m m a l ) — j being the-^ Figure 39

»Lamb 1961, 1964a, 1964b, 1965?, forthcoming; Gleason 1964; Sgall 1964, White 1964.



deformed so that all the arrows point in the same general direction. If my own explorations of the stratificational model (§§6.7-8) can be relied on, then semon networks are always partially ordered sets but rarely, if ever, have universal upper bounds. In the light of the discussion of §6.3, the inference is clear. A conversion grammar designed to accept semon networks cannot use rewrite rules. Stratificational grammars are not rewrite grammars in disguise, as some have suspected, but an essentially different genus. We can best at the essential differences if we first consider a little more closely just how the rewrite format manages to meet conversion grammar requirements. A conversion grammar must have inputs: in the rewrite format, inputs are arrays of rewrite rules. A conversion grammar must have outputs: in the rewrite format, outputs are arrays of characters in the terminal subalphabet T. A conversion grammar must have the function g: in the rewrite format, that function is specified by the detailed structure of the individual rewrite rules. A conversion grammar must be monitored: we take a rewrite grammar to be input-monitored, in that the detailed structure of the individual rewrite rules specifies exactly what networks of rules are acceptable as inputs. The detailed structure of the individual rules, be it noted, is the only locus of occurrence of the nonterminal characters of the alphabet A (or Q). What, then, is the role of these nonterminal characters? They do not—or, at least, need not—match anything in the language of which the grammar purports to be a grammar. They are accounting symbols, arbitrary constants, dummy variables, traffic signals, descriptive conveniences, whose sole role is to specify how elements of input can be assembled and how such assemblies lead to outputs. A rewrite rule is two things at once: it is a character of an input alphabet, and it is a specification of a mapping of strings (or stepstrings) into strings (or stepstrings). As a specification of a mapping, a rewrite rule tells us two things: (1) the possible positions of the particular rule in input arrays, and (2) the effect of the rule, in any position where it can occur, on the terminal string being generated. As an input unit, a rewrite rule should also (3) correlate in some



sensible way with meaning. Here, then, are three considerations. Let us try to associate them in a different way. First, let us stop using the term 'rule' when we mean an element of input, and instead use Lamb's term 'semon'—even with reference to the rewrite rule format. In this format, then, an input is an array of semons, and each semon refers to a specific rule. This change is so purely terminological that, if we were not intending to move outside the rewrite format, it would be laughable. For there are the same number of semons and of rules—we are just using different names for a coin depending on which side is up. But this will cease to be so with our other proposals. Second, let us try to make semons match meanings. Third, let us try to make input geometry a function of the participating semons themselves, not something determined by the rules to which the semons refer. Suppose we have a finite collection of semons, not necessarily all different. The semons of the collection have inherent valences, by virtue of which, if the collection is compatible, they will 'spontaneously' assemble themselves into one or another valid network; for a given compatible collection, the number of distinct valid networks must be finite. If the collection permits no valid network, it is incompatible. The analog is obviously chemistry: given two carbon atoms and six hydrogen atoms, where atoms of the same element indistinguishable, the only valid network is that for ethane: H H





However, the analog is vague: semons must have valences that differ not only in number but also qualitatively—specific affinities for semons of certain other types. Fourth, instead of using rewrite rules as the rules to which semons refer, let us try to use context-sensitive realization rules, like those discussed in §§3.7-8. That is: any semon in a given environment of other semons has an unambiguous realization; there are



approximately 10 as many composite realizational rules as there are semons; and each such composite rule consists of an ordered set of simpler realizational rules to provide for environmentally conditioned differences in the realization. These alterations take us from the rewrite format to the stratificational format. At the risk of becoming tiresome, we must summarize and underscore the differences, which are widely misunderstood. It seems to me that the misunderstanding stems largely from the fact that the same word 'rule' is used in such strikingly different ways. To show this, we shall temporarily eject the term 'semon' from the terminology of the rewrite model, into which we introduced it just a moment ago; the proper parallelism between ingredients in these two kinds of grammars is then as follows: rewrite grammars:

stratificational grammars:

input elements:

RULES (nonobligatory rewrite rules)


internal machinery for specification of g:

auxiliary characters; obligatory rewrite rules

RULES (realizational); intervening strata


(simple strings, or stepmatrices, in both).

Now let us restore the term 'semon' to the terminology for any conversion grammar intended to be a grammar for a language. Recall the three considerations set forth earlier in this section. In either a rewrite grammar or a stratificational grammar we have semons, and the semons refer to rules. In a rewrite grammar, a semon correlates with meaning (consideration 3); the associated rewrite rule specifies input geometry (consideration 1) and the effect of the semon on output (consideration 2). In a stratifica10 I say 'approximately' because it may be convenient to allow some rules to operate on small sets of semons in fixed arrangement rather than on individual semons. Compare §3.8, where there are 25 morphons but 24 composite rules.



tional grammar, a semon correlates with meaning, but it also specifies input geometry, so that the associated realizational rule need only provide for the effect of the semon on output. 6.6. Architectonic Comparison of the Models. The preceding section covers the fine-detail differences between the stratificational model and rewrite-rule grammars, but there are also some largerscale comparisons to be made. In Lamb's view (but setting it forth in our terminology), we should describe a stratificational grammar for a language as a conversion grammar consisting of three partial grammars G 1 G 2 G 3 in tandem. Inputs for Gj are networks of semons; outputs are strings of elements called lexons.u These latter are in turn inputs for G a , whose outputs are strings of morphons; G 3 is then like our G" of §3.0, converting morphon strings into phonon stepmatrices. G 1; the coupled pair G ^ , and the coupled system G = GiGoG3 , are input-monitored conversion grammars. But G 2 , G 3 , and the coupled system G 2 G 3 need not be grammars at all, since they do not have to be either input-monitored or outputmonitored—the specification of inputs to G 2 is provided for by G x . Let us now consider the most recent proposal by the transformationalists. 12 The grammar proper takes the form of two partial grammars G 2 G 3 (the choice of subscripts is intentional); G 2 and the coupled pair G 2 G 3 are conversion grammars, while G 3 , taken alone, need not be. G 2 is input-monitored: inputs for G£ are networks of rewrite rules; outputs are strings of morphons; and from our extended discussion of §3 we might as well say that G 3 = G 3 . It is not assumed that there can be any very simple or direct correlation between either the input to or the output from G 2 and meanings. Therefore, one associates with the grammar proper an interpretive system S which ties together sentences and meanings. It is customary to view S as attached to the grammar proper via outputs from G2. But this view is a consequence of the 11

Lexon strings may be bracketed; this is not clear. See Chomsky and Miller 1963, Katz and Fodor 1963, and, especially, Katz and Postal 1964. 12



particular loose-limbed conception of G2, which allows distinct sentences to have the same 'structural description' except for alternative choices from a lexical choice rule. As we have developed rewrite grammars in this essay, no two distinct sentences can have the same 'structural description'—that is, the same input to G 2 — though two different inputs may lead to the same output. Therefore the proper place for the attachment of S is at the input end of G2. Furthermore, a 'meaning' is simply a network of semons. Hence we may say that, if C is any legal input to G2 then S associates C with one or more (but at most a finite number of) networks of semons. Inputs to S are not determined by S itself, but by G2. However, we may detach from G2 just exactly all those features which define the class of inputs acceptable to it (and hence also to S), and, instead, assign them to both S and G2. With this modification, G2 remains an input-monitored conversion grammar, while S becomes the inverse of an output-monitored conversion grammar, whose own inverse S - 1 = Gi is therefore an output-monitored conversion grammar. We have, then, the following: stratificational grammar G = ^ G t G2 G3 , transformational grammar G' = Gi^G-i G 3 , where G x is an input-monitored conversion grammar, G ^ an outputmonitored conversion grammar, and the arrow means something like start here. The difference between an input-monitored and an output-monitored conversion grammar is that, although both map inputs into outputs algorithmically, in the former one knows whether a would-be input is legal or not without reference to output, whereas in the latter the acceptability of an input can be determined only by seeing whether the corresponding output is valid or not. This difference in the first partial grammar of the whole system is, then, the crucial architectonic difference between the stratificational model and the transformational. At least one other key difference stems from this first one. In the stratificational model, the 'interpretive system' corresponding to the grammar G



is not merely the inverse Gi 1 of the first partial grammar, but the whole inverse G _ 1 =Gi 1 G~2 1 Gi 1 ; and this whole system G 1 allows only trial-and-error determination of the acceptability of an input. In a moment we shall see the advantage of this. In characterizing a stratificational grammar G of a language, we said that the partial grammars G 2 and G 3 need not themselves be conversion grammars, since inputs to G 2 G 3 are determined by G ^ But we only said 'need not be'; we did not say that they could not be. Suppose we let G 3 be a conversion grammar such that the set of outputs from G 2 are a proper subset of the set of inputs acceptable to G 3 . The grammar, as a theory of a language, can then hope to account for utterances that conform to the phonology of the language but are both ungrammatical and meaningless (or, more often, only in part grammatical or meaningful). Such utterances occur: 'pocketa-pocketa-pocketa queep' is nonsense—but English nonsense, not French or Swahili. We say that such utterances are generated within (or by) G 3 without input from, or with only partial control from, G 2 . Similarly, let G 2 be a conversion grammar such that the set of outputs from G x is a proper subset of the set of inputs acceptable to G 2 . The theory then accounts also for utterances that are grammatical but meaningless: Chomsky's famous example 'Colorless green ideas sleep furiously', or genuine (rather than especially coined) cases like 'I want some hot-water juice and a lemon'. Furthermore, the theory has the requisite machinery with which to account for all manner of mixtures of these two types of nonsense. I think it is clear from our exposition of the very close architectonic similarity of transformational and stratificational grammars that the former could easily make these distinctions too. We saw in §3.10 (2) that it may not be advisable to keep the partial grammar G 3 (we were then calling it G") completely free of optional rules; instead, we may regard choices made within G 3 as characterizing stylistic differences. But now, in the stratificational model, we have not merely the original input to G x but also three successive conversions—from semons to lexons by G 1 ; from lexons to morphons by G 2 , and from morphons to phonons by G 3 .



If each of these three partial grammars is exactly a conversion grammar, then there is no room for slippage: stylistic variations have to be accounted for just by the machinery available for the explication of nonsense, namely by partially independent inputs to G 2 or to G 3 . It is perhaps preferable to think of the three partial grammars not as conversion grammars but as almost conversion grammars. That is, the mapping gi for each partial grammar G« (/' = 1, 2, 3) is almost a surjection, in the sense that it is very nearly many-to-one rather than one-to-many or many-to-many, but occasionally (for some inputs) not quite. This way of putting it is not very elegant, abstractly speaking; but I believe it can be 'fixed up' along lines of formalism touched on in §7. However we may choose to formalize the informal loosening-up just described, it is clear that we have three distinct varieties of stylistic variation built into the theory, instead of just one. These are essentially available to both models, not just to the stratificational. Perhaps the proposed trichotomy of style will prove to be a theoretical embarras de richesse, but I doubt it. There is one further possible distinction between the stratificational and transformational models that we must mention. This has to do with the apportionment of matters between Gi and Gi in the transformational model, between Gx and G 2 in the stratificational. Since Gi is, by definition, a rewrite-rule grammar, it presumably includes single-based and double-based transformations among its rules, or else bomb-planting rules as set forth in §5.5. G 2 , on the other hand, operates with realizational rules which may not serve to distinguish valid from invalid inputs. An input to G a —a string of lexons—is going to look much like a string over the alphabet A generated by some partial network of rewrite-rules of G 2 , and the realizational rules of G 2 are not going to bring about any major reordering of the terms of that string comparable to that achieved by a transformation or by the details of the bombtriggering rule. Thus, some of what we imagine being done by transformations or bomb-planting rules in the transformational model, within G 2 , will probably be done by the input geometry for and the realizational rules of Gx in the stratificational model.



6.7. Semons and Semon Networks. In §§6.5-6 I spoke of the stratificational model in somewhat glowing terms, yet tried to make clear the absence, so far, of any formal guarantee that stratificational grammars exist. There are, as a matter of fact, two by no means trivial problems in the stratificational approach. The partial grammar G x must be a conversion grammar of a certain species. Its input elements must be geometrically self-organizing: inputs, as we have seen (§6.5), may well be networks of some class more general than partially ordered sets with a universal upper bound. And g must map such networks uniquely into strings of lexons. The problems, then, are these: (1) Can semons be set up in such a way that, in addition to correlating with meaning, they define the geometry of their own interconnections in acceptable networks? (2) Can realizational rules be devised that will map such general networks into bracketed strings? We shall discuss these problems in this section and the next. There is a heuristic strategy in the search for semons. Consider the following three assertions: (a) X belongs to class Y; (b) X has property Y; (c) X consists of components, one of which is Y. The close kinship of (a) and (b) has long been known, and is reflected by the two possible ways of defining a set (§1.2): one may enumerate the members (if the set is finite), or one may state a membership criterion. The latter procedure defines a class Y in terms of a property Y. The former procedure defines a property in terms of a class, since property Y can be defined as the property of belonging to class Y. Now if we define a component Y, or perhaps merely the fact of containing the component Y, as a property Y, the three-way kinship is immediate. A working principle in the search for semons and their potential interconnections is to examine sentences with full regard to their meanings and with clues in the form of all the traditional grammatical statements that might be made about them; whenever such an assertion is in form (a) or (b), it is converted to an equivalent



assertion in form (c). For example, where the traditional grammarian would say that 'man' is a noun, which is to use form (a), we say that 'man' consists of components, one of which is noun (or, as Lamb has it in his networks, being). Where tradition says that 'The man caught the tiger' is a declarative sentence, we say that the sentence contains a component declarative. Consider, similarly, the following two assertions: (d) X bears the relation R to Y; (e) X and Y are the first and third terms of an ordered triad (X, R, Y). Where tradition would use form (d), in the search for semons we convert to form (e). Tradition says that, in the sentence 'The man caught the tiger', 'the man' bears to 'caught the tiger' the relation of subject to predicate; we say, instead, that the sentence contains the triad 'the man', agent, and 'caught the tiger'. The components turned up by these working principles are what Lamb calls sememes. A semon is then a sememe that does not consist of smaller components. In Figure 38, the node-labels that are not enclosed in parentheses are supposed to name semons; those enclosed in parentheses name sememes the semonic structure of which is uncertain or, for the moment, unimportant. Whenever analysis dissects a sememe into two smaller sememes, we represent each smaller sememe by a (labelled) node, and connect the nodes by an arc. (We also orient the arc by adding an arrowhead, but the criteria for this cannot be discussed until later in our exposition.) For example, recognizing that 'The man caught the tiger' contains a sememe declarative (= decl) implies this network: • decl

(rest of sentence).

The network for 'Did the man catch the tiger?' would be the same except that the left-hand node would be labelled inter (= interrogative) instead of decl. Again, since 'cat' is a noun we have






and since 'black cat' is a noun phrase we have • >-• •• (black)



Calling on dictionary information—which is just as valid for our purposes as the sorts already mentioned—we observe that in some contexts the noun 'man' refers only to some adult male human; hence we have

adult male •—-j human •


The phrase 'adult male human being', on the other hand, is this:


>• •




>• •

>- •



where 'being' denotes an identified semon while '(being)' denotes a sememe that remains unanalyzed. Likewise, when a sememe is recognized as being composed of X, R, and Y, we posit a node for each of the three, with the R in the middle: •—

>• •

(the man)



(the tiger).

Figures 40-61 give further and more elaborate examples; in every case, some sememes are left unresolved into semons, but that will not affect the argument. Note that the difference between declarative and interrogative is taken to be the presence at the same place in the network of two different semons (compare Figures 40 and 42), but that that between active and passive is interpreted as a difference of arrangement of exactly the same



semons (Figures 40 and 41). Note also the handling of reflexive sentences, in Figures 52-53. As was suggested in §6.5, semons cannot be self-organizing into networks unless they are of various valence types; this, in part, explains the arrowheads on the arcs in our examples. An arrow leading from the node for a semon constitutes a (positive) valence of that semon. We see that, purely as to the number of positive valences, the semons in our examples fall into three types: I. II. III.

Links: positive valence 2: agt, gl. Kernels: positive valence 0: being, do. Modifiers: positive valence 1: the, past, decl, sg, adult.

It is also necessary to indicate the specificity of each positive valence: that is, what sort of semon can appear at the arrowhead end of the arc. Thus, one valence from a link necessarily attaches itself to the being semon, and the other to the do semon, except that either of these may be replaced by another link (e.g., in Figure 50 there is an occurrence of gl with one arrow attached to an occurrence of agt instead of to an occurrence of being). Modifiers fall into a number of subclasses in this regard: IIIA. IIIB. IIIC. IIID.

¿/-modifiers (or adverbials), which can attach to a do or, perhaps, to another ¿/-modifier, but to nothing else: past. ¿-modifiers (or adjectivals), which can attach to a being or to another ¿-modifier: adult. /-modifiers (or concorders), which can attack to a link: decl, sg. Universal modifiers, which can attach to a being, to a do, or to a link: inter.

Of course, this is by no means sufficiently refined; but we are concerned only with how the system might be made to work, not with all the details for a specific actual language. A semon, then, has a set of valences. A sememe larger than a semon also has a valence, consisting of the valences of the constituent semons insofar as they are not saturated (or 'satisfied') within the sememe. A sentence is then a sememe with a valence of



zero, all valences of participating semons being saturated within the sentence. 13 Any network into which any set of semons is allowed to organize itself by virtue of the inherent valences of the semons themselves must be an acceptable input for the grammar G^ Is valence avoidable? By a repeated application of the heuristic principles described at the beginning of this section, one might hope to bypass the need for recognizing sememes or semons of different valence types. It turns out, however, that any such effort leads to an infinite regress. Suppose, thus, that sememe X is of valence type Y. This is an assertion of type (a)—it say that X belongs to class Y. We convert it into a componency statement: sememe X consists of components, one of which is Y. We have thus decomposed X into



where I have omitted the arrowhead because I don't know where to put it, and where X' is everything in X except for Y. But now we have a sememe X', which must be of some valence type, say Z, defined by the fact that it can be linked to the semon Y. The threatening infinite regress is obvious. We stop the process, therefore, just when it has given us all the structure we need.


• being


—» —• —•

A/man/ ; A/s/; A/the/; A/this/; A/that/.



Now suppose that, in inspecting a semon network, we find the expanded nominal [03]:

deci. pl>aet


The realizational rules cited above tell us the lexon string into which this sememe maps will involve the lexons A/s/, A/the/, and A/man/. Obviously, we must also know the proper order for these three. We can make the specificities of valence yield this information: the modifiers in the expanded nominal all have a positive valence of 1, but the can be of one subtype, adult, male, and human

of another; decl and pi are already of a separate type (concorders), whose influence on the nominal filters through the link agt. Thus the valence-controlled output will be A/the man s/. (This is what we want: the mapping of A/man s/ into /¿/men/ is done by the partial grammar Ga.) In some cases the realizational rules appropriate for a particular expanded nominal or verbal must map it into an ordered pair of lexon strings instead of a single one; it is in this connection that lexon strings turn out to be bracketed (at least in the process of deriving from semon networks). Consider, from Figure 42, the expanded verbal [J J :


The output must be not A/shoot ed/, as for the corresponding verbal [A^ of Figure 40, and not A/do ed shoot/, but (A/do ed/, A/shoot/), because these two parts are destined to be separated when the lexon outputs for the participating expanded nominals



and verbals are appropriately arranged. As another example, consider the network of Figure 52, where the reduced representation takes this form:

Here [0 3 ] is

adult—* male human gl


and the representation must be (A/the man/, A/himself/), in two parts also due to be separated in the completed lexon string. If a nominal is dominated by two relators, as in Figure 52, we may think of a step in which the two successive strings into which the nominal will be mapped are 'detached' and separated. Thus, letting 03 be that portion of the expanded nominal which is responsible for A/the man/, and 03 be the portion responsible for ¿/himself/, we can modify the reduced representation to 0'3+- A3

A3 w e c a n compute the entire past and future history of the particle. Operationally, however, it turns out that it is impossible to determine the values of the six parameters via an instantaneous observation. One must observe over an interval of time and then interpolate. The presumed instantaneous values are therefore an idealization rather than a directly observable reality. Yet the assumption that instantaneous



values 'exist' at each moment allows an enormously powerful physical theory. Our assumption about scope will in fact be variable, altered to suit each problem with which we deal. But the most extreme form—our basic assumption about scope—will be that an idiolect may require a new grammar after each successive term of a terminal string that is in the process of being produced or received. Whenever we choose to subsume more than this by a single grammar, we shall be settling for some sort of time-average or group-average. Thus, what we ordinarily think of as 'a language' (or even 'a dialect' or 'an idiolect') will be handled in what follows not in terms of a single grammar, but in terms of an ensemble of grammars related to one another in certain ways. The formal manipulation of such ensembles of grammars requires a type of mathematics called category theory and another type called group theory. The latter is, in turn, useful in explaining the former. This accounts for the order of topics in the next two sections. 7.1. Groups.2 In §2.0 we defined a type of system called a semigroup, and another type called a monoid. Just from the names, it might be guessed that groups have been known in mathematics longer than semigroups or monoids. This is quite true. The logical relations among these three, however, do not mirror their history. Just as every monoid is a semigroup, but not vice versa, so also every group is a monoid, but not vice versa. Let G be a monoid with elements {gi} and identity go. Suppose we represent the monoid operation by mere juxtaposition. Then, from the definition of a monoid, we know that: (1)Ifgi and gj are any two elements, then gig] is a unique elemen t. (2) If gu gh and gk are any three elements, then (gtg})gjc = gtigjgk). (3) If gi is any element, then gigo = gogt = gt. Now let us assume, further, that corresponding to any element gi in G there is a unique element gi1 such that gtgi1 = g'^gi = go- The element gi 1 is called the inverse of gi. If our monoid G has this 2

Birkhoff and MacLane 1944, ch. 6; Chevalley 1956, ch. 2; Hall 1959.



additional property, it is a group. A group is a monoid with inverses. If the number of elements in a group (or monoid or semigroup) is finite, the system is finite; otherwise it is infinite. Here are five finite groups, each presented in terms of the table for its operation (compare the tables in § 2.0) : 1 i

' i

i a

; D'1 = C; and = E. Therefore we have not merely a monoid but a group. We shall call this particular group II(S3), for a reason that will be clear in a moment. Here is its complete multiplication' table: * A

* B





* I










































The table is not symmetric around the northwest-southeast diagonal: the group is not commutative. For example, BD = A, but DB = E. In our initial description of a permutation, we mentioned both a set of things and a corresponding set of positions, because that renders the description more graphic. But it is not really necessary (nor is it customary) to refer to the latter set separately, since one can simply assume that there is a set of positions, including exactly enough to go around—that is, one position per thing. 3 With this understanding, we may let Sn be any set whatsoever of n identifiable things; then the set n(Sn), consisting of all possible distinct permutations on Sn, is a group, called the full group or the symmetric group on Sn. There is a simple connection between the cardinality (§1.7) of Sn and the cardinality of the symmetric group n{Sn). If n— 1, then IJ(S„) consists of just the identity permutation. If n = 2, then /7(5 2 ) consists of just two permutations. If, as in the example used above, « = 3, then IJ(S3) consists of six permutations. This suggests a general rule, which happens to be correct: cardinality of Sn ( = value of n) 1 2 3 4 5 n

cardinality of II(Sn) 1 1-2 1-2-3 1-2-3-4 1-2-3-4-5

= = = = =

1! 2! 3! 4! 5!




= = = = =

1 2 6 24 120

The notation 'w!' denotes a function of any positive integer n, called the factorial function, whose value is the product of all the integers from 1 up to and including n. The rule is, then, that if n 'Indeed, if S i s any set, we may define a permutation as any bijection of S onto itself. n ( S ) is then the group whose elements are all the bijections of S onto itself.



is the cardinality of Sn, then «! is the cardinality of n(Sn). This rule still holds, as a matter of fact, if Sn is an infinite set, so that n is a so-called 'transfinite cardinal number'. The symmetric group 77(5') on such an infinite set is perfectly well defined, and its cardinality, «!, is a larger 'transfinite cardinal'. 4 We shall need this fact in the sequel, but can omit the details. The symmetric group, by definition, contains all the permutations on a set of things. But if we leave some of the permutations out, we may still have a perfectly good group. Suppose we try this with JJ(S3), for which we have the multiplication table. Let us choose A, plus as few others as possible. If the subset we choose is to be a group, then two things are necessary: along with any element, the subset must include the inverse of that element; and along with any two elements, the subset must include their product or products (one for each order). In II(Sa), as the table shows us, the inverse of A is A itself; so this requirement adds no extra element. But the product AA is 7, and 7 must be included in the subset. This is all we need: the subset {7, A} satisfies the requirements. So does the subset {7, B}; so does {7, E}; so does {7, C, D}; and so, of course, does the subset {7} consisting of the identity alone. This leads us to the notion of a subgroup. Let G be any group whatsoever (not necessarily a symmetric group, nor even a permutation group), and let H be a subset of the elements of G. Then 77 is a subgroup of G if it is itself a group under the operation defined for G. Obviously, every group is a subgroup of itself; and obviously every group contains at least the trivial subgroup consisting of the identity alone. Many groups, however, contain no subgroups except these two. For example, Glt G2, and G3 do not. 4 In §1.7 we indicated that the cardinality of a finite set is just the number of elements in it. This renders it natural to think of infinite cardinalities also as 'numbers' of a special sort: of course, this is an extension of meaning of the term 'number'. There are any number of transfinite numbers, which manifest their own peculiar arithmetic; the smallest is the cardinality of a denumerably infinite set; the next smallest can without fear of contradictory consequences be taken to be that of the continuum. If c is a transfinite cardinal and c is the next larger transfinite cardinal, then c! = c . Thus, if a set S is denumerably infinite, then 11(5) has the cardinality of the continuum. See Birkhoff and MacLane 1944, ch. 12; Breuer 1958.



The additive group of all integers, positive, negative, and zero, contains many subgroups—in fact, an infinite number. For let n be any positive integer. Holding n fixed, let m range through all the integers, positive, negative, and zero, and consider the set of all products {mn}. This set constitutes an additive group. Thus, along with any element mxn, the set also contains the element — m-ji, such that mxn H m^n — (m1 + (—?%))« = 0-n = 0. And along with any two elements m^n and m2n the set also includes m^n + m2n = (m1 + m2)rt, since m1 + m2 is some integer. If we set n = 1, then we get just the original additive group of all integers. Some subgroups of the form {mn} are subgroups of others. Thus, {m-4} is a subgroup of {m-2}, and {m-8} is a subgroup of {m-4}; in general, if k is an integral multiple of /, then {mk} is a subgroup of {ml}. We note that in order for a subset G' of a group G to be a subgroup, G' must contain the identity of G. If we are dealing merely with monoids, this is not true: that is, a subset M' of a monoid M can itself be a monoid even if it does not contain the identity of M. 5 The difference lies in the presence of inverses in a group. Since a subgroup must contain, along with any element x, also its inverse x~l, and since it must contain, along with any two elements x and y, their product xy, therefore it must contain the product xx'1 = i. This argument fails for monoids that are not groups. The last notion of group theory that we shall need is that of a homomorphism. Let G be any semigroup, with elements {gi} and, if G happens to be a monoid, with identity go- Let H be any other 5

This fact does not emerge clearly in the discussion of Chevalley 1956 ch. 1, but is quite clear in Clifford and Preston 1961, ch. 1. Here is an example of a monoid of which every non-null subset is also a monoid. The elements are all the nonnegative integers. The law of combination defines xy to be max (x, y). Clearly, any non-null subset of this monoid is a monoid with the smallest element in the subset serving as the identity. It can easily be shown, however, that if M is any monoid, any submonoid N of M is isomorphic to a submonoid N' of M whose identity is the identity of M. Let i be the identity of M, e that of N. Then i cannot be in N, since a monoid cannot have two identities. We construct N' from N merely by deleting e and adjoining i.



semigroup, with {hi} and ho defined similarly. A homomorphism from G to H is a function fi with G as domain and H as range, such that /¿(gi)/i(g2) = M&i&a)- A homomorphism maps all the elements of G into some or all of the elements of H, in such a way that products are preserved. Consider, as an example, a homomorphism from the group n(Sa) to the group Gi, with ¡x(T) = /x(A) = fi(B) = fi(C) = //(D) = i and fi(E) = a. It is easy to confirm that this mapping fulfills the requirements. If G is a monoid, then the subset of H into which G is mapped by a homomorphism must be a monoid; if G is a group, then the image of the homomorphism must be a group; if G is an Abelian group, then its image must be an Abelian group. These rather obvious requirements impose no particular constraint on the semigroup H as a whole, as long as it contains at least one idempotent element: that is, an element x such that xx = x. For if it does, then the requirements can at least be met by a homomorphism that maps all the elements of G into that idempotent element. (H may have an idempotent element without being a monoid; if H is a monoid—or group or Abelian group—then it necessarily has an idempotent element, namely the identity. If H is a group, then its identity is its only idempotent.) If G and H are both monoids, then it is necessarily the case that, if ho is in the image of a homomorphism /i, then fi(go) = ho. For assume that fi(go) = e e H. Since, for any g e G, ggo = gog = g, therefore, setting fi(g) = h, we have he = eh = h. But then we would seem to have, within the image of ¡1, two identities e and ho, and it must be that e = ho (§2.0). If G and H are both groups, then the conclusion is stronger: if H is any homomorphism from G to H, then n(go) = ho. This is true because the image of n must be some subgroup of H, and every subgroup of H contains the identity of H. In §1.10 we defined an isomorphism. Clearly, any isomorphism from a semigroup G to a semigroup H is a homomorphism, but not vice versa. A homomorphism is an isomorphism only if it is bijective.



7.2. Categories.6 A category is a type of mathematical system involving three sets of data and three postulates . The sets of data are: (1) A family of objects A, B, C (2) For each ordered pair (A, B) of objects, a set M^{A, B) of morphisms from A to B. (For a given ordered pair of objects, the set of morphisms may be empty.) We shall usually write 'M(A, B)' instead of 'M%(A, B)\ retaining the subscript only when dealing with two or more categories in the same context. (3) For each ordered triple (A, B, C) of objects, a law of composition of morphisms such that, for any n e M(A, B) and any v e M(B, C), there exists a unique £ e M(A, C) such that fiv = Note the diagram: *

If, for a given ordered triple (A, B, C), either M(A, B) or M(B, C) is empty, then the composition of morphisms is undefined. The postulates are: CI. M(/4, B) n M{A', B') = A unless A = A! and B = B'. CI. (Associative law for the composition of morphisms.) If ft e M(A, B), v e M(B, C), and £ e M(C, D), then (/¿v)£ = 'Category theory is less than two decades old; it began essentially with Eilenberg and MacLane 1945 (which I have not seen), and has been developed as a frame of reference for bringing together for common discussion and manipulation a wide variety of seemingly very diverse and unrelated situations in pure mathematics. Certainly nothing could have been further from the intentions of the originators than the possibility of a direct 'practical' application of the sort undertaken here. It is only fair to state that my application involves only a thin edge of the most elementary portions of the theory. My own limited knowledge I owe to a series of rich and elegant lectures on the topic given during the Fall Semester of 1964-5 by P. J. Hilton of the Department of Mathematics, Cornell University. I feel impelled to apologize to Hilton for the triviality of the present discussion and application. 'See §3 fn 14 for the use of 'nv' instead of 'vn'.



Diagrammatically : J£L J L r f ^ U u .


C3. For each object /t there exists a unique identity morphism I A e M(A, A) such that, for any morphism // e A) and for any morphism v e C), / / l ^ = N and 1 AV = v.8 As a trivial initial example, consider any partially ordered set (§6.1). Any such system may be viewed as a category. The objects of the category are the elements of the set. The set of morphisms M(x, y) for an ordered pair of elements (objects) x and y is empty unless x sS y, and contains the single morphism if x ^ y. If, for a given triple (x, y, z) of objects, there are morphisms lkx%i and iSj/z, then the definition of ordering guarantees that there will also be the morphism ^ x z , which we define as the same as fixyHyz', this satisfies the requirement for a law of composition of morphisms. The first postulate is met by the rather artificial but perfectly legal device already introduced: we say that a morphism ^xy is not the same as a morphism unless x = z and y = w. The second postulate follows from the transitivity of an improper inequality relation. As for the third: S u = lx exists for each x because, in a partially ordered set, x tH x for every element x ; its uniqueness is guaranteed by the fact that it is, indeed, the only morphism in M(pc, x). Categorical terminology would obviously be very cumbersome for the discussion of something as simple as a partially ordered set. But consider a somewhat less trivial example: the category {K, In general, there will be many morphisms in each set M{K, K'). Thus, at one extreme, any function that sends all elements of K into some single element of K' is a morphism, and there are as many morphisms of this special type in M(K, K') as there are elements in K'. At the other extreme (so to speak), if .ST and K' have the same cardinality, then one morphism of M(K, K') may be an isomorphism, in just the sense of §1.10. The identity morphism 1 K for each object K is the unique identity function that sends each element of K into itself. It is not hard to define the required law of composition of morphisms, and then to show that the postulates hold. As a third example we choose the category Jin whose objects are monoids (§2.0, §7.1) and whose morphisms are homomorphisms. Let G, H, and K be any three monoids, and let n e M(G, H), v e M(H, K). Then, for any two elements gx, g2 e G, M(gi)KSi) = Kgig-t) and viMigJMKgz)) = v ( X g M g 2 ) ) = MSigz) = M g J M g z ) , so that fiv e M(G, K), as required: a homomorphism on a homomorphism is a homomorphism. Postulates CI and C2 follow easily. The identity morphism 1 G for any monoid G is the identity isomorphism that sends each element of G into itself. As the fourth and final example we choose the category whose objects are groups and whose morphisms, as in the category Jin, are homomorphisms. The proof that is a category exactly parallels that partly given and partly suggested above for Jin\ since every group is a monoid, the argument for all monoids applies to all groups. We need one further notion of category theory: that of a functor. A functor moves us, as it were, from one category to another. Let , and


(b) with each morphism fi e B) a morphism ityt 6 0B), in such a way that these two postulates hold:

209 MS(&A,

F1. the composition of morphisms is preserved: