Markov Models and Linguistic Theory 9783110908589, 9789027917072


259 106 10MB

English Pages 196 Year 1971

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
ACKNOWLEDGMENTS
PREFACE
TABLE OF CONTENTS
LIST OF FIGURES
LIST OF TABLES
I. THE UTILITY AND RELEVANCE OF MARKOV MODELS TO LINGUISTICS
II. THE THEORETICAL POSITION OF MARKOV MODELS IN LINGUISTICS
III. PREVIOUS EXPERIMENTAL RESULTS AND THEIR RELATION TO LINGUISTIC THEORY
IV. THE RATIONALE BEHIND THE DESIGN OF THE NEW EXPERIMENT
V. DESCRIPTION OF THE EXPERIMENTAL CORPUS
VI. MACHINE PROCESSING STEPS
VII. THE PANEL EVALUATION PROCEDURE
IX. CONCLUSIONS
Appendix I. EXAMPLE PATENT CLAIMS
Appendix II. GENERATED STRINGS, MARKED FOR GRAMMATICALNESS
Appendix III. GENERATED STRINGS, WITH TRANSITION PROBABILITIES
BIBLIOGRAPHY
INDEX
Recommend Papers

Markov Models and Linguistic Theory
 9783110908589, 9789027917072

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

MARKOV MODELS AND LINGUISTIC THEORY

JANUA LINGUARUM STUDIA MEMORIAE N I C O L A I VAN W I J K D E D I C A T A

edenda curai

C. H. V A N

SCHOONEVELD

INDIANA

UNIVERSITY

SERIES

MINOR 95

1971

MOUTON THE HAGUE

• PARIS

MARKOV MODELS AND LINGUISTIC THEORY AN EXPERIMENTAL STUDY OF A MODEL FOR ENGLISH by FREDERICK

J. D A M E R A U

1971

MOUTON THE H A G U E • PARIS

© Copyright 1971 in The Netherlands. Mouton & Co. N.V., Publishers, The Hague. No part of this book may be translated cr reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publishers.

LIBRARY OF CONGRESS CATALOG CARD NUMBER: 78-135666

Printed in The Netherlands by Mouton & Co., Printers, The Hague.

ACKNOWLEDGMENTS

This work and the dissertation on which it is based would not have been possible without the encouragement, advice, and help of a great many people. I would like to thank in particular: my thesis adviser, Prof. Rulon Wells, for continuous guidance and many helpful suggestions; Profs. Isidore Dyen and Sydney Lamb, whose suggestions greatly improved the intelligibility of the thesis; Mr. Fred Blair, Mr. Luther Haibt, Dr. David Liberman, and Dr. Philip Smith, for generous contributions of their time; Mrs. Doris Crowell, who typed the manuscript speedily and accurately; the IBM Corporation for the use of its facilities and for financial support; and most of all my wife, Diane, who in addition to tolerating my own long hours without complaint, helped greatly by doing a variety of routine, fatiguing jobs.

PREFACE

In the time which has elapsed since this work was written in 1965 and its publication, many of the views of those I have criticized, as well as my own, have changed considerably. In addition, there have been very significant advances in data processing hardware and programming techniques. Nonetheless, I know of no comparable work in large scale computer simulation from the Markovian viewpoint. Therefore I believe that the description of that experiment and its results are still both novel and of value to others.

TABLE OF CONTENTS

ACKNOWLEDGMENTS

5

PREFACE

7

LIST OF FIGURES

11

LIST OF TABLES

13

I.

The Utility and Relevance of Markov Models to Linguistics

15

II.

The Theoretical Position of Markov Models in Linguistics

22

III.

Previous Experimental Results and Their Relation to Linguistic Theory

40

IV.

The Rationale behind the Design of the New Experiment

57

Description of the Experimental Corpus

62

Machine Processing Steps

71

The Panel Evaluation Procedure

90

Results of the Experiment

95

V. VI. VII. VIII. IX.

Conclusions

Appendix

I.

138

Example Patent Claims

143

Appendix II.

Generated Strings, Marked for Grammaticalness

146

Appendix III.

Generated Strings, with Transition Probabilities

171

BIBLIOGRAPHY

188

INDEX

194

LIST OF FIGURES

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.

Word List Packing Forming Strings from Text Sorted String List Condensed List of Strings Compacted List of Strings Generation Cycle, k = 3 Grammaticalness Ratios, All Strings Included . . . . Tree Structure of a Patent Claim Right Branching Structure Grammaticalness Ratios, PI Grammaticalness Ratios, P2 Grammaticalness Ratios, P3 Grammaticalness Ratios, P4

77 78 78 85 85 87 96 100 101 113 119 125 131

LIST OF TABLES

la. lb. II. III. IV. V. VI. VII. VIII. IX. X. XI. XII. XIII.

150 Most Common Words, News Magazine . . . 150 Most Common Words, Patents Rank Frequency Relationship for String Lengths 1 to 7 Total String Types Generation Time, Seconds Number of Nodes at Each Level Number Number of Nodes per Level for Text and Orders 2 through 5, PI Frequency of Modifier Types in Text and Selected Approximations Fraction of Grammatical Strings, PI Fraction of Grammatical Strings, P2 Fraction of Grammatical Strings, P3 Fraction of Grammatical Strings, P4 Fraction of Grammatical Strings for Each Sample Order 0 , PI Fraction of Grammatical Strings for Each Sample Order 1, PI

72 74 79 84 88 102 104 106 112 118 124 130 136 137

I

THE UTILITY A N D RELEVANCE O F MARKOV MODELS TO LINGUISTICS

In what follows, we will be mainly concerned with some linguistic experiments and their relations to linguistic theories. Therefore, it appears necessary as a preliminary to delineate clearly what is to be understood by 'theory', by 'experiment', and by the relation between the two (cf. Nagel, 1961, for further discussion). In the first place, it is necessary to distinguish an experimental law from a theoretical deduction. If a supposed experimental law is shown to be violated by an experiment, then either it was not a law in the first place or the conditions under which it holds were improperly stated; in any case an experimental law can only be tested by further experiment. In this sense, an experimental law is independent of the theory in which it is stated. Obviously, not all relations between observables are formulated as experimental laws. In general, an experimental law arises from testing the consequences of some theory, i.e., a theory suggests an experiment, which may or may not confirm the predictions of the theory. If it does not, confidence in the correctness of the theory is reduced. On the other hand, no experimental law or experimental evidence can conclusively establish the correctness of a theory. For any body of experimental data, there is always more than one theory consistent with that data. One theory, or set of theories, however, may be more credible than another, given a set of experimental laws, and this is presumably what is meant by saying that some piece of evidence supports a theory, i.e., the evidence makes that theory more credible than others.

16

UTILITY AND RELEVANCE OF MARKOV MODELS

All theories are in essence mathematical (Kemeny, 1959: 33), or rather, all theories which are precise are mathematical, in the sense that their form can be expressed abstractly in terms of mathematics. A theory, however, which is expressible only in terms of some very general mathematical constructs may gain very little from being so expressed (Chomsky, 1959b: 203). Thus, although a great many theories can be expressed in terms of mathematical set theory, if the sets are not limited in some way it is unlikely that anything very interesting will result from that expression. (As a slightly different example, it is not very interesting to know that the set of grammatical sentences forms a recursive set [Putnam, 1961: 44].) The mathematics used need not be exact, i.e., it may be statistical. Thus, there is a theory which says what fraction of uranium atoms will decay in any given period, within statable confidence limits, but none which can say that a particular uranium atom will decay in any given time period (Kemeny, 1959: 73). In what follows, I shall use the term 'model (of)' as synonymous with 'mathematical theory (of)'. This usage has become current in linguistics in recent years, as in Chao (1960). In particular, it corresponds to the usage of Chomsky. It should be noted that this is a different usage than that of Hockett (1954) in which 'model' means a framework or schema. It is different also from the common usage in physics, for example, where a model often means a concrete interpretation of a theory, from which it is hoped the theory can be better understood (Braithwaite, 1960). Within the framework just sketched, this study is concerned primarily with a particular experimental law and its relation to a particular mathematical theory. The particular experimental law concerned is the positive correlation of orders of approximation of English by a Markov model with grammaticality. Therefore, provided that the experimental procedures are not faulty, some of the results of the study are valid irrespective of one's view of the theories, and its value can be judged by how interesting, to linguists, the results are. Since this study deals with the experimental procedure in

UTILITY AND RELEVANCE OF MARKOV MODELS

17

statistical terms, the relevance of this kind of formulation must be shown. Specifically, this study undertakes to demonstrate that it was erroneous to dismiss a finite state model as irrelevant for English or any other natural language for the reasons so far employed. What is known as the finite state model (also called 'Markov model') has been described extensively elsewhere (cf. Hockett, 1953 and 1955; Chomsky and Miller, 1958 and 1963). It can be characterized by a machine with a defined list of internal states, a defined set of state to state transitions, and a defined alphabet, one symbol of which is emitted by the machine at each state transition. Such a device with a probability measure defined for each state transition is a finite state Markov source. A Markov source for which the states are identified with the preceding k emitted symbols is a k-limited source, and, if the symbols are letters or words of a language, L, is said to generate (k+l)-order approximations of L (Miller and Chomsky, 1963: 427). The zero-limited source (generating first order approximations) is a single state source which emits each symbol independently of the preceding ones, but with, perhaps, unequal probabilities for each symbol. The zero-order approximation is an approximation in which each symbol has the same probability. The concept of a finite state model should be clearly distinguished from the general class of probabilistic models. These models, too, may be used in linguistics. Yngve (1962) describes a sentence generation procedure based essentially on immediate constituents, in which the terminal elements, words, are randomly selected from a fixed set. If we admit that phonetics is properly a part of linguistics, then probability considerations apply to the measurement of formant frequencies, and in general to all measurements. Even aside from these cases, there appears to be a general place for probabilistic models in linguistics, including those areas for which we have or hope to have exact models. Within a general category of 'all-and-only' models, there can be distinguished those in which the primitive elements of the model are grouped

18

UTILITY AND RELEVANCE OF MARKOV MODELS

into classes. These classes are combined in specified ways to form new classes, etc., until some specified maximal class or classes are achieved. Such models, which may also include operations other than grouping, are often called structural models. It is almost certainly true that any structural model for a language which claims completeness will be complex. It may well be that for some operations, it will be more convenient to work with a probabilistic model rather than to trace a long, involved derivation. Similar considerations led to the use of probabilistic models in other sciences. The physicist uses Monte Carlo methods rather than attempt to solve a complex system of equations. A logic designer uses a statistical sampling for the placement of large numbers of interconnected components on the board to which they will be wired, rather than work out a very large combinatorial problem which would give him minimum wire length. To this should be added the fact that we have as yet no complete, exact descriptions of any language. Among the most complete and exact models for English are those described in Kuno (1963) and Robinson (1965), yet both of these models are neither complete nor exact. That is, there are English sentences which both models will fail to recognize as such (i.e., they are incomplete), and there are strings of English formatives which are not sentences but which are classed as sentences by the models (i.e., they are not exact). Note that in this sense it is possible for a model to be complete without being exact. It may be that the best way to achieve completeness is by means of a nonexact model, in particular, by a probabilistic model. The finite state model differs from all-and-only models in that, under conditions discussed more thoroughly below, it provides more than a two step ranking of utterances. All-and-only models, in particular those discussed by Chomsky, rank sentences as either deviant or nondeviant. Since the models so far discussed have been concerned mostly with grammar, sentences are called grammatical and non-grammatical. Since there are always potential or actual utterances which will be classed as ungrammatical (the construction of an all-and-only model being pointless otherwise),

UTILITY A N D RELEVANCE OF MARKOV MODELS

19

there must be a criterion or criteria for distinguishing the grammatical from the ungrammatical. All-and-only models using different criteria must also be different, if perhaps only slightly. It will be necessary then, to examine below the criteria now employed. The finite state models, on the other hand, rank utterances as more or less likely even though any given utterance is itself unlikely. It is interesting to see how this ranking relates to the dichotomy of grammatical versus ungrammatical. The history of serious theoretical interest in information theory, and statistical theory in general, in linguistics is very brief and the published output relatively small. A brief comment of this history may be helpful. The earliest influential work was apparently Shannon and Weaver (1949), introduced to many linguists by Hockett (1953). The theory was explained once again and some linguistic suggestions made in Osgood and Sebeok (1954) and Cherry (1957), is devoted in large part to information on theoretical ideas. Since then, most of the theoretical discussion has been directed toward demonstrations of the inadequacy as a model for language of any finite state model, largely by Chomsky (1957, 1959, 1961, 1963), Chomsky and Miller (1958, 1963) and Miller and Chomsky (1963). Two exceptions to this trend are Hockett (1961a) and Wells (1961). More recent theoretical discussion has been very scanty and in any case appears not to have been fruitful. Before any new theoretical discussion of this topic can expect to receive attention, it is necessary to at least discuss and if possible to counter the objections of Chomsky and others to this kind of a model. These objections are examined in some detail in Chapter II. Similarly, a number of experiments have already been performed to test one or more consequences of the Markov or some other model. It is important to be sure that the results of none of these vitiate whatever theoretical motivation exists with respect to Markov models, so that a sampling of these experiments is examined in Chapter III. It should be understood, however, that the motivation of these examinations is not to show that the previous studies were wholly fallacious, but rather that they were

20

UTILITY AND RELEVANCE OF MARKOV MODELS

inconclusive. That is, whatever one might judge the weight of evidence to be, it should be established that neither the theoretical argument nor the experimental evidence conclusively rules out any possible relevance for the Markov model in linguistics. Willingness to accept these arguments is probably in part due to one's position on the question of absolutism in linguistics. That is, it seems clear to me as well as others, (Harris, 1965: 365), that languages exhibit a variety of aspects. Some behavior appears to be rule directed, other behavior appears to be probabilistic. There does not appear to be any logical necessity for adducing only one mechanism for language behavior to the human organism. It is entirely conceivable that one sort of mechanism is invoked at one time and another sort at a different time, both perhaps leading to the same end results. (A certain amount of investigation has been carried out on the overlap between structural and probabilistic models [Stolz, 1965].) It is true that one may progress fastest by making a strong assumption and seeing where it leads, but still one should not be blinded by possibly temporary successes to the possibility of alternatives. Although primarily concerned with mechano-linguistics, a number of other sub-disciplines are also touched on in this study. Earlier it was mentioned that theories are more or less credible as they conform better or worse to experimental data. It will be necessary, then, to investigate experiments made as result of predictions from a finite state model and from certain all-andonly models. Since these experiments in the main turn out to be psychological experiments carried out on speakers' and hearers' responses, the study is necessarily also concerned with psycholinguistics. In particular, it will be necessary to consider the experimental methods as well as the experimental results. In addition, if an experiment is to be performed to test a result predicted by a finite state model, it is necessary to acquire data produced by such a model. What is asserted to be such data has heretofore been elicited from native speakers (Miller and Selfridge, 1950), but we shall see below that such a procedure is of questionable validity. Therefore, the necessary data for this experiment

UTILITY AND RELEVANCE OF MARKOV MODELS

21

was acquired by constructing a specific finite state model, a procedure essentially impossible without using very high speed data processing equipment. The process of construction and the time and difficulty involved in such a process are considered in Chapter VI. This is in brief, in fact, a linguistic study differing from other studies mostly in emphasis, since it is largely concerned with a machine model and its output. The machine model is, however, only technically different from an informant. In both informant work and machine work, one is essentially experimenting. The primary difference appears to be that an investigator working with an informant usually does not separate building a theory of the informant's language from testing the theory he is building. It is only relatively recently that linguists have taken time from the building of theories for hitherto undescribed languages to consider competing theories for particular languages or for languages in general. As this becomes more common, and the theories more precise, it is likely that mechanical experiments will also become more common.

II

THE THEORETICAL POSITION O F MARKOV MODELS IN LINGUISTICS

At the risk of being repetitious, it appears necessary to emphasize that the intent of the following discussion is merely to examine the strength of the arguments which have been raised against the Markov model. There is no implication that doing so will discredit a phrase structure model, a transformational model, or any other kind of model, nor is there a claim for any kind of absolute superiority of the Markov model over any other. A detailed discussion of the issues appears warranted, however, in view of the very strong claims which have been made regarding the inapplicability of any sort of finite state model in linguistics. In order to understand Chomsky's objection to the finite state model, it is necessary to be clear about two things: (1) what Chomsky considers to be the subject matter of linguistics, and (2) what properties he considers necessary in a model for it to be regarded as an adequate model for language. These two quite different concepts are thoroughly interwoven in Chomsky's writings. They are usually discussed, in connection with linguistic subject matter, as the distinction between competence and performance, and in connection with adequacy, as the levels of observational, descriptive, and explanatory adequacy. These notions are also related very closely to the meanings of the terms 'sentence' and 'grammatical', so that relevant remarks relating to these also will be discussed. Most of these ideas have been discussed also by others in the transformationalist school, but their discussions appear to be largely derivative from Chomsky.

THEORETICAL POSITION OF MARKOV MODELS

23

I will, therefore, confine myself to his writings and, to reduce misunderstanding, will quote liberally from them. In all sciences, ultimate goals tend to be vague and probably unattainable. It is customary to limit one's attention to what it is hoped is a more reasonable scope. Chomsky says: A reasonable, though still remote, goal for linguistics and psychology would be to construct a device capable of duplicating this (speak and understand) performance, or certain aspects of it. The linguistic abilities of the mature speaker can in part be characterized by what we might call a 'formalized grammar' of his language. I will... consider only these aspects of linguistic competence. (Chomsky, 1960: 530) His notion of 'formalized g r a m m a r ' is made more specific and related to past work in the following: The traditional aim of a grammar is to specify the class of properly formed sentences and to assign to each what we may call a 'structural description'. If we hope to go beyond traditional grammar in some significant way, it is essential to give a PRECISE FORMULATION [emphasis mine, FJD] of the notion 'structural description of a sentence' and a PRECISE ACCOUNT [emphasis mine, FJD] of the manner in which structural descriptions are assigned to sentences by 'grammatical rules'. (Chomsky, 1961a: 6; cf. also Chomsky, 1958: 152) The grammar relates, of course, to users: The device A is a grammar which generates sentences with structural descriptions; that is to say, A represents the speaker's linguistic intuition, his knowledge of his language, his langue. (Chomsky, 1963: 329; cf. also Chomsky, 1965: 20) The relationship of the grammar to language use is then not a direct one: Clearly the description of intrinsic competence provided by the grammar is not to be confused with an account of actual performance. (Chomsky, 1962: 915) We thus make a fundamental distinction between COMPETENCE (the speaker-hearer's knowledge of his language) and PERFORMANCE (the actual use of language in concrete situations). (Chomsky, 1965: 4; cf. also Chomsky, 1963: 326)

24

THEORETICAL POSITION OF MARKOV MODELS

Since, however, the linguist is directly aware only of his own intuition, he must garner the rest of his knowledge through the performance of others. He must, therefore, know what part of that performance he can safely neglect, i.e., what part is a reflection of underlying competence and what belongs purely to performance. The items that it (formalized grammar) generates will not be the utterances of which actual discourse is composed, but rather they will be what the untutored native speaker knows to be well formed sentences. Actual discourse consists of interrupted fragments, false starts, lapses, slurring, and other phenomena that can only be understood as distortions of an underlying ideal pattern. It would be absurd to try to incorporate these phenomena directly into a formalized grammar. (Chomsky, 1960: 531) In addition to properties of the output which can be neglected (interrupted fragments, etc.), some properties of the real device, i.e. the native speaker, can also be neglected: Linguistic theory is concerned primarily with an ideal speaker-listener, in a completely homogeneous speech-community, who knows his language perfectly and is unaffected by such grammatically irrelevant conditions as MEMORY LIMITATIONS [emphasis mine, FJD], distractions, shifts of attention and interest, and errors (random or characteristic) in applying his knowledge of the language in actual performance. (Chomsky, 1965: 3; cf. also Chomsky, 1963: 326) This, then is Chomsky's view of the subject matter of linguistic theory. We will return to fuller consideration of the implications of this formulation below. Note, however, that it is not clear what aspects of behavior are to be considered as reflecting competence; instead, we are given examples of some behavior that does not. Part of this difficulty is resolved when we consider what the adequacy conditions for a linguistic theory are. If competence is distinct from performance and there is a theory of competence, then there can also be a theory of performance: We can construct a model for the listener who understands a presented sentence by specifying the stored grammar G, the organization of memory, and the operations performable by M. (Miller and Chomsky, 1963: 467)

THEORETICAL POSITION OF MARKOV MODELS

25

.. .it is perfectly possible that M will not contain enough computing space to allow it to understand all the sentences in the manner of the device G whose instructions it stores. (Miller and Chomsky, 1963: 467) In general, it seems that the study of performance models incorporating generative grammars may be a fruitful study; furthermore, it is difficult to imagine any other basis on which a theory of performance might develop. (Chomsky, 1965: 15) It is not clear whether these performance models are considered part of linguistics or part of psychology. Whether Chomsky thinks them to be part of linguistics or not, many linguists would agree that indeed they are. The distinctions of observational, descriptive and explanatory adequacy were implicit in the early transformationalist papers. Thus, the phrase structure model was criticized for being unable to provide separate descriptions for ambiguous senses of a sentence (Chomsky, 1956: 123ff), and the condition that all grammars be of the same form was required in Chomsky (1957: 14). These ideas are fully explicit in Chomsky (1962: 923): The lowest level of success is achieved if a grammar presents the observed primary data correctly. A second and higher level of success is achieved when the grammar gives a correct account of the linguistic intuition of the native speaker... A third and still higher level of success is achieved when the associated linguistic theory provides a general basis for selecting a grammar... let us refer to these roughly delimited levels of success as the levels of observational adequacy, descriptive adequacy, and explanatory adequacy, respectively. (Cf. also Chomsky, 1965: 24) Less precisely stated, these conditions require a match of theory and data, a caution against establishing useless constructs (for example, definitions of ' w o r d ' which do not correlate well with what is generally called a 'word'), and a requirement that theory not be ad hoc for each language, which is essentially equivalent to paying attention to language universals and to the nature of language learning in children (Chomsky, 1965: 28). The difficulty is less with the rough adequacy conditions, than in the strict interpretation which is sometimes put on them. It is certainly true, as Chomsky suggests (Chomsky, 1958: 168), that all linguists are really seeking to explain intuitions, but what is clearly NOT true

26

THEORETICAL POSITION OF MARKOV MODELS

is that they are all seeking to explain the SAME intuition. Thus, one can accept the general idea that linguists seek to explain intuitions about the notion 'word' but might well reject the following: ...(assigning to) the doctor's arrival, but not the doctor's house, a structural description that indicates the Subject-Verb relation appears in the former but not in the latter phrase. But clearly (this) Jespersen's account is correct on the level of descriptive adequacy and the fact that the data processing operations of modern linguistics fail to provide the correct information indicates only that they are based on an erroneous conception of linguistic structure, or that observational adequacy is being taken as the only relevant concern. (Chomsky, 1962: 925) Similarly, the particular interpretation given below to the notion of observational adequacy might well not be universally accepted: Comprehensiveness of coverage does not seem to me to be a serious or significant goal in the present stage of linguistic science. Gross coverage of data can be achieved in many ways, by grammars of very different form. Consequently, we learn little about the nature of linguistic structure from study of grammars that merely accomplish this. (Chomsky, 1962: 937; cf. also, Chomsky, 1965: 26) If someone says of my description that this doesn't fit, and this, I would say that is not a very interesting comment. If on the other hand he says that the exceptions can fit into a different pattern, that is of the highest importance. (Hill, 1962: 32) One can ask, for example, how much we can hope to learn about linguistic structure f r o m the study of grammars which d o NOT comprehensively cover the data. It may be a great deal or it may be very little. Similarly, exceptions to one set of rules might well fit into a pattern for a different rules set, but there is no guarantee that the two rule sets will otherwise overlap. They might in fact be conflicting. This is by no means unknown in science (cf. Margenau, 1960: 354), and it is then necessary to make decisions on other bases. We will return to this below also. Conditions of adequacy for the performance models are not quite so specific. However: In considering models for the actual performance of human talkers and listeners, an important criterion of adequacy and validity must be the

THEORETICAL POSITION OF MARKOV MODELS

27

extent to which the model's limitations correspond to our human limitations. (Miller and Chomsky, 1963 : 421) In addition, such models surely must meet the three adequacy conditions discussed above, at least in their general sense. Two terms used freely below need also to be considered, 'sentence' (needed only to better define grammatical), and 'grammatical', by which I mean not the adjective 'having to do with grammar', but the adjective 'having to do with grammaticalness'. There is in addition a third use, occurring in the phrase 'degree of grammaticality'. Chomsky's notion of sentence is quite clear. For him, a sentence is the output of the rules of a generative grammar. I will... (insist) only that the grammar provide a list of fully grammatical sentences, a recursive definition of 'grammatical sentence'. (Chomsky, 1960: 532) The fully grammatical sentences generated by the grammar are the ones represented by trees headed by a single node labelled S, where S is the distinguished initial symbol of the underlying constituent structure grammar. (Chomsky, 1961b: 175) The term 'grammatical' is also defined: The attempts to define 'phoneme', 'morpheme', etc., presuppose a set of clear cases of applicability or non-applicability of these terms, with reference to which the proposed definition can be tested. The same is true of an attempt to define 'well-formed (grammatical) English sentence'. (Chomsky, 1961b: 178) Actually, there is a terminological difficulty here. If a 'sentence' is the final output of a set of rules, and if the final output of a set of rules is well formed, EVERY 'sentence' is well formed, and every sentence is grammatical. Consequently the phrase 'grammatical sentence' is redundant and the class 'unwell-formed sentence' or 'ungrammatical sentence' is empty. Thus, all sentences are grammatical, but there can be strings of formatives which are not derived from the set of rules and are therefore not well formed (grammatical). This is not the sense of grammatical in 'degree of grammaticality' (nor is it my sense of grammatical).

28

THEORETICAL POSITION OF MARKOV MODELS

The relation of the theoretical constructs to the observable data is not quite so clear. At one time, the relation apparently was thought to be a relatively direct one: A grammar of L seeks to formulate laws (grammatical rules) in terms of theoretical constructs (particular transformations, phonemes, etc.) which govern the construction of utterances, i.e., which correctly predict which physical events are and are not sentences acceptable to the native speaker, whether they have been observed or not. (Chomsky, 1958: 137) Now, however, in light of an explicit distinction of performance models and competence models, the relation is more complicated: Let us use the term 'acceptable' to refer to utterances that are natural and immediately comprehensible without paper and pencil analysis, and in no way outlandish... The notion 'acceptable' is not to be confused with 'grammatical'. Acceptability is a concept that belongs to the study of performance whereas grammaticality belongs to the study of competence. (Chomsky 1965: 10ff) There does not appear to be a distinct equivalent notion in the theory of performance for the notion of 'sentence', since we find: ...the more acceptable sentences... the unacceptable sentences... the unacceptable grammatical sentence... (Chomsky 1965: 11) The relationship of the theoretical notion of grammaticalness to the observational one of acceptability is not clear, a difficulty which has been noted by others (Marks, 1967: 202): ...it is clear that the intuitive notion of grammatical wellformedness is by no means a simple one and that an adequate explication of it will involve theoretical constructs of a highly abstract nature. (Chomsky 1965: 151) The notion of 'degree of grammaticalness', which is well defined in theoretical terms, is distinct f r o m the simple notion of grammaticalness, as was pointed out above. Given a grammatically deviant utterance we attempt to impose an interpretation on it exploiting whatever features of grammatical structure it preserves and whatever analogies we can construct with perfectly wellformed utterances. (Chomsky 1961b: 187)

THEORETICAL POSITION OF MARKOV MODELS

29

The degree of grammaticalness is a measure of the remoteness of an utterance from the generated set of perfectly well formed sentences. (Chomsky 1961b: 190) Details on one proposal for assigning a measure to utterances can be found in Miller and Chomsky (1963). In light of the above, we can now see some of the points at which it makes sense to differ with Chomsky, i.e., points which can be classed as legitimate differences of opinion and not terminological quibbles. In the first place, there are areas of difference concerning the constraints on a linguistic theory. For example, false starts, interruptions and the like are dismissed by Chomsky as not being matters of competence (which presumably equals linguistics), yet these phenomena are not totally random as we shall see below. If the general goal of linguistics is restated as the description and explanation of patterned language behavior, such phenomena are open for consideration. (Fromkin 1968: 48). Similarly, the finiteness of human memory is not considered by Chomsky to be a constraint on linguistic theory. Although in many cases it is more convenient to consider phenomena as constituting an infinite collection, there is no general principle that we must do so. There are areas of possible difference also in regard to adequacy conditions. Chomsky appears to be mainly interested in explanatory adequacy, and dismisses the lowest level of observational adequacy as of considerably less importance (cf. above p. 25). In general one does this at great risk since it is far too easy to neglect, regardless of intent, data that cannot fit a pet theory. We can agree with Chomsky that the theory which covers the most data, is not for that reason alone the one to be preferred; many theories might do that and yet not be preferable to a competitive theory. On the other hand there seems to be no clear way of balancing degree of coverage against value measured by explanatory power. The application of the criterion of descriptive adequacy, particularly, seems likely to give rise to differences of opinion, since

30

THEORETICAL POSITION OF MARKOV MODELS

the application of this criterion depends so much on one's personal opinion or intuition of what a good structural description of a particular sentence should be. Thus, I can easily conceive of a theory in which the doctor's house and the doctor's arrival have the same structural description, the difference between them being attributed to semantics and not syntax. Chomsky uses 'grammatical' as a technical term in a very precise way, so that it is essentially meaningless when not coupled with a particular set of rules. Since in this usage it is essentially equivalent to 'sentence' I shall use the term 'grammatical' differently below, corresponding more closely with Chomsky's notion of 'acceptable'. On the other hand, I will use the term 'sentence' in the same way as Chomsky does in connection with the performance model rather than as he does in relation to a particular rule set. Therefore, neither term is here well defined in a technical sense. More specifically I use 'sentence' to mean 'that utterance which a native speaker, upon reflection is willing to classify as well-formed'. Like Chomsky I use 'grammatical' as equivalent to 'well-formed', as above, but the notion of grammaticalness cannot be said to be well defined, since that property for which utterances are well defined is left unspecified except insofar as it is said to be known to native speakers. Let us now consider Chomsky's arguments against the applicability of a Markov model in linguistics. These are basically four, of which three have to do with a competence model and the fourth with a performance model. One of the arguments with respect to the competence model relates to descriptive adequacy and the other three to observational adequacy. The reason for the emphasis on observational adequacy seems simple enough; a model which is inadequate to describe performance need not be further considered. However, as I have tried to point out above (p. 25 ff.), we must specify W H A T performance. Chomsky's arguments are as follows: (1) The Markov model does not separate the clearly grammatical from the clearly ungrammatical.

THEORETICAL POSITION OF MARKOV MODELS

31

(2) Successive improvements in the Markov model will not change its status with respect to (1). (3) There are types of sentences in the language which the Markov model cannot generate. (4) It is impossible to collect the data necessary to build a Markov model; so much data is necessary that a child could not possibly collect enough of it to internalize such a model. Each of these arguments will be considered in some detail. The first argument is contained in the following: (I)

A certain number of clear cases [i.e. of sentencehood FJD] then will provide us with a criterion of adequacy for any particular grammar. (Chomsky 1957: 14) (Ha) (14) colorless green ideas sleep furiously (lib) (15) furiously sleep ideas green colorless (III) No order-of-approximation model can distinguish (14) from (15) (or an indefinite number of similar pairs). (Chomsky 1956: 116) What we are concerned with now is the relation between the first statement and the last. It seems likely that nearly everyone would agree with the first statement. Everyone would agree, I think, that (IV)

he would like to have it

should be distinguished from (V)

like have to he it would

It is by no means clear to me, however, that Chomsky's Ila is to be equated to IV, above, although I think I might be willing to equate l i b with V. As an informal experiment, five people who had seen none of the four strings before were handed a card with these strings (II, IV, V) on it in the order IV, lib, V, Ila, and asked to say which pair was most similar, and to continue doing this as long as they could see a basis for making similarity judgments. Four of the five noticed the vocabulary match and gave IV-V and Ila-IIb first, as might be expected. One subject, however, gave first ranking to IV-IIa before giving the vocabulary ranking. Discarding the vocabulary ranks, one subject ranked IV-IIa,

32

THEORETICAL POSITION OF MARKOV MODELS

one IV-IIb, one V-IIa, one Ilb-V and the last found nothing to separate lib, V, and Ha. While this is hardly a significant sample, the fact that there are five different answers does appear significant, since a set of CLEAR cases was required. Chomsky's statement quoted as III above is almost certainly true but perhaps not very interesting, if one takes the view that what should be clearly distinguished is V from IV. Note also that in the kind of models Chomsky is proposing, there is no way to characterize the difference between IV and (VI)

He would like to probate it

and yet this difference is also important for at least some purposes (e.g. language teaching). To rate the distinction between Ha and lib as more important than that between IV and VI is, of course, acceptable but hardly necessary. To reiterate, this argument asserts that the finite state model is observationally inadequate, since it must either generate neither or both of (II). However, whether or not (II) is a CLEAR test case is a matter of opinion. An informal check reinforces the view that the case is not CLEAR. For transparently clear cases, such as (IV) vs. (V), the argument against the model does not hold. Chomsky's second argument is related to the first but is somewhat different from it. It is this argument that will be put to empirical test. The relevant passages are: (VII) ...as n increases an n th order approximation to English will exclude (as more and more improbable) an ever increasing number of grammatical strings while it still contains vast numbers of completely ungrammatical strings (Chomsky 1956: 116; Miller and Chomsky 1963:429) (VIII) ...there is no significant correlation between order of approximation and grammaticalness. If we order the strings of a given length in terms of order of approximation to English we shall find both grammatical and ungrammatical strings scattered throughout the list from top to bottom. Hence the notion of statistical approximation appears to be irrelevant to grammar. (Chomsky 1956: 116; Chomsky 1957: 17) The relationship of these statements to the preceding argument should be clear, at least insofar as they relate to questions of gram-

THEORETICAL POSITION OF MARKOV MODELS

33

maticalness. This argument is, however, even more inclusive than the preceding one, since it claims that the Markov model is not only theoretically insufficient, it is also pratically useless and in fact irrelevant. If we consider the orders of approximation question itself, we can see that the statement is at least trivially wrong. That is, if we consider a k t h order of approximation, presumably all strings of length k—1 and less are grammatical, since they are all defined states. If the order of approximation k is less than the length of the maximum dependency m then for orders of approximation a, k < a < m, there doubtless will be strings for all these m — k + 1 orders which are ungrammatical. If there is no finite m, there will always be ungrammatical strings for any k, since there will be dependencies longer than k. This is, however, something quite different from saying that "there is no significant correlation" between the order of approximation and grammaticalness, unless one takes a rather unusual sense of 'significant', by considering only perfection to be significant. It seems 'intuitively' that if the order of approximation k is high enough, then the strings generated by the model should be very much like English. This assertion is at least empirically testable, as will be seen below. To recapitulate, the assertion about 'significant relationships' appears to follow from the preceding arguments only of one means to imply by this a complete correspondence between the model and the intuitively defined categories of grammatical and ungrammatical. The third argument is concerned both with what sentences occur in English (observational adequacy) and what types of structures occur in English (descriptive adequacy). Chomsky gives the following necessary condition for finite state languages: (IX) There is an m such that no sentence s of L has a dependency set of more than m terms in L. (Chomsky 1956: 115) That is, if a language is to be finite-state there must be some number m, such that no grammatical dependency spans more than m elements. He gives the example:

34 (X)

THEORETICAL POSITION OF MARKOV MODELS

(i) If Sj thens 2 . (ii) Either S3 or S4. (iii) The man who said that S5, is arriving today. (Chomsky, 1956: 115; Chomsky, 1957: 22)

where the si, S3, or S5 can be i, ii, or iii themselves. Further, (XI)

(the rat (the cat (the dog chased) killed) ate the malt) is surely confusing and improbable but is perfectly grammatical and has a clear and unambiguous meaning. (Chomsky and Miller, 1963: 286)

(XII)

...there are processes of sentence formation that this elementary [i.e. finite state, FJD] model for language is intrinsically incapable of handling. (Chomsky, 1957: 23)

and (XIII) ...we must conclude that the competence of the native speaker cannot be characterized by a finite automaton. (Chomsky, 1963: 390)

The observational question concerns the occurrence in English of sentences such as (IX). The descriptive question concerns the occurrence of such processes of formation. Again, with reference to specific cases, it appears that there is room for argument. Specifically, Chomsky is relying on the native speaker of English to accept an essentially indefinite series of self-embeddings as in (X) and (XI) above as always leading to grammatical sentences. I have no doubt that if a subject is confronted with the outer layer of (XI), the rat ate the malt, and is then presented one at a time with further embeddings, he can be brought to say that the result in each case is grammatical and he will probably never be willing to say that the result is ungrammatical. In another informal check, (XI) was presented to five relatively naive informants who were asked if this was a grammatical sentence, disregarding whether or not it meant anything or was confusing. Of the five, three rejected it out of hand and the two who accepted it did so for the wrong reason, by supplying 'understood' conjunctions for a compound subject and a compound predicate. Again, no claim is made for accuracy or statistical significance. I only want to

THEORETICAL POSITION OF MARKOV MODELS

35

show that the grammaticality of strings like (X) and (XI) is at least not obvious. I f we do not accept these strings as grammatical, then it is possible to accept Chomsky's statement in (XII), which is mathematically true, 1 and still reject his conclusion in (XIII), which is not. Part of the difficulty in these examples results from failing to make clear the distinction between what in INFINITE and what is INDEFINITE. Wells (1954) discusses this aspect of natural languages with respect to the vagueness of meaning of adjectives like numerous. Something very similar might well apply here. For example we can say there is some number xi of self-embeddings, greater than zero, which is 'always' judged to be grammatical, and some other number X2, less than, say, ten thousand, which is 'always' judged to be ungrammatical, such that xi is almost always less than X2, and for numbers between xi and the judgments of grammaticalness will vary. 2 The claim of Chomsky's quoted in ( X I I ) is interesting per se, in that the recursive property of self-embedding is attributed to language itself. This claim is used as part of the argument for the grammaticality of (XI), when Chomsky points out that whether such sentences can be understood or not without aids such as pencil and paper is irrelevant, since multiplication, for example, can be done for arbitrary numbers by anyone who knows the rules, but long numbers probably cannot be multiplied in the head (Chomsky, 1963: 327). From one point o f view the relevance of this with respect to the argument at hand is hard to see, even though I have not the slightest doubt that the statement about Consider statement ( I X ) and the sentences of ( X ) , in particular iii. N o matter what m we select, if iii is embedded in itself in place o f ss sufficiently often, the distance between the singular noun man and the singular verb is will exceed m. 2 A similar comment might well apply to the comments in Chomsky ( 1 9 6 5 : 198) on a statement by Dixon: "he states that 'we are clearly unable to say that there is any definite number N, such that no sentence can contain more than N clauses' (that is, he states that the language is infinite). Either this is a blatant self-contradiction, or else he has some new sense of the word infinite in mind." It could well be that Dixon means by indefinite the same thing that I mean, in which case there is no contradiction at all. 1

36

THEORETICAL POSITION OF MARKOV MODELS

ability to understand is true. The ability to use paper and pencil to parse a sentence is surely as much of a learned skill as the ability to multiply and neither need have anything fundamental to do with language. The fact that a human has the ability to learn and use recursive processes is not a reason to attribute these to language also. From another point of view, appeal to pencil and paper is indeed relevant to the argument but should be taken as evidence for the view that these structures are not part of English. If we consider spoken language, the fact that long self-embeddings cannot be understood is evidence for their ungrammaticalness. The assumption that there is no limit to the maximum sentence length, or to the number of self-embeddings, etc., is largely made, as Chomsky knows (Chomsky, 1956: 115), in order to simplify the description of the process being studied. By and large, the infinity assumption causes no trouble and is therefore legitimate as long as it is realized that it is a simplifying assumption. The only danger is that it becomes fairly easy to ignore the problems inherent in discreteness and finiteness. The fourth argument, which argues against the Markov model as a performance model, is based not on linguistic considerations, but on some assumptions about learning and the structure of the brain, and the constraints implied by these assumptions. (XIV) We know that the sequences produced by k-limited Markov sources cannot converge on the set of grammatical utterances as k increases because there are many grammatical sentences that are never uttered and so could not be represented in any estimation of transition probabilities. (Miller and Chomsky, 1963: 429) (XV) A staggering amount of text would have to be scanned and tabulated to make reliable estimates. (XVI) We cannot seriously propose that a child learns the value of 109 parameters in a childhood lasting only 108 seconds. (Miller and Chomsky, 1963: 430) It would serve no purpose here to consider extensively the proposed models for memory and learning. It is the case, however,

THEORETICAL POSITION OF MARKOV MODELS

37

that only very little is known about both problems. The question that must be addressed is how an unlikely string can be produced by a Markov model and indirectly perhaps, how a learning mechanism would then operate. One possible approach to this problem from the point of view of a competence model, is to posit an ideal observer, who constructs the state diagram for all possible states of all possible orders, assigning probabilities. This is essentially the viewpoint that Hockett (1961b: 234) proposes and to which we will return briefly below. For the moment, note merely that this approach ignores the comment in XVI and makes inapplicable the objections in XIV and XV, since no text is being processed. Chomsky is certainly right in saying that the amount of text which would need to be processed for empirically determining the transition frequencies would be staggering, once k becomes at all large, but this is, from the point of view of competence models, irrelevant. We can either picture the omniscient observer mentioned above or a machine which instantaneously processes all speech and writing and continuously recomputes the transition probabilities. The objection that certain sequences will never be produced is easily overcome by picturing a device for which the transition probability from one state to any other is never zero; it is only very small, i.e. most of the probability is accounted for by relatively few transitions, but the sum of these is less than 1.0, and the difference from 1.0 is distributed among all other states. In fact, we could posit such a device, with a large but finite memory, as a performance model (neglecting all other evidence for or against it). In addition to what has been described, we need some sort of universal state, perhaps, with also a small transition probability to and from any other state, to account for the ability to recognize a new word. Let us consider this model somewhat more fully. Hockett (1961b) proposes a double ordering of the states, first by length and second by probability. The difficulty with this model is that it contains zero entries for transitions and therefore would not allow the production of nonsense. I have proposed a minor

38

THEORETICAL POSITION OF MARKOV MODELS

modification, so that the states applicable to colorless green ideas sleep furiously and the like, while they have extremely small probabilities, have larger probabilities than furiously sleep ideas green colorless. This is done not by some empirical derivation of transition probabilities but by the aforementioned ideal observer who produces the matrix of transitions. (It is possible to conceive of an infinite processor deriving the transition matrix from an infinite text, but the difficulty is that we still need an infinite observer to insure that the text is ideal as well as infinite.) If we add to the infinite corpus the other necessity of Chomsky, i.e. a non-finite dependency m, it is easy to see what happens to a k-limited automaton and equivalently to a Markov-source performance model. For strings of length greater than k, there will be dependencies also greater than k of which the model is unaware. Suppose there is a dependency of length k + 1 . This is exhibited in the original infinite matrix by local maxima, relative to the other states, for those states to which transition is normally made. The k-limited automaton does not contain the rows and columns which exhibit the local maxima of transitions for this dependency. Therefore, in the output of this automaton, it would be purely fortuitous if the strings of length m happened to exhibit the same dependency. If we had allowed for zero transitions in the model, a matrix built from the output of the k-limited automaton would show non-zero entries for the length m transitions. The reason for not having zero transitions is to deal with examples of the I saw a fragile whale jI saw a fragile of type (Chomsky, 1957: 16), where, as Hill (1961: 7) points out, the second sentence is interpreted as containing an unfamiliar noun. In this way, we can say that the difficulty with a k-limited Markov model is simply that it generates some strings less often than it should, and spreads this difference arbitrarily over all strings. In addition, the explanation meets a requirement of 'explanatory adequacy' in that certain language phenomena can be attributed to general properties of the model, (i.e., the model 'explains' the phenomena). These include certain classes of hesitation and perceptual phenomena. However, the model is clearly not

THEORETICAL POSITION OF MARKOV MODELS

39

descriptively adequate in Chomsky's sense, since the structural descriptions it provides are very different from those he considers correct. I have noted, however, that the notion of descriptive adequacy appears to me to be the weakest of the three, since it is the least subject to direct checking. Therefore, I do not think that the model can be dismissed on this basis, and because of its possible high degree of coverage, deserves further consideration. Note also, that if the model is successful for low orders, this fact in itself requires explanation by a competing model, particularly in view of Chomsky's assertion that there is no significant correlation. In order to account for language learning, one might consider the learning process as one of continuously modifying a growing state transition matrix, with no state having a completely zero transition to any other state. It will be argued that this model does not account for the speaker's apparent ability to recognize categories like 'noun phrase', but one should be reminded again that no claim is made for exclusivity of this model. That is, learning of the transition matrix might well go on in parallel with some type of structural learning. Before leaving this discussion of theory, one other point should be made, that these Markov models have nothing whatever to do with the semantic models discussed by Wells (1961). As many others have pointed out, it is unfortunate that Americans, unlike the British, did not happen to adopt the term 'communication theory'. These models do not really have anything to do with communication theory either, except insofar as they have been stimulated by the notions of communication theory. That is, if we consider communication theory as a branch of abstract mathematics then we are trying to see if language can be interpreted in terms of the axioms and theorems of communication theory. In the next chapter, we will see to what extent communication theory has done the latter and with what result.

Ill

PREVIOUS EXPERIMENTAL RESULTS A N D THEIR RELATION TO LINGUISTIC THEORY

Although I have tried to show in the preceding chapter that the theoretical arguments raised against the Markov model are, at a minimum, not so strong as to be overwhelming, since they depend in part on how the field of linguistics is delimited, it might still be the case that experimental evidence so strongly discredits one theory or supports another as to make it unreasonable to be concerned with one of them. In this chapter, then, some of the relevant experiments will be examined. Very few of these have been done by linguists or have appeared in linguistic journals, the whole field of linguistic experiment appearing to have passed by default to psychologists and engineers. Since the literature is quite extensive, we will be concerned primarily with these three areas: hesitation studies designed to test the predictions made on the basis of information theory, experiments utilizing probabilistic models or data, including those using statistical approximation data, and perceptual experiments, mostly designed to show effects predicted by transformational theory. It should be stressed again that the point at issue is not a just weighing of the evidence, but rather a demonstration that finite state or Markov models are not discredited by the result of some experiment or set of experiments. It is largely for this reason that the final set of perceptual experiments has been included for comment.

PREVIOUS EXPERIMENTAL RESULTS

41

HESITATION STUDIES

The hesitation papers discussed consist of a set of suggestions to be tested, followed by a set of experiments which refer to each other, in such a way as to provide almost a classic example of the interaction of theory and experiment. That is, the theory leads to a prediction and an experiment designed to test that prediction leads to a result in accord with it. The suggestions referred to above are contained in Lounsbury (1954), although the first set of experiments described does not refer to them, so they might well have been initiated elsewhere and independently. The two relevant hypotheses proposed for test are: I (1)

Hesitation pauses correspond to the points of highest statistical uncertainty in the sequence of units of any given order. (Lounsbury, 1954: 99)

II (3) Hesitation pauses and points of high statistical uncertainty frequently do not fall at the points where immediate constituent analysis would establish boundaries between higher-order linguistic units or where syntactic junctures or 'facultative pauses' occur. (Lounsbury, 1954: 100)

Of the references, only Maclay and Osgood (1959) has any evidence with respect to the second hypothesis (Lounsbury's third; the other three, not given here, seem to be untestable at least at present). The three investigators, although all driving toward the same end, approached the problem somewhat differently. Briefly, the steps in a study to check the first hypothesis are to select or create a set of data, to mark and classify the pausal types to be studied, to arrive at information measurements for the items after pauses and for other items, where the other items might also be more finely distinguished, and to compare these information measures. All of the studies used speech material acquired for purposes other than this study of hesitations. The primary difference in the studies in this respect is in size, Goldman-Eisler (1958a) and Tannenbaum, Williams, and Hillier (1965) being roughly comparable and based on a few hundred words, whereas Maclay

42

PREVIOUS EXPERIMENTAL RESULTS

and Osgood (1959) was based on over 50,000 words. Tannenbaum, Williams, and Hillier borrowed the method of counting and classifying pauses from Maclay and Osgood. In essence they both used a subjective judgment with cross-checking of judges, so that the location of a pause and its classification was attested by at least two judges. This procedure tends to compensate for subjective variability but, of course, leaves open the possibility of missing occurrences of real pauses. Goldman-Eisler (1958a), used a mechanical method of counting by insisting on a silence period of .25 seconds and treating any occurrence of such a silence as a pause. This mechanical method can, of course, also err in both ways. The human judgment method of counting also allows the consideration of false starts and of filled pauses, which do not appear as silence, and concurrently the classification of pause types. Maclay and Osgood (1959) and Tannenbaum, Williams and Hillier (1965) distinguish among types of pauses, and are able to say some interesting things relative to specific types. The methods for arriving at information measures were different for all three studies. Goldman-Eisler (1958a) used the technique first suggested by Shannon (1949), of asking subjects to guess the next word of a sentence, given all of the preceding words, for all of the words in her 212 word text. She then repeated the procedure except that subjects guessed from the other end of the sentence, toward the beginning. The transition probabilities were taken to be the ratio of correct guesses to total guesses for each word guessed. At one point in the analysis, the forward and reverse transitions are averaged to arrive at a final transition ratio. The most recent study, Tannenbaum, et al., uses the Cloze procedure of Taylor (1953) to derive estimates of the transition probability, using in one case the total text except for those words whose transition probabilities were to be estimated as the deletion pattern, and in a second case every fifth word deletion. (It is not clear that Cloze procedure and backward transitions are a theoretically satisfactory way of proceeding; see below p. 48). Maclay and Osgood (1959), do not directly estimate transition probabilities at all, but instead map the words of the sample into the word

PREVIOUS EXPERIMENTAL RESULTS

43

classes of Fries (1952). In a sense, since the three studies use different approaches toward the estimate of word transitions, agreement among the studies is stronger evidence for the hypothesis than if all three studies had used the same way of estimating these probabilities. All three studies compare the estimates of transition probabilities after hesitation with the transition probabilities elsewhere and conclude that the word following a hesitation is more difficult to predict than other words, on the average. The argument for this conclusion in Maclay and Osgood (1959) which did not estimate probabilities, is that the lexical classes, (nouns,verbs, adjectives and adverbs), members of which occur more often after pause than the function classes, are open classes with a greater membership than function classes, and therefore the average transition probability to a member of one of the lexical classes must be smaller than that of the function words. Where they are comparable, then, all studies show the same result. In detail, however, they differ slightly. The data of Tannenbaum, Williams and Hillier (1965) indicate that if all types of pauses are lumped together, the word preceding hesitation is as hard to predict as the word after, and in both cases harder than other words. The study indicates, however, that the grouping of hesitation types may be wrong, in that it confounds different phenomena. Filled pauses, (i.e. uh and the like) show the word following a pause as hard to predict, and repeats show the word previous as hard to predict. Goldman-Eisler (1958b) shows that the number of pauses tends to decline with repetition of the same verbal task (in this case, discourse concerning a particular cartoon) which tends also to support the view that pauses are coincident with uncertainty, since repetition of the same material should reduce uncertainty. Lounsbury's third hypothesis was tested only by Maclay and Osgood (1959). Their data shows that approximately half of all pauses, filled and unfilled, occur within rather than at phrasal boundaries (Maclay and Osgood, 1959: 33). This inference is perhaps not really justified, since their data indicates that retraced

44

PREVIOUS EXPERIMENTAL RESULTS

false starts typically seem to begin at function words (Maclay and Osgood, 1959: 30), which means that at least this class may indeed correlate with syntactic boundaries. With the possible exception noted in the previous paragraph then, these studies, at least some of which were generated by predictions made from theoretical conditions in a Markov model, do support the model in that the predictions of the model appear to be confirmed. One should ask if perhaps the same empirical result could have been predicted from an alternative theory. The question appears not to have been asked by the transformational grammarians, apparently because these phenomena have been attributed to performance rather than to competence. There is a question, of course, about how regular a phenomenon must be before one wishes to attribute it to competence. In any event, the hesitation studies are positive empirical evidence in favor of a Markov model. In fact, unless the data are otherwise explained they are rather impressive evidence, since phenomena of this type have not normally been considered by linguists most of whom have apparently agreed with Chomsky that hesitations are part of parole rather than langue. Although the suggestion has apparently not been followed up, Hockett (1961a) has pointed out that the neglect by linguists of similar matters was really arbitrary. (He referred specifically to blends, e.g. shell for shout/yell, but the cases are very alike.) FREQUENCY STUDIES

The papers discussed in this section are not nearly so homogeneous a group as those of the preceding section. They are, however, all concerned with some aspect of information, redundancy, or frequency measurement. Many use order-of-approximation data. What will be emphasized from this collection is their relevance to the Markov model of language, either with respect to their methods or to their findings. The first of this group of references, Miller and Selfridge, (1950), describes one of the early experiments based on the Markov

PREVIOUS EXPERIMENTAL RESULTS

45

model. Its finding was that ease of recall was a monotonically increasing function of order of approximation to English. The importance of this for psychologists is put thus: The experiment shows, therefore, that the problem of meaning vs. nonsense in verbal learning need not be approached in terms of a qualitative dichotomy but can be studied as a functional relation between quantitative variables... By shifting the problem from 'meaning' to 'degree of contextual constraint' the whole area is reopened to experimental investigation (Miller and Selfridge, 1590: 183)

One result of this study is quoted again and again in other papers because it is so surprising: ...note that a fourth or fifth order passage is remembered as accurately as a meaningful textual passage. (Miller, 1950: 183)

This result will be returned to below. The work relating to degree of contextual constraint is directly relevant to our present problem, since, in a sense, it is equivalent to the m dependecy discussed above (chapter II, p. 38). While no experiment on finite corpora could establish the reality of a finite m, a finding that constraint rapidly approaches an asymptote would imply that a Markov model with finite m could APPROXIMATE English. Unfortunately, these studies suffer, for this purpose, from a variety of failings. Consider first the statement of Miller, quoted above, which is surprising indeed. Experiments were, of course, made to verify the result, and to better characterize the material. Salzinger, Portnoy, and Feldman (1962) rated each passage by the Cloze procedure (Taylor, 1953), and found that text was rated approximately equal to order five, and lower than orders six and seven. One additional fact is that text gives rise to fewer correct words than order 7. This indicates that it was misplaced on the order continuum. (Salzinger, Portnoy, and Feldman, 1962: 54)

This study was based on exactly the same material as MillerSelfridge (1950). A similar study appeared in 1965, using materials different from the preceding but constructed in the same way (Treisman, 1965: 126). In this experiment, however, the prose passages were not equivalent to such low orders of approximation.

46

PREVIOUS EXPERIMENTAL RESULTS

A passage from Conrad fell between the 8th and 16th order approximations and a passage of easy prose was estimated to be the equivalent of a 45th order approximation (Treisman, 1965: 127). While not as strange as the Miller-Selfridge findings, the position of the Conrad passage as still anomalous. Treisman suggests: ...the method of constructing the passages, taken directly from Miller and Selfridge (1950), distorts the hypothesis of increasing approximation to normal English by omitting punctuation marks. This means that in the higher order passages, the grammatical structure becomes unnaturally complex because no sentence can ever be complete. (Treisman, 1965:126)

Although the paper appears not to be well known, part of the reason for the Miller-Selfridge results was already given in Coleman (1963). He asked a group of linguists to rank samples of orders of approximation, which had appeared in the literature, with respect to grammaticalness. When he plotted the mean ranks against order of approximation, he found that this curve, too, leveled off at approximately order five. Coleman pointed out that the constructor of a sequence: ...when he was faced with a long sequence... sometimes... had to resort to a complex and farfetched sentence to incorporate it. The effect was multiplied because he was not allowed to use punctuation. (Coleman, 1963: 241)

Coleman attempted to normalize the passages by accounting for their complexity in terms of uncommon words, mean number of morphemes, and syllables. When this was done, the text samples lay beyond the order of approximations and the recall continued to increase rather than flatten or decline. Coleman's finding that the generated samples were confounding two effects, order of approximation and complexity, makes conclusions drawn from studies ignoring this effect somewhat suspect and perhaps wrong. Text is indeed easier to learn than order seven approximations and longer constraints apparently are operative in learning. An oddity like the quote above from Salzinger, et al., that text is misplaced on the order continuum,

PREVIOUS EXPERIMENTAL RESULTS

47

should perhaps have been a sufficient indication that the materials were inappropriate. Although Coleman attempted to compensate for the failings in the standard methods of constructing orders of approximation, his methods are somewhat arbitrary, and introduce additional variables into studies already plagued by far more variables than can be dealt with. It is because of Coleman's result that it was suggested earlier that one use for the order-of-approximation data, gathered in the way proposed below, was in psychological studies. Order-of-approximation data gathered from text, providing the results were reliable, would not be subject to the failings noted by Coleman. 1 Most of the studies discussed so far have been concerned with writing or with learning, yet the essence of language is speech. There have been a number of studies concerning the relation between frequency and intelligibility. Stowe and Harris (1963) demonstrate that context does affect intelligibility. The study operated with the last word in a collection of 11-word sentences. This length allowed up to 10 words of preceding context. The context was presented visually for clarity and the test word presented orally with varying signal-to-noise ratios. The number of correct words increased with increasing context at a fixed S/N ratio (Stowe and Haris, 1963: 641-642). Savin (1963) is concerned with the relation between absolute frequency and intelligibility for words spoken in isolation. Several studies had shown that frequent words had lower intelligibility thresholds than infrequent ones (Savin, 1963: 200). This study showed that the relationship is not a simple one. f l the stimulus is an uncommon word, and there are common English words that are phonetically quite similar to it, then these common words 1

One other point about Coleman's paper should be noted. After he had graphed the linguist' ratings order of approximation, he noted, with reference to the Chomsky passage about the relation between order of approximation and grammaticalness: "This curve contradicts a statement by Chomsky, and may be of some incidental interest to linguists." (Coleman, 1963: 241). If I have been mistaken in my interpretation of Chomsky, the mistake is apparently not unique.

48

PREVIOUS EXPERIMENTAL RESULTS

will usually be given as (incorrect) responses except at quite high speechto-noise ratios. ...when a word is not confused with another in this way, its threshold is relatively low regardless of its frequency of occurrence in English usage. (Savin, 1963: 200) These results make uncritical acceptance

o f the notion

that

frequency is simply and directly related to intelligibility decidedly risky. However, the idea that there is a frequency effect is not really weakened by this study, since where there was possibility of confusion the word given was higher in frequency than the masked stimulus. Pollack (1964), using a controlled vocabulary, was able to show that words o f higher probability are more intelligible, even independent o f a possible response bias toward highly frequent words. As was mentioned above, the important thing about these studies is that frequency effects and context effects can be experimentally demonstrated with spoken as well as written materials. Although the results appear to belabor the obvious, it is still well to have this experimental verification. Lieberman (1963) and Fillenbaum, Jones and Rappoport (1963), are included with this group because Lieberman (1963) is an attempt to show that a Markov model must be wrong since better identification o f missing words is accomplished when two-way rather than only left context is available, and Fillenbaum, Jones, and Rappoport (1963) use the Cloze procedure to say something about predictability o f words and form classes, Cloze also being a procedure that involves guessing o f missing material given a bilateral context. Lieberman's results indicate that indeed it is possible to guess missing words better given the total context rather than only the left context o f a M a r k o v model. However, this seems again to be a case o f confusion between competence and performance, to use Chomsky's terms, and between facts o f language and facts belonging to psychology. What Lieberman has shown is that speakers are able to use bilateral context, just as they are able to multiply. He has not shown that in the course o f normal speech production and reception that they DO use a bilateral context. He shows intelligibility is roughly related to redundancy, where

PREVIOUS EXPERIMENTAL RESULTS

49

redundancy is measured by percent of correct guesses, but the relationship could hold equally well for left context only as far as can be determined by his data. Because there is no obvious way to decide what is being used as a normal habit and what specially in response to a particular task, data such as that contained in Brenner and Feldstein (1965) must be used cautiously. In this study, it is shown that rate of deletion is correlated with percent of completion, that form class is easier to predict than lexical item, that lexical items from some form classes are easier to predict than others, and similar data. However, in view of the use of bilateral context, the results are not very illuminating with respect to Markov models. This is unfortunately true also of a number of other psychological studies. The impression left by this group of papers is that in general the results contained in them are suspect because of faulty assumptions concerning materials or are accurate but almost obvious, or are unusable because of failure to separate as clearly as possible effects which must be attributed to language from effects which can be attributed elsewhere. In any case, there does not seem to be, in this body of evidence, any overwhelming reason for rejecting a Markov model on empirical grounds.

EXPERIMENTS WITH THE TRANFORMATIONAL MODEL

The third class of experimental results differs somewhat in kind from the previous two. In principle, they were not designed with the Markov model in mind, but were supposed to provide evidence relevant to a transformational model of language, and in particular of English. They require some discussion here, since if they provided convincing evidence favoring a transformational model, this would decrease the possible relevance of a Markov model. The experiments reported in Epstein (1961) and Epstein (1962) are among the earliest designed with a transformational or at least non-Markov model in mind. The studies show that a string of nonsense syllables, augmented by the and a, and some having

50

PREVIOUS EXPERIMENTAL RESULTS

past tense and plural markers added, is easier to learn if the string is in sentence order than if it is not, and that the same is true for a string of English words, even if the sentence-like list is meaningless. However, the differences between comparable categories is quite small (the ranges of mean, plus or minus one standard deviation, overlap) so that an alternative explanation in terms of a Markov model need only account for small effects (Epstein, 1961: 82). The strings of English words are not so implausible that the possibility of short range transitions giving rise to the effect can be discounted. The strings of nonsense syllables are truly nonsense, but the pseudosentences formed with them begin with strings very commonly used to begin sentences, viz., BLANK The and BLANK A, and end with a period. Since the differences in learning are small, the possibility that these short range dependencies are responsible for the effect cannot be discounted. It is unfortunate that the structured strings of nonsense syllables were not repeated with only the omission of capitalization and a final period, so that one could determine if these graphic devices have any effect. (Even if there were a difference, it would not affect the point at issue; it would only eliminate another variable.) As a matter of fact, the results of these studies might almost be taken as favorable to a Markov hypothesis, since one might have expected much larger effects than those observed, on the basis of a transformational model. Miller (1962a) and Miller (1962b) both discuss some work showing that a sentence in forward order is more intelligible than one in reverse order, even when the material making up both strings is overlearned. However, as Miller points out, The experiment I have just described argues for the existence of perceptual units larger than a single word. It does not, however, argue in favor of any particular type of structure underlying those larger units. That is, it does not show that some form of grammatical structure must be preferred to, say, a Markovian structure that communication theorists talk about. (Miller, 1962b: 754; Miller and Isard, 1963: 224) Miller would like to find evidence for the transformational structure elsewhere, partly in studies of self-embedding, and

PREVIOUS EXPERIMENTAL RESULTS

51

partly in finding empirical evidence for transformational operations, in terms of processing times involved or errors made in learning. Miller's study on self-embedding (Miller and Isard, 1964), begins with comments which establish his theoretical position. The fact that an indefinite number of self-embeddings is grammatically acceptable, yet at the same time psychologically unacceptable, would seem to imply that a clear distinction is necessary between our theory of language and our theory of the language user... Let us say simply that the distinction between knowing a rule and obeying it seems to us both valuable and necessary. (Miller and Isard, 1964: 295) The result of his experiment is that sentences with no or one selfembedding are repeated almost equally well on early trials, that two or three embeddings are almost equivalent of the first two trials, but then two embeddings are more easily recalled on later trials, and that while sentences with three embeddings are recalled a little better than those with four on early trials, on later trials they are almost equivalent. Miller suggests that this effect may be due to our capacity to reenter particular processing routines without destroying the partial results of previous entries, by analogy with the idea of re-entrant subroutines in computer programs. Again, I do not find it impossible to explain these results by appeal to the notion of m-dependency; the non-embedded strings satisfy long-range dependencies in English, and the highly embedded strings only short range dependencies. The fact that there are short range dependencies makes these strings easier to recall than random arrangements of words. What I find somewhat puzzling is why the sentence with four self-embeddings is still considered to be perfectly grammatical. It would be extremely interesting to know what Miller (or Chomsky) would take to be empirical evidence AGAINST a theory including infinite embedding, as well as what evidence they are willing to consider to be in favor of it. Miller (1962b) reports an experiment whose intent was to measure differential reaction time in going from one type of sentence e.g. active and passive, to another. There are a number of alternative explanations that could be offered for the expe-

52

PREVIOUS EXPERIMENTAL RESULTS

rimental results but since Miller is not willing to drawn firm conclusions from them, it is not worth going into detail. Possible correlations with the times observed could also be established with length and probably with the frequency of the sentence type. It is of some interest to note, however, that Miller finds the active to passive transformation taking longer than the affirmative to negative transformation, whereas Gough (1965: 109) found just the opposite in an experiment calling for verification rather than matching. It is true that these are different operations, but the results cannot be said to reinforce each other. Mehler (1963) reports an experiment on recall, partially reported by Miller (1962b), which shows that subjects recall sentence content while confusing sentence form. That is, a stimulus sentence given in the passive might be recalled as the equivalent active sentence. Miller and Mehler take this as evidence that a sentence is stored as a kernel plus a transformation tag, and that the tag may be lost independently of the kernel. Again, several alternative explanations can be offered. One possible one is that if only a few words or the meaning of the structure are remembered, the subject might simply produce what he considers the most likely sentence type. In the absence of other evidence, this might be the most common type, the affirmative (equal to the Miller-Mehler kernel). A study supporting this explanation is reported in Mandler and Mandler (1964), which showed that the words recalled on the first trial at repeating a sentence tended to be content words. (This same study is also favorable to the Markov hypothesis in a different way; words near the end of a sentence tend to be recalled better on the first trial than those near the beginning.) Lane and Schneider (1963), also tend to support the frequency view. They varied the proportion of each sentence type in a particular corpus in a systematic way, and had each corpus read by a different speaker. Subjects were asked, after listening to tapes of the speakers, to assign sentences to speakers on a likelihood-of-production basis. When a particular syntactic structure predominated in a speaker's corpus, the listeners tended to assign all the sentences of the dominant

PREVIOUS EXPERIMENTAL RESULTS

53

form exclusively to that speaker. When the relative frequencies of the syntactic structures were more equal, the listeners tended to assign structures at random. Most of the confusions... were associated with the declarative and passive sentence structures. (Lane and Schneider, 1963: 461) The declarative and passive structures are apparently also the most common sentence types. Additional evidence supporting a frequency explanation might be found in the Mehler study. He reports that both active and passive questions are more likely than non-questions to be recalled as negative (Mehler, 1963: 350). Mehler suggests that this may be due to semantic equivalence between negative and positive questions. It is true that both Has the boy hit the balll and Hasn't the boy hit the ball? can receive the same answers, Yes, he has. or No, he hasn't, but I am dubious about their semantic equivalence. It may be that the negative question, at least, expects a negative answer. It would be most interesting to have frequency data on these sentence types. The study by Gough, referred to earlier, points out, with respect to the Mehler study (it applied to the others also), that transformation, length, and frequency are all confounded in the experimental design, and that the results of the experiments can be correlated with any of these variables. It is perhaps worth pointing out that the transformational theory used by all of the investigations reported above is somewhat at variance with current theory. Thus, Katz and Postal (1964: 72ff) consider the negative and the passive as being specified in the underlying P-Marker (although these later undergo obligatory transformations). If this theory is correct, observed differences in reaction times, etc., need not stem from simple transformational differences, but could stem from differences in length, frequency, etc. Chomsky (1965) is an even more different conception of the nature of transformational grammar. No matter what the status of the current theory, it should be clear that the results of the set of experiments reported above are not clear-cut evidence for the essential Tightness of transformational theory. In some cases, in fact, the results are better

54

PREVIOUS EXPERIMENTAL RESULTS

explained by theories which account for frequencies. It does not appear to me to be the case that the evidence from these studies is accumulative, so that taken as a whole, they can be considered solid evidence for the kind of transformational theory assumed. OTHER STUDIES

The last two papers to be discussed, Somers (1961) and Jaffe, Cassota, and Feldstein (1964), are included here because they both illustrate that not all use of or experiment with information theory is relevant to the point being considered. Somers (1961) includes information-amount calculations of Bible texts, in order to use the numerical results given by the formula to characterize the texts. That is, the calculations produce numbers which allow texts belonging to two different authors to be separated, while texts of the same author are not distinguished. For this purpose, it does not matter what labels are attached to the numbers, and ENTROPY and R E D U N D A N C Y are as good or as bad as any others. The only interest for present purposes might be that the measured constraint of word x on word x + n seems to be constant for n > 7, (Somers, 1961: 154). Actually, Somers has measured part of speech rather than word constraint, so that it is difficult to relate this figure to any other measures of constraint except that it does not contradict any of them. These figures are at least not open to question on procedural grounds, since they are tabulated from text. The main reason for caution is that the sample size is small, being only 1000 words. The second paper illustrates a different kind of irrelevancy. This study treated the speech signal as a string of on-off conditions each 30 msec long. Under these conditions, the sequence of on-off periods can be treated as a first order Markov process, that is, an event i depends only on event i—1. The transition matrix derived from this assumption was extrapolated to the sixth order and compared with empirically derived sixth order data. The comparison showed a good fit between the prediction and the data (Jaffe, Cassotta, and Feldstein, 1964: 885).

PREVIOUS EXPERIMENTAL RESULTS

55

Although the evidence from this study might seem to be exactly the kind desired, there is a difficulty which can be illustrated by an analogy. If someone studies the properties of salt, he might look at smaller and smaller quantities, from a large crystal to a fine powder. If, however, he subjects a melted sample to electrolysis, he will very soon have chlorine gas and liquid sodium. In other words, the division into smaller units has a lower limit. One suspects that that limit was exceeded with respect to speech in this study. Additionally, it is known that the speech signal includes a frequency component, which is neglected here. Therefore, it seems unlikely that any conclusions relevant to the present problem could be deduced from this kind of study.

CONCLUSIONS

I have tried to show above that there is no compelling reason, on the basis of empirical evidence, to reject the Markov hypothesis or to accept the transformational hypothesis. Nearly all of the evidence is open to varying interpretations, and part of it is of little value because of methodological problems. Thus, evidence about constraint which might provide evidence about transition probabilities is not acceptable because of the use of measures of bilateral constraint. The effect of posited transformations is very difficult to filter from the effects of length and frequency, as well as being based on a model which may not be acceptable generally. In addition, it is very difficult to separate linguistic facts from extraneous information about learning ability, intelligence, or information processing capability, even if it were possible to agree on what is to be called a linguistic fact. On the other hand, the work described here is interesting, because it is experimental. In general, linguists have not been given to controlled experiments, except as a check on the acceptability of particular new formations to informants. The test of utility of any theory, however, ultimately rests on the data, so that it would seem necessary to be concerned also with data organization

56

PREVIOUS EXPERIMENTAL RESULTS

and presentation. The applicability of theoretical constructs to practical problems is not always easy to see. The experiments described above contain some ideas which seem worth pursuing, and indicate some pitfalls it would be well to avoid.

TV

THE RATIONALE BEHIND THE DESIGN O F THE NEW EXPERIMENT

I have tried to show in the two preceding chapters what the theoretical developments and experimental results concerning Markov models are. Chomsky has shown by theoretical argument that if an unlimited series of self-embeddings is permissible in a system of rules, then the language defined by those rules cannot be perfectly generated by a Markov model. He has asserted that sentences exhibiting unlimited self-embedding are 'perfectly grammatical' sentences in English. I have contended that this assertion cannot be effectively tested, since sophisticated subjects presumably can be primed to respond either way to such sentences, and naive subjects, if any can be found, probably cannot perform the task at all. One can conceive of training a naive subject with adequate mental ability to analyze his own speech. Once such a subject is trained, one can speculate as to his probable performance. Unfortunately, we cannot answer this question completely. There are at least two possibilities. If the subject trains himself to categorize utterances, then presumably the sentence with many self-embeddings will not belong to a category, and the assumption of perfect grammatically is not valid. If, however, he trains himself to formulate rules, he may very well be faced with the same decision mentioned earlier, that of establishing a fixed upper limit on the number of self-embeddings. If this is what happens, he too could be primed to respond either way. Therefore, it seems to be impossible to formulate a critical experiment which would experimentally demonstrate truth or

58

THE RATIONALE BEHIND THE DESIGN

falsity for a Markov hypothesis. Above, I have tried to show that other experiments appear to be inconclusive, since they are open to a variety of possible explanations. It seems, then, that we have no conclusive basis for rejecting a Markov model for language. Of course, even if Chomsky's argument is rejected as a proof for the reasons I have outlined, the argument itself still remains. That is, even if we may not all have the same intuition to explain, there is at least some overlap or we could not agree on a parse of any sentence. Similarly, if relations between sentences, such as the active and passive, are felt to be grammatical relations, the Markov model cannot describe this relationship. In this sense, the Markov model cannot be considered a serious rival to any of the broader structural models. However, as I have mentioned above in chapter I and also in chapter II, there is at present no structural model for any language which can claim completeness, even on its own terms. There is also none which provides an explanation for the systematic behavior in hesitation and pause filling, though this is not to say that none of the other models CANNOT provide an explanation. At least temporarily, these phenomena seem to me to be best explained in terms of the Markov model. Besides the theoretical possibilities, there is utilitarian justification for interest in a Markov model. In fact, it is largely utility that provides the motivation for pursuing a Markov model, providing that the model can meet certain conditions. The first of these conditions is that successively higher orders of approximation begin to approach a limit as measured by a grammaticalness ratio, and the second, that this limit be approached rapidly enough so that low orders of approximation provide reasonably grammatical output. For example, the output would be usable directly in the kind of psychological experiment described in the previous chapter, being superior for this purpose to the simulations now being used. Application to problems of machine translation is an obvious use, either in the translation itself or applied to the output as an editing tool. Purely linguistic uses suggest themselves also. For example, one expects idiom structures of the type 'immediate constituent' to be characterized, in linguistic texts, by very high forward

THE RATIONALE BEHIND THE DESIGN

59

transitions from IMMEDIATE to CONSTITUENT, and high backward transitions from CONSTITUENT to IMMEDIATE. In lexical analysis of linguistic texts, this word pair should be treated as a unit and the transition probabilities may help us to isolate such units. These motivations toward the study of approximations suggest that one might begin with Chomsky's assertion referred to above, that there is no relationship between order of approximation and grammaticalness. One can conceive of testing this as Coleman did, by rating in some way the humanly constructed approximations. These approximations are, however, unsatisfactory in a number of ways as was shown above. It appears that real persons do not well simulate the ideal observer posited in chapter II. This observation suggests an approach to the ideal observer of the kind suggested by Chomsky, viz. the estimation of transition probabilities from a large sample of text. To proceed in this way allows us not only to test Chomsky's assertion, but also to determine whether or not the Markov model can be developed by an effective procedure. Chomsky and Miller suggested that the number of parameters to be estimated was on the order of 1045, but they did not speculate as to the result if, following Hockett's suggestion, most of these are assumed to be zero, and the non-zero ones only are estimated. The question then arises as to how big a corpus must be tabulated in order to estimate the transition probabilities satisfactorily. If the problem is approached from this direction, there is no way to arrive at a solution. If, however, the problem is approached from the other end by making a decision rule which will indicate whether or not a sample is big enough, the problem is feasible again. The danger in estimating transition probabilities from too small a sample is that for high orders of approximation, the generator will simply recreate the original corpus, since at every transition there will be a unique next state. Suppose, however, we make a rule that generation of sentences will be done only for orders of approximation in which every state has more than one successor. Now there is essentially no danger of recreating the original text, but on the other hand it is almost certain that a corpus cannot

60

THE RATIONALE BEHIND THE DESIGN

be found which meets the restriction for interestingly high orders, for this decision rule is surely too harsh. That is, if k is at all large, there are states for which the transitions to all but one state are vanishingly small. Therefore, if the restrictions are relaxed so that it is only required that most transitions not be unique, it is possible to decide whether a sample is sufficiently large for generation of a k t h order approximation without the danger of triviality. The relationship between material generated mechanically and that generated by human beings is by no means clear. In the first place, a human guesser probably does not limit himself to a k-word left environment, but rather either imagines a sentence in which these k-words can be embedded and then responds with the following word of that sentence (structural explanation), or imagines an n + k word state, and the transition to the next state (Markov explanation). The hope is that by using enough different speakers, the effect of various possibilities of embedding will even out. If it does, the speakers should provide a better estimate of the languagewide transitions, since their experience of language is so much broader than that of any presently conceivable machine. However, at least on the evidence of available humanly produced approximations, the production of punctuation in written material is a problem for the human estimators. It is not for the machine, since punctuation is being treated like any other symbol. How one would ask human subjects to produce spoken material, requiring the placement of alternative intonation contours, is even more unclear. Again, one can conceive of a properly designed machine doing this correctly. The tactic employed in this study for data gathering also deserves comment, since it affects the strings which can be produced and the potential order limitations. Each sentence (i.e., claim) boundary was treated as a stop signal, so that no strings contain embedded sentence periods. If this tactic were followed generally, it would not be possible to generate orders much higher than twenty words, about the length of an average sentence. The corpus of electronics patents used in this study, as is discussed below, has very long

THE RATIONALE BEHIND THE DESIGN

61

strings for each sentence, and the danger of a short sentence is very small. The remainder of this study, then, is the description of a twostage experiment. In the first (mechano-linguistic) stage, a set of k t h order transition matrices are generated from a corpus of patent claims, and the matrices are evaluated for triviality. In the second (psycho-linguistic) stage, the generated approximations are evaluated for grammaticality. The experiment gives information about the rapidity of convergence of such an empirical model to an infinite, ideal observer model.

V

DESCRIPTION O F THE EXPERIMENTAL CORPUS

The corpus for the experiment consisted of approximately five and one half million words of the claims section of patents in the electronics field. Generally, a U.S. Patent 1 consists of a GENERAL part describing the prior art, the object of the invention, and its characteristics; a SPECIFIC part giving a detailed description of the invention, examples, and references; and the CLAIMS section in which that which is to be protected is stated in detail. Infringement of a patent essentially means infringement of one or more of the claims. The claims section begins with the words "what we claim is" followed by a numbered list of claims. The first claim is normally the broadest in scope of all the claims. Successive claims limit the device until finally the preferred embodiment is described. Normally, each new claim expands on only one part of an earlier, broader claim, so that there is considerable repetition in many of the patents, successive claims differing only slightly. The number of claims in an electronics patent may range from one claim to over fifty, with most patents nearer the lower end of that range. Appendix I is an example of the claims section of a patent. LINGUISTIC PROPERTIES

Since each claim is really a predicate complement, and moreover only a noun complement, the linguistic structure on the highest 1

Cf. Wade (1957) for a relatively simple account of patent operations.

DESCRIPTION OF THE EXPERIMENTAL CORPUS

63

level of this material is not particularly interesting. It could be argued, therefore, that for this reason, the material is a poor choice for the kind of experiment envisaged. If, however, one moves deeper into the structure of these claims, the complexity and consequently the interest continues to grow, as is shown below, and so the material is suitable for the experiment. One possible way to show this is to write a grammar based on the corpus, or to attempt to do so. A modern approach would be to write a transformational grammar. This I shall not do, largely because I cannot use it to illustrate my point. As will be seen, the complexity of the claims arises from the variety of modifier structures, including clauses, which are possible in any claim. In transformational grammars with which I am familiar, clauses are introduced as transformations of independent sentences. I see no particular point, in this study, for producing a set of rules for kernel sentences and appropriate transformations for these claims, and none either for adopting an existing set of rules, which would not explicate the distinguishing features of the claims as opposed to other text. What is wanted is descriptive clarity. For this reason, the rules are presented as essentially constituent structure rules, in such a way that each lower level of nodes is an expansion of the level above. The rules are not written quite as Chomsky-defined phrase structure rules, because I, with Chomsky, find the notion of only binary branching an unsatisfactory way of representing conjunction. In lieu of introducing the conjunctions transformationally, I shall assume an exponent operator T n , such that n, which can take on integer values, indicates the number of conjoined structures, T. 2 I am aware that there are methodological difficulties in introducing such a device into basically constituent structure grammars, but again, the description below is aimed at clarity. Presented below, then, are a set of expansion rules with examples, to illustrate the constructions and complexity of the patent text. The rules are not carried out to the level of morphemes or words, because this would essentially involve writing a grammar of English. 2

This device has been borrowed from tagmemics. See, for example, Longacre (1964: 25).

64

DESCRIPTION OF THE EXPERIMENTAL CORPUS

A complement consists of one or more conjoined noun phrases: C -> NP where NP is of type T, i.e., such that Tn, Tn Ti + , + T 2 + , ..., + T n _i + (,) + and + T n

T

A noun phrase consists of an optional or mandatory determiner, depending on the noun, a premodifier, a noun, and an optional postmodifier: (Det) Det

NP

Premod -f

Nm Nc

+ Postmodifier

where N m and N c are of type T. The premodifier consists of noun structures, verb structures and adjective structures, in any order; or of zero.

Premod

'0 NS vs AS

+ Premod where NS, VS, and AS are of type T. 3

The adjective structure consists of an optional abverb structure and an adjective. AS

(AdS) + Aj

where AdS and Aj are of type T.

The adverbial structure consists of an optional intensifier plus an adverb. AdS ->• (Intens) + Ad

where Ad is of type T. 4

The verb structure consists of optional nominal or adverbial structures plus a past or present participle. VS 3

NS ' AdS

Part

The rules are already incomplete even at this stage. This rule should be modified to account for punctuation between types of premodifiers. Again, this kind of complexity would only be confusing. 4 There are some selection restrictions here, again unspecified.

DESCRIPTION OF THE EXPERIMENTAL CORPUS

65

The noun structure consists of a premodifier plus a noun. NS

Premod +

rNml Nc.

Since these rules are recursive, the premodifier could, in this formulation, become indefinitely long. In practice even moderately long examples are easy to construct but difficult to find. Preceding the noun, as shown above, is a sometimes optional determiner. Among these determiners is included the word said. This word is homophonous with a past participle meaning 'mentioned' or 'stated' or the like, but is distinct from it, as the following examples show: with said third resistor, of said resistive network, contacting said body. In these positions, a determiner is appropriate, but not a past participle like mentioned. Examples of the homophonous past participle are somewhat rare: the said third electrode. The post modifier structure is more complex and exhibition of expansion rules is more difficult. At the first level, there are post modifiers consisting of adjectives, prepositional noun phrases, prepositional verb phrases, and clauses. PM Prep M PMi ->

(ASI)

"Prep M .

0

PM!

Prep - N P "Prep -f VS1 clause

VS1 - * (AdS) + Part + (VMod) ASI VMod

(AdS) + Aj + (Prep) + NP "to + Infinitive Prep M AdS NP clause

Already in the rule set above, there are some obvious modifications which should be made to account for the difference between

66

DESCRIPTION OF THE EXPERIMENTAL CORPUS

transitive and intransitive verbs, restrictions on clauses, and similar phenomena. The reason for not continuing this is that to go into the internal structure of clauses would require an essentially complete treatment of English sentence structure, which is not intended here. It is now possible to see why the patent claims, in spite of the simple grammatical structure at the highest level, are still a fit vehicle for testing the Markov generation process, viz., they do comprehend nearly all of English syntax. Missing are question structures and fragments, for example, but the part included is sufficiently complex that no published formal grammar with which I am familiar describes it completely and correctly. Following are examples of the structures outlined above:

NP Pre As

N

Aj magnitude

such

and

sign

NP

Det

N

said

emitter

and

N

N

base

electrodes

NP

67

68

DESCRIPTION OF THE EXPERIMENTAL CORPUS NP

1 r 11 I ; i u i means independent of said for controlling the comprising a biasing source output circuit transit time of having its poles electrical charges through said body

of said first transistor

Tabulations of the relative frequencies of various types of structure will be discussed in chapter VIII in the comparison of the generated material with the text. STYLE

As well as some grammatical peculiarities which were discussed above, the claims include some peculiarities of style. The word means is sometimes used as an ordinary noun in the sense of 'method' or 'device', i.e., it has a premodifier. Thus: second means for deriving..., said first means comprises, circuit means including..., said biasing means. Wade, in his glossary, defines means as "any gadget that does the trick" (Wade, 1957: 4). Two more quotes from Wade are "comprising... everything, including the kitchen sink" and "consisting of... nothing except the kitchen sink" (Wade, 1957: 4). These two participles are both very frequent postmodifiers in the claims. Comprising is often followed by a plurality of, as in a plurality of trigger circuits where the intent is to avoid being too specific in the claim and, thereby

DESCRIPTION OF THE EXPERIMENTAL CORPUS

69

broaden its coverage. The same intent of indefiniteness, also often following comprising, accounts for use of the indefinite article with the ordinal adjectives first, second, etc. and the participle predetermined. We find comprising a first and a second trigger circuit and a predetermined one of said two distinct voltage levels. In addition to the style peculiarities, there is a special technical vocabulary. Some examples are: flip flop which is synonymous to trigger circuit and bistable multivibrator, all of which denote a circuit configuration common in electronic computers having two stable states and a means of switching from one state to the other; P-N, N-P and P-N-P which refer to junctions in a semiconductor between regions with different electrical properties; push-pull, which refers to a method of connecting a pair of amplifying devices into one circuit; and gate, which as a noun denotes another common computer circuit, and as a verb means to pass a pulse through such a circuit. A complete glossary of electronic terms can be found in standard handbooks (e.g., International Dictionary of Physics and Electronics, 1961) and need not be repeated here. Some of these stylistic and lexical differences can be seen in Table 1, which gives frequency ordered lists of the 150 most common words in a news magazine text and in the patents.

PHYSICAL FORM

In order for a computer to operate on a text it must be transformed from printing on a page to some sort of electrical or magnetic representation (usually the latter). This is usually accomplished through another, different medium, mostly punched cards, as was done with this text. The most common IBM card punches are limited to 48 characters, precluding separate symbols for capitalized letters. In this text, no consistent distinction was made between capitalized and non-capitalized letters by any other means, such as explicit shift codes. The card punch also does not have either a semicolon or colon. These both were sometimes represented

70

DESCRIPTION OF THE EXPERIMENTAL CORPUS

by one period, the colon sometimes by two periods, and the semicolon sometimes by two commas. A single period is therefore ambiguous. There was also an inconsistency in the use of a hyphen in such strings as semiconductor, which was sometimes punched with a hyphen and sometimes without it. At an early stage of the processing, all hyphens were replaced by space to standardize the representation. 5 Because of the properties mentioned above, the text is not ideal for present purposes. It had been prepared for a mechanized retrieval system in which attention to detail on punctuation was unimportant. Since a number of operators punched the cards, they established a number of private conventions for representing punctuation. The cost of manually correcting this much material prior to processing would have been prohibitive. Machine editing could have normalized the text, but, because of information loss, could not have restored it. It does not seem that normalization alone would have really improved anything except the appearance of the output. Therefore, the text was used as it stood.

5

One other peculiarity is the appearance of one to five zeroes at the beginning and end of claims. Since their distribution is essentially random, they do not affect the results in any way. They are caused by a difference in the structure of the IBM 1401, with which the punched cards were loaded onto magnetic tape, and the IBM 7044, which processed the tape. The 1401 is able to write tape records of any number of characters up to some limit determined by memory size. The 7044, however, reads all tape records as though they were multiples of six characters; the equipment supplies zero characters to fill out the record. Because substantial processing had already been done before it was noted that the record structures were not multiples of six characters, it was decided to continue without redoing the earlier processing, since the distribution of zeroes is essentially random. They have been deleted from the output reproduced in Appendixes I and II.

VI

MACHINE PROCESSING STEPS

The experiment itself was a four-step process: data preparation, string generation, string evaluation and output evaluation.

DATA PREPARATION

This stage included all that processing necessary to provide the input for the string generation program, i.e., the sets of unique strings of each length up to length seven, which was the maximum dependency planned for, and their frequencies from the corpus of patent claims. The scope of this processing can be seen somewhat better by considering the constraints imposed by the available equipment and the requirements of the experiment. Since it was necessary to have frequency statistics on strings of length five which at least began to approach limiting values, it was necessary to process a very large text. The decision was made to process five million words, at the time of the experiment essentially all the available patent text. Since there was no way of predicting what order of approximation could usefully be run with this much text, the optimistic decision was made to try length seven. Since every text word was the beginning of a string of length seven (only the last six words of every claim would not have such a beginning in any case, and these were appropriately padded to simplify later processing), the five million word text expands to a 35 million word text. For technical material, an average word length of

72

MACHINE PROCESSING STEPS

TABLE IA.

150 Most Common Words, News Magazine, Size about 4,000,000 Words

Rank

Frequency great never york house white work war our men while another sir good too under did because my note four little make m get against life American day million well way back P

+

government state much where since down also long

2277 2298 2300 2348 2391 2414 2430 2484 2486 2493 2502 2502 2504 2525 2542 2547 2574 2578 2584 2590 2625 2672 2685 2715 2722 2788 2800 2831 2853 2854 2858 2870 2870 2879 2932 2942 3065 3146 3161 3174 3195 3195

Rank

100

90

80

70

Sample

Frequency made own before people still 3 any may just do through old off could says then many such three over even you president man other world 12 we she like now what can said after them if years first year time week

3197 3219 3229 3245 3261 3285 3361 3385 3400 3422 3539 3571 3625 3648 3699 3720 3756 3824 4318 4472 4530 4553 4648 4666 4730 4743 4791 4817 4862 4930 4944 4963 4984 5052 5126 5199 5248 5451 5550 5588 5681 5681

MACHINE PROCESSING STEPS

73

Table la (continued) Rank

Frequency some most her him so there about would two been only which into story no than set will were or when out 2 1 up its more last this end new their I

5715 5739 5766 5795 6203 6224 6294 6554 6766 6846 6918 6980 7001 7051 7200 7441 7642 8089 8154 8199 8212 8302 8364 8378 8748 9020 9021 9067 9587 9609 9874 10197 10483

Rank

Frequency had U all they S have who not one be an has are from but at by it on was with as his he for is that in to and a of the

10632 10798 10965 11081 11683 12486 12545 12547 13137 14281 14551 14676 14769 16369 18415 18756 21065 23423 23604 24345 24717 25101 26648 29727 32825 36299 38560 77280 87348 88437 100188 117575 221492

6 characters is certainly not too high, so the file of strings would need to be at least 210 million characters long. Since the sentence generation process is a random process, these strings could not feasibly be stored on a device which could only be searched by starting at the beginning each time, such as a magnetic tape. The available computer included a large random access store,

74

MACHINE PROCESSING STEPS

TABLE IB.

150 Most Common Words, about 7,000,000 Words

Rank

Frequency so conductivity secondary 2 primary thereof providing parallel portion semi than end alternating low respectively elements contact applied high windings fourth gate rectifier are •*' negative feedback network transformer least oscillator stage time value response semiconductor circuits opposite flow applying its when

6711 6860 6926 6955 7021 7061 7089 7093 7105 7128 7211 7229 7251 7251 7310 7387 7412 7476 7557 7657 7693 7739 7747 7842 7883 7954 7990 8152 8216 8480 8499 8515 8557 8573 8648 8685 8706 8719 13273 13635 13697

Patents,

Sample

Rank

110

100

90

80

Size

Frequency biasing direction connection system devices core be polarity conducting magnetic condition supply line reference switch conductor electrical power wherein type substantially whereby common 1 path body bias coupling point plurality state responsive conductive that junction direct combination predetermined electrodes device including

8728 8776 8835 9037 9109 9136 9156 9233 9310 9311 9370 9411 9483 9541 9553 9559 9791 9831 9958 10110 10219 10296 10639 10681 10751 11086 11219 11529 11960 12370 12417 12441 12451 12621 12756 12969 12988 13247 29326 29377 31045

MACHINE PROCESSING STEPS

75

Table lb (continued) Rank

70

60

50

40

Frequency switching across claim pulses on frequency through resistance two signals amplifier element capacitor as resistor diode pair load coupled terminals pulse series third by transistors which impedance other connecting winding terminal from at potential

13851 14252 14798 15118 15142 15395 15778 15807 15843 16092 16319 16562 16734 17313 17519 17762 18227 18518 18869 19535 20363 20437 20604 21545 21776 22268 22566 23107 23503 25054 27248 27587 28428 28514

Rank

30

20

10

Frequency control is comprising between each being collector source one emitter signal with base input voltage electrode current having output an transistor connected circuit for second first in means and to the a of said

32014 33878 37998 40786 42804 42833 46172 46572 47258 47806 48086 49358 50276 52328 53491 54139 58083 58990 60817 71060 80278 82136 92277 98526 99645 105440 131810 158944 243821 249193 320165 351172 357675 495520

an IBM 1301 disc file, which can be thought of as a juke-box-like storage, with access to each record and to segments of a record. This device has a capacity of 56 million characters, any one of which is available to the processer in about one fourth of a second.

76

MACHINE PROCESSING STEPS

This unit, however, is still too small to store the raw file. Since if each string is unique the file is useless anyway, an obvious tactic is to sort the file of strings and keep only the unique strings along with their number of occurrences. In order to use existing sort programs conveniently, however, all strings must be normalized to the same length. The simplest way to do this is to normalize the length of each word. In order to accommodate almost (but not quite), all words, the normalization should be around 15 or 16 characters. Such a normalization would almost treble the length of the file already of such magnitude that under most circumstances it would be considered as being impractically long. Rather than retain each word in its normal spelling, which is a very redundant representation, it was decided to assign a code number to each word. Since the basic data structure of the 7044 computer is a six character unit (also called a word) containing 36 binary places, it is usually simpler for programming purposes to use units of this length or multiples thereof. There was no need, therefore, to use minimum redundancy codes. Instead, a word list was created, using an existing program, of each word (defined as a string of characters between two members of the set of delimiter characters) and its frequency in the text of the claims. The word list was broken into two parts, one consisting of the fifty most frequent words (cf. Table I), and the other of the remainder. A block of numbers was assigned to the punctuation marks (each punctuation mark also being treated as a word), a block to the common words, and the largest block to the remainder. The code numbers consisted merely of the serial position of the words in the list. Since the number of different words was under 10,000, the numbers fit easily into a 15 bit field (another convenient segment, since it is a hardware-defined portion of the 36 bit word and can be operated on easily). This left the remainder of each machine word available to retain summary count information. In order to do the code-for-word replacement, it was necessary to write a program which segmented the text into words and looked up each word in the list of all words to find its serial position and therefore its code number. To make this process efficient, since

MACHINE PROCESSING STEPS

77

original list answer answering answers antennae antenna antennas

coded list 0/6answer0/3ing3/ls5/6tennae 1 / 0 0 / l s The number before the slash tells how many characters to remove from the current string and the number following, how many characters, which follow immediately, should then be added to derive a new string. Figure 1.

Word List Packing

five million words had to be looked up, it was desirable to keep the word lists in the main storage device of the computer, which allows access to any known 6 character machine word in two microseconds. Therefore, another program was written to pack the words list into the available storage (Figure 1). The lookup process then took an average of about 5 milliseconds per word, since it was necessary to expand a portion of the list each time a search was made for a non-common word. (The common words were not packed.) The output of this program was a list of the strings of length seven (and consequently all shorter strings) in the order in which they occurred and with the words represented by code numbers (Figure 2). This processing for the whole file took about ten hours of computer time, not counting the rerun time caused by a variety of machine and program malfunctions. These strings were sorted by code number, so that all strings with the same words were brought together. By sorting from left to right, identical strings of all lengths were collected in the same area (Figure 3). The sort process, again exclusive of rerun time, took about 25 hours. The sorted strings were combined by a separate program so that there was only one copy of each unique string of

MACHINE PROCESSING STEPS

78 Input string #

a

Codes

circuit element comprising a body of semi conductors... 262 0 201 262 320

320 201 262 320 274

274 262 320 274 201

201 320 274 201 223

223 274 201 223 567

567 201 223 567 875

8/ 223 567 875 284

The numbers shown are examples and not the code numbers actually assigned, (which could be found if necessary but are now known only implicitly through the program). Leading zeroes are not shown. 201, for example, is really 000 000 000 201 in the computer. Figure 2. Sorted string list 201 201 201 201 201 201 202

262 262 262 262 262 850 210

320 320 320 320 320 761 204

Forming Strings from Text

274 274 274 274 274 425 420

201 201 201 205 205 204 326

223 223 223 207 260 223 780

436 567 567 210 781 320 201

Figure 3. length seven and a record o f the number o f times it occurred. Similar count information was placed in the last entry for each shorter string (Figure 4). The same program produced the summary statistics given in Table II. T h e totals information f r o m each o f these tables is summarized in Table III. 1 Since the summarization still left about t w o million,

three

hundred thousand unique strings o f length seven, the file could still 1 Since the material was generated in pieces, the final summarization of Table III represents a merge operation of pieces. The figures given here are those for pieces used in the merge.

MACHINE PROCESSING STEPS

U N O 00

o\ m vo o oo t-» m o Ov S « N o r - •o

—m r4 •>}•

e

MACHINE PROCESSING STEPS

x/i

ONOOOOOOOO I O N - ^ ^ N o o o n — t^ — (N^OOOO^- — 0 » 0 OOOOOsOO'cVOfì^

(so^wnoooflo^t oor-r^ocn^tw-v^-Tt fnn^'tTfr^oMfN l i i CS

i

cs

UH

r-r-sDfNiNr-r^im OOOONONWTj-ntCìO

o Li,

ooisasNOfor-m-NOO r^osTj-osr-assoroTf ^ NQOnf N^ - N T f—O N^ t O

OVf^ir'-ONON—HONCION ONOOii^tNQiM

JZ

H

O u,

E 3

z

^Tfin^nvifSTtn -H-r-r-vorfmso-^rf

NWTfio^r^oe^

85

MACHINE PROCESSING STEPS

Condensed list of strings 201 201 201 201 105/201

262 262 262 34/262 1/650

320 320 320 26/320 1/761

274 274 274 16/274 1/425

201 10/201 205 2/205 1/204

223 4/223 1/207 1/260 18/223

1/436 2/567 1/210 1/771 18/320

The symbol "/" divides the count from the word number, leading zeroes again being omitted. Thus, 10/201 is really 000001000201 in the machine, 201 being the word number and 10 the count. Figure 4

not be fitted onto the disc storage in this form. Therefore, it, too, was packed, as shown in Figure 5, by eliminating leading zeroes from the counts and word numbers. This coding reduced the size of the file by almost half so that it now fit easily into the storage unit. (Other packing schemes are, of course, possible.) In order to speed the search process, two additional files were created, one giving the disc location where strings beginning with each different word number were located, and the other giving the number of occurrences of each word. The second of these was used for generating the order zero and order one approximations without reference to the disc storage. These steps completed the necessary preparatory processing. To recapitulate, when the string generation program is to be run, the machine contains a compressed and condensed list of strings on its disc unit, and two files in its high speed memory, one Compacted list of strings line 1 of Figure 4 line 2 of Figure 4

722017226272320722747220172223612436 722017226272320722445102201642223622567

The first number is the number of leading octal zeroes in the count, (7 if the count field is all zero), the next number, if the preceding number was not 7, is the count for the unique substring, the next, if the preceding number is a 7 or a count, is the number of leading zeroes in the word code, which has a maximum length of five, and the last number is the word code. In the example, the control numbers are italicized. Figure 5

86

MACHINE PROCESSING STEPS

giving the frequency of each word, and another giving the starting location on the disc for strings beginning with every word. STRING GENERATION

The string generation process consists essentially of two machine programs. One of these uses the file data to create strings of a given length and order of approximation, and the other translates the strings of code numbers from the first program back into conventional spelling and prepares a magnetic tape which can be printed. The generation process for a run for strings of order k begins by generating a random number 2 between 1 and the total number of beginning strings (which is the number of claims). The file is then scanned from top to bottom, accumulating the frequencies of each beginning string, i.e., each string which can begin a claim, of length k, until the total equals or exceeds the random number. This procedure weights each unique string by its frequency of occurrence, which is exactly what is wanted. When a starting string has been selected, the final entry containing the last (k—1) words of that string, which contains the frequency f of that (k—l)-word string, is located in the file. Another random number between 1 and f is generated, and the section of the file beginning with these (k—1) words is searched in exactly the same way as the set of initial strings was, summing frequencies for the k-word strings until the accummulated total equals or exceeds the random number. The last, i.e., k t h word of this string is added to the list of generated words and the (k—1) last words of this new, longer string become the beginning string for repeating the process, which continues until either an end of claim symbol is generated or until the maximum string length, set at 99 words, is reached. This process is shown in Figure 6. The program also recorded, as each new word was added to the generated string, either the chance of selecting that particular next 2 Cf. Green, Smith, and Klein (1959). This generator was chosen primarily because of its high operating speed.

87

MACHINE PROCESSING STEPS Generation cycle, 1. 2. 3. 4. 5.

6. 7. 8.

k=3

current string k — 1 word argument for searching k—1 word string found random number generated section of file 201 262 2/245 201 262 4/271 201 262 2/311 201 262 26/320 selected string new current string new k—1 word argument

204 207 201 201 262 201 34/262 21 cumulative total 2 6 8 34 201 262 320 204 207 201 262 320

262

262

320

Figure 6

word, or, if there was only one possible next word, the frequency with which that word followed the preceding (k—1) words. The selection chance is simply the frequency of the k-word string divided by the frequency of the (k—1) word string. The most time-consuming part of the process described above is the location of the last (k—1) word string, which is needed in order to find the total number of such strings, that number being the desired range for the random number which will select the next word. As the order of approximation decreases from seven to two (remember that orders zero and one are handled separately), the total number of possible successors for a (k—1) word string increases, so the search time must also increase. This was partly adjusted for by not storing the full seven word strings when generating approximations of order two and three, but instead storing only the strings of the required lengths. Although this greatly increased the speed of generation, the process was still quite time consuming. Table IV has average generation times for several orders of approximation, including order seven, even though this will not be discussed further. As can be seen, these times for the lower orders are sufficiently long that a limit to the string length as short as 99 words was almost mandatory. The second program in this step is simply a translator from code number back to spelled word. It produced, on command,

88

M A C H I N E P R O C E S S I N G STEPS

TABLE IV. Order

Generation Time,

Seconds (not available for 6)

j•J

A

5

185 309 448 218 423 349 439 413 343 434 317 321 197 132 109 173 305 211 348 180 257 283 323 331 465

43 121 108 134 60 87 123 24 108 138 118 126 31 108 110 121 128 122 82 109 146 154 97 68 50

117 41 89 88 104 29 107 109 88 87 114 103 87 87 62 87 98 98 107 104 98 102 121 16 91

103 125 109 93 96 104 85 98 45 91 90 98 107 101 62 102 89 94 93 95 68 94 86 96 53

Total

7513

2516

2234

2277

Average

301

101

89

91

Set # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

7

two different formats for the strings, one containing the selection statistics supplied by the generation program and one without these statistics. Examples of both of these formats are given in Appendix II and III. The first format was used for determining, in an admittedly pragmatic way, what the maximum usable order of approximation was that could be derived from the data. That is, for order five, generated strings in which more than half the

MACHINE PROCESSING STEPS

89

words were one of several possibilities were in the majority, and for higher orders, they were not. This requirement of having nonuniqueness in successors is really quite stringent, in that for high orders of approximation, it is clear that the number of possibilities must decline and for at least some strings must be only one. It is conservative, therefore, to assume that this is not the case for order six and that the lack of more possibilities is a result of insufficient sample size. The second style of printing, without the transitional frequencies, was used for presentation to the evaluation panel. A volume was prepared for each member of the panel containing ten strings of each of the six orders, zero to five, as well as examples of real claims and an explanation of some of the style characteristics of the patent material. The preparation of the printed strings concluded the major computer processing phase of this project. All told, the processing described so far required approximately six man/months of programming time and well over one hundred hours of computer time. Most of the computer time was spent in long sorts, to order the strings and bring together like strings.

VII

THE PANEL EVALUATION PROCEDURE

The packets of generated strings, each containing examples of two claims, were given to each of the four members of the panel, who were to evaluate the grammaticality of the strings. A rather lengthy discussion was held with each panel member describing the style features of the patents and the peculiarities of the generated claims, particularly on the incidence of said in the claims, and on the inconsistent use of periods in the text material (see above, chapter VI). The background of the four panel members appeared to be appropriate for what they were asked to do. PI has a Ph.D. in physics, but has been working on the linguistic analysis of English and Russian. P2 has a Ph.D. in linguistics, and a B.A. in chemistry, and has also been working in machine translation, primarily on Russian syntax, and Russian to English translation rules. P3 has a B.A. in mathematics and a M.A. in linguistics. P4 has a B.E. in engineering physics and a M.A. in mathematics, and has been working for approximately two years on the computer processing of the patent material. The first three panel members were asked to evaluate the strings only for grammaticality, i.e., for correct syntax, and the fourth to evaluate both for grammaticality and for meaningfulness. After the initial discussion with each panelist, the packets were left to be analyzed with instructions to evaluate the first few strings on each page of the packet. When this was completed, a second round of discussion was held and misunderstandings corrected. At this point most of the troubles seemed to be with overlapping

THE PANEL EVALUATION PROCEDURE

91

strings, i.e., what was meant by overlap and with the interpretation of punctuation. The packets were then left again with instructions to ask about any problem that might arise. The panelists were also requested, once they had finished the set, to go back over the earlier material to see what they had done. When this was completed, each packet was inspected and at least a twenty-five word substring of every string was parsed, and every string was counted for length. Any questions which arose during this process as to grammatical judgments which seemed inconsistent with either the instructions or the norm of material from that panelist were referred back to him for review. Each string was, therefore, reviewed at least twice by a panel member and once by the experimenter before the panelists were considered to be through. A number of objections to a panel evaluation procedure can easily be raised. Most of them can be disposed of by appealing to the overall goal of the evaluation, i.e., to determine whether or not there was a clear trend in the measured property, grammaticalness. For this purpose, it is not necessary that the judgments be consistent between subjects or even for a single subject, since no absolute measures are needed. The review procedure was intended to eliminate systematic bias for sections of a packet, not to increase accuracy. The most likely sources of bias would be the learning effect, boredom and tiredness, with the first probably the most serious. It is probably the case that not all the learning effects were eliminated by review, but the alternative would have been to have a set of training materials for each panelist before giving him the study materials. Since, however, all the panel members were otherwise occupied and could only devote spare time to this process, so that the processing delay varied from four to six months, it was felt that a training period would have added too much additional delay. Boredom and tiredness both would result in shorter strings than otherwise, through failure to see possible interpretations. However, since no time pressure was applied, the usual reaction to either was to drop the activity for a time and return to it later. Another factor affecting the string length is the amount of time

92

THE PANEL EVALUATION PROCEDURE

spent looking for possible interpretations. The panelists were asked to be aware of this and to try to devote roughly the same amount of time to all the strings. This introduces a systematic bias to strings shorter than the maximum possible length, but is one of those objections answered by pointing out that since the point of interest is a trend, systematic bias over the whole curve is irrelevant. A very similar factor tending to shorten the strings is the degree to which attention is paid to semantics, since the more this factor is ignored, the more likely it is that an interpretation can be placed on some string. It is in general very difficult to say what is semantic and what is grammatical with regard to specific cases, particularly in the low order material, where words are construed as nouns or adjectives even though they may not ordinarily be used as such. It is of some interest to note that all panelists considered number agreement to be a grammatical fact, so that a string such as a devices is always rejected. The fourth panelist was asked to look specifically for semantic as well as grammatical sense, to determine whether or not this factor had an effect which must be considered, with results discussed below in chapter VIII. Serious questions can, however, be directed toward two points, the general acceptability of native speaker judgments of grammaticality in the absence of meaningfulness, and the extension of judgments of grammaticality from judgments about whole sentences to judgments about strings shorter than a sentence. With respect to the first point, there is very little to be said. The only alternative I can envision would be to compare the strings to some grammar and determine if they are well formed with respect to that grammar. Aside from the practical difficulty that no grammar complete enough for this purpose has been made public, there is still the open question of how the postulated grammar was established as being, in fact, a grammar of English. Presumably it was established as such because the grammar either generated or recognized all and only the sentences of English, but this judgment is exactly the native speaker reaction that was looked for in the first place (cf. chapter II). Maclay and Sleator (1960) indicates that subjects can be asked to respond, with some degree of reliability, to the

THE PANEL EVALUATION PROCEDURE

93

adjectives meaningful, ordinary, and grammatical. The Maclay and Sleator study also shows, however, that the responses to such a question can only be statistically evaluated, since there is no absolute uniformity in judgment. Some of the reasons for this disagreement are given by Hill (1961), who recalls Sapir's observation that in the absence of reason to the contrary, a hearer will attempt "to wring some kind of pattern, and some kind of meaning, out of the most unlikely material" (Hill, 1961: 168). Hill's suggestion for testing the grammaticality of a sentence as part of a larger context, in which the operating rules are clear, is not apropos here. This appears to mean that, while no great amount of confidence can be placed on a single judgment by a single speaker, the average of such judgments conforms with linguists' expectation of what such judgments should be (presumably these being based on some notion of grammaticality which depends on conformity with either a recognition or production grammar). The extension from grammaticality of sentences to grammaticality of strings was made in a straightforward way. The panelists were asked to mark the longest string which could be imbedded in a grammatical sentence and used rather than mentioned in that sentence. 1 In this way, the grammaticality of a string is clearly derivative from the grammaticality of sentences, and the panelists, therefore, were not being asked to make a judgment of a different kind than that so far investigated. It is clear, however, that the results from panelist to panelist are even less comparable than otherwise, since in addition to the idiolectal differences between panelists which would cause judgments to vary anyway, the additional factor of imagination has been added. Presumably, except for factors like tiredness, etc., mentioned 1

An amusing example of this kind of danger was given me by one of the panel members. The following string can be punctuated so as to be easily parsed: Jim where John had had had had had had had had had had pleased the teacher. The punctuation consists of a comma after Jim, quotation marks around the third had, followed by a comma, quotation marks around the sixth and seventh hads, followed by a semicolon, and quotation marks around the following two hads.

94

THE PANEL EVALUATION PROCEDURE

above, this is a constant for any panelist, but is surely not from panelist to panelist. To reiterate, however, since the absolute judgments are not important for purposes of this study, and since the intelligence factor should change only the height and not the shape of the curve, this effect can safely be ignored.

VIII

RESULTS OF THE EXPERIMENT

It will be recalled that the sentence generation phase had two aims, creation of a set of approximations which were judgment independent, and testing of the Chomsky hypothesis relative to the relation between order of approximation and grammaticalness. The first of these aims, which is of secondary importance here, is easily satisfied by the data in Appendices II and III. Appendix II consists of one string of each order from each data package, making four strings of each order of approximation from zero to five, with the substrings marked as grammatical by each subject indicated. Appendix III is a list of the same strings with the pointto-point transitions noted. A total of forty strings of each order is available immediately if needed, and, of course, an indefinite number more might be generated. The versions marked by the panel provide grammatical substrings of generated material, and the strings marked by P4 provide semantically correct as well as grammatically correct strings. In spite of the highly technical nature of the material, these strings, having well-known properties in contrast to those generated by the Shannon method, should be more usable for at least some purposes than more familiar material. THE ORDER HYPOTHESIS

Some of the questions relative to evaluation of the output of the generator were touched on above. Part of the design was, in fact, dictated by considerations of evaluation. The point is, again, to

96

RESULTS OF THE EXPERIMENT

String Length

Fraction of Strings Judged Grammatical

PI

/

/

/

P3 -

/

/

t

1

t

/ 1

i

i

I

/ /

t

«

/

/

/

*

/

/

//

/

J

«

1

/

P4

/

J

f

4



*

* -/



.1

i 2

P2

f

3

4

Order of Approximation

Figure 7. Grammaticalness Ratios, All Strings Included

5

RESULTS OF THE EXPERIMENT

97

show that the output of a Markov source becomes more grammatical as the order of approximation increases. The way chosen to approach the notion 'more grammatical' was to have the generated strings marked by a panel in such a way as to isolate the grammatical substrings, and to determine if the number of these, for each length, was monotonically increasing with increasing order. In addition, the strings were to be checked for grammatical type, to insure that the model was not simply generating longer strings of the same type, but that the grammatical diversity of the strings also approached the grammatical diversity of the patent claims. This method was chosen in preference to that of Coleman (1963), who asked linguists to rate humanly generated strings on a scale of grammaticalness which ranged from a low of 1 to a high of 6. The ordinal ranking process in effect combines a number of judgments which the method chosen here separates. As the booklets were returned, the length of each marked substring was determined and the number of grammatical substrings of length six to fifty tallied for each generated string. This tally was somewhat complicated by the fact that grammatical strings overlapped. In general, given a string of length m, the number of substrings of length n is equal to m - n + 1 . Thus, in the string for P4, order 5, in Appendix II, the first substring, of length 51, contains 46 substrings of length six and 41 of length eleven. The second substring, of length eighteen, contains 13 substrings of length six and 8 of length eleven. However, only seven of these strings for both lengths should be added to the total tally for the complete strings, since the remainder were already counted as part of the strings of length 51. By counting in this way, the maximum number of strings of each length for a generated string can easily be calculated from the formula given above, and the number of these that were judged grammatical can be determined, to give a proportion of grammatical strings of each length for each generated string. The tallies for each length and each string of each order can be summed, and these summed over all lengths, to give a measure value for each order of approximation. This is in fact, the grossest measure that should be applied to this data,

98

RESULTS OF THE EXPERIMENT

since the samples from each panelist should be kept separate. Figure 7 is a graph of this measure for all subjects. If the hypothesis that order of approximation is correlated to grammaticality holds, each curve on this graph should be monotonically increasing. We see that this is true except for one point of the curve for PI, at order 1. There are more grammatical strings of order zero than there are for order one. This point will be returned to later; for the moment, let me say that the reason is a higher tolerance of PI to long noun structures consisting of modified nouns. On the graph the curve for P2 appears to be essentially flat from order 3 to order 5. This is also unique, and will be discussed again in detail, but does not contradict the hypothesis. The curve for P4, similarly, is constant from order 0 to 1 at zero, also still satisfying the hypothesis. The explanation for this is very simple; P4 evaluated the strings for semantic as well as grammatical regularity, and found no strings as long as six words satisfied both conditions for order zero and only a few for order one. The curves also share some properties. All of them have a sharp upward change in slope from order 1 to order 2, and PI and P3 a sharp downward change from order 4 to order 5. This sharp downward trend must take place for PI, since a constant slope would have projected to 115% at order 5, an obvious absurdity. Since PI, P2, and P3 all judge well over half the strings grammatical at order 4, an increase of five more orders CANNOT be an extrapolated linear function of the previous four. This change, therefore, has no deep theoretical significance. Even the fact that such a large percentage of the strings were already judged grammatical at order 4 must be interpreted with some caution until the data on grammatical types is considered, and the possibility of repetition eliminated.

COMPLEXITY

The question of grammatical complexity, which has already arisen a number of times, must be discussed. There is an immediate

RESULTS OF THE EXPERIMENT

99

difficulty in determining what is meant by 'grammatical complexity'. The problem is somewhat ameliorated for present purposes by the fact that it is not necessary to provide a rating scale which is generally valid. The question that must be answered is, "Do the generated strings match the real patents in complexity, and if not, are the higher order strings more similar to the patents than the lower ordered ones?" This still leaves open the question of what characteristics will be considered for similarity. Since at least a twenty-five word string from every generated string was parsed, some comparisons were fairly easy. There are some obvious possibilities, for example, in considering the trees resulting from a parse as abstract structures and determining if there is any correlation between structural properties and order of approximation. As a basis for comparison, ten claims were randomly selected and the first one hundred words of each of these claims were parsed and a tree structure for the parsing was drawn. Figure 8 is an example of one of these claims and part of its associated tree. No great amount of rigor is claimed for the parsing, but since all of it was done by the experimenter, it should be relatively consistent. In general, binary rules were used except for conjunctions like and, comma, semicolon, and the like, which were treated as coordinate with the constituents they connected. Given a tree structure of the form shown, drawn on graph paper, it is quite easy to determine a number of characteristics of each tree. Consider the relation between the number of nodes on each level and some possible structures. In particular, it might be suspected that a Markov source would only produce right branching structures, as for example, that of Figure 9. In this structure, if the word is counted as level one, and each word is at that level, there is one node at levels two through twelve. One might, therefore, count the number of nodes which are present at each level in strings of fixed length in the selected claims, and compare these numbers to similar ones from the generated strings. The strings at orders zero and one are rather short for this purpose, so this analysis is restricted to strings of orders two through five. Table V gives the mean and range of the number of nodes at

100

RESULTS OF THE EXPERIMENT bo

.-s 3p C tsc .hO o 1) on i«3 .5 bO

•O o a C ° cs !» « ge •a . - 1= _o JJ * E 41 a (-1 Ls 0 f m cj a < J3 8 .-g O (A I V- o c o a % -O — u o «5 > i* — a « E_ cd u ^z.3 cj a -a zi o o h

•in0 n£ 1v ?^ cI ¿ " S « a 33 a o3 C OIu. -2 U u 2 "g 3 E b •o oLh ts rs o ® «M vi Q,u -a S3 IS 8 1 x> o "g c8 8 .¡2 E •a>->ou-o£^ s I o ' ¡3 XI T3 E£ < 0 -3, 'J3 o. a 3 Z 8 •o •3 * ,•0 * to 4> s ^ § e W> S v 5 o oB «ft SP.ti g 3 SP B « H o ¡3 ets g ot. xi yH eo s Ma .5 5 J - a >. "S •o 2 •0, ft 8 id K CO «« ti cd 60 ° •a3d .titi a u -o c8 0 S O. rt p Ego -a g «w a > •o cO gVrf O C« t) o .S cA C

r

to

x E i

0o 23

j f 03 CJ

J E in
,

TIME

IS

SATURATED

DURATION SAID

OF

THE

FIRST

NEGATIVE

ALTERNATOR) | AT

CONSTANT

VALUE

RESISTOR

FOR

TO 'SAID

FIRST

DIFFERENTIAL CAPACITOR A

IN

LOW

A

INPUT)

A

SIGNAL

IMPEDANCE

MEANS A

PATH

SAID

INPUT

A

INDUCTOR

FOR

BETWEEN

ELECTRODE*

P 4 - order 5

A

TRIGGER

TO

OF

SUBSTANTIALLY

INCLUDING

CUTOFF

TERMINAL

AMPLIFIER

VOLTAGE

PREDETERMINED

AND

APPLYING

OUTPUT

'THE

SIGNAL

SECOND

FIRST

SAID

PROVIDING INPUT

Appendix III

GENERATED STRINGS, WITH TRANSITION PROBABILITIES

Following are four sets of four strings each, consisting of one string for each panelist (the same as those in Appendix II) for the orders two through five. Following each word is a number. If this number is a decimal, it represents the transitional probability from the preceding string to the next word. If the number is an integer, the transition probability (rather its estimate) is one, and the integer represents the number of times that word followed.

172

APPENDIX III

A F I R S T (. 1 1 1 8 4 6 ) AND (. 0 5 4 9 3 5 ) SAID ( . 0 0 6 2 0 7 ) LOAD ( . 0 6 3 2 4 6 ) *, ( . 1 2 3 7 4 5 ) A (. 0 0 0 5 3 2 ) F L I P (. 4 5 3 4 9 3 ) F L O P (. 0 1 8 4 8 5 ) E L E M E N T ( . 0 4 1 0 6 2 ) O F ( . 2 1 2 3 9 7 ) SAID ( . 0 0 3 9 6 3 ) SIGNAL ( . 0 4 4 1 8 2 ) TO ( . 0 0 0 6 8 5 ) INPUT ( . 0 1 1 6 9 6 ) VOLTAGE ( . 0 0 0 1 5 9 ) E L E C T R I C ( . 0 0 9 1 2 9 ) WAVE (. 0 1 4 0 2 0 ) SIGNALS (. 0 0 3 1 0 0 ) WHEN (. 1 4 8 2 7 7 ) THE (. 0031 66) MAGNITUDE ( . 0 4 1 9 5 8 ) AND ( . 0 8 2 6 2 1 ) A ( . 0 0 9 3 1 3 ) SIGNAL ( . 0 0 9 2 7 2 ) * . (. 0 2 7 5 0 3 ) IN ( . 0 5 9 6 2 3 ) SAID (. 0 3 7 1 2 7 ) SECOND (. 0 0 2 0 0 0 ) RESISTORS (.010119) RESPECTIVELY (.170926)

(. 0 6 5 8 9 9 ) SAID

( . 0 1 1 0 5 3 ) SOURCE ( . 0 0 1 2 2 6 ) TERMINALS ( . 0 0 0 1 9 8 ) S U P P L I E D ( . 0 0 2 7 5 7 ) DURING ( . 0 5 1 6 1 4 ) A ( . 0 1 3 8 2 9 ) P L U R A L I T Y ( . 4 8 3 9 6 5 ) O F ( . 0 0 0 1 2 5 ) F E R R O E L E C T R I C ( . 0 4 8 4 4 3 ) C A P A C I T O R S ( . 0 4 6 2 8 7 ) CONNECTED ( . 0 2 4 1 2 0 ) ACROSS ( . 0 2 2 6 2 2 ) A ( . 0 0 0 1 1 8 ) TAPPING ( . 0 5 5 1 3 0 ) POINTS (. 0 2 4 1 2 1 ) AND (. 0 4 0 7 5 5 ) SECOND (. 0 0 5 4 6 9 ) O F (. 0 0 0 4 6 2 ) INFORMATION ( . 0 0 1 9 2 2 ) T R A N S F E R R E D ( . 0 3 6 3 9 6 ) B E T W E E N (. 2 7 7 9 5 4 ) SAID (. 0 0 2 8 2 1 ) E L E C T R O D E S (. 0 4 7 3 5 2 ) AND (. 0 5 4 9 3 5 ) SAID ( . 0 0 3 3 8 6 ) DIODE ( . 0 6 1 5 8 3 ) *, ( . 0 2 6 6 3 8 ) THE ( . 0 0 2 3 6 4 ) R E V E R S E ( . 0 1 3 5 8 4 ) RUNNING ( . 0 1 7 2 8 7 ) WINDING ( . 0 1 5 3 9 8 ) IN ( . 0 5 7 1 5 5 ) A ( . 0 0 7 7 3 5 ) V O L T A G E ( . 0 0 5 4 5 6 ) M E A N S ( . 0 2 5 7 4 2 ) INCLUDING (. 2 6 5 2 5 8 ) A (. 0 0 0 5 0 0 ) C A R R I E R (. 0 5 3 0 4 7 ) SIGNAL (. 0 0 3 5 0 3 ) COMPRISING ( . 0 1 3 8 2 9 ) M E A N S ( . 1 2 5 5 8 2 ) FOR ( . 0 1 5 3 1 5 ) PRODUCING (. 0 0 7 9 0 5 ) A T (. 0 7 9 0 8 7 ) SAID (. 0 5 2 5 5 9 ) F I R S T (. 0 0 9 2 6 6 ) T E R M I N A L ( . 1 7 1 5 1 1 ) O F ( . 0 1 0 7 6 9 ) E A C H ( . 1 9 1 2 8 2 ) O F ( . 2 1 2 3 9 7 ) SAID ( . 0 0 0 7 0 7 ) E L E C T R I C A L ( . 0 0 5 7 3 9 ) A P P A R A T U S ( . 0 3 6 6 5 2 ) O F ( . 2 1 2 3 9 7 ) SAID ( . 0 0 1 2 8 8 ) BRIDGE ( . 0 1 6 4 4 8 ) HAVING ( . 0 1 5 1 0 9 ) F I R S T ( . 0 0 4 0 2 3 ) S E M I ( . 0 8 0 8 9 4 ) CONDUCTIVE ( . 0 0 6 2 9 3 ) FOR ( . 0 0 7 5 2 3 ) GENERATING ( . 1 5 8 2 6 3 ) M E A N S ( . 0 0 2 1 8 7 ) INTERCONNECTING (. 2 9 2 5 6 3 ) SAID (. 0 3 7 1 2 7 ) SECOND

P I -

order 2

APPENDIX III

173

A BODY (. 006582) BEING ( . 0 0 4 9 9 5 ) DISPOSED ( . 0 5 8 0 9 8 ) O N ( . 2022521 SAID ( . 0 0 2 6 4 5 ) P R I M A R Y ( . 0 0 3 1 68) T H R E S H O L D ( . 0 1 9 3 8 8 ) R E S P O N S I V E (.37491 6) T O ( . 1 0 6 4 3 2 ) THE ( . 0 1 8 1 2 8 ) E M I T T E R (.115122) E L E C T R O D E (.135586)

( . 0 2 3 6 9 8 ) AN ( . 0 8 0 5 6 2 ) O U T P U T

( . 0 3 5 0 2 5 ) SIGNAL ( . 0 0 0 7 6 9 ) C O N T R O L (.01 3690) O F ( . 0 7 7 8 5 8 ) T H E ( . 0 0 1 3 8 8 ) S T A T E (. 000721 ) SAID (. 052559) F I R S T ( . 0 0 3 8 3 1 ) S T A G E ( . 0 1 8 3 2 0 ) COMPRISING ( . 0 3 0 5 4 9 ) AN ( . 0 0 0 6 2 8 ) AXIS ( . 0 0 2 2 7 9 ) HAVING ( . 0 0 2 3 3 4 ) PRIMARY ( . 3 1 0 0 6 2 ) WINDING ( . 0 0 0 3 5 8 ) COOPERATING (.124567) TO (.014662) BE (.021352) CONTROLLED (. 000674) LOAD (. 0081 54) *. (. 000853) A T (. 089685) A ( . 0 0 0 5 3 2 ) F L I P ( . 4 5 3 4 9 3 ) F L O P ( . 1 3 2 5 5 2 ) CIRCUIT ( . 0 2 6 7 1 7 ) O F (.001 572) O P E R A T I O N (. 263783) O F (. 077859) T H E (. 000827) CONDITION (. 086071) O F (. 000105) C O N T R O L L E D (. 147749) BY (. 070165) A (. 006547) THIRD (. 004213) D E V I C E (. 0351 23) T O (. 034664) A (. 000029) Q U O T I E N T (. 125054) T O (. 002504) CONDUCT (. 096525) C U R R E N T ( . 0 2 3 9 0 0 ) S O U R C E ( . 0 3 2 0 4 5 ) T O ( . 1 2 6 1 9 7 ) SAID ( . 0 1 2 3 6 5 ) O U T P U T ( . 0 0 7 0 0 9 ) C U R R E N T ( . 0 1 3 5 3 3 ) O F ( . 0 0 0 1 0 7 ) SOURCES ( . 0 3 7 1 0 4 ) AND ( . 0 0 5 1 1 5 ) INCLUDING ( . 2 6 5 2 5 8 ) A ( . 0 3 4 7 8 4 ) F I R S T (. 062123) TRANSISTOR (. 000252) A L S O (. 001973) C O N T R O L L E D (. 147749) BY ( . 0 7 0 1 6 4 ) A ( . 0 1 3 0 6 4 ) TRANSISTOR (. 014657) * . (. 016204) T H E ( . 0 1 8 1 2 8 ) E M I T T E R (. 115122) E L E C T R O D E ( . 0 1 1 1 2 4 ) * . ( . 0 0 1 5 5 0 ) W H E R E B Y ( . 1 54645) T H E ( . 0 2 5 2 9 5 ) O T H E R ( . 0 1 0 1 4 0 ) A N D (. 082621 ) A (. 000376) C E R T A I N (. 127966) O F (. 212397) SAID ( . 0 0 2 4 2 3 ) RESISTOR (. 097198) C O N N E C T E D (. 211969) T O ( . 1 0 6 4 3 2 ) T H E ( . 0 1 3 5 9 1 ) S E C O N D ( . 0 0 6 2 4 2 ) I M P E D A N C E ( . 0 0 7 2 6 2 ) HAVING (. 076047) AN (. 063353) I N P U T

P 2 - order 2

174

APPENDIX III

T H E C O D E D (. 037543) F O R M ( . 2 0 3 8 6 7 ) A (. 0 0 6 3 7 2 ) L O A D (. 0 0 8 2 6 7 ) R E S I S T A N C E (. 0 0 0 2 0 5 ) L Y I N G (. 143258) IN (. 000346) S I G N A L (. 003503) C O M P R I S I N G (. 2 7 1 2 0 8 ) A (. 001410) T E L E P H O N E (. 1 7 5 3 4 4 ) SYSTEM (.094358) FOR (.056069) APPLYING (.176061) A (.009611) P R E D E T E R M I N E D (.006590) ONE (.014224) CONDUCTIVITY (.295927) T Y P E (.002300) E L E M E N T S (.002135) ARRANGED (. 118644) IN (. 045234) S E R I E S (. 144805) W I T H (. 0 5 3 6 2 8 ) A (. 0 1 3 8 2 9 ) PLURALITY (.483965) O F (.022429) A (.007547) CONTROL (.007632) G R I D ( . 0 8 9 3 6 8 ) A N D ( . 0 3 2 1 7 6 ) M E A N S (. 035074) C O N N E C T E D (.211969) TO (.000223) RESTORE (.220619) THE (.006512) T R A N S I S T O R ( . 0 5 7 7 8 4 ) HAVING ( . 2 1 9 3 0 0 ) A ( . 0 1 3 0 6 4 ) T R A N S I S T O R ( . 0 1 3 6 8 9 ) O F ( . 2 1 2 3 9 7 ) SAID ( . 0 0 0 6 6 6 ) I N D U C T A N C E ( . 0 1 5 4 3 3 ) E L E M E N T (.012448) BEING (.000258) DIRECT (.392896) CURRENT ( . 0 2 3 3 4 8 ) *, ( . 0 7 1 1 9 0 ) A N D ( . 0 3 2 1 7 6 ) M E A N S ( . 1 2 5 5 8 2 ) F O R ( . 0 0 6 4 6 5 ) R E C E I V I N G (. 001 279) P O R T I O N (. 2 9 9 6 3 3 ) O F (. 0 2 2 4 2 9 ) A (. 0 0 0 5 4 3 ) S H U N T ( . 0 0 9 9 3 9 ) D I O D E ( . 0 2 7 8 9 7 ) HAVING ( . 0 0 9 8 2 9 ) T H E ( . 0 2 1 7 9 0 ) BASE (.075737)

(.054563) MEANS (.025742) INCLUDING (.006658)

T W O ( . 0 0 2 3 7 3 ) A D J A C E N T ( . 0 0 2 0 5 8 ) D E V I C E S ( . 1 0 4 8 8 2 ) *,

(.123745)

A ( . 0 1 3 8 3 0 ) P L U R A L I T Y ( . 4 8 3 9 6 5 ) O F ( . 2 1 2 3 9 7 ) SAID ( . 0 0 2 7 2 7 ) O S C I L L A T O R (. 0 0 0 2 2 3 ) O S C I L L A T E S (. 0 3 0 5 2 9 )

(. 141607) A

(.000016) UNIPOLARITY (.219083) INPUT (.006764) IMPEDANCE (. 0 0 7 1 7 4 ) B E T W E E N ( . 1 3 9 2 5 9 ) T H E (. 0 0 1 5 3 8 ) R E M A I N I N G (. 0 0 9 7 4 0 ) P O R T I O N ( . 2 9 9 6 3 3 ) O F ( . 0 7 7 8 5 9 ) T H E ( . 0 0 7 7 7 6 ) SAID ( . 0 5 2 5 5 9 ) F I R S T (.000811) TIMING (.095264) CIRCUIT (.055082) MEANS ( . 0 1 0 4 5 5 ) HAVING ( . 2 1 9 3 0 0 ) A ( . 0 0 3 6 3 2 ) S E R I E S ( . 0 4 8 9 7 3 ) C I R C U I T (.000201) SUBSTANTIALLY (.005167) OHMIC (.052889) CONTACTS (.017981) IN

P 3 - order 2

175

APPENDIX III

A DIODE (. 000624) P O T E N T I A L (. 1 07663) *, (.003222) C O L L E C T O R (. 137484) E L E C T R O D E (. 135586)

(.026638) THE (.000546)

R E L A T I V E (. 358235) TO (.000027) DAMAGE (.071830) TO (.000010) P E R T U R B (3) THE (. 000418) MAXIMUM (. 003759) BASE (.075736)

(.006454) IN (.018358) ACCORDANCE (.498476) WITH

(. 000170) WHETHER (. 165327) THE (. 019059) C O L L E C T O R (.000854) T H E R E O F (.000301) TWO (.000208) BRUSHES (. 104548) R E S P E C T I V E L Y (.013118) CONNECTING (.192340) THE (.002364) R E V E R S E (.165482) DIRECTION (. 090435)

(.023697) AN (.058535)

E M I T T E R (.018156) C O L L E C T O R (.137484) E L E C T R O D E (. 135586) *, ( 071189) AND (. 054935) SAID (. 037127) SECOND (. 009626) SOURCE (.209375) O F (.000074) SIGNALING (.108448) C U R R E N T (.010167) S U P P L Y (. 05411 7( MEANS (.031335) T O (.014662) BE (.000229) C O O L E D (2) T O (.126196) SAID (.000368) CAPACITORS (.108968) (.026638) T H E (.000470) PAIR (.454198) O F (.002016) P U L S E S (.077543) TO (.007103) P R O D U C E (.221410) A (.009611) P R E D E T E R M I N E D (.027653) T I M E (.000866) M U L T I P L E X E D (.411681) SIGNALS (.062809) TO (.000064) A T T R A C T (.154041) SAID (.000195) REGISTER (.008672) THE ( . 0 0 0 3 5 9 ) CLASS (.055696) A (.014663) PAIR (.454198) O F (.001451) SUBSTANTIALLY (.013119) NON (.023986) CONDUCTION (.103246) O F (.004010) C U R R E N T (.019147) THROUGH (. 251967) SAID (.052559) FIRST (.062123) TRANSISTOR (.000767) SIGNAL (.004315) P U L S E S (.064074) O F (.003655) O P P O S I T E (.119296) CONDUCTIVITY (. 295927) T Y P E (. 031086) HAVING (.017259) TWO (.002373) A D J A C E N T (.062013) T H E (.000021) OUTSIDE (.205567) O F ( . 0 0 2 1 8 1 ) T E R M I N A L S ( . 0 8 3 3 1 1 ) O F (.022429) A ( . 0 0 0 0 5 6 ) L E N S

P 4 - order 2

176

APPENDIX III

THE COMBINATION COMPRISING (.014613) * .

(.016136)

THREE

( . 2 0 7 4 4 9 ) S P A C E D ( . 4 4 6 9 2 9 ) L A Y E R S (8) O F (. 0 1 5 5 4 0 ) A L T E R N A T I N G (.330508) CURRENT (.003522) FOR (.030914) ENERGIZING S A I D (. 0 0 7 3 3 1 ) E L E C T R O D E S ( . 1 4 4 9 8 7 ) * ,

(.364697)

( . 1 3 7 3 8 5 ) A (. 0 0 0 1 8 4 )

P I C K (24) U P ( . 0 4 1 4 5 0 ) D E V I C E ( . 0 6 8 0 8 7 ) C O M P R I S E S ( . 3 9 4 4 6 7 ) A ( . 0 1 1 5 2 2 ) T R A N S F O R M E R (. 3 2 7 1 6 5 ) HAVING ( . 0 4 0 5 1 0 ) P R I M A R Y ( . 4 1 7 0 6 6 ) AND ( . 2 8 3 9 4 0 ) S E C O N D A R Y ( . 1 0 5 9 4 3 ) WINDING (. 1 3 5 1 7 8 ) *,

( . 1 0 7 2 9 4 ) S A I D ( . 0 0 0 2 3 6 ) C A B L E ( . 0 4 5 6 3 6 ) B E I N G (4) L O W

( . 3 6 5 4 1 7 ) C O M P A R E D ( . 3 0 8 4 4 1 ) WITH ( . 3 5 7 1 4 8 ) T H E ( . 0 0 0 6 5 6 ) P U L S E ( . 0 2 9 5 8 5 ) WIDTH (.052673) M O D U L A T E D (.166702) I M P U L S E (. 2 6 8 7 0 5 ) AND ( . 1 5 8 3 4 0 ) AN ( . 1 7 4 7 0 7 ) O U T P U T (. 0 0 4 1 2 0 ) C O N N E C T E D (.400540) TO (.036422) A (.026695) P R E D E T E R M I N E D

(.016328)

V O L T A G E ( . 1 2 1 9 8 8 ) L E V E L ( . 0 4 6 4 7 1 ) A T ( . 1 2 8 1 2 5 ) SAID ( . 0 4 6 1 6 2 ) OUTPUT (.006361) P U L S E (.002200) CIRCUIT (.083139) FOR (. 0 6 5 2 7 2 ) S A I D (. 0 0 5 4 6 0 ) T H I R D (. 0 7 1 8 3 0 ) AND (. 0 2 5 7 4 0 ) A ( . 0 0 3 2 8 4 ) T R A N S I S T O R ( . 0 0 0 2 8 3 ) M O U N T E D (3) ON ( . 2 6 6 7 8 4 ) S A I D (.004081) STORAGE (.010582) DEVICES (.040569) EACH (.352374) HAVING ( . 1 8 8 1 9 5 ) A (. 0 5 5 9 2 0 ) B A S E (. 0 0 1 4 6 3 ) WHICH (. 0 1 3 9 3 7 ) C A U S E S ( . 0 6 0 1 3 1 ) S U B S T A N T I A L (2) AND ( . 1 7 6 1 7 0 ) (1) C U R R E N T S (1) * ,

INSUBSTANTIAL

(.032754) RESPECTIVELY (.000741)

( . 4 4 5 1 7 1 ) T O ( . 2 6 5 8 3 2 ) SAID ( . 0 0 1 6 8 3 ) N E G A T I V E RESISTANCE (.010362)

COUPLED

(.174337)

( . 0 7 8 0 5 6 ) M E A N S (. 0 3 1 8 8 0 ) T O ( . 0 0 2 2 9 1 )

DEVELOP (.231345) A (.052040) RESULTANT (.036290) OUTPUT (.276519) PULSE (.004401) OVER (.206014) THE (.019938)

LATTER

(. 0 1 2 5 3 6 ) S A I D ( . 0 2 8 0 7 1 ) C O L L E C T O R ( . 1 1 7 7 6 6 ) AND ( . 1 3 3 8 6 1 ) E M I T T E R (.310113) ELECTRODES (.139297) O F

P 1 - order 3

177

APPENDIX III

A TRANSISTOR HAVING (. 064820) BASE (. 01341 6) AND (. 130803) C O L L E C T O R ( . 3 3 2 0 0 9 ) E L E C T R O D E S ( . 0 9 2 2 7 0 ) O F (. 335359) SAID ( . 0 0 0 1 4 2 ) T E S T ( . 0 7 2 0 3 4 ) P O I N T S ( . 1 1 3 0 6 7 ) WHICH (12) A R E ( . 0 4 0 8 0 3 ) O F ( . 0 4 8 3 9 3 ) ONE ( . 0 5 5 3 4 0 ) P O L A R I T Y ( . 0 7 5 3 6 9 ) AND ( . 0 0 6 7 3 0 ) P O T E N T I A L ( . 0 3 5 1 2 5 ) SOURCE ( . 0 0 3 4 8 9 ) IS ( . 0 0 3 7 2 3 ) BROKEN (. 370737) WHEREBY (2) SAID (. 003193) A P P L I E D (. 072995) I N P U T (. 0 56302) V O L T A G E ( . 0 0 1 1 2 8 ) L E V E L (. 075731) *,

(.087140)

A ( . 0 0 0 7 8 1 ) R E G E N E R A T I V E ( . 0 4 2 5 4 8 ) P U L S E (.357223) A M P L I F I E R (. 058440) *, (. 100270) SAID (. 000379) G E N E R A T I N G (. 124190) STAGES ( . 0 1 7 5 7 6 ) AND ( . 0 9 6 0 5 9 ) A ( . 0 4 6 4 2 5 ) C O L L E C T O R ( . 0 1 3 4 8 9 ) * . (. 065917) AN (. 001490) E L E C T R I C A L L Y ( . 0 9 1 4 7 2 ) CONDUCTIVE ( . 0 4 4 6 4 4 ) D E V I C E ( . 0 1 2 4 6 7 ) INCLUDING ( . 0 2 8 3 2 2 ) AT ( . 4 9 8 9 1 3 ) L E A S T (. 2201 55) ONE ( . 1 7 2 7 0 2 ) O F ( . 3 8 2 3 5 5 ) SAID ( . 0 0 0 1 3 8 ) A D J A C E N T (. 024274) S U R F A C E S (. 064611 ) HAVING (. 046095) R E S P E C T I V E ( . 0 4 8 9 4 3 ) C O N T R O L (.13471 5) T E R M I N A L S ( . 1 9 3 5 6 9 ) », (. 001536) R E S P E C T I V E ( . 0 4 0 2 6 3 ) MEANS ( . 1 1 7 2 6 5 ) FOR ( . 0 5 3 0 6 4 ) CONNECTING ( . 1 8 6 8 5 5 ) THE ( . 0 0 4 0 2 6 ) R E S P E C T I V E ( . 0 0 1 1 1 4 ) BODY (. 333695) AND (. 015208) D I F F E R I N G ( . 4 8 2 1 9 7 ) IN (. 41 6697) CONDUCTIVITY ( . 0 8 2 2 3 0 ) T H E R E F R O M (12)

(.143333) A

( . 0 5 5 5 3 1 ) SECOND (. 005301) C O L L E C T O R ( . 0 2 5 641) BEING ( . 3 0 9 5 2 3 ) C O N N E C T E D ( . 2 6 3 9 7 6 ) TO ( . 1 5 0 7 2 3 ) T H E ( . 0 0 0 7 2 1 ) L E V E L ( . 4 4 3 9 0 1 ) O F (. 009615) AN (. 025559) A L T E R N A T I N G (. 056388) V O L T A G E (. 011273) FOR (. 085643) SAID (. 062143) TRANSISTOR (. 005178) D E V I C E S ( . 0 3 5 3 2 1 ) BEING ( . 1 0 7 8 3 9 ) C O N N E C T E D ( . 0 0 0 5 4 7 ) T H E R E T O ( . 2 8 1 2 5 2 ) *, ( . 0 0 8 4 8 1 ) W H E R E B Y ( . 0 1 5 6 5 4 ) T O ( . 0 2 0 2 7 8 ) C O N N E C T ( . 0 0 0 7 2 9 ) SUCH (1) O T H E R ( . 1 5 4 5 9 9 ) SIDE ( . 4 5 3 1 0 2 ) O F

P 2 - order 3

178

APPENDIX III

AN A R R A N G E M E N T F O R ( . 0 0 7 7 2 5 ) E N E R G I Z I N G ( . 3 6 4 6 9 8 ) SAID (. 003665) BRAKING ( . 1 6 6 7 9 8 ) WINDING ( . 4 3 1 0 4 8 ) MEANS ( . 0 0 2 7 8 7 ) INCLUDING (.259642) A (.001162) TWO (.048265) PHASE (.035401) S H I F T I N G ( . 0 3 4 6 8 6 ) C I R C U I T S ( . 1 4 4 1 6 1 ) P A S S (2) T H R O U G H ( . 2 1 6 8 8 8 ) T H E (. 001268) I N D I C A T O R S (5) T O (5) SAID (. 000110) B L A N K I N G (. 1 95701 ) P U L S E S (. 032677) A P P L I E D ( . 2 1 5003) T O ( . 2 6 6 6 7 9 ) SAID ( . 0 4 8 0 3 5 ) F I R S T ( . 0 0 0 1 4 0 ) C O M P A R I S O N ( . 2 2 3 1 6 3 ) P U L S E ( . 2 8 1 2 3 7 ) AND ( . 0 3 6 9 9 9 ) HAVING ( . 0 7 0 1 2 5 ) AN ( . 1 4 1 1 2 7 ) I N P U T ( . 0 2 2 9 4 6 ) P U L S E ( . 0 0 0 9 8 9 ) R E T U R N I N G (2) SAID ( . 0 7 2 9 4 7 ) TIMING ( . 0 2 6 5 0 9 ) R E L A Y ( . 2 0 5 1 5 8 ) *, ( . 0 9 3 6 8 2 ) A N D (. 134081) M E A N S (. 235225) F O R (. 005920) E S T A B L I S H I N G ( . 0 2 0 1 3 3 ) IN ( . 1 5 7 1 8 7 ) T H E ( . 0 2 5 4 8 8 ) A B S E N C E ( . 4 8 8 5 4 5 ) O F ( . 0 1 1 5 1 3 ) I N P U T ( . 0 2 9 5 5 0 ) S I G N A L ( . 0 1 8 7 5 8 ) IS (. 124053) A P P L I E D ( . 3 0 7 5 3 9 ) T O ( . 2 6 6 6 7 9 ) SAID (. 028411 )BASE ( . 0 5 0 5 1 0 ) E L E C T R O D E S ( . 1 9 9 6 8 8 ) *, ( . 0 2 7 7 3 8 ) A N ( . 0 9 0 3 9 0 ) O U T P U T ( . 0 3 0 9 0 8 ) E L E C T R O D E ( . 0 8 9 0 5 0 ) AND ( . 1 3 4 5 6 3 ) SAID ( . 0 0 0 1 9 9 ) CHARGING ( . 0 1 4 7 0 6 ) C A P A C I T O R (. 049117) S T O R A G E ( . 4 8 6 9 5 4 ) M E A N S (. 104847) ( . 0 0 0 8 8 9 ) D E P E N D E N T {, 470897) ON ( . 4 5 4 8 0 9 ) T H E ( . 0 0 3 9 0 1 ) R E L A T I V E ( . 0 3 4 0 0 6 ) V A L U E S ( . 0 2 8 4 6 8 ) AND ( . 0 6 9 7 2 0 ) P O L E D ( . 0 9 5 0 0 0 ) F O R ( . 1 7 5 1 2 9 ) F O R W A R D ( . 1 9 1 1 9 0 ) BIASING ( . 0 6 4 2 9 7 ) O F ( . 1 6 0 8 4 4 ) T H E (. 001072) P O L A R I T Y ( . 4 0 3 9 6 7 ) O F (. 243814) SAID ( . 0 0 3 2 6 3 ) P L U R A L I T Y ( . 4 5 8 7 1 1 ) O F ( . 0 0 1 9 1 7 ) S P A C E D ( . 0 1 1 5 5 1 ) E N D ( . 0 6 3 7 3 1 ) F A C E S ( . 2 3 0 0 2 0 ) O F ( . 2 7 4 5 1 9 ) SAID (.044637) SECOND (.001077) CONDENSER (.011379) BEING (.338753) C O N N E C T E D ( . 2 6 3 9 7 6 ) T O ( . 2 0 7 2 9 6 ) SAID ( . 0 4 8 0 3 5 ) F I R S T ( . 0 0 6 8 1 4 ) D I O D E (. 009136) C O U P L E D (. 202250) B E T W E E N ( . 1 5 6 2 3 3 ) T H E

P 3 - order 3

179

APPENDIX III A S U M O U T P U T ( . 1 5 7 4 3 8 ) WINDINGS ( . 0 6 4 0 4 9 )

(.007780) CIRCUIT

(.477755) MEANS (.024176) TO (.016111) S U P P L Y (.125266) A (.022361) BIAS (.017515) R E S I S T O R (.093307) *,

(.150278) A

(.011318) THIRD (.002823) C O L L E C T O R (.107305) BEING

(.309523)

C O N N E C T E D (. 0 8 8 6 8 4 ) IN (. 0 0 5 0 3 7 ) T A N D E M (. 1 5 8 8 2 3 ) * ,

(. 2 3 5 8 8 3 )

E A C H (. 2 6 4 9 9 2 ) O F ( . 4 3 4 8 3 2 ) S A I D (. 0 4 4 6 3 7 ) S E C O N D (. 1 1 7 4 2 5 ) TRANSISTOR (.012078) MEANS (.080423) * , ( . 0 8 2 2 8 4 ) C O N N E C T I N G ( . 0 0 4 5 1 1 ) AN

(.054922) MEANS

(.005359)

ENERGIZING

( . 0 4 5 8 5 1 ) C U R R E N T ( . 0 4 5 0 1 1 ) THROUGH ( . 2 9 7 5 8 6 ) SAID ( . 0 0 6 4 8 0 ) R E C T I F I E R ( . 0 4 1 1 4 2 ) AND ( . 1 0 1 2 7 9 ) A ( . 0 0 2 3 7 2 )

CONNECTION

( . 0 8 8 1 7 5 ) T O ( . 1 8 9 2 5 6 ) SAID ( . 0 0 3 9 5 4 ) L I N E ( . 0 3 1 5 6 5 )

CIRCUIT

(. 0 3 5 7 2 2 ) A S S O C I A T E D ( . 4 5 7 2 1 4 ) W I T H (. 1 9 4 6 4 7 ) S A I D (. 0 0 1 1 2 8 ) CONDENSER (.108129)

( . 0 8 0 0 4 8 ) AND ( . 0 2 0 3 4 5 ) S A I D ( . 0 0 2 1 7 1 )

A M P L I F I E R (.066665) *,

(.007452) COMPRISING (.243347) A

(.040696) PAIR (.495586) OF (.060905) TRANSISTORS E A C H ( . 4 2 8 4 5 8 ) HAVING ( . 1 8 8 1 9 5 )

A (.000395)

(.098748)

CONDUCTION

(. 11 5 6 4 2 ) P A T H ( . 0 0 3 7 0 3 ) D U R I N G (. 1 7 2 7 3 3 ) T H E (. 0 0 3 9 6 1 )

CLEAR

( . 4 4 5 7 9 0 ) AND (. 2 0 6 6 2 1 ) S E T (. 0 8 8 4 1 2 ) I N P U T S ( 3 ) AND (. 1 4 4 8 3 8 ) AN (. 0 0 1 6 7 1 ) I N T E R N A L (. 3 5 1 2 8 9 ) C O M B U S T I O N (. 3 5 8 8 0 5 )

ENGINE

(. 1 5 8 0 7 0 ) HAVING (. 2 1 5 7 6 6 ) A (. 0 2 1 5 3 8 ) P L U R A L I T Y ( . 4 9 8 7 9 6 ) (.000639) SATURATING (.089277) MEANS ( . 0 8 9 9 0 5 ) FOR

OF

(.002590)

D E T E C T I N G ( . 0 1 6 7 2 0 ) S M A L L ( . 0 9 1 0 6 2 ) V A R I A T I O N S (8) IN (. 0 0 1 6 8 5 ) M A C H I N I N G (5) G A P ( . 2 6 7 1 1 7 ) * ,

( . 0 0 9 6 3 8 ) SO ( . 4 5 2 3 0 0 )

THAT (.157793) THE (.002736) CHARGE (.096536)

CARRIERS

( . 0 0 2 7 9 9 ) U N C O N T R O L L E D (3) B Y ( 3 ) S A I D ( . 0 0 1 7 1 7 ) P R I M A R Y (.352626) WINDING ( . 1 1 8 9 5 9 ) A N D

P 4 - order 3

180

APPENDIX III

THE TRANSISTOR A M P L I F Y I N G CIRCUIT ( . 2 8 9 7 4 4 ) INCLUDING (. 027889) F I R S T ( . 4 6 1 9 8 8 ) AND ( . 4 9 2 2 0 9 ) SECOND ( . 0 0 2 5 8 2 ) LOAD ( . 0 8 3 3 7 7 ) I M P E D A N C E ( . 0 6 5 5 0 2 ) E L E M E N T S ( . 0 8 0 0 6 1 ) CONNECTED ( . 3 1 0 8 5 8 ) IN ( . 3 4 0 9 1 7 ) S E R I E S ( . 0 0 0 5 2 4 ) BY ( . 2 0 0 0 7 8 ) A ( . 4 2 6 9 8 4 ) COMMON ( . 0 4 2 4 8 8 ) D I R E C T ( . 4 2 5 7 1 6 ) C U R R E N T (. 151406) CIRCUIT ( . 0 1 9 2 4 4 ) INCLUDING ( . 0 8 4 6 1 1 ) SAID ( . 0 0 3 8 9 0 ) D I R E C T ( . 4 5 9 7 4 4 ) C U R R E N T ( . 0 3 4 3 0 0 ) P O T E N T I A L ( . 0 1 7 2 1 4 ) FOR ( . 1 2 5 0 3 5 ) A P P L Y I N G (. 125043) ENERGIZING (. 343380) P O T E N T I A L S ( . 4 5 6 6 2 1 ) TO ( . 3 2 2 2 3 9 ) SAID ( . 0 0 5 4 3 5 ) O S C I L L O S C O P E ( . 3 3 8 5 7 2 ) WHEN ( . 2 5 9 1 2 1 ) A (3) SIGNAL (. 014032) R E P R E S E N T I N G ( . 0 2 4 5 6 8 ) A ( . 2 5 0 0 7 2 ) S E T (8) POINT (8) WHICH (8) IS ( . 1 1 8 5 0 4 ) CONNECTED (.072649)

IN ( . 1 7 1 5 8 6 ) S E R I E S ( . 2 3 5 4 8 6 ) WITH ( . 2 6 8 9 4 8 ) SAID

( . 0 1 5 9 1 6 ) B A S E ( . 3 1 3 5 3 9 ) E L E C T R O D E (. 142713) * ,

(.002748)

SECOND (. 218760) MEANS (. 169850) F O R (. 0 1 3 2 2 8 ) D E T E C T I N G (. 055559) AN (. 142928) OUT (8) O F ( . 2 3 5 3 5 1 ) F R A M E (16) CONDITION ( . 2 5 1 0 8 7 ) B E T W E E N (8) SAID ( . 0 5 4 0 1 5 ) R E M O T E (. 192394) COUNTING (12) MEANS ( . 1 4 0 0 1 9 ) * , ( . 2 9 6 7 6 7 ) MEANS ( . 0 0 2 1 62) E L E C T R I C A L L Y ( . 4 3 1 0 9 1 ) CONNECTING (. 190042) SAID ( . 0 7 9 0 9 6 ) F I R S T ( . 0 1 1 5 6 3 ) B A S E ( . 2 2 6 7 2 7 ) E L E C T R O D E ( . 0 2 8 1 2 4 ) O F ( . 0 1 9 2 6 3 ) ONE ( . 4 0 4 2 5 5 ) O F ( . 4 1 2 1 5 0 ) SAID ( . 0 0 5 5 0 2 ) LOAD ( . 0 1 4 2 4 9 ) R E S I S T A N C E ( . 0 2 5 8 7 6 ) AND ( . 0 4 8 4 2 1 ) THAT (.309017) OF (.438581) THE (.001512) EMITTER (.056917) E L E C T R O D E ( . 3 9 1 2 2 6 ) O F (. 358780) SAID ( . 0 0 2 8 6 6 ) NPN ( . 4 4 8 3 8 7 ) TRANSISTOR ( . 0 1 7 1 5 4 ) IS ( . 0 6 6 1 9 9 ) S U P P L I E D ( . 1 9 4 0 8 0 ) AN (1) O P E R A T I N G (1) V O L T A G E ( . 0 7 0 3 4 9 ) MAY (9) B E ( . 3 7 5 1 4 0 ) A P P L I E D (. 207839) TO (. 158291) T H E

P i -

order 4

181

APPENDIX III

A S H I F T R E G I S T E R AS (. 2 6 0 9 0 8 ) IN ( 2 9 ) C L A I M ( . 1 3 2 4 2 7 ) 1 (. 1 8 7 8 1 7 ) W H E R E I N (. 1 0 6 6 5 4 ) T H E ( . 0 0 6 0 2 4 ) R A T I O ( . 4 3 3 3 7 9 ) O F ( . 0 0 8 3 0 6 ) L E S S (4) THAN (. 1 0 6 0 9 3 ) A B O U T ( . 0 1 2 9 2 4 ) 1 ( . 2 6 3 3 6 1 ) * .

(.409170)

4 ( . 4 4 7 1 4 3 ) AND (8) 3 (8) * . ( . 1 3 8 2 9 4 ) H A V E (3) A (3) C O M M O N ( . 3 8 3 4 8 1 ) Z O N E ( 2 5 ) O F ( . 0 6 9 5 0 7 ) SAID ( . 0 4 1 1 9 2 ) O N E ( . 1 1 6 1 2 9 ) T R A N S I S T O R (. 0 9 5 6 6 6 ) * ,

(. 0 9 8 1 5 6 ) A (. 1 0 2 3 9 8 ) S E C O N D ( . 0 0 2 1 1 9 )

UNIDIRECTIONAL (.068204) DEVICE (.109432)

CONNECTED

( . 1 1 8 0 4 3 ) B E T W E E N ( . 3 4 8 7 8 6 ) SAID ( . 0 1 7 2 3 8 ) S O U R C E ( . 1 5 3 5 7 0 ) O F ( . 0 1 3 9 7 2 ) C U R R E N T (. 0 7 1 7 2 9 ) HAVING (. 3 2 3 6 4 5 ) A (. 0 1 1 3 7 4 ) V A L U E ( . 0 1 1 9 5 2 ) IN ( . 4 1 8 6 6 8 ) E X C E S S (7) O F ( . 1 3 2 3 0 9 ) A ( . 0 5 8 1 6 2 ) C R I T I C A L ( . 4 5 3 3 2 0 ) V A L U E ( . 0 5 2 6 3 3 ) C O M P R I S I N G (4) A ( . 0 2 6 3 3 6 ) P O I N T (.311010) CONTACT (.264997) TRANSISTOR (.030511)

OSCILLATOR

(7) I N C L U D I N G ( . 4 4 6 6 9 9 ) A ( . 0 8 2 8 3 4 ) F I R S T ( . 0 4 8 8 9 9 ) AND ( . 3 5 1 0 5 8 ) A (.496057) SECOND (.007365) S T A B L E (.351285) S T A T E

(.019420)

W H E R E I N (. 1 1 5 5 5 4 ) A ( . 2 5 1 8 5 0 ) V O L T A G E ( . 2 5 0 1 8 2 ) A P P E A R S ( . 0 6 6 7 5 2 ) 18 (2) D E G R E E S (2) O U T (8) O F ( 1 0 7 ) P H A S E ( . 0 1 0 7 5 1 ) R E L A T I O N (. 1 9 3 6 0 7 ) * ,

( . 3 5 3 1 6 9 ) AND (. 2 6 7 4 2 2 ) M E A N S (. 2 3 1 8 0 8 )

F O R ( . 0 0 0 6 3 3 ) S U F F I C I E N T L Y ( . 2 5 2 0 4 7 ) I N C R E A S I N G (4) S A I D

(4)

T H I R D (4) P O T E N T I A L ( . 0 3 2 4 2 3 ) D I F F E R E N C E ( . 1 8 6 5 3 0 ) A C R O S S (; 4 1 3 2 6 6 ) SAID (. 0 6 6 1 7 6 ) F I R S T (. 0 4 8 2 6 5 ) R E S I S T O R ( . 1 2 8 8 5 7 )

*,

( . 0 7 4 5 8 3 ) T H E ( . 0 0 4 2 0 1 ) V A L U E ( . 4 7 5 0 2 4 ) O F ( . 0 0 4 0 1 9 ) AN ( . 3 1 8 1 5 7 ) I N D E P E N D E N T (. 2 5 7 3 7 9 ) V A R I A B L E (. 3 7 5 7 1 7 ) I N P U T ( 1 6 ) V O L T A G E ( . 3 1 2 6 8 2 ) « . ( . 0 1 6 3 3 1 ) AND

P 2 - order 4

182

APPENDIX III

A M O N I T O R I N G D E V I C E AS (12) D E F I N E D (99) IN ( . 4 9 8 7 5 5 ) C L A I M ( . 0 0 7 4 4 7 ) 15 ( . 0 2 0 5 6 0 ) W H I C H ( . 2 5 3 7 5 9 ) I N C L U D E S (2) M E A N S ( . 3 4 2 3 1 0 ) F O R ( . 0 7 2 9 4 9 ) A P P L Y I N G ( . 0 7 4 3 1 5 ) SAID ( . 0 0 7 7 2 0 ) S I G N A L S ( . 3 3 3 7 1 2 ) T O ( . 1 2 8 9 6 2 ) SAID ( . 0 2 8 8 4 6 ) B A S E ( . 0 3 4 9 0 6 ) * , (. 125000) A (. 166977) C O L L E C T O R (. 001435) P O T E N T I A L (. 0 3 3 7 9 3 ) O F ( . 3 1 5 3 4 3 ) SAID ( . 0 0 3 6 2 4 ) D I O D E S ( . 0 3 4 2 4 3 ) F O R ( . 0 3 7 9 5 7 ) CAUSING ( . 1 1 6765) S U F F I C I E N T (1) A D D I T I O N A L (1) C U R R E N T (1) T O (. 255092) F L O W (. 049279) F R O M (. 375099) SAID (. 030239) E M I T T E R (. 2 2 8 0 3 3 ) E L E C T R O D E (. 123910) *, (. 250970) A (. 001732) B R E A K D O W N ( . 2 8 6 2 7 0 ) D E V I C E ( . 4 3 1 0 9 7 ) HAVING (14) T W O (.300122) TERMINALS (.362001)

(.006102) FIRST (.073945) MEANS

(. 008405) A D A P T E D (. 229745) T O (. 050134) C L A M P (. 254527) SAID ( . 0 2 0 5 2 7 ) S E C O N D (2) B A S E ( . 2 6 6 4 7 8 ) E L E C T R O D E ( . 2 0 6 0 5 4 ) *, ( . 0 4 7 3 8 1 ) M E A N S ( . 0 3 5 8 9 3 ) I N C L U D I N G ( . 0 8 7 2 8 6 ) SAID ( . 0 0 2 8 9 0 ) M E T E R (3) A N D (. 176880) T H E (, 106401) C O M M O N (. 0 7 5 0 2 2 ) E L E C T R O D E (.356677) O F (.120968) EACH (.016038) STAGE ( . 0 5 0 0 1 1 ) HAVING ( . 3 2 0 5 5 7 ) A ( . 0 8 7 1 1 0 ) F I R S T ( . 0 2 1 0 4 9 ) I N P U T ( . 1 1 1 2 9 4 ) C I R C U I T (. 0 5 0 2 5 8 ) C O N N E C T E D (. 192999) B E T W E E N ( . 3 0 5 0 7 6) SAID ( . 0 0 3 1 1 2 ) C A P A C I T O R ( . 4 61166) A N D (. 194789) SAID (. 002583) O U T P U T ( . 0 0 4 6 2 1 ) D I O D E ( . 1 2 7 3 0 7 ) *,

(.338108)

SAID ( . 0 1 0 3 6 2 ) F E E D B A C K ( . 1 7 0 6 9 4 ) M E A N S ( . 0 4 7 6 1 2 ) C O M P R I S E S (.424587) A (.032185) TRANSISTOR (.049701) COMPRISING (.399472) A ( . 0 9 9 6 8 8 ) B O D Y (. 018957) HAVING (. 3 9 7 3 7 7 ) T W O (. 015658) INCLINED

(3) S U R F A C E S (3)

(. 254385) A (. 0 8 7 4 0 9 ) F I R S T

(.006687) LOAD (.077875) SWITCHING (.474706) MEANS

P 3 order 4

183

APPENDIX III

AN E L E C T R I C A L S Y S T E M * ,

(.331786) A (.022513) FIRST

(.000826)

M E T A L L I C (10) L A Y E R ( . 4 0 1 2 3 0 ) F O R M I N G (12) A ( . 3 1 4 6 1 5 ) P L U R A L I T Y (55) O F ( . 0 0 9 3 1 5 ) C O N T R O L ( . 0 5 1 0 2 8 ) C I R C U I T S ( . 0 3 4 3 5 2 ) INCLUDING ( . 1 1 1 1 2 6 ) A ( . 0 0 3 6 6 8 ) T U N N E L ( . 3 7 6 0 8 0 ) D I O D E ( . 0 5 0 8 7 6 ) D E V I C E ( . 0 9 7 3 9 6 ) IN ( . 1 3 0 5 4 2 ) S E R I E S

(.352961)

WITH ( . 0 2 5 8 3 7 ) EACH ( . 2 9 6 8 8 0 ) O T H E R ( . 0 1 9 3 9 0 ) B E T W E E N SAID ( . 0 5 0 4 3 6 ) P A I R ( . 4 8 9 3 1 1 ) O F ( . 0 1 4 6 9 8 ) D E V I C E S

(.455102)

(.006045)

C O R R E S P O N D I N G (1) T O ( . 3 7 6 3 7 8 ) T H E ( . 0 1 4 6 3 3 ) S I G N A L ( . 0 7 9 4 0 2 ) INPUT (.108706) CIRCUIT (.033993) *,

(.124166) A (.045283) FIRST

(.005862) SEMI ( . 3 7 0 2 7 6 ) CONDUCTOR (.246875) DEVICE ( . 0 2 5 8 8 1 ) AND (. 2 8 9 6 2 8 ) T H E (. 0 0 6 4 1 0 ) R E L E A S E (4) O F ( . 1 0 5 7 5 1 ) A (4) C H A R G E (. 0 1 6 0 5 4 ) E Q U I V A L E N T (1) T O (1) T H E (. 1 9 4 0 3 2 ) V E C T O R ( . 4 7 5 7 1 2 ) S U M (25) O F ( . 0 4 0 0 0 3 ) V O L T A G E S (2) F R O M . ( . 3 4 9 7 2 4 ) SAID ( . 1 1 3 1 7 2 ) S O U R C E S ( . 1 3 2 1 2 5 ) T O ( . 1 7 0 5 5 6 ) SAID ( . 0 2 4 3 8 1 ) O U T E R ( . 3 1 3 5 9 8 ) ZONES ( . 2 3 9 7 8 2 ) * ,

( . 0 1 4 7 1 4 ) E I T H E R (1) IN (1)

T H E (1) R E V E R S E ( . 4 8 1 3 5 5 ) D I R E C T I O N ( . 0 4 9 9 4 1 ) W I T H ( . 4 5 4 9 7 2 ) R E S P E C T (.494488) TO (.144358) THE (.007110) CONTROL (.011177) WINDINGS ( . 1 3 4 2 6 8 ) IN ( . 0 4 0 8 4 4 ) E A C H ( . 1 7 0 9 6 9 ) M A G N E T I C (4) A M P L I F I E R ( . 0 9 4 1 6 3 ) T O ( . 0 7 1 9 4 3 ) R E S E T (4) T H E ( . 0 4 5 9 0 6 ) O T H E R (3) C O R E (. 0 2 8 9 6 2 ) WHICH (2) A R E (2) O F (. 0 1 5 1 9 2 ) G R E A T E R ( . 2 0 4 4 3 9 ) D U R A T I O N (1) THAN (1) T H E (. 1 1 6 6 4 5 ) T H E R M A L (3) T I M E (3) C O N S T A N T ( . 3 1 6 4 0 2 ) O F ( . 3 1 7 5 8 1 ) SAID ( . 0 9 6 9 6 0 ) R E S I S T O R ( . 0 2 3 0 0 2 ) C O N N E C T E D ( . 0 4 4 1 6 3 ) A C R O S S (. 0 0 5 3 8 2 ) A T

P 4 - order 4

184

APPENDIX III

IN AN A P P A R A T U S O F T H E (.315212) C H A R A C T E R

(.482924)

D E S C R I B E D ( . 2 8 0 5 8 3 ) *, ( . 2 5 6 4 3 5 ) A ( . 1 3 7 5 4 6 ) P A I R (11) O F ( . 0 0 6 2 4 4 ) O P P O S I T E L Y ( . 3 7 8 1 8 5 ) P O L E D ( . 0 4 5 0 3 5 ) F R O N T (9) T O (9) F R O N T (9) D I O D E S (9) C O N N E C T E D (9) I N (9) S E R I E S ( . 0 4 6 7 4 8 ) A C R O S S ( . 0 7 6 6 0 2 ) T H E ( . 0 2 3 2 6 4 ) A M P L I F I E R (2) O U T P U T C I R C U I T ( . 1 2 7 7 0 6 ) T O (. 3 1 3 8 7 6 ) S A I D (. 2 2 9 7 3 9 ) I N P U T

(.448346)

(.060630)

C I R C U I T S ( . 3 7 2 0 9 5 ) *, ( . 0 6 6 2 6 5 ) A (. 1 9 0 6 2 0 ) S O U R C E (18) O F (. 0 2 1 0 0 0 ) U N I D I R E C T I O N A L ( . 1 5 1 7 1 4 ) P O T E N T I A L ( . 0 3 0 5 5 4 ) A N D (. 0 6 3 2 7 1 ) T H E (1) C A T H O D E ( . 3 6 6 1 7 7 ) O F ( . 3 1 5 8 2 2 ) S A I D ( . 0 0 1 9 7 4 ) V A L V E (4) A N D ( . 3 8 5 2 7 4 ) A ( . 1 1 1 4 8 7 ) G E R M A N I U M (1) J U N C T I O N (1) T Y P E (1) R E C T I F I E R (1) C O N N E C T E D (1) D I R E C T L Y (1) B E T W E E N (.428810) SAID (.077777) C O L L E C T O R (.358840) AND ( . 2 6 4 7 5 3 ) B A S E ( . 0 0 0 5 7 5 ) C I R C U I T S ( . 2 6 9 2 3 6 ) O F (2) D I A M E T R I C A L L Y (?) O P P O S I T E ( . 3 0 1 6 1 4 ) O D D (1) S T A G E S (8) A N D (8) S I M I L A R (8) M E A N S (8) I N T E R C O N N E C T I N G ( . 4 4 0 9 9 8 ) D I A M E T R I C A L L Y (7) O P P O S I T E (. 2 5 1 0 1 6 ) E V E N (7) S T A G E S (. 3 1 7 4 2 8 ) A S (5) R E S P E C T I V E (5) C O M P L E M E N T A R Y

(5) S E T S ( . 21 2 5 5 0 ) A R R A N G E D

(2) F O R (2) S W I T C H I N G (2) O N E (3) S T A G E (3) O F (3) A ( . 1 1 7 6 5 3 ) S E T (4) T O (4) O N E (4) S T A B L E (. 1 3 5 6 2 6 ) C O N D U C T I O N (1) S T A T E (.251662) T O (.213431) ANOTHER (.404195) UPON (.447324) A P P L I C A T I O N (8) O F (8) A N ( . 3 7 5 2 0 2 ) I N P U T ( . 0 2 5 0 9 2 )

IMPULSE

(2) O F (2) P R E D E T E R M I N E D (2) P O L A R I T Y (2) T H E R E T O TO

P 1 - ordar 5

(.263601)

185

APPENDIX III

A S U B S C R I B E R S E T IN A C C O R D A N C E (14) WITH (. 09541 7) A ( . 0 0 4 5 7 4 ) S E N S E D (2) P H E N O M E N A ( . 2 7 7 1 4 3 ) A N D (1) C O N N E C T E D (1) IN (1) S E R I E S ( . 1 3 8 2 0 2 ) B E T W E E N (. 3 0 5 9 4 7 ) SAID ( . 0 0 2 4 7 2 ) U N I L A T E R A L (4) C O N D U C T I N G ( . 2 5 1 0 3 8 ) E L E M E N T ( . 0 4 5 8 4 1 ) CONNECTED (.454628) WITH (.203090) THE (.287562) C O L L E C T O R (.229199) E L E C T R O D E (.425826) O F (.043224) EACH (.233511) T R A N S I S T O R ( . 0 1 3 4 5 1 ) IS ( . 0 3 4 6 8 1 ) M A T C H E D (1) T O (4) T H E ( . 2 5 3 3 5 1 ) T U R N (3) ON (3) C H A R A C T E R I S T I C (3) O F (3) T H E (3) O T H E R (. 136338) O F ( . 4 7 0 1 2 0 ) SAID ( . 0 2 3 9 3 8 ) O U T P U T ( . 0 0 3 4 7 2 ) L E A D (. 1)

(. 152344) T H E ( . 2 9 5 0 4 2 ) B A S E (4) E L E M E N T (. 234831)

O F ( . 0 5 4 6 0 3 ) O N E (6) O F ( . 4 2 0 6 9 6 ) SAID ( . 0 0 9 0 3 0 ) D I O D E S ( . 0 3 9 1 1 2 ) F O R ( . 0 6 2 5 5 0 ) BIASING ( . 2 6 2 5 8 4 ) T H E (3) F I R S T ( . 0 4 8 5 4 0 ) S E M I C O N D U C T O R (2) T O ( . 0 6 8 2 2 2 ) R E C E I V E (1) A (1) C H A R G E (. 135251 ) U P O N (1) A (1) F L O W (1) O F ( . 4 4 1 0 7 8 ) C U R R E N T ( . 0 3 1 2 6 2 ) B E T W E E N ( . 0 9 7 2 6 1 ) T H E ( . 0 3 0 8 8 6 ) F I R S T (1) T W O (1) E L E C T R O D E S (. 130901) A N D (. 250057) SAID ( . 0 4 8 4 1 0 ) T H I R D (. 255979) E L E C T R O D E (. 188075) *, ( . 0 9 4 6 2 2 ) A (. 033558) C H A R G E (2) S T O R A G E (. 3 9 3 6 8 8 ) C I R C U I T (7) C O M P R I S I N G ( . 4 5 4 9 8 2 ) A ( . 0 1 7 4 9 9 ) P L U R A L I T Y (272) O F (.012309) CASCADED (.300143) STAGES (.400352)

(.466007)

EACH (.088592) COMPRISING (.456712) A (.032130) SEMICONDUCTOR (.386444) *, (.339071) A (.206362) FIRST (.464511) O F (.486498) SAID ( . 0 0 3 8 0 9 ) T R I G G E R ( . 3 7 8 0 7 5 ) E L E M E N T S ( . 1 1 3 0 8 6 ) W H E R E B Y (2) S T R A I G H T

P 2 - order 5

186

APPENDIX III

A P P A R A T U S A S D E F I N E D IN C L A I M (. 0 1 5 9 6 7 ) 10 (. 2 1 2 5 0 8 ) W H E R E I N (. 3 2 2 1 1 7 ) SAID ( . 0 1 4 9 9 7 ) O N E (2) E L E C T R O D E (. 1 7 9 0 3 0 ) IS (. 2 8 2 3 2 9 ) T H E (5) E M I T T E R ( . 3 6 3 3 6 0 ) A N D (5) SAID ( . 4 2 9 1 3 0 ) O T H E R

(.048237)

L I N E (. 2 8 6 7 8 9 ) C O N D U C T O R (. 1 8 5 6 1 3 ) A N D (. 1 0 6 4 5 5 ) B E T W E E N

(1)

SAID (1) P H A S E (1) I N V E R T I N G (2) C I R C U I T (2) A N D (2) SAID ( . 3 1 6 2 2 6 ) C O M B I N I N G (1) C I R C U I T ( . 3 0 9 5 4 5 ) F O R ( . 3 3 8 1 4 4 ) R E V E R S I N G (2) T H E ( . 2 5 6 6 0 2 ) C U R R E N T (4) F L O W (4) O V E R (4) SAID ( . 2 5 2 3 8 3 ) L I N E ( . 4 0 3 6 2 7 ) C O N D U C T O R S (5)

(.102683)

I N C L U D I N G (1) M E A N S (1) F O R (. 1 0 7 8 2 9 ) A P P L Y I N G ( . 2 5 9 4 6 9 ) A (. 0 5 2 3 8 0 ) S I G N A L (. 2 5 3 5 4 9 ) T O (. 1 4 6 1 8 6 ) T H E (. 1 4 2 8 5 7 ) B A S E (. 2 5 4 9 0 6 ) O F ( . 3 0 9 5 9 0 ) SAID ( . 1 4 1 4 9 1 ) S E C O N D ( . 4 3 4 7 8 3 ) T R A N S I S T O R (. 0 7 8 6 7 7 ) A N D (. 1 2 8 8 6 5 ) SAID (. 0 1 8 1 8 5 ) O N E (. 0 8 4 8 5 4 ) O U T P U T (3) T E R M I N A L (. 1 6 3 2 4 0 ) A N D (. 2 8 5 8 3 2 ) T H E (. 1 1 2 2 0 3 ) O T H E R (. 0 4 3 5 7 9 ) S I D E (. 0 3 7 8 2 7 ) C O N N E C T E D (5) T O (. 1 9 3 1 2 8 ) A (. 0 8 3 8 3 1 ) P O I N T (.305263) O F (.220589) R E F E R E N C E (.499203) P O T E N T I A L

(.056882)

A N D (. 0 0 9 8 1 1 ) A N (. 2 5 1 3 4 4 ) A N O D E (3) C O N N E C T E D (9) T O (. 3 1 7 1 2 8 ) S A I D ( . 0 2 1 8 7 4 ) F I X E D ( . 2 5 0 6 2 0 ) D I R E C T (13) C U R R E N T (13) P O T E N T I A L ( . 4 6 3 2 7 9 ) *, ( . 0 1 0 0 0 4 ) E A C H (3) O F (16) SAID ( . 0 2 2 2 8 9 ) S T A G E S (. 0 2 6 4 9 0 ) T O (. 0 1 1 6 3 6 ) P R O V I D E (2) A (. 1 5 4 1 3 5 ) C O P H A S A L (1) V O L T A G E (1) * , (1) S E C O N D (1) I M P E D A N C E (4) M E A N S ( . 0 7 4 0 7 3 ) D I R E C T L Y (3) C O N N E C T I N G ( . 4 3 0 6 6 4 ) SAID ( . 0 3 0 0 1 9 ) A N O D E ( . 3 4 3 2 5 4 ) A N D (7) SAID

P 3 - order 5

187

APPENDIX III

A S E M I C O N D U C T I V E D I O D E *, A (1) S O U R C E (55) O F ( . 0 0 4 5 3 3 ) O F (.004533) SUBSTANTIALLY (.096217) RECTANGULAR P U L S E S ( . 0 8 6 4 8 4 ) H A V I N G (2) A N (2) A M P L I T U D E

(.401944)

(.252346)

S U F F I C I E N T (10) T O (. 2 9 0 8 2 2 ) D R I V E ( . 2 7 4 7 3 3 ) SAID (. 1 2 6 6 6 9 ) F I R S T ( . 0 7 9 5 8 1 ) C O R E ( . 4 3 6 1 1 0 ) T O W A R D ( . 2 2 2 4 4 5 ) P O S I T I V E (3) S A T U R A T I O N (. 2 5 9 8 3 1 ) A N D (3) SAID (4) S E C O N D (. 3 9 2 1 8 9 ) C O R E ( . 2 1 5 4 1 3 ) T O W A R D ( . 2 5 2 1 8 9 ) P O S I T I V E (3) S A T U R A T I O N ( . 2 5 3 7 1 6) A N D (3) SAID (4) S E C O N D ( . 3 9 0 3 3 5 ) C O R E (. 1 0 7 3 7 1 ) IS (. 2 5 0 7 9 8 ) S A T U R A T E D (6) IN ( . 3 3 4 5 1 8 ) SAID ( . 3 4 2 8 9 0 ) F I R S T (17) D I R E C T I O N (. 0 9 3 2 4 2 )

(. 0 4 2 3 7 2 ) T H E (2) T I M E (2) D U R A T I O N (2) O F (. 1 8 4 2 9 3 )

T H E " ( . 1 19962) N E G A T I V E ( . 3 4 1 4 2 0 ) O U T P U T ( , 4 0 6 6 ? 2 ) V O L T A G E (. 267 670) O F (2) SAID (. 0094 51) A L T E R N A T O R (3) A T (3) A (3) P R E D E T E R M I N E D ( . 3 8 9 4 8 0 ) S U B S T A N T I A L L Y (3) C O N S T A N T (. 3 0 8 6 2 0 ) V A L U E (3)

( . 1 9 1 9 5 2 ) A N D (3) M E A N S ( . 0 4 3 1 1 1)

INCLUDING (.241212) A (.027959) RESISTOR (.056619) FOR

(.200429)

A P P L Y I N G ( . 4 5 0 6 1 2 ) A ( . 0 8 6 4 0 3 ) C U T O F F ( . 2 0 7 6 6 2 ) T R I G G E R (1) S I G N A L (1) T O (1) S A I D (. 0 5 3 1 4 9 ) F I R S T (. 1 1 3 4 4 7 ) I N P U T (. 2 0 4 8 3 4 ) T E R M I N A L (. 1 4 9 9 9 9 ) T O ( . 2 3 2 8 9 6 ) SAID (. 1 5 6 6 2 6 ) S E C O N D ( . 0 2 1 5 1 5 ) D I F F E R E N T I A L (8) A M P L I F I E R (. 2 8 1 4 0 1 ) I N P U T (. 2 8 0 3 8 2 ) (. 2 8 0 3 7 7 ) A (. 0 8 7 5 7 3 ) F I R S T (. 0 7 1 4 7 7 ) C A P A C I T O R (. 0 7 3 2 4 8 ) *, ( . 2 9 1 8 7 3 ) A ( . 0 1 1 8 3 1 ) S I G N A L ( . 0 9 1 5 9 7 ) I N D U C T O R (2) F O R (2) P R O V I D I N G (2) A ( . 2 1 6 7 0 1 ) L O W ( . 4 4 8 5 0 6 ) I M P E D A N C E

(.226644)

P A T H ( . 1 2 0 7 2 5 ) B E T W E E N ( . 3 3 9 3 1 7 ) SAID ( . 0 4 3 9 2 4 ) I N P U T ( . 0 5 6 1 6 8 ) E L E C T R O D E (53)

P 4 - order 5

BIBLIOGRAPHY

Aborn, M., J. Rubenstein, and T.D. Sterling, 1959 "Sources of contextual constraint upon words in sentences", Journal of Experimental Psychology, 57, 171-180. Braithwaite, R.B., 1960 "Models in the empirical sciences", in Nagel, Suppes, and Tarski (1962), 224-231. Brenner, M.S., S. Feldstein, and J. JafFe, 1965 "The contribution of statistical uncertainty and test anxiety to speech disruption", Journal of Verbal Learning and Verbal Behavior, 4, 300-305. Chao, Y.R., 1960 "Models in linguistics and models in general", in Nagel, Suppes, and Tarski (1962), 558-566. Cherry, C., 1957 On Human Communication (New York: John Wiley & Sons, Inc.). Cherry, C. (ed.), 1961 Information Theory (Washington: Butterworths). Chomsky, N., 1956 "Three models for the description of language", I.R.E. Transactions on Information Theory, IT-2, 113-124. 1957 Syntactic Structures (The Hague: Mouton & Co.). 1958 "A transformational approach to syntax", in Hill (1962), 124-169. 1959a "On certain formal properties of grammars", Information and Control, 2, 137-167. 1959b Review of Greenberg (1957) Word, 15, 202-218. 1960 "Explanatory models in linguistics", in Nagel, Suppes, and Tarski (1962), 528-550. 1961a "On the notion 'Rule of Grammer'", in Jakobson (1961), 6-24. 1961b "Some methodological remarks on generative grammar", Word, 17, 219-239. Reprinted in H.B. Allen, Readings in Applied English Linguistics, 173-192 (New York: Appleton-Century-Crofts). 1962 "The logical basis of linguistic theory", in Lunt (ed.), Proceedings of the Ninth International Congress of Linguists (The Hague: Mouton & Co, 1964), 914-918. 1963 "Formal properties of grammars", in R.D. Luce, R. Bush, and

BIBLIOGRAPHY

189

E. Galanter (eds.), Handbook of Mathematical Psychology, II, 323-418 (New York: John Wiley & Sons, Inc). 1965 Aspects of the Theory of Syntax (Cambridge: M.I.T. Press). Chomsky, N., and G.A. Miller, 1958 "Finite state languages", Information and Control, 1, 91-112. 1965 "Introduction to the formal analysis of natural languages", in R.D. Luce, R. Bush, and E. Galanter (eds.), Handbook of Mathematical Psychology, II, 269-322 (New York: John Wiley & Sons, Inc.). Coleman, E.B., 1963 "Approximations to English: some comments on the method", American Journal of Psychology, 76, 239-247. Epstein, W., 1961 "The influence of syntactical structure on learning", American Journal of Psychology, 74, 80-85. 1962 "A further study of the influence of syntactical Structure on learning", American Journal of Psychology, 75, 121-126. Fillenbaum, S., L.V. Jones, and A. Rapoport, 1963 "The predictability of words and their grammatical classes as a function of rate of deletion from a speech transcript", Journal of Verbal Learning and Verbal Behavior, 2, 186-194. Fries, C.C., 1952 The Structure of English (New York: Harcourt, Brace and Company). Fromkin, V., 1968 "Speculations on performance models", Journals of Linguistics, 4, 47-68. Goldman-Eisler, F., 1958a "Speech production and the predictability of words in context", Quarterly Journal of Experimental Psychology, 10, 96-109. 1958b "The predictability of words in context and the length of pauses in speech", Language and Speech, 1, 226-231. Good, I.J., 1953 "The population frequencies of species and the estimation of population parameters", Biometrika, 40, 237-264. Good, I.J., and C.H. Toulmin, 1956 "The number of new species and the increase in population coverage when a sample is increased", Biometrika, 43, 45-63. Gough, P.B., 1965 "Grammatical transformations and speed of understanding", Journal of Verbal Learning and Verbal Behavior, 4, 107-111. Green, B.F., F.E.K. Smith, and L. Klem, 1959 "An empirical test of an additive random number generator", Journal of the Association for Computing Machinery, 6, 527-537. Greenberg, J.H., 1957 Essays in Linguistics (Chicago: University of Chicago Press). Harris, Z.S., 1965 "Transformational theory", Language, 41, 363-401.

190

BIBLIOGRAPHY

Hill, A.A., 1961 "Grammaticality", Word, 17, 1-10. Reprinted in H.B. Allen, Readings in Applied English Linguistics, 163-172 (New Y o r k : Appelton-Century-Crofts). Hill, A.A. (ed.), 1962 Third Texas Conference on Problems of Linguistic Analysis in English (Austin, Texas: University of Texas). Hockett, C.F., 1953 Review of Shannon and Weaver (1949), Language, 29, 69-93. 1954 "Two models of grammatical description", Word, 10, 210-234. 1955 A Manual of Phonology, Memoir 11 of the International Journal of American Linguistics (Baltimore, M d . : Indiana University Publications in Anthropology and Linguistics). 1958 "Idiom formation", in M. Halle, H . Lunt, and H. MacLean (eds.), For Roman Jacobson, 222-229 (The Hague: Mouton & Co.). 1961a "Linguistic elements and their relations", Language, 37, 29-53. 1961b " G r a m m a r for the hearer", in Jakobson (1961), 220-236. Jaffe, J., L. Casotta, and S. Feldstein., 1964 "Markovian model of time patterns of speech", Science, 144, 884-886. Jakobson, R. (ed.), 1961 Structure of Language and its Mathematical Aspects, Proceedings of the Twelfth Symposium in Applied Mathematics (Providence, R.I.: American Mathematical Society). Katz, J.J., and P. Postal, 1964 An Integrated Theory of Linguistic Descriptions (Cambridge, Mass. : M.I.T. Press). Kemeny, J.G., 1959 A Philosopher Looks at Science (Princeton, N.J.: Van Nostrand). Kuno, S., Harvard 1965 The Multiple Path Syntactic Analyzer for English (= Report No. NSF-9, 1 and 2) (Cambridge: Harvard University). Lane, H., and B. Schneider, 1963 "Some discriminative properties of syntactic structures", Journal of Verbal Learning and Verbal Behavior, 2, 457-461. Liberman, P., 1963 "Some effects of semantic and grammatical context on the production and perception of speech", Language and Speech, 6, 172-187. Longacre, R., 1964 Grammar Discovery Procedures (The Hague: Mouton & Co). Lounsbury, F.G., 1954 "Transitional probability, linguistic structure, and systems of habit-family hierarchies", in C.E. Osgood and T.A. Sebeok (eds.), Psycholinguistics (1965) (Bloomington, Ind.: Indiana University Press), 93-101. Maclay, H., and C.E. Osgood, 1959 "Hesitation phenomena in spontaneous English speech", Word, 15, 19-44.

BIBLIOGRAPHY

191

Maclay, H., and M.D. Sleator, 1960 "Responses to language: Judgements of grammaticalness", International Journal of American Linguistics, 26, 275-282. Madhu, S., and D.W. Lytle, 1965 "A Markov process to the resolution of non-grammatical ambiguity in mechanical translation", IFIP Congress 1965 Proceedings, 2 (in press). Mandler, G., and J.M. Mandler, 1964 "Serial position effects in sentences", Journal of Verbal Learning and Verbal Behavior, 3, 195-202. Margenau, H., 1960 "Is the mathematical explanation of physical data unique?", in Nagel, Suppes, and Tarski (1962), 348-355. Marks, L.E., 1967 "Judgements of grammaticalness of some English sentences and semi-sentences", American Journal of Psychology, 80, 196-204. Mehler, J. 1963 "Some effects of grammatical transformations on the recall of English sentences", Journal of Verbal Learning and Verbal Behavior, 2, 346-351. Michels, W.C., 1961 The International Dictionary of Physics and Electronics, 2nd ed. (Princeton, N.J.: Van Nostrand). Miller, G.A., 1950 "Language engineering", Journal of the Acoustical Society of America, 22, 720-725. 1962a "Decision units in the perception of speech", I.R.E. Transactions on Information Theory, 8, 81-83. 1962b "Some psychological studies of grammar", American Psychologist, 17, 748-762. Miller, G.A., and N. Chomsky, 1963 "Finitary models of language users", in R.D. Luce, R. Bush, and E. Galanter (eds.), Handbook of Mathematical Psychology, II, 419-492. Miller, G.A., and S. Isard, 1963 "Some perceptual consequences of linguistic rules", Journal of Verbal Learning and Verbal Behavior, 2, 217-228. 1964 "Free recall of self-embedded English sentences", Information and Control, 7, 292-303. Miller, G.A., and J.A. Selfridge, 1950 "Verbal context and the recall of meaningful material", American Journal of Psychology, 63, 178-185. Morton, J., 1964 "A model for continuous language behavior", Language and Speech, 7, 40-70. Nagel, E., 1961 The Structure of Science (New York: Harcourt, Brace & Co.).

192

BIBLIOGRAPHY

Nagel, E., P. Suppes, and A. Tarski, 1962 Logic, Methodology and the Philosophy of Science (Stanford, Calif.: Stanford University Press). Osgood, C.E., and T.A. Sebeok, 1954 Psycholinguistics (Reprinted, 1965, Bloomington, Ind.: Indiana University Press). Pollack, I., 1964 "Message probability and message reception", Journal of the Acoustical Society of America, 36, 937-945. Putnam, H., 1961 "Some issues in the theory of grammar", in Jakobson (1961), 25-42. Robinson, J., and S. Marks, 1965 Parse: A System for Automatic Syntactic Analysis of English Text (= RAND Memo RM-4654-PR). Salzinger, K., S. Portnoy, and R.S. Feldman, 1962 "The effect of order of approximation to the statistical structure of English on the emission of verbal responses", Journal of Experimental Psychology, 64, 52-57. Savin, H.B., 1963 "Word frequency effect and errors in the perception of speech", Journal of the Acoustical Society of America, 35, 200-206. Shannon, C.E., and W. Weaver, 1949 The Mathematical Theory of Communication (Urbana, 111.: University of Illinois Press). Siegel, S., 1956 Nonparametric Statistics for the Behavioral Sciences (New York: McGraw Hill). Somers, H.H., 1961 "The measurement of grammatical constraints", Language and Speech, 4, 150-156. Stolz, W„ 1965 "A probabilistic procedure for grouping words into phrases", Language and Speech, 8, 219-235. Stowe, A.N., and W.P. Harris, 1963 "Signal and context components of word recognition behavior", Journal of the Acoustical Society of America, 35, 639-644. Tannenbaum, P.H., F. Williams, and C.S. Hillier, 1965 "Word predictability in the environment of hesitations", Journal of Verbal Learning and Verbal Behavior, 4, 134-140. Taylor, W.L., 1953 '"Cloze procedure': a new tool for measuring readability", Journalism Quarterly, 30, 415-433. Treisman, A.M., 1965 "Verbal responses and contextual constraints in language", Journal of Verbal Learning and Verbal Behavior, 4, 118-128. Wade, W., 1957 Patents for Technical Personnel, 2nd ed. (Devon: Advance House).

BIBLIOGRAPHY

193

Wells, R., 1954 "Meaning and use", Word. 10, 235-250. 1961 "A measure of subjective information", in Jakobson (1961), 237-244. Yngve, V., 1962 "Computer programs for translations", Scientific American, 206, 68-76.

INDEX

acceptability 28, 30 adjective structure 64 adverb structure 64 boredom factor 91 Braithwaite, R.B. 16 Brenner, M.S. 49 capitalization 69 card punching 69 Cassota, L. 54 Chao, Y. R. 16 Cherry, C. 19 Chomsky, N. 17-19, 22-39, 44, 47-48, 51, 53, 57, 96, 108, 142 claims 62 ff., 103 Cloze procedure 42, 45, 48 code number 76, 78, 85-87 Coleman, E. B. 46, 47, 97 communication theory 39 competence 22-24, 28-30, 44, 48 complement 62, 64 complexity 46, 63, 98 ff., 141 "comprising" 68 computer time 87, 89 "consisting of" 68 constituent structure 63 ff. corpus 62 ff. data preparation 70-86 dependency length 103, 141 descriptive adequacy 22, 25-26, 29-30, 33-34, 39 determiner 65 disc file 75-76, 85

Epstein, W. 49 evaluation 90-94, 139 experiment 15, 19, 21, 40 ff., 57, 61 ff. experimental law 15-16, 140, 142 explanatory adequacy 22, 25, 29, 38 Feldman, R. S. 45 Feldstein, S. 49, 54 Fillenbaum, S. 48 finite state model (see also Markov Model) 17-22, 32, 40, 138-140 Fries, C. C. 43 Fromkin, V. 29 generated strings 97-111 generation models 139 Goldman-Eisler, F. 41-43 Good, I. J. I l l Gough, P. B. 52-53 grammaticalness 18, 22, 27-30, 34-36, 46, 57, 90-93, 96-98, 110-111, 138, 141 grammaticalness ratio 58,96,109,140 Green, B. F. 86 Harris, W. P. 47 Harris, Z. S. 20 hesitation 38, 40-43, 58 Hill, A. A. 38, 93, 140 Hillier, C. S. 41-43 Hockett, C. F. 16-17, 19, 37, 59, 108 hyphen 70

INDEX I B M 1401 70 I B M 7044 70, 76 informant 21 information theory 19, 40, 54 intelligibility 47, 48 intuition 26, 30 Isard, S. 50-51 Jafïe, J . 54 Jones, L . V. 48 Katz, J . J . 53 Kemeny, J . G. 16 Klem, L . 86 Kolmogorov-Smirnov test 108 Kuno, S. 18 Lane, H. 52 learning effect 91 lexical analysis 59 Liberman, P. 48 lookup time 77 Lounsbury, F . G . 41, 43 Lytle, D . W . 139 machine translation 58, 139 Maclay, H. 41-44, 92-93, 140 Madhu, S. 139 magnetic representation 69 main storage 77 Mandler, G . 52 Margenau, H. 26 Markov Model 17, 19, 30-31, 33, 36-40, 44-45, 48-50, 57-59, 107-108 Markov source 97, 99 Marks, L . E . 2 8 " m e a n s " 68 mechano-linguistics 20 Mehler, J . 52-53 memory 29, 36 memory limitations 24 Miller, G . A. 17, 19-20, 27, 29, 44-46, 50-52, 59, 96 multivariate normal distribution 110 Nagel, E . 15 nodes, number o f 99, 103, 141 normalization 76

195

noun modifiers 103 noun phrase 64-65 noun premodifiers 107 observational adequacy 22, 25, 29-30, 33 order o f approximation 31-33, 44-47, 58, 71, 86-89, 96-98, 103, 109 Osgood, C. E . 19, 41-44 overlap 91 packing 85 panel 90-93, 140 parameters 59 participle 65 patents 62-63, 66, 99, 139 path length 103 perceptual experiments 4 0 performance models 22, 24-26, 28, 30, 36-38, 44, 48, 57, 142 phrase structure model 22, 25, 63 Pollack, I. 48 Portnoy, S. 45 post modifier 6 5 , 1 0 5 Postal, P. 53 premodifiers 64, 68, 105 printing 89 probabilistic models 18, 20, 4 0 probability 17 programming time 89 psycholinguistics 20 punched cards 69, 70 punctuation 76, 107, 146 Putnam, H. 16 random numbers 86 Rappoport, A. 48 redundancy 76 relative frequency 68 retrieval system 70 review procedure 91 right branching 99 Robinson, J . 18 "said" 65, 90 Salzinger, K . 45-46 Sapir, E . 93 Savin, H . B . 47-48

196 Schneider, B. 52 search 85, 87 Sebeck, J. A. 19 self-embedding 34-36, 50-51, 57 Selfridge, J. A. 20, 44, 45, 46 semantics 92, 98, 146 sentence 17, 22, 27-28, 30 set theory 16 Shannon, C. E. 19, 42, 96 Siegel, S. 108 Sleator, M. D. 92-93, 140 Smith, F. E. K. 86 Somers, H. H . 54 sort 76, 77, 89 statistical validity 110 Stolz, W. 20 storage 85 Stowe, A. N. 47 string generation 71, 85-89 string length 91 structural description 30, 39 structural models 18, 20, 58 systematic bias 91-92

INDEX Tannebaum, P. H. 41-43 Taylor, W. L. 42, 45 theory 15 topology 141 Toulmin, C. H. I l l transformation 52-53 transformational model 22 transformational theory 40, 41, 55, 138 transformational grammars 63 transition probability 42-43, 50, 54, 59-61, 139 tree structure 99, 141 Treisman, A. M. 45-46 verb structure 64 Wade, W. 62, 68 Weaver, W. 19 Wells, R. 19, 35, 39 Williams, F. 41-43 word list 76 Yngve, V. 17