232 97 8MB
English Pages 293 [296] Year 1982
SLIPS OF THE TONGUE AND LANGUAGE PRODUCTION
SLIPS OF THE TONGUE AND LANGUAGE PRODUCTION
EDITED BY
ANNE CUTLER
MOUTON PUBLISHERS
BERLIN · NEW YORK · AMSTERDAM
ISSN 00243-3949 (Journal edition) ISBN 90 279 3120 8 (Paperback edition) Both editions © 1982, Walter de Gruyter/ Mouton Publishers, Amsterdam Printed in The Netherlands
Contents
PREFACE
5
ANNE CUTLER Guest editorial: The reliability of speech error data
7
MANFRED BIERWISCH Linguistics and language error
29
BRIAN BUTTERWORTH Speech errors: old data in search of new theories
73
ANDREW CROMPTON Syllables and segments in speech production
109
DAVID FAY Substitutions and splices: a study of sentence blends
163
JEAN AITCHISON and MIRON STRAF Lexical storage and retrieval: a developing skill?
197
PAUL MEARA and ANDREW W. ELLIS The psychological reality of deep and surface phonological representations: evidence from speech errors in Welsh
243
ALAN GARNHAM, RICHARD C. SHILLCOCK, GORDON D. A. BROWN, ANDREW I. D. MILL and ANNE CUTLER Slips of the tongue in the London-Lund corpus of spontaneous conversation
251
Review article CAROL A. FOWLER Errors in linguistic
performance:
and hand, edited by V. Fromkin
slips of the tongue,
ear, pen
265
NAME INDEX
287
SUBJECT INDEX
291
Preface
This collection is the first in an occasional series of Special Issues of Linguistics. Each Special Issue will be devoted to a single theme, and edited by a Guest Editor who will be able to invite contributions, and also to select from papers submitted in response to an announcement in the journal. Special Issues will be published concurrently as an issue in the regular series (in this case 7/8 of Volume 19), and also as a book available to non-subscribers through bookshops in the usual way. Speech errors seemed to us a particularly appropriate theme in that it reflects the interdisciplinary character of the journal. Notable contributions to the study of speech errors have been made by philologists, phoneticians and phonologists as well as by psychologists of various tendencies. So we are particularly fortunate in having as our first Guest Editor Dr. Anne Cutler, who is herself a scholar of German, a linguist and a psychologist, and who has recently been described as 'the pure example of a psycholinguist'. As many readers will know, Dr. Cutler has made a distinguished contribution to the study of speech errors, and with Prof. David Fay introduced a new edition of Meringer and Mayer's Versprechen und Verlesen. This collection contains besides important new papers, a seminal article by Prof. Manfred Bierwisch, first published as 'Fehler-Linguistik' in Linguistic Inquiry 1, 1970, 397-414, here translated into English for the first time, with a postscript by Prof. Bierswisch. It also contains the first corpus of errors based entirely on tape-recorded material, collected and transcribed under the direction of Prof. Jan Svartvik, and analysed by Dr. Cutler and her colleagues at the University of Sussex. BOARD OF EDITORS
Guest editorial The reliability of speech error data*
ANNE CUTLER
1.
Introduction
Collecting speech errors is enjoyable. For instance, it can give the collector the feeling of doing some useful work while on holiday, at a dinner party, or watching a television interview. And speech error collections are valuable: in the last decade research based on slips of the tongue has provided one of the major components of a long-overdue upsurge in interest in speech production processes, which have otherwise been accorded much less research attention than the processes of comprehension. The problems associated with speech error research are well known to all in the field. Listening for errors tends to distract the listener's attention from the content of speech, analogously to the way that monitoring for sentence-internal targets reduces the amount of content understood (Johnson, 1980), and to the common experience that we take in little of a text's content when we are proof-reading it. Conversely, of course, attention to content reduces both proof-reading efficiency (Smith and Groat, 1979) as well as the speed with which sentence-internal targets are detected (Green, 1977), and without doubt it also reduces the percentage of slips of the tongue detected and recorded by the error collector. Thus no collector claims to have recorded all slips occurring in a given period of time or a given number of utterances. If selective attention were the only problem, it could eventually be overcome by a combination of high-fidelity recording and painstaking, multiply-checked transcription. But there is the further problem that some kinds of errors are simply harder to hear than others. Every existing collection of speech errors confounds occurrence of particular types of error with detectability. In fact, it is possible that the detectability problem is so serious that even the most careful transcription of speech will be likely to miss some slips. Another source of bias in speech error collections is the distributional
7
562
A. Cutler
characteristics of language. Thus it is not of theoretical interest that errors are reported more often in words of one word class than in words of another if the first class also occurs more often than the second in normal speech. Error collectors are now taking linguistic distributional patterns into account (e.g. Dell and Reich, 1981). In this paper, however, I shall concentrate on the detectability problem. In the following sections I outline the various types of argument which can be made on the basis of speech error evidence, and then summarise evidence which pertains to the question of relative detectability of errors. It will be seen that the detectability problem is by no means an insuperable bar to speech error research; it is possible to identify the confounding factors and control for them when error data are interpreted. Moreover, certain types of speech error argument are completely safe to make, as they are not subject to detectability confounding at all.
2.
Types of speech error argument
Speech error data have been used in support of linguistic and psycholinguistic arguments in three basic ways. The first distinct type of argument simply interprets the characteristics of errors which have been reported — 'some errors are like this, therefore...'. The second type is concerned with the relative frequency of occurrence of particular types of error — 'more errors are like this than like that, therefore...'. The third type is based on kinds of error which don't occur — 'no errors are like this, therefore...'. This three-way classification does not actually reflect a difference in kind between the three categories. For instance, a claim that particular errors don't occur amounts to a claim that such errors have a zero frequency of occurrence — that is to say, the third type of argument is a special case of the second type. Furthermore, all interpretations of speech error data in terms of rule-governed processes constitute an argument against the ultimate null hypothesis that speech errors occur randomly; 'Some Errors' arguments are thus also a kind o f ' M o r e Errors' argument inasmuch as they claim that random rubbish occurs with insignificant frequency. Thus the categories are by no means logically exclusive. But the three-way division does correspond to a difference in emphasis, as will be seen from the examples of each type of argument given below.
8
The reliability of speech error data 2.1
563
'Some Errors' arguments
A typical 'Some Errors' argument, i.e. one concerned with the characterisation of occurring errors, is the argument that movement errors exhibit morphological accommodation to their environment (e.g. Garrett, 1976). In (1), for example, two words have exchanged places so that each bears the inflection originally intended for the other, yet the particular form each inflection takes is appropriate for the erroneously inflected word rather than for the originally intended word. (1)
We had to use to wear hats. Target: 'We used to have to ...'.
Such errors are used to justify suggestions about relative order of processes in sentence production. Other 'Some Errors' arguments include the description of blends: in (2), the speaker reported confusing ton with load to produce the erroneous toad: (2)
We all jumped on him like a toad of bricks.
From this one can argue that at certain points in the production of an utterance more than one plan can be simultaneously entertained — e.g. in selection of words (Garrett, 1980; Butterworth, this volume), or of syntactic structure (Fay, this volume). Similarly, the characteristics of word substitution errors can be invoked to support hypotheses about the word selection process; Fay and Cutler (1977) collected all the word substitution errors they could find, eliminated those that could be explained as resulting from error processes already described in the literature (e.g. semantic errors such as substitution of opposites or members of the same word field; single phoneme movement or substitution errors; blends) and found that the corpus remaining seemed to form a homogeneous class showing considerable similarity of form between error and target. The characteristics of this class of error allowed conclusions to be drawn about the organisation of the mental lexicon used in speech production. 'Environmental contamination' occurs when an unintended word finds its way into the utterance because someone else has just spoken it, or because the speaker happens to be looking at its referent, as in (3), — (3)
Where would we be without your ribbons? Target:'.... without your rulers'. Speaker was looking into a drawer containing typewriter ribbons.
Such errors can be cited as evidence that the processing systems of speech
9
564
A. Cutler
production are not entirely independent of other processing systems having demands on our attention (e.g. Garrett, 1980), and this again constitutes a 'Some Errors' argument. Finally, all arguments about 'psychological reality' are of the 'Some Errors' type. Thus the fact that single phonemes participate in errors as separate units has been seen as evidence that utterances are at some level of the production process represented as strings of phonemes; similar arguments have been made about sound features, morphemes, words and syntactic constitutents (e.g. Fromkin, 1971).
2.2
'More Errors' arguments
Arguments based on the relative frequency of errors with particular characterises are quite common in the error literature. For instance, it is a truism that error collections contain more anticipations — particularly of single sounds — than perseverations and transpositions. At least one error researcher has suggested (Nooteboom, 1969) that this indicates that 'the speaker's attention is normally directed towards the future', although the same writer also points out (Nooteboom, 1980; see also Meringer and Mayer, 1895 and many other authors for the same observation) that if a transposition is detected and corrected by the speaker when only the first erroneous segment has been uttered, it will be indistinguishable from an anticipation. This phenomenon may well have artificially inflated the frequency of anticipations in speech error collections. Similarly, it has often been observed that stressed words are disproportionately represented in speech errors (Boomer and Laver, 1968; MacKay, 1969; Nooteboom, 1969). On the basis of this it has been claimed (Boomer and Laver, 1968; MacKay, 1969) that stressed words are not only more prominent in the acoustic form of the utterance, but also in its pre-output mental representation. Other arguments involving prosody — both errors of prosody and the prosody of errors — have also been of the 'More Errors' type. For example, errors of word stress significantly more often result in stress falling on a syllable which bears stress in a morphological relative of the intended word, as in (4): (4)
For linguists, for linguists to judge...
in which the interference is presumably from linguistic·, this has been used as the basis for an argument that words derived from a single base are not stored entirely independently in the mental lexicon (Cutler, 1980a). The prosodic characteristics of syllable omission errors — namely that they result significantly more often than not in errors which are more rhythmic
10
The reliability of speech error data
565
than the target would have been — has also been cited as evidence of the importance of rhythmicity in the generation of utterances in English (Cutler 1980b). There have also been a number o f ' M o r e Errors' arguments interpreting single phoneme error data (e.g. Shattuck-Hufnagel and Klatt, 1979, 1980; van den Broecke and Goldstein, 1980); from the frequency with which particular phonemes substitute for one another, arguments can be constructed about the representation of utterances in pre-output store, or about the psychological justification for particular phonological descriptions.
2.3
'No Errors' arguments
The most well-known ' N o Errors' argument in the speech error literature is embodied in Rulon Wells' First Law of speech errors: Ά slip of the tongue is practically always a phonetically possible noise' (Wells, 1951). Boomer and Laver (1968) and Fromkin (1971), among others, have also noted that errors which produce sequences disallowed by the phonological rules of the language in question are almost completely absent from their collections. (Not ENTIRELY absent: even Wells noted that a few exceptions had been recorded. An example noted by the present writer is recorded by Crompton (this volume).) Typically this has been interpreted as evidence that speech production is internally monitored and 'impossible' output filtered out before it is ever actually produced — although Hockett (1967) does suggest that hearers may treat phonologically deviant utterances as non-deviant (i.e. refuse to believe the unprecedented evidence of their ears). 'No Errors' arguments can sometimes be born of 'More Errors' arguments. Thus the observation that lexical stress errors arise from confusion between morphological relatives with different stressed syllables has led to the prediction that particular errors will not occur: that administrätive is possible, and administration, but not administrative, for example; or that a stress error on window, which has no morphological relatives, will not occur (Cutler and Isard, 1980). Similarly, Garrett's observations that open class (lexical) words frequently exchange places in the utterance, often 'stranding' their inflections behind them, whereas in his data the inflections themselves do not exchange places leaving the lexical items in the intended position, has led him to postulate a model of speech production in which lexical items and bound morphemes have fundamentally different status and in which only lexical items CAN swap places; by implication, this amounts to a prediction (Garrett, 1980) that
11
566
A. Cutler
exchanges between inflections never W I L L occur. That is, Garrett's model accounts for errors like (5), but predicts that (6) is impossible: (5) (6)
3.
Take the freezes out of the steaker. Target: 'Take the steaks out of the freezer'. From Fromkin (1973). Take the steaker out of the freezes.
Potential confounding factors
This section will review factors which could influence the detectability of particular kinds of speech errors and hence the content of error collections. The four headings under which the evidence is grouped represent relatively separate lines of research rather than truly independent sources of confounding factors.
3.1
Slips of the ear
Hearing errors are attested much less often than speech errors because it is necessary for the hearer to admit to having made the slip; but they are nevertheless not uncommon in everyday life. Examples ( 7 ) - ( l l ) are typical: (7) (8) (9) (10) (11)
On the eve of the motor show she'll officially open tomorrow.... Perceived: 'On the eve of the motor show Sheila Fishley open...'. Because they can answer inferential questions Perceived: 'Because they can answer in French '. Do you know about reflexes? Perceived: ' D o you know about Reith lectures?' It's about time Robert May was here. Perceived: 'It's about time to drop my brassiere'. If you think you have any clips of the type shown... Perceived: 'If you think you have an eclipse...'.
G a m e s and Bond (1975; 1980), who analysed hundreds of hearing errors collected in the course of ordinary conversations, reported that more often than not (a) the stress pattern of the utterance is correctly perceived; (b) the vowel in a stressed syllable is correctly perceived; and (c) the error does not cross a phrase boundary. The misperc'eived segments are more often consonants than vowels, and are usually unstressed syllables, particularly in the middle of a word (Browman, 1980); changes in the rhythmic pattern usually involve only the mislocation of a single
12
The reliability of speech error data
567
unstressed syllable. Thus in (10) above the primary stress in the utterance fell on the final word; about and Robert were also stressed; time and May were unreduced; i f s and was were reduced. In the misperceived sequence the stressed vowels in Robert and here have been preserved (US pronunciation of brassiere applies [brezi:r]), and the sequence of stressed and unreduced syllables has likewise been preserved. Consonants, however, have been misconstrued, omitted or added in the erroneous reconstruction of the utterance, and although the number of syllables has been preserved, one reduced syllable has migrated. Not all hearing errors are as complicated as (10); (9), for instance, consists only in the misperception of two fricatives ([f] and [s], heard as [Θ] and [c] respectively), in (8) a reduced vowel has been overlooked, and in (11) the unstressed but unreduced vowel [ε] has been misperceived as reduced, precipitating a misplacement of the word boundary. Sometimes the error consists entirely in misplacement of the word boundary, as in (7). Purely syntactic misperceptions, in which the error consists solely in assigning the wrong syntactic structure to the utterance (e.g. (12)—(14) below) also occur, although these are rarely reported; errors in which misparsings are precipitated when a word is mistaken for its homonym, as in (15) and (16), are reported somewhat more often. (12)
(13)
(14)
(15)
(16)
You never actually see a forklift truck, let alone person. Perceiver attempted to access a compound noun 'forklift person', as if a second occurrence of 'forklift' has been deleted. This result was recently replicated by someone at the University of Minnesota in children. Perceiver assigned NP status to 'the University of Minnesota in children' (cf. 'the University of California in Berkeley'). Mr Milne came to Rothsay to impress upon this pretty leftwing gathering... Perceiver understood 'pretty' as adjective rather than adverb. Stretching would initiate a change. Perceiver understood 'stretching' as verb rather than noun, and 'would' — which was contrastively stressed —as 'wood'. One thing that Mark's formulation did... Perceiver parsed 'One thing that marks [Formulation Did] NP '.
It is repeatedly stressed by those who describe hearing errors that this evidence shows speech perception to be an active interpretative process, rather than simply passive reception of the incoming signal. Listeners strive to make the best sense they can of the speaker's messsage, reconstructing sounds, words and syntactic structure from the incomplete information they have received. Such reconstruction can also be de-
13
568
A. Cutler
monstrated experimentally. Warren (1970), for example, replaced single sounds in an utterance by a brief burst of white noise, and found that his listeners reported hearing a cough-like sound occurring SIMULTANEOUSLY WITH the speech, rather than INSTEAD OF a portion of it. Listeners are obviously very efficient at constructing a meaningful message from heard speech, even in defiance of the acoustic information. What is the relevance of this evidence for speech error collectors? Firstly, it must be noted that there is clear evidence that speech errors can precipitate hearing errors. Some of the misparsing examples above, for instance, seem to have been prompted by slightly unusual or ambiguous prosody, or marginally deviant syntax, chosen by the speaker — cont r a s t ve stress on the auxiliary in (15), deaccenting of forklift in (12) together with omission of the indefinite article before person, failure to provide intonational marking of the phrase boundary on Minnesota in (13), equal stress on pretty and leftwing in (14). In the following examples, however, an actual mispronunciation has caused an unintended word to be perceived: (17)
(18)
(19)
... and for nurses' memory for a feel — film! The error consisted in uttering the long vowel [i] instead of its short counterpart; the speaker stopped and corrected before the end of the word. Of the dozen or so people who heard the error, one heard the speaker to say 'field'. It's the same right-wrap apperation... Target was 'operation', error a sound perseveration from 'wrap', and speaker stopped and corrected just before the end of the word, but not before one of his dozen or so hearers had retrieved 'apparatus' and another had retrieved 'apparition'. Such a representation is entirely [otijos]. The speaker produced a deviant, but probably intended, pronunciation of the word 'otiose'. Of a hundred or so listeners, one reported hearing 'odious'.
In each of these cases, the perceived word was inappropriate to the context, and the hearing error was therefore noticed and reported. Moreover, two of these three errors involved vowels, and all of them concerned either the first or second phoneme of a word; as we saw above, vowels, and the beginnings of words, are likely to be correctly perceived. Suppose, however, that the hearer had reconstructed a word appropriate to the context, possibly even the word that the speaker had originally intended. Unless the speaker spontaneously corrected his error, there would be no way for the hearer to know an error had been committed. It is impossible for error collectors to know how often this happens, but it is
14
The reliability of speec h error data
569
not inconceivable that it happens with sufficient frequency to bias the content of speech error — and hearing error — collections. 'More Errors' arguments, where relative frequency of error types is of crucial importance, should therefore be constructed with extreme care. In particular, they should be avoided if there is specific reason to believe that the characteristics of hearing errors could have confounded the data. The claim, mentioned above, that speech errors more often involve stressed than unstressed syllables is one case in point. This claim looks particularly impressive when it is compared with the relative frequency of occurrence of stressed and unstressed syllables in speech — there are many more unstressed syllables than stressed. Yet hearing errors, as we have seen, are very much more likely to occur on unstressed syllables. There is good reason to believe that even with relative frequency taken into account speech errors on unstressed syllables may be under-represented in the available data. It would therefore seem highly unwise for any error collector to attach a great deal of theoretical significance to the relative frequency with which stressed and unstressed syllables participate in slips of the tongue, without taking statistical account of the differences in relative detectability of sounds in stressed and unstressed syllables as revealed in hearing error studies.
3.2
Shadowing and mispronunciation detection
Several recent experimental studies have presented listeners with speech already containing errors, and have asked them to repeat this speech back as fast as possible (shadow the text), or else to make a response as soon as they hear an error. Typically, the researchers have found that mispronunciations of single sounds are very often missed; this is particularly true if the mispronunciation differs from the intended sound on only a single feature (e.g. / k / for /t/), and if the mispronunciation is near the end rather than the beginning of the word (Cole, 1973; Marslen-Wilson and Welsh, 1978). Moreover, the more contextually predictable the distorted word, the more quickly are distortions detected (Cole and Jakimik, 1978). (Note that hearing errors are reported to be common on proper nouns (CelceMurcia, 1980); these are often unfamiliar or unpredictable in context.) Whereas in the mispronunciation detection tasks, the effect of a distortion being overlooked is that it fails to elicit a response f r o m the subject, in the shadowing experiments undetected distortions are restored in the shadowers' output to the form appropriate to the context (Marslen-Wilson, 1975; Marslen-Wilson and Welsh, 1978). Lackner (1980) has looked at higher-level distortions — syntactic errors such as wrong tense, number, or
15
570
A. Cutler
word-class markings, and semantic errors produced by substitutions of entire words — and has found that at rapid presentation rates these errors, like the phonemic errors, are very often overlooked and corrected. The relevance of these data to the study of speech errors was brought out most clearly in a study by Cohen (1980) in which the errors inserted in the text presented to listeners for shadowing were constructed in such a way that they mimicked actually reported speech errors. Cohen compared the restoration rates for anticipations versus perseverations, for instance, and found that perseverations were significantly more often overlooked and restored than were anticipations. Similarly, consonant errors were restored more often than vowel errors, and errors in unstressed syllables were restored more often than errors in stressed syllables. These latter two results are, of course, directly in line with the hearing error evidence cited in section 3.1. Thus these experimental findings, like the hearing error data, suggest factors which might affect the relative detectability of particular types of speech error. They indicate, for instance, that single-phoneme errors are more likely to be overlooked if the error segment differs f r o m the intended segment by only one feature. Lackner's work suggests that at very fast speech rates even gross syntactic and semantic errors might be overlooked. (This suggests that speech rate might have more of an effect on error DETECTION than it has on error COMMISSION; attempts to demonstrate that error rates rise with rate of speech, going back to Meringer (1908: 122), have all, to my knowledge, met with failure.) Finally, Cohen's finding that anticipations are detected more often than perseverations indicates that the higher frequency of anticipations in error collections, besides being confounded with the possibility of incomplete exchanges, could also be an artefact of the relatively greater detectability of anticipations. Again, it would seem imprudent to rest any major theoretical claims on this pattern of occurrence.
3.3
Perceptual confusions
There is a considerable literature on perceptual confusions: the likelihood with which sounds are confused with one another. Typically, such studies involve the presentation of isolated syllables, with or without noise masking (Peterson and Barney, 1952; Miller and Nicely, 1955; Wang and Bilger, 1973); the listeners simply report what they have heard. F o r vowels, one finding relevant to error research is that identification is affected by dialect (Peterson and Barney, 1952); analogously, misunderstandings involving vowels are more common when speaker and
16
The reliability of speech error data
571
listener use different dialects (Celce-Murcia, 1980). In general, Peterson and Barney found high front vowels to be most accurately identified, low back vowels least accurately, with confusions tending to be between adjacent positions in the vowel space. For consonants, a consistent finding is that some features of sounds are less likely to be mistaken than others. Whether or not a consonant is nasal is highly likely to be perceived correctly, and similarly, whether or not it is voiced. The place of articulation is more likely to be mistaken, as is whether or not the consonant is a fricative. Thus jbj is more likely to be perceived as /d/, /g/ or /v/ (which differ f r o m it on place of articulation or frication only) than as / m / or /p/ (which involve a change of nasality and voice respectively). These findings are relevant to any attempt to interpret the relative frequency of confusions between sounds in speech errors (e.g. ShattuckHufnagel and Klatt, 1979, 1980; van den Broecke and Goldstein, 1980). The mispronunciation detection evidence indicates that sound errors involving a change in only one feature are more likely to be overlooked; the perceptual confusion evidence adds to this the suggestion that such changes are particularly likely to be overlooked if the altered feature is one that is relatively easily confused, e.g. place of articulation. Moreover, there is another dimension of this problem which might affect the detectability of certain sound errors, and that is the size of the response set. For instance, given that nasality value is highly likely to be perceived correctly, there is a greater likelihood of the hearer mistakenly hitting on the intended target for a mispronounced nasal consonant than for a mispronounced non-nasal consonant simply because there are many more non-nasal than nasal consonants (in English and in other languages). That is to say, there seems to be a greater likelihood for an /m/ mispronounced as /n/ to be misperceived as / m / than for a /b/ mispronounced as /d/ to be perceived as jbj. This would show up in speech error collections simply as a greater likelihood for errors to occur in non-nasal consonants (once frequency of occurrence had been controlled for). Perceptual confusion data also show evidence of response bias, in that some sounds are more likely to be reported than others. Goldstein (1980) has demonstrated that response bias for consonants correlates with lexical frequency (i.e. the number of words containing the sound in question, as opposed to absolute frequency of occurrence) and with phonological naturalness (as measured by the probability with which a particular sound occurs across languages). Goldstein points out that the same response bias does not appear to be at work in speech errors; although the perceptual confusion experiments show an asymmetry of report as a result of bias (e.g. in W a n g and Bilger's data /b/ is more likely to be reported as / p / than
17
572
A. Cutler
vice versa), no such asymmetries show up in the speech error data — for any pair of sounds, one is as likely to be substituted for the other as vice versa (Shattuck-Hufnagel and Klatt, 1980). The reality may of course be horrendously complex, with the reported symmetry being in fact a f u n c t i o n of a b i a s - d e t e r m i n e d a s y m m e t r y IN THE OPPOSITE DIRECTION FROM THE A S Y M M E T R Y D E M O N S T R A T E D IN PERCEPTION (e.g. in fact /b/ occurs as an error for /p/ more often than vice versa, but because there is a response bias towards reporting /p/, some substitutions of /b/ for /p/ are misperceived as /p/, bringing the total of j b j for /p / substitutions down to the /p/ for jbj substitution level); however there are at present no independent reasons for believing this to be so. Indeed there are even some indications that the phoneme error data may be relatively uncontaminated by hearing error confounding, since the most frequently reported sound substitutions are those between sounds which are most like one another — exactly the substitutions which the evidence above suggests are most difficult to detect. Moreover, 'manner-of articulation and voicing features are significantly more likely to be preserved than a place-of-articulation feature, and the greatest number of single-feature errors involve a change in place of articulation' (ShattuckHufnagel and Klatt, 1979). Place-of-articulation errors are of course the ones which the perceptual confusion findings (as well as the hearing error and mispronunciation detection findings) suggest should be most often overlooked. Nevertheless, studies of phoneme errors should always take account of the possible effect of perceptual confusions. Finally, it should be noted that the effects of language and dialect-based expectations and of phonological naturalness on perceptual confusions must cast some doubt on the generality of the ('No Errors') argument that speech errors virtually never violate the phonotactic constraints of the language in which they are perpetrated. Phonologically deviant errors would on the present evidence be likely to be perceived as segments more probable in the language, so that it is quite possible that such errors occur at least a little more often than they have been reported; certainly it seems that the phonologically sophisticated (e.g. Ladefoged, reported in Fromkin, 1973: 25) are somewhat more likely to detect phonological violations. Butterworth and Whittaker (1980) report that impermissible consonant clusters can quite easily be elicited in a tongue twister task. The strict adherence to phonological constraints would appear therefore to be another argument on which no grandiose theoretical edifice should be erected, unless it is firmly underpinned by comparisons with measures of the relative detectability of phonologically permissible and impermissible slips.
18
The reliability of speech error data 3.4
573
Relative salience of beginnings and ends of words
There is abundant evidence that the initial portions of words are of crucial importance to word identification. For instance, when subjects are presented with letter strings which can be made into existing words merely by exchanging two adjacent letters, and are required to identify the distorted words, identification is hardest if the letters which have been exchanged are the first two (and easiest if the letters occur in the middle of the word; final letters are of intermediate difficulty) (Bruner and O ' D o w d , 1958). Similarly, words can be guessed most speedily and more reliably from their initial fragments (Broerse and Zwaan, 1966; Nooteboom, 1981). If word fragments are presented as cues in a recall task, the initial letters comprise the most effective fragment, the final letters the second most effective, and medial letters the least effective (Horowitz et al., 1968); Horowitz et al. (1969) demonstrated that this result is independent of word frequency. The initial positions of a compound word are more important to its recognition than the other portions (Taft and Forster, 1976). People who have a word on the 'tip of their tongue' very often have intuitions about the word's beginning or end, and these intuitions are right more often than guesses about the middle of the word (Brown and McNeill, 1966; Browman, 1978). Brown and McNeill hypothesise that memory storage of words assigns greater weight to the two ends of the words than to the middle, and probably particular weight to the initial portions. Marslen-Wilson (1978; Marslen-Wilson and Welsh, 1978) has constructed a model of word recognition in which the initial portions of words can be the O N L Y necessary cues for recognition. Marslen-Wilson calls the point at which a word becomes unique from other words of the language — scanning the word from left to right — its R E C O G N I T I O N POINT, and suggests that in auditory speech perception a word can be recognised at the latest at its recognition point. This model fails to account for why word-final segments should be more salient than word-medial segments, so that it probably over-simplifies the actual word recognition process; however, it can be shown that lexical decision latencies are affected by how far into the word the recognition point occurs (Marslen-Wilson, 1978) — compare dwindle, with recognition point on the third segment, with intestate, which only parts company with intestine on the final segment. Interestingly, the recognition point can be shown to be of relevance in the construction of neologisms (which of course usually consist in the addition of novel endings to stems); although speakers generally prefer to preserve the base word intact in a neologism (Cutler, 1980c), an exception to this rule can be made as long as the segmental
19
574
A. Cutler
values and relative syllable salience of the base word are maintained up to the base word's recognition point (Cutler, 1981). In speech errors, there is further evidence that the beginnings of words are particularly important. Form-related word substitution errors resemble their intended targets very strongly in the initial segments (Fay and Cutler, 1977), at least in adult errors (Aitchison and Straf, this volume); although there are similarities at other points of the word (Hurford, 1981), these similarities are significantly weaker than those in the initial portions (Cutler and Fay, 1982). These findings have also been interpreted as evidence that words are stored in the mental lexicon in left-to-right order, and by implication that word recognition proceeds left to right. The clear implication for speech error collectors is that even a small distortion of initial segments is quite likely to be noticed and reported, whereas changes in later parts of the word — especially in the middle — are much more likely to go undetected. (Recall that hearing errors are also more common in the middles of words.) In fact it has often been noted that error collections contain many more examples of sound errors in initial position than in final position (Cohen, 1966; Garrett, 1980; Goldstein, 1980; van den Broecke and Goldstein, 1980). Once again, this finding may be an artefact of a hearing error pattern determined by differing psychological salience of parts of words, and once again, therefore, it must be accounted a finding on which it would be hazardous to base important theoretical claims without taking the potentially confounding factor into statistical account.
4.
Levels of explanation in speech error analysis
A common confusion in the speech error literature arises from a failure to distinguish between the C A U S E of an error's occurrence and the M E C H A N I S M by which it occurs. The two are logically distinct. For example, although the mechanism by which errors of lexical stress arise is as a result of confusion between morphologically related words, a lexical stress error may be more likely to occur if its occurrence will make the utterance easier to produce than it would otherwise have been, e.g. more rhythmical (Cutler, 1980b). An early example of confusion between cause and mechanism is provided by the dispute in the early years of this century between Sigmund Freud and Rudolf Meringer. From Meringer's collection of speech errors, Versprechen und Verlesen (1895), Freud had borrowed a number of examples which he used as illustrations of his arguments in The Psychopathology of Everyday Life, first published in 1901. Moreover, he
20
The reliability of speech error data
575
suggested some tentative explanations which were characteristically 'Freudian' — for instance (20), which Meringer categorised as a syllable perseveration, was explained by Freud in terms of underlying ill-feeling of the speaker towards his boss: (20)
Ich fordere Sie auf, auf das Wohl unseres Chefs aufzustossen. (Target: anzustossen. Ί call on you to belch to the health of our chief instead of ' . . . t o drink a toast to the health of our chief.)
Meringer objected vigorously to this use of his examples, and criticised Freud's explanations on the grounds that the handful of errors which Freud had borrowed from Versprechen und Verlesen, in fact all the errors described in Psychopathology, obeyed the same rules as the thousands of other errors in Meringer's collection and could therefore be most parsimoniously described in terms of the categorisation (anticipation, perseveration, substitution, exchange, blend etc.) which Meringer had set up (see Meringer, 1923 for these arguments). Freud, on the other hand, felt that Meringer's explanations were simply vastly less interesting than his own. Both of them made the mistake of assuming that their respective explanations were on the same level. In fact, there is no logical reason why the occurrence of an error via one or another mechanism (anticipation, perseveration, etc.), or alternatively the failure of an error to be detected and corrected by internal monitoring systems prior to output, should not be rendered more likely by the fact that the error form is associated with secret desires or thoughts. This is by no means to say that Freud's explanations of speech errors are necessarily correct. Meringer (1923) also made a number of cogent commonsense criticisms of Freud's theories, and he was neither the first nor the last to d o so. The point to be made here is simply that Freud's examples cannot be dismissed by pointing out that they conform to otherwise postulated speech error mechanisms, because Freud's explanations are at a different level from those involved in the postulation of mechanism. Others have attempted to explain away Freud's examples by classifying them in recognised error categories (Ellis, 1980) or as other known linguistic phenomena (Timpanaro, 1976; see the discussion of this work by Butterworth, this volume). There is, of course, no pressing need for error researchers to explain away Freud's data, since the logical independence of cause and mechanism explanations also means that Freud's speculations about cause have no relevance whatsoever to hypotheses about mechanism; they can simply be ignored, as indeed they largely have been. The independence of cause and mechanism is tacitly accepted whenever an error researcher points out that error rates increase when speakers are
21
576
A. Cutler
tired or intoxicated; it is never suggested that fatigue and drunkenness are alternative mechanisms by which errors can occur, merely that such states can precipitate, or cause, the more frequent operation of the existing mechanisms. It is often noticed by error collectors that once one error has been made, other errors seem to follow; while this effect may be solely due to heightened sensitivity in the hearers, a causal explanation could also be constructed by postulating a relaxation of the pre-output monitoring devices, thus letting more errors through. Similarly, whereas form-related word substitution errors are hypothesised to arise by a totally different mechanism from semantically related word substitution errors (Fay and Cutler, 1977), there are a large number of cases in which form-related substitutions seem to resemble their intended targets in meaning as well (see Aitchison and Straf, this volume). A causal explanation for this phenomenon might be that routine semantic activation of words associated with the intended target occasionally results in a phonologically near neighbour of the target being activated, and that if a neighbour is activated, it is more likely than unactivated neighbours to be chosen by mistake; or alternatively, that such errors might arise independently as semantic and form-related substitutions, and that any error which has two sources is more likely to get through preoutput filters than an error with only one source. This kind of causal explanation — that an error is more likely if it has more than one source — is in many ways not comparable to Freud's claim that an error is probable if it expresses unconscious mental states (for instance it is testable, which Freud's claims are not). But it is like Freud's suggestions in that it is logically independent of the hypothesised mechanisms involve in error phenomena. Causal explanations can, moreover, differ across languages. For instance, Cutler (1980b) suggested a causal explanation for syllable omission errors which invoked a tendency towards underlying rhythmicity in English utterances. Such an explanation would obviously not hold for languages — such as French — in which there is no tendency towards stress-timed rhythm; nonetheless, the mechanism by which syllable omission errors arise would seem to be the same in French as in English, since errors in both languages conform strictly to constraints of syllable structure (see C r o m p t o n , this volume, for a discussion of these constraints). This has led to the suggestion (Cutler 1980d) that whereas CAUSES of errors might differ across languages, across individuals, and across occasions, error MECHANISMS ought to be both speaker- and language-universal.
22
The reliability of speech error data 5.
577
Conclusion
Not all speech errors are equally detectable; therefore all collections of speech errors assembled from everyday language behaviour are liable to be confounded by the problem of detectability. N o collection as yet exists which reliably records every error in a large body of speech. The compilation by G a r n h a m et al. (this volume) is a small step in this direction. The tape-recorded corpus of Boomer and Laver (1968), often cited as an example of a complete corpus, was in fact small ('more than a hundred' slips) and compiled 'over a period of several years'; like a similar (already larger) collection of recorded excerpts being amassed by the present author, it was presumably as subject to the detectability problem as any written accumulation. Evidence from hearing error research, and from experimentation on perceptual confusions, detection of mispronunciations, and relative salience of parts of words, indicates that there are a number of factors which will act to make some slips more detectable than others. There are a few studies which have addressed this question directly. Tent and Clark (1980) studied the detection of phonetic-level and higher-level errors in isolated sentences presented under mild noise masking, and found that phonetic errors were overlooked far more often than were errors involving larger units (syllables, words); among the higher-level errors, anticipations and transpositions were nearly always detected (98% and 97% respectively), while perseverations and blends were detected less often but still in more than 75% of cases. In the phonetic errors, anticipations were detected most often (28%), transpositions nearly as often (26%,) and perseverations least often (13%). The greater detection rate for higher-level errors parallels Cohen's (1966) observation that errors which resulted in an obviously deviant meaning in the particular context were most likely to be detected. Greater detectability by the hearer, however, does not imply also greater detectability for the speaker who perpetrated the slip; N o o t e b o o m (1980) looked at self-corrections by speakers and found that phonetic errors were actually corrected slightly more often than word-level errors. Within error types, however, the relative detectability is the same for both speakers and listeners: anticipations (at both sound and word level) were corrected far more often than other errors. Whether or not the speaker spontaneously corrects an error is presumably also a factor influencing whether or not listeners perceive that an error has been committed. The three possible types of speech error argument differ in the degree to which they are susceptible to the detectability problem. 'Some Errors' arguments, in their strongest form, require only one instance of a
23
578
A. Cutler
particular error type to make their case. In the present volume, the paper by Meara and Ellis offers a good example of such an argument. The papers by Bierwisch and by Fay are also based on 'Some Errors' arguments. This kind of argument is without question the theoretically safest kind to make on the basis of error data collected from everyday speech. ' M o r e Errors' arguments (of which this volume offers a typical instance in the paper by Aitchison and Straf) do not have to be avoided; but those who construct them must remember that detectability confounds are likely, and must be wary of assigning too great a theoretical importance to any pattern of distribution which independent evidence suggests may be influenced by differential detectability. As the evidence cited in section 3 showed, however, a number o f ' M o r e Errors' arguments in the speech error literature are based on error distributions which are quite unlike the distributions which would be predicted simply on the basis of the differential detectability of those errors. The proponents of these arguments therefore have cause for considerable confidence in their findings. Where differential detectability would predict the same result as the distribution of reported errors shows, though, interpretations of error patterns are less immune to criticism; but all is still not lost. What is required, as section 3 suggested, is a numerical estimate of the difference in detectability based on the available evidence, and a comparison of this difference with the difference observed in the error distribution. For instance, Tent and Clark's study found that 28% of phonetic anticipations were detected, but only 13% of phonetic perseverations. Thus for every 28 anticipations that were reported, 72 were missed, while for every 13 perseverations reported, 87 were missed. A comparative study of the relative frequency of anticipations and perseverations could adjust the reported frequencies accordingly, and would thereby control for the differential degree of detectability as operative in Tent and Clark's study. ' N o Errors' arguments (with which in the present volume only C r o m p t o n briefly flirts) are obviously the least safe, since only one decisive counter-example is required to destroy them; but those who make ' N o Errors' arguments have presumably always known this. Finally, it should be pointed out that speech error research is not limited exclusively to data collected f r o m everyday speech. As Cutler and Fay (1978) and Fowler (this volume) have pointed out, some of the problems inherent in naturalistic collection methods can be overcome by combining this methodology with recently developed laboratory techniques for the elicitation of errors. The reliability problems which confront speech error collectors are by no means insurmountable; speech error research — as the
24
The reliability
of speech error data
579
present volume attests — is in fine health and making a valuable contribution to linguistic and psycholinguistic knowledge. Centre for Research on Perception and Cognition University of Sussex Falmer Brighton BN I 9QG England
Note *
Brian Butterworth, Steve Isard, J o h n Laver, Dennis Norris and Richard Shillcock read and commented on an earlier version of this paper. They are responsible for a vast number of improvements, but not for the many deficiencies which remain. Financial support was provided by the Science Research Council.
References Aitchison, J. and Straf, Μ. (1981). Lexical storage and retrieval: a developing skill? Linguistics 19-7/8, 751-795. Bierwisch, Μ. (1981). Linguistics and language error. (Translated and enlarged version of Bierwisch, 1970). Linguistics 19-7/8, 583-626. Boomer, D. S. and Laver, J. D. M. (1968). Slips of the tongue. British Journal of Disorders of Communication 3, 2-12. van den Broecke, M. P. R. and Goldstein, L. (1980). Consonant features in speech errors. In V. A. Fromkin (ed.), Errors in Linguistic Performance: Slips of the Tongue, Ear, Pen and Hand. Ν. Y., London: Academic Press. Broerse, A. C. and Zwaan, E. J. (1966). The information value of initial letters in the identification of words. Journal of Verbal Learning and Verbal Behavior 5, 441-446. Browman, C. P. (1978). Tip of the tongue and slip of the ear: Implications for language processing. U C L A Working Papers in Phonetics 42. —(1980). Perceptual processing: Evidence from slips of the ear. In V. A. F r o m k i n (ed.), Errors in Linguistic Performance: Slips of the Tongue, Ear, Pen, and Hand. Ν. Y., London: Academic Press. Brown, R. and McNeill, D. (1966). The 'tip-of-the-tongue' phenomenon. Journal of Verbal Learning and Verbal Behavior 5, 325-337. Bruner, J. S. and O ' D o w d , D. (1958). A note on the informativeness of words. Language and Speech 1, 98-101. Butterworth, B. (1981). Speech errors: old data in search of new theories. Linguistics 19-7/8, 627-662. Butterworth, B. and Whittaker, S. (1980). Peggy Babcock's relatives. In G. Ε Stelmach and J. Requin (eds), Tutorials in Motor Behaviour. Amsterdam: North-Holland. Celce-Murcia, M. (1980). On Meringer's corpus o f ' s l i p s of the ear'. In V. A. Fromkin (ed.), Errors in Linguistic Performance: Slips of the Tongue, Ear, Pen, and Hand. Ν. Y., London: Academic Press.
25
580
A.
Cutler
Cohen, Α. (1966). Errors of speech and their implication for understanding the strategy of language users. Zeitschrift für Phonetik 21, 177-181. —(1980). Correcting of speech errors in a shadowing task. In V. A. Fromkin (ed.), Errors in Linguistic Performance: Slips of the Tongue, Ear. Pen, and Hand. Ν. Y., London: Academic Press. Cole, R. A. (1973). Listening for mispronunciations: a measure of what we hear during speech. Perception and Psychophysics 11, 153-156. Cole, R. A. and Jakimik, J. (1978). Understanding speech: how words are heard. In G. Underwood (ed.), Strategies of Information Processing. London: Academic Press. C r o m p t o n , A. (1981). Syllables and segments in speech production. Linguistics 19-7/8, 663-716. Cutler, A. (1980a). Errors of stress and intonation. In V. A. Fromkin (ed.), Errors in Linguistic Performance: Slips of the Tongue, Ear, Pen and Hand. Ν. Y., London: Academic Press. —(1980b). Syllable omission errors and isochrony. In H. W. Dechert and M. Raupach (eds). Temporal Variables in Speech. The Hague: M o u t o n . —(1980c). Productivity in word formation. Papers from the Sixteenth Regional Meeting, Chicago Linguistic Society, 45-51. —(1980d). La legon des lapsus. La Recherche 11, 686-692. —(1981). Degrees of transparency in word formation. Canadian Journal of Linguistics 26, 73-77. Cutler, A. and Fay, D. A. (1978). Introduction. In re-issue of R. Meringer and K. Mayer Versprechen und Verlesen (1895), ix-xi. Amsterdam, John Benjamins. —(1982). One mental lexicon, phonologically arranged: comments on H u r f o r d ' s comments. Linguistic Inquiry 13 107-113. Cutler, A. and Isard, S. D. (1980). The production of prosody. In B. Butterworth (ed.), Language Production. London: Academic Press. Dell, G. S. and Reich, P. A. (1981). Stages in speech production: an analysis of speech error data. Journal of Verbal Learning and Verbal Behavior 20 611-629. Ellis, A. W. (1980). On the Freudian theory of speech errors. In V. A. Fromkin (ed.), Errors in Linguistic Performance: Slips of the Tongue, Ear, Pen, and Hand. Ν. Y., London: Academic Press. Fay, D. A. (1981). Substitutions and splices: a study of sentence blends. Linguistics 19-7/8, 717-749. Fay, D. A. and Cutler, A. (1977). Malapropisms and the structure of the mental lexicon. Linguistic Inquiry 8, 505-520. Fowler, C. (1981). Review of V. A. Fromkin (ed.), Errors in Linguistic Performance: Slips of the Tongue, Ear, Pen and Hand. Linguistics 19-7/8, 00-000. Freud, S. (1901). Zur Psychopathologie des Alltagslebens (Vergessen, Versprechen, Vergreifen), nebst Bemerkungen über eine Wurzel des Aberglaubens. Monatschrift für Psychiatrie und Neurologie 10, 1-13. Fromkin, V. A. (1971). The non-anomalous nature of anomalous utterances. Language 47, 27-52. —(ed.) (1973). Speech Errors as Linguistic Evidence. The Hague: M o u t o n . G a m e s , S. and Bond, Z. S. (1975). Slips of the ear: Errors in perception of casual speech. Proceedings of the Eleventh Regional Meeting. Chicago Linguistic Society, 214-225. —(1980). A slip of the ear: A snip of the ear? A slip of the year? In V. A. Fromkin (ed.), Errors in Linguistic Performance: Slips of the Tongue. Ear, Pen, and Hand. Ν. Y., London: Academic Press. G a r n h a m , Α., Shillcock, R. C., Brown, G. D. Α., Mill, A. I. D. and Cutler, A. (1981). Slips of
26
The reliability
of speech error data
581
the tongue in the L o n d o n - L u n d corpus of spontaneous conversation. Linguistics 19-7/8, 805-817. Garrett, M. F. (1976). Syntactic processes in sentence production. In R. J. W a l e s a n d Ε. C. Τ. Walker (eds.), New Approaches to Language Mechanisms. Amsterdam: N o r t h - H o l l a n d . —(1980). Levels of processing in sentence production. In B. Butterworth (ed.), Language Production. London: Academic Press. Goldstein, L. (1980). Bias and asymmetry in speech perception. In V. A. Fromkin (ed.). Errors in Linguistic Performance: Slips of the Tongue, Ear, Pen, and Hand. Ν. Y., London: Academic Press. Green, D. W. (1977). The intermediate processing of sentences. Quarterly Journal of Experimental Psychology 29, 135-146. Hockett, C. F. (1967). Where the tongue slips, there slip I. To Honor Roman Jakobson. The Hague: Mouton. Horowitz, L. M., Chilian, P. C. and Dunnigan, K. P. (1969). W o r d fragments and their redintegrative powers. Journal of Experimental Psychology 80, 392-394. Horowitz, L. M., White, M. A. and Atwood, D. W. (1968). W o r d fragments as aids to recall: the organisation of a word. Journal of Experimental Psychology 76, 219-226. Hurford, J. (1981). Malapropisms, left-to-right listing and lexicalism. Linguistic Inquiry 12, 419-423. Johnson, N. F. (1980). Part-whole relationships in word processing: psycholinguistics in the eyeball. Paper presented to the Midwestern Psychological Association, St. Louis. Lackner, J. R. (1980). Speech production: Correction of semantic and grammatical errors during speech shadowing. In V. A. Fromkin (ed.), Errors in Linguistic Performance: Slips of the Tongue, Ear, Pen, and Hand. Ν. Y., London: Academic Press. M a c K a y , D. G. (1969). Forward and backward masking in m o t o r systems. Kybernetik 6, 57-64. Marslen-Wilson, W. D. (1975). Sentence perception as an interactive parallel process. Science 189, 226-228. —(1978). Sequential decision processes during spoken word recognition. Paper presented to the Psychonomic Society, San Antonio. Marslen-Wilson, W. D. and Welsh, A. (1978). Processing interactions and lexical access during word recognition in continuous speech. Cognitive Psychology 10, 29-63. Meara, P. and Ellis, A. W. (1981). The psychological reality of deep and surface phonological representations: evidence from speech errors in Welsh. Linguistics 19-7/8, 797-804. Meringer, R. (1908). Aus dem Leben der Sprache: Versprechen, Kindersprache, Nachahmungstrieb. Berlin: Behr's Verlag. —(1923). Die täglichen Fehler im Sprechen, Lesen und Handeln. Wörter und Sachen 8, 122-140. Meringer, R. and Mayer, K . (1895). Versprechen und Verlesen: Eine PsychologischLinguistische Studie. Stuttgart: Göschen. Miller, G. A. and Nicely, P. Ε. (1955). An analysis of perceptual confusions a m o n g some English consonants. Journal of the Acoustical Society of America 27, 338-352. N o o t e b o o m , S. G. (1969). The tongue slips into patterns. Leyden Studies in Linguistics and Phonetics. The Hague: M o u t o n . —(1980). Speaking and unspeaking: Detection and correction of phonological and lexical errors in spontaneous speech. In V. A. Fromkin (ed.), Errors in Linguistic Performance: Slips of the Tongue, Ear, Pen, and Hand. Ν. Y., L o n d o n : Academic Press. —(1981). Lexical retrieval f r o m fragments of spoken words: beginnings versus endings. Journal of Phonetics 9, 407-424.
27
582
A.
Cutler
Peterson, G. Ε. and Barney, H. L. (1952). Control methods used in a study of the vowels. Journal of the Acoustical Society of America 24, 175-184. Shattuck-Hufnagel, S. and Klatt, D. (1979). The limited use of distinctive features and markedness in speech production: Evidence from speech error data. Journal of Verbal Learning and Verbal Behavior 18, 41—55. —(1980). How single phoneme error d a t a rule out two models of error generation. In V. A. F r o m k i n (ed.). Errors in Linguistics Perf ormance: Slips of the Tongue, Ear, Pen, and Hand. Ν. Y., London: Academic Press. Smith, P. T. and G r o a t , A. (1979). Spelling patterns, letter cancellation and the processing of text. In P. A. Kolers, Μ. Wrolstad and H. Bouma (eds), Processing of Visible Language. New York: Plenum. Taft, Μ. and Forster, Κ. I. (1976). Lexical storage and retrieval of polymorphic and polysyllabic words. Journal of Verbal Learning and VerbaI Behavior 15, 607-620. Tent, J. and Clark, J. E. (1980). An experimental investigation into the perception of slips of the tongue. Journal of Phonetics 8, 317-325. T i m p a n a r o , S. (1976). The Freudian Slip. London: New Left Press. Wang, M. D. and Bilger, R. C. (1973). Consonant confusions in noise: a study of perceptual features. Journal of the Acoustical Society of America 54, 1248-1266. Warren, R. M. (1970). Perceptual restoration of missing speech sounds. Science 167, 392-393. Wells, R. (1951). Predicting slips of the tongue. Yale Scientific Magazine 26, 9 - 3 0 .
28
Linguistics and language error
MANFRED BIERWISCH
Werdens nächste ich schon Schreiber gescheiten. Wolfgang Amadeus Mozart Abstract The first part of the paper explores three types of spontaneous speech errors: sequential errors, selectional errors and blends on the level of lexical and syntactic organisation. The errors are explained in terms of two types of underlying mechanisms: sequential errors result from disturbances of timing processes, selectional errors and blends result from disturbances in activation and selection. The second part generalises the effects involved in speech errors to other types of automatized behaviour such as typing and practical action. It pursues the interaction of different levels of structure in determining the result of occasional disturbances in the postulated mechanisms. Finally three special problems are analysed: the role of lexically opaque semantic constituents in verbs like auf-hören (to end); the status of morphological errors; and the specification of the domain constraining the occurrence of sequential errors. With respect to the latter, two types of misordering are distinguished— exchange and anticipation or delay — and the domain of the former is characterised in terms of rhythmical conditions inherent in the accent pattern. It is argued that in general selection errors are constrained by structural similarity of competing elements, and sequential errors by rhythmical conditions on timing.
1.
The problem
Grammatically incorrect sentences have for a long time been taken into consideration and used to justify rules set up for grammars, most consciously within the framework of generative grammatical theory. Let
29
584
Μ.
Bierwisch
us consider a few examples. The assumption that imperative sentences contain a latent you is based (partly) on the correctness of the sentences in (1) and the incorrectness of those in (2), in which there are violations of the reflexive transformation: (1) (2)
a. b. a. b. c. d.
Enjoy yourself. Behave yourself. *Enjoy themselves. *Enjoy ourselves. *Behave himself. *Behave myself.
Or take the German modal verbs wollen (want) and dürfen (may). Their analysis is based not only on the different relations between ich (I), ihn (him) and bleiben (stay) in (3a) and (3b) but also on the correctness or incorrectness of the sentences in (4): (3)
a. b.
(4)
a. b. c. d.
Ich bat ihn, bleiben zu wollen. Ί asked him if he would stay'. Ich bat ihn, bleiben zu dürfen. Ί asked him if I might stay'. Ich bat ihn, sich setzen zu wollen. Ί asked him if he would sit down'. Ich bat ihn, mich setzen zu dürfen. Ί asked him if I might sit down'. *Ich bat ihn, mich setzen zu wollen. Ί asked him if I would sit down'. *Ich bat ihn, sich setzen zu dürfen. Ί asked him if he might sit down'.
W h a t all these and countless other examples have in common is that it is totally irrelevant to their heuristic role whether or not they have ever actually been used. In fact most ungrammatical sentences of this kind only occur when linguists produce them for purposes of analysis. What is then important is not whether the grammatical errors they contain have actually occurred spontaneously but whether the sentences are classified as well formed or not well formed by someone who knows the language. In the present paper I shall discuss some aspects of grammatically incorrect sentences uttered spontaneously in everyday language use. Obviously problems of a different kind will emerge than those involved in the comparison of the sentences in ( l ) - ( 4 ) above: spontaneous grammatical deviation arises from particular factors of language production, which are by definition excluded from any role in a linguistic analysis based on facts such as those illustrated in (l)-(4). The analysis of 'spontaneously
30
Linguistics
and language error
585
incorrect' sentences belongs within the realm of psycholinguistics inasmuch as the errors they contain can give some clues to the particular mechanisms of language production, in which the a b n o r m a l case — in accordance with a general methodological principle — can lead to conclusions a b o u t the factors involved in n o r m a l functioning. Hypotheses thus set u p must then of course be tested in an a p p r o p r i a t e way. Beyond that I shall try to show that the p h e n o m e n a involved in spontaneously produced incorrect sentences can be of interest in sorting o u t questions of the linguistic system proper. This fact is not surprising, since the essential factor in linguistic behaviour is linguistic competence, so that all phenomena of language p r o d u c t i o n , even pathological p h e n o m e n a , can be related to competence. It goes without saying that linguistic and psycholinguistic analyses of s p o n t a n e o u s error, if they are to be meaningful, can only be m a d e against the b a c k g r o u n d of significant hypotheses concerning the structure of the language in question. The fact that n u m e r o u s grammatically incorrect sentences are produced in s p o n t a n e o u s language activity is nothing new. False starts, a n a c o l u t h a and rule violations are the rather r o u g h and ready categories given in the linguistic literature. However, the spectrum of possible errors must be more than simply a chaos describable merely as a set of grammatical defects. I a m convinced that little research would be required in order to establish that the defects themselves are subject to discernible principles. 1 Some already well-known steps have been m a d e in this direction: the part played by syntactic complexity, in particular the depth of nesting, selfembedding, left a n d right branching have been examined as general factors, which, along with the limited capacity of short-term m e m o r y , exercise their own constraints on language activity (see for example Miller and Isard, 1964). They can be regarded as conditions for a whole class of spontaneous grammatical defects. The following examples are of a different kind: their defects c a n n o t be accounted for by syntactic complexity.
2.
Ordering errors
In each of the examples (5)-(9), sentence (a) was actually uttered in normal conversation, while (b) was the correct sentence intended. In most cases the speaker was u n a w a r e that he had m a d e an error, but recognised it once it was pointed out. (5)
a.
Erstens dauert nicht jede Stunde vier Proben. 'Firstly not every h o u r lasts four rehearsals'.
31
586
(6)
Μ.
Bierwisch
b.
Erstens dauert nicht jede Probe vier Stunden. 'Firstly not every rehearsal lasts four hours'. Er hat in Berlin drei Wochen im Tag gearbeitet. 'He worked three weeks a day in Berlin', Er hat in Berlin drei Tage in der Woche gearbeitet. 'He worked three days a week in Berlin'. Ich kann nur über die Teile kennen, die ich spreche. Ί can only know about the parts I talk', Ich kann nur über die Teile sprechen, die ich kenne. Ί can only talk about the parts I know'. Da müssen noch Einkünfte ausgeholt werden. ( ' E i n k ü n f t e ' = income; ' a u s g e h o l t ' = questioned) Da müssen noch Auskünfte eingeholt werden. 'Some more information has to be got'. Dem muß man ja seine eigene Posterklärung ordnen. ('-erklärung' = explanation; ' o r d n e n ' = order, arrange) Dem muß man ja seine eigene Postordnung erklären. 'He even has to have his own mail regulations explained to him'.
a. b.
(7)
a. b.
(8)
a. b.
(9)
a. b.
Cases of this kind illustrate a phenomenon we can easily put our finger on: word exchange. The following observations may be made: a. Word exchange is not arbitrary, but affects words of the same category: Stunde — Probe, Tag — Woche, sprechen — kennen, ein — aus, erklären — ordnen. b. Word exchange has nothing to do with any syntactic permutation rule, and thus cannot be accounted for as a rule violation. c. The morphological rules (and of course the phonological rules too) are applied after word exchange, so that there is no violation of congruence. Jede Probe vier Stunden is replaced by Jede Stunde vier Proben, not by jede Stunden vier Probe. The same applies to Tag — Woche and sprechen — kennen. d. The meaning of the sentences, their semantic structure, is not involved in exchange. Not only were the speakers not trying to express the meanings of the (a) sentences ((6a) and (7a) in particular have no meaningful interpretation at all), but they often did not even notice that they had failed in their attempt to express the meaning of the (b) sentences. 2 Word exchange is something which affects the sequential organisation of the sentences and takes place prior to the specification of the morphological surface features. Otherwise we would have got jede Stunden vier Probe. 'Prior to' cannot only have a logical meaning here:
32
Linguistics
and language error
587
word exchange must actually precede the morphological process in time. Since, on the other hand, word exchange does not affect the semantic structure, it can only be regarded as a disturbance in the process of relating semantic structure to syntactic surface structure as conditioned by transformation rules. The theoretical consequence of this is that ordering morphological and, particularly, inflexional processes after syntactic transformations is not only well motivated linguistically, but expresses a psychological reality. This is not particularly surprising. What appears to me to be important is the implication of word exchange for the relation between semantic representation and syntactic surface structure. It is clear at any rate — and this is equally trivial — that semantic interpretation cannot be determined directly by surface structure: (5a)-(9a) do not have surface structures compatible with the meaning actually realised by the speaker (and hearer). What is more interesting is a question that can be formulated in connection with cases like (5) and (6). Sentence (5a) contains three 'logical' elements: the negation nicht and the quantifiers jede and vier. Logical elements determine a semantically relevant scope of operation directly related to the surface structure. To explain this phenomenon Lakoff (1969) postulated 'global constraints', which restrict the possibilities of permutation of logical elements, while Jackendoff (1969) postulated rules for the semantic interpretation of surface structure. In (5a) word exchange has led to the confusion of elements from the different scopes of nicht jede and vier. This fact is irrelevant both to L a c k o f f s and Jackendoff's theories because it has nothing to do with delimiting scope. What is decisive, however, is the question whether exchange can take place unnoticed between quantifiers. For example, could (10) occur instead of (5b), or (11a) instead of an intended ( l i b ) ? (10) (11)
Erstens dauern nicht vier Proben jede Stunde. 'Firstly four rehearsals don't last every hour'. a. Erstens dauern nicht vier Proben viele Stunden. 'Firstly four rehearsals don't last many hours', b. Erstens dauern nicht viele Proben vier Stunden. 'Firstly not many rehearsals last four hours'.
If word exchanges of the type (10) or (11a) do occur without their discrepancy with the intended semantics being noticed, then the assumption that semantics is determined by surface interpretation rules is quite implausible, since the necessary surface structure has not been formed. For Lakoff's theory on the other hand, the occurrence of such word exchanges would only mean that they violate the global constraints. If cases like (11a) do not occur, this would not confirm Jackendoff's
33
588
Μ. Bierwisch
hypothesis, because their non-occurrence could be due not to the existence of surface interpretation rules but to the special role of global constraints, which would thus turn out to be resistant to word exchange. All I can do here is formulate the question, since I have not observed sentences with word exchange of the type (10) or (11a). But although I regard (10) as rather improbable, I would not find (11a) surprising as a defective realisation of ( l i b ) . What these considerations show is that systematic observation and analysis of spontaneously incorrect sentences can be drawn into service in the construction of linguistic theories. 3 Also of interest to linguists is a certain problem arising from errors like Posterklärung ordnen in (9a). I have assumed that word exchange is at least restricted by the condition that the elements involved must belong to the same category. If this assumption is correct, then the intended noun Postordnung must, in the syntactic derivation process, have a representation in which the stem ordn is categorised as a verb, and the exchange with the verb erklär must involve this representation. Regarding our example at least, this assumption is very plausible. The grammatical treatment of nouns like Ordnung involves the controversy over the lexicalist versus the transformationalist hypothesis. The former, proposed by Chomsky (1968), assumes that nominalisations like Ordnung are contained in the lexicon as such, while the latter assumes that they are derived by transformations from an underlying verbal structure — in our example form X ordnet die Post. Defects such as in (9a) do not provide any direct evidence for or against the lexicalist hypothesis, but they do make it likely that sentence (9) must contain a representation with the categorisation [ordn] v in its derivation. Any version of the lexicalist hypothesis which does not make at least this assumption is rendered highly implausible by (9). The hypothesis outlined by Chomsky (1968) is at least compatible with such a categorisation. The cautionary remarks in Note 3 apply in this case also. Word exchange allows some tentative conclusions to be drawn regarding the theory of language production. Word exchange is a precisely specified disturbance of the linear organisation of the syntactic elements of a sentence, and although it does not represent a violation of particular syntactic rules, it is determined by syntactic factors. Not just any words are exchanged, but only words of the same syntactic category. 4 It has an analogy in the kind of phoneme exchange leading to sentences like Ich kann doch keine Plötenflatte spielen instead of Ich kann doch keine Flötenplatte spielen ( Ί can't play a flute record'). Here too we have a nonarbitrary disturbance in linear organisation. According to my observations such exchanges affect mainly word-initial consonant clusters. 5 Like word exchange, this defect does not consist in the violation of specific
34
Linguistics
and language error
589
grammatical rules. N o r can it be regarded as a disturbance of motor articulation. Rather, like word exchange, it affects a more abstract level in the organisation of the utterance and is determined by phonemic and syntactic factors. These considerations strongly suggest that the mechanisms of speech production involve a process of linearization which presupposes the hierarchical and linear structure determined by the grammar, but is not identical with this structure. Misorderings which can be described in terms of linguistic structure, but which do not arise from incorrect application of grammatical rules, cannot otherwise be explained. 6 The sentences considered here do not provide a basis for statements on the nature of the linearization process or its disturbances. I shall return to this problem below.
3.
Selection errors
The following examples illustrate quite a different kind of defect: (12)
a. b.
(13)
a. b.
(14)
a. b.
Ist das das verpachtete Rad? 'Is that the hired bike?' Ist das das verpfändete Rad? 'Is that the pawned bike?' Er untersucht, ob die zwei nun derselbe sind oder zwei andere. 'He's examining whether the two are the same or two others', Er untersucht, ob die zwei nun derselbe sind oder zwei verschiedene. 'He's examining whether the two are the same or different ones'. Sie wollen das tabu verschieben. 'They want to change it taboo', Sie wollen das partout verschieben. 'They insist on postponing it'.
What we are dealing with here can be termed selection error. If the error is noticed, then we get sentences like this: (15)
Der hat so'n Ding geheiratet — ich meine geerbt. 'He married — I mean inherited — this thing'.
The phenomenon itself at first seems quite trivial, but one or two things of interest can be said. a. Selection errors arise because the phonemic representation of a wrong lexical item is selected. The item actually intended is not, however, replaced by some arbitrary one, but one which is phonemically or
35
590
Μ. Bierwisch
semantically related to it in some as yet unexplained way: verpfänden — verpachten; verschiedene — andere; heiraten — erben', tabu — partout. b. A selection error is not a defect in the semantic structure of the sentence: as in the case of word exchange, the speaker (like the hearer) does not usually become aware that the intended meaning has not been expressed, and if he notices the error afterwards he can immediately provide the phonemic structure corresponding to the semantic structure. c. The error affects a stage in the derivation that is early in the grammatical sense, namely the insertion of lexical items. It does not affect later rules. d. A selection error is not a violation of a particular grammatical rule. One lexical item is displaced by another, but the two items are not related by any particular rule. At any rate, there is no violation of rules in the same sense as rules are violated in (2) or (4c) and (4d). We may draw the following conclusions. Selection errors are disruptions in the mapping of semantic structure onto surface structure. They occur through a deviation in the assignment of the phonemic representation to a lexicalized complex of semantic elements. This deviation does not arise from a failure to apply a particular rule correctly or to apply it at all, as would be the case in e.g. *we have beed there instead of we have been there. N o r does it affect the syntactic organisation of the sentence. While word exchanges, for obvious reasons, can lead to incorrect syntactic surface structures — e.g. in (7a) — this is not generally the case with selection errors. Usually their deficiency can be recognized only by comparison with the correct version of the intended sentence. Selection errors are determined by semantic or phonemic relations within the whole system of lexical items, identity of syntactic category again being, it seems, a constant condition. (In cases like (14), this assumption means that tabu is classified as an adverb, which is grammatically incorrect. The fact that both tabu and partout are loan words with morphologically irregular features may play a certain part here.) Whereas word exchange is a defect relative to the syntactic organisation of the sentence, selection error must be regarded as a defect relative to the lexical organisation of the language. 7 The preliminary conclusion that can be drawn for linguistic theory is by no means new: lexical items are not an amorphous list of arbitrarily arranged or simply isolated entries, but are interrelated in a way which needs to be explained. In fact the Saussurian idea that langue is a system of mutually determining units concerns precisely none other than this aspect of linguistic structure. This of course only states the problem. Its explication awaits the development of the necessary theoretical means. 8 The following set of examples throws light on one aspect of inner-lexical
36
Linguistics and language error
591
relations. (17a) follows (16a) as a correction but is itself defective. (16)
a. b.
(17)
a. b.
N u n mußt du noch folgendes vergessen. ' N o w you must forget the following', N u n mußt du noch an folgendes denken. ' N o w you must remember the following'. Ich meine, du mußt folgendes nicht vergessen. Ί mean you needn't forget the following', Ich meine, du darfst folgendes nicht vergessen. Ί mean you shouldn't forget the following'.
In (16a) we have a selection error of the kind discussed above: denken an is displaced by its antonym, vergessen. The fact that antonyms are semantically closely related and are consequently susceptible to selection error is well known. 9 W h a t is of interest is that the selection error here is not corrected by the choice of the right lexical item but by inserting a negation. However, the syntactic combination negation-plus-verb is treated like a unitary lexical item. Sentence (17a) shows this clearly; since the negation is not incorporated lexically within the verb, it should, according to the rules of German, relate to the superordinate modal verb müssen and not its subordinate infinitive vergessen. Thus the negation-plus-verb reads mußtnicht: vergessen ('you needn't forget'), not mußt: nicht-vergessen ('you mustn't forget'). 1 0 In correcting the selection error the meaning of the missed, but appropriate, denken an is accordingly dissolved syntactically, but without any conclusions being drawn from this dissolution. Such conclusions would have led to a further correction, namely replacement of müssen ('must') by dürfen ('may'). This double status of (17a) has a perfectly natural explanation, if one assumes that the internal composition of the semantic structure of lexical items is in principle the same as that of synonymous syntactic constructions, but that certain regularities — for example the shift of the scope of negation with modal verbs — are suspended if the components of the semantic representation are combined to form a single lexical item. The relation between (16a) and (17a) thus renders quite plausible the assumption, made for independent reasons, that the meanings of individual words are structured on the same principles as those of syntactically more complex expressions. These considerations go beyond the phenomenon of selection error. They are based on a secondary phenomenon, namely spontaneous correction. I should like to outline briefly a few more provisional ideas on selection error from the point of view of language use. The insertion of lexical items in order to verbalize an intended and thus cognitively represented semantic structure involves a selection process, which on the basis of the complex system of lexical rules — i.e. the mapping of semantic partial
37
592
Μ.
Bierwisch
structures onto phonemic ones — specifies phonemic representations. This process of selection, which is largely but not wholly determined by the given semantic representation, is in the case of selection error disrupted by intervening factors. The deviation thus caused does not lead to arbitrary defects but to malfunctions controlled by the semantic or phonemic organisation of the entire lexical system. 1 1 Thus the disruption again affects a general mechanism of language production, namely the selection or activation of specific partial structures stored in long-term memory. Again, the nature of this mechanism and of its disruption must be left open. The interaction of this mechanism with the linearization process mentioned above must take place within the limits of those linguistic structures which the grammatical rules can generate. Disturbances of the selection mechanism lead to the activation of the wrong units, and disturbances of the linearization mechanism lead to the wrong sequential arrangement of units already activated.
4.
Blends
The following examples seem at first glance to belong to the category of selection error: (18)
a. b. c.
(19)
a. b. c.
Wir können auch eine Wette abgehen. Wir können auch eine Wette abschließen. Wir können auch eine Wette eingehen, b, c = 'Or we could make a bet'. Ich muß meine Liste noch verweitern. Ich muß meine Liste noch vergrößern. Ich muß meine Liste noch erweitern, b, c = 'I'll have to extend my list'.
The difference between these examples and those in the preceding sections is that here there are two possible corrections, (b) or (c). The (a) sentences do not contain a wrong word selection — the word verweitem does not even exist — but, rather, the contamination, or blending, of two word structures. This is clearer from the following sentences, in which the blend does not occur within the confines of a single surface word: (20)
a. b. c.
(21)
a.
Der Bereichsleiter hat die Anweisung, ergangen, daß ... Der Bereichsleiter hat die Anweisung erlassen, d a ß ... Vom Bereichsleiter ist die Anweisung ergangen, daß ... b, c = 'The head of department issued the instruction that ...'. ... und es stellt sich, fest: die merkt das gar nicht.
38
Linguistics b. c. (22)
a. b. c.
(23)
a. b. c.
and language error
593
... und es stellt sich heraus: die merkt das gar nicht. ' . . . and then it emerges that she doesn't notice it'. ... und ich stelle fest: die merkt das gar nicht. '... and I see that she doesn't notice it'. Damit ist es uns. in der Lage, die Unterschiede zu erklären. Damit ist es uns möglich, die Unterschiede zu erklären. 'Thus it's possible for us to explain the differences'. Damit sind wir in der Lage, die Unterschiede zu erklären. 'Thus we're in a position to explain the differences'. Für mich, kommt es ähnlich vor. Für mich ist das ähnlich. Mir kommt das ähnlich vor. b, c = 'To me it seems similar'.
In each of these examples two constructions are wrongly combined. The following points can be made: a. The blended constructions — X erläßt Y and Y ergeht von Χ, X stellt fest daß S and S stellt sich heraus, es ist X möglich and X ist in der Lage, X ist für Y A and X kommt Υ Α vor — are in some as yet unexplained sense equivalent: they are expressions of the same intended message. This distinguishes them from the two components of a selection error, which are not equivalent in this sense. b. The collision of the competing constructions occurs at an earlier stage of the generation of linguistic structure, namely when the semantic structure to be lexicalized is being constructed. c. The disruption affects among other things the mechanism which selects lexical items. However, the right lexical item is not displaced by a wrong one, as in the case of selection error, but rather, two equally 'correct' but incompatible selections are made. d. The compromise between the clashing constructions is manifested in the syntactic surface structure. The subsequent morphological rules operate correctly, as far as the defective surface structure allows. Thus es stellt sich fest is generated, and not *es stelle fest, etc. e. Blends can only affect syntactically complex units. If both candidates were syntactically unstructured, then by definition a blend could not occur. Accordingly the blended words in (18) and (19) are necessarily syntactically complex. The consequences of these considerations are not unimportant. Firstly, it is clear that the intended message, or more- generally the cognitive structure forming the starting point for generating a linguistic structure, is not as a rule identical with a semantic structure. The competing sentences in (19b) and (19c), or (21b) and (21c), for example, are not semantically
39
594
Μ.
Bierwisch
equivalent: their semantic representations must be different. However, the intended message is obviously the same. That does not mean that the semantic structure and the assumed 'intended message' must be entities of qualitatively different kinds. Rather, one must assume that the intended message can only be understood with reference to the semantic structure, namely as a partially determined semantic representation, and therefore that it is not a cognitive entity independent of semantic structure. Nevertheless, blends compel us to assume that the two must be distinguished. On this assumption a blend must be understood as a disruption in the conversion of the intended message into a fully specified semantic structure. Secondly, the compromise between the two potential constructions is arrived at only at a relatively late stage of the derivation. In other words, if the derivation of a sentence is given by the sequence of representations D = (S 1 ; ..., S n ), where Si is semantic structure and S n syntactic surface structure, and where every S( is converted into S i + ι either by a syntactic transformation or a lexical insertion operation, then a blend having the property of a merging of two derivations occurs only from a shallow structure Sj onwards. The complete derivation of a blend must therefore consist of two anterior portions A1 = ( S i · , ..., Sj.) and A" =(Sj.., ..., Sj..), and a common end portion Ε = (Sj, ..., S n ). The compromise between the blended sentences consists in the mapping of Sj. and Sj.. onto the unitary but defective structure Sj. The derivation of a syntactic blend is therefore given by D = (A', A") E, where A1 and A" are parallel partial derivations. This conclusion is supported by the observation that the superficial similarity of, for example, (20b) and (20c) hides very different underlying structures, which only approach one another via a series of intermediate stages. Thirdly, it follows that for the two blended constructions it must be possible to derive shallow structures which are sufficiently similar for a conflation of the two to be possible. 1 2 Similarity implies here that the structures Sj. and Sj. can be analysed into constituents in such a way that an identical constituent sequence Xi, ... x 2 , ... xn occurs in both, interrupted by the elements which differ across the two structures (... a.\, ... a j , ... a^; a'/, ... a j , ... a^'). Example (24) illustrates this analysis with some simplification: (24)
a. b. c.
Die betrifft, sich auf alle Parameter. Die betrifft alle Parameter. Xi a'j x2 Die bezieht sich auf alle Parameter, χι a.i a2 a3 x2 b, c = 'It applies to all parameters'.
40
Linguistics
and language error
595
The mapping of Sj, and Sj.. onto S_,· leaves the xs unchanged, but enters at least one a' and one a" into S, (in the case of (24), ai and aj' a j ) . With r e g a r d to the question whether the blend of a' and a" is arbitrary, the following fact is of interest: all the blends I have observed show a linear distribution of the reflexes of the two constructions in the surface structure. In examples (20a)-(24a) a full stop marks the point dividing the relics of the (b) sentences f r o m those of the (c) sentences. Left of this point there occur only a' elements and right of it only a" elements. The balancing out of the two blended sentences is thus clearly organised f r o m left to right: first a' elements and then only a" elements appear in the compromise structure. This principle of linearity in blends must be treated with caution, however. It emerges as a trivial consequence in all those cases in which Sj, and S r contain at most two a' or a " elements. Then a', must precede a^' by definition, otherwise there would be no blend. Embedding of the kind a',, a^', a j is only possible if the competing constructions contain at least three syntactic formatives. For example (20) this would produce the following variants: (20)
d. e.
Der Bereichsleiter ist die Anweisung erlassen, ... Vom Bereichsleiter hat die Anweisung ergangen, ...
However, these two sentences could then no longer be analysed as blends, but would have to be regarded as selection errors (hat — ist) or simply as violations of the morphological rule for the selection of the perfect tense auxiliary. An examination of other blends produces similar results: as a rule the character of a blend disappears when a1 and a" are nested. A possible counterexample would be: Mir ist das ähnlich vor. with the structure a j a'2 Χι a^' instead of (23a), which actually did occur, with the structure a j aj' Xi aj'. However, even if the principle of linear concatenation is valid and the nesting of a1 and a" is excluded, a blend cannot be regarded as a linear switch from one construction to another, completing construction. Linear switching only affects the formation of a compromise structure at the stage Sj of the derivation, not the whole derivation and certainly not the source of the blend. Until a compromise is formed a blend must be based on the simultaneous derivation of two structures. This is shown particularly clearly by the following example: (25)
a. b.
Es hat den Eindruck, nein es macht den Charakter einer Bibliothek. Es hat den Charakter einer Bibliothek. 'It has the character of a library'. 41
596
Μ.
Bierwisch
c.
Es macht den Eindruck einer Bibliothek. 'It gives the impression of a library'.
Here two competing constructions give rise to the compromise structures a'j a'2 and a^1 a:2, one after the other. The necessity of simultaneous derivations A1 and A" arises from the difference between the deeper structures in cases like (20): X erläßt Y and Y ergeht von X, which would rule out the transition a',, a j , a'i, manifested in the surface structure, even in the reverse order. It is clear that blends can only be analysed in the framework of a linguistic theory which provides for the transformational derivation of the structure of spoken utterances from abstract underlying representations. The formal characterisation of a blend as a defective derivation D = (A', A " ) E, with two competing initial sequences A' and A", is only possible within such a theory. Without these — or equivalent — means, blends not only cannot be explained, they cannot even be described. Blending itself does not seem to have any special implications that would lead to particular assumptions within linguistic theory. However, let us consider one or two questions of language production. The actual defect manifested in a blend must arise in the setting up of a semantic representation suited to the 'intended' message. The assumption that this process can be disrupted implies the assumption that it exists. 13 Its nature, and that of its disruption can be isolated a little more precisely. The setting up of a semantic structure can as a first approximation be described as an appropriate selection from a subset of all possible semantic representations. This set is determined firstly by the intended message itself (somebody who wants to say something about the relation between smoking and cancer does not first have to take account of the meaning of the sentence Chagall's paintings are not incomprehensible), and secondly by the lexically preformed partial structures, i.e. by the possibility of mapping the semantic representation constructed onto phonemically encoded and syntactically connected items by means of the lexical rules of the given language. In short, the setting up of a semantic structure is an unconscious selection f r o m a set of possible structures restricted in the above sense. What is necessary is the (at least partial) generation of a set of'intentionally equivalent' structures and the selection — or differentiation in the sense mentioned in Note 9 — of one structure f r o m this set. Given these conditions, a blend can be regarded as a fault in the selection mechanism, except that in this case it is not one wrong selection that is made, but two right ones. In other words the selection is not wrong but insufficiently precise. Furthermore, it seems important that blends and selection errors affect different stages or aspects of the selection process: I
42
Linguistics
and language error
597
have attempted to show that selection errors, as distinct from blends, do not affect the setting up of semantic structure. 1 4 Blends differ from selection errors in two further important respects. Firstly, the influence of a blend does not end with the selection defect. Whereas a structure containing a selection error behaves as though it were correct with respect to the application of subsequent grammatical rules, a blend leads to the overlapping of two simultaneous derivational structures, namely A' and A". The fact that such a partially duplicated derivation can be produced and cognitively processed by the speaker — and incidentally by the hearer, who corrects it automatically — is not surprising in view of the general complexity of language strategies. The evidence for this provided by blends is, however, not without interest. Secondly, the balancing out of a blend in the transfer to the compromise structure Sj involves a further process which has no analogy in the production of a selection error. Depending on the syntactic conditions, but not on the basis of any grammatical rule, the structures S\ and SV are mapped onto Sj. If the above statements of the linear character of this compromise structure are accurate, it seems likely that this process can be regarded as an effect of the linearization mechanism which I have suggested underlies word exchange. If all these speculations are acceptable at least in principle, then blends can be described within the framework of an appropriate speaker strategy as a disturbance of the selection mechanism balanced out subsequently by an irregular linearization process.
5.
Some general remarks
A comparison of the three types of spontaneous error I have discussed gives grounds for the following statements: a. Word exchange shows an incorrect ordering of the elements of a given sentence, and the defect only takes place within the structure of a single sentence. b. The morphological and phonological processes are not affected by any of the three types of error (word exchanges, and possibly selection errors and blends also, do, however, have phonemic analogies). c. None of the three types of defect is due to the violation of any particular grammatical rule. However, word exchanges sometimes, and blends necessarily, lead to grammatically incorrect sentences, whereas this is not the case with selection errors. Rather, the latter, as a rule, are only errors with regard to the intended semantic structure. I have attempted to explain these three types of error as defects in the working of two general mechanisms involved in the production of
43
598
Μ.
Bierwisch
language: word exchanges arise from a disturbance in the mechanism of linear organisation, selection errors and blends f r o m various disturbances in the selection mechanism. Blends have consequences in the subsequent derivation and lead to a secondary disturbance in the linearization mechanism. The two basic mechanisms I have invoked, selection and linear organisation, are not grammatical regularities, but rather processes operating on linguistic structures. They make it possible to produce and process structures which can be generated on the basis of internalised grammatical rules. For a theory of speech production we may, with caution, draw the following conclusions. The production of an utterance begins with the selection or singling-out of one semantic representation from a system of semantic structures which are appropriate to the speaker's intention and can all be lexicalised. For this representation a derivation is generated which, by syntactic transformations and phonemic structure selection, determines a syntactic surface structure. The syntactic formatives which enter into this derivation undergo linear organisation during the derivation according to their syntactic relations. The application of morphological and phonological rules gives the surface structure thus generated a phonetic representation, which then controls the innervation of the speech organs. A further remark concerns the typology of language error. Three further types of error are distinguished from that class of errors arising f r o m limitations of working memory, i.e. in particular anacolutha, sentences which on account of too great a complexity of the syntactic plan 1 5 are not ended or are ended wrongly. I have attempted to show that these types of error can be regarded as disturbances in the working of two general mechanisms. This says nothing about the source of these disturbances. What in this context seems to be merely fortuitous may in fact be explicable at some other level. Selection error for example may involve causes of the type described by Freud, although this aspect should not be overestimated; most of the examples discussed here hardly contain such factors. The types of error are by no means exhausted. None of the categories set up would, for example, include I've er I've er I've er been trying... (even interruptions and false starts of this sort seem to be tied closely to the linguistic structure). Furthermore, we should not overlook the possibility of various combinations of the types of error discussed. Nevertheless it so far emerges quite clearly that not only deliberate deviations, such as jocular or poetic ones, but also quite unintentional deviations from wellformedness are not an amorphous mass of phenomena but allow and require systematic analysis.
44
Linguistics 6.
and language error
599
Analogies from aphasia
Some new light is thrown on the solutions I have proposed when the phenomena discussed are considered in connection with certain phenomena of aphasia and alexia. In Weigl and Bierwisch (1970) arguments are advanced for regarding aphasia in general as a disturbance of the central mechanisms of speech production, not as a loss of linguistic competence. If selection and linear organisation actually play the part I have been assuming in language production, then their disruption must occur in various forms of aphasia. One does not necessarily have to agree with Jakobson's (1960) proposal for classifying all types of aphasia f r o m the point of view of disturbances of similarity and contiguity in order to recognise the important role played by the disturbance of these mechanisms. Selection disturbances are widespread within the field of aphasic and alexic disorders. The selection errors described above have particularly close similarities with the following types of case: instead of Wir haben gestern das Museum besucht ('We went to the museum yesterday') the mildly alexic patient Μ reads Wir haben gestern den Zoo besucht ('We went to the zoo yesterday'). Instead of Ich erlaube es ( Ί allow it') she reads Ich verbiete es ( Ί forbid it'). Another patient, presented with the word Tüte ('paper bag') first reads Eimer ('bucket') and then Tür ('door'). In the latter case the error affects first the semantic structure then the graphemic/phonemic structure. For many further instances of semantically conditioned selection disturbances cf. Weigl (1970). Particularly interesting cases of linearization disturbance are described by Luria and Tsvetkova (1968) in connection with 'dynamic aphasia'. These patients can name objects, produce single words spontaneously and repeat whole sentences, but they cannot construct sentences spontaneously. Yet if they are shown blank cards representing the individual words of a sentence, then their linear arrangement enables them to correctly form the sentence they otherwise cannot construct. Other types of aphasia do not show disturbance of linear organisation with the same clarity and in such relative isolation. But it is more than a bold speculation to assume that it plays a major part in a number of aphasic symptoms. What these fragmentary remarks and examples are meant to show is that the types of linguistic malfunction discussed above are due to occasional disturbances of the same mechanisms which in various forms of aphasia occur on a longterm basis due to cerebral disorders.
45
600 7.
Μ.
Bierwisch
Concluding remarks
The analysis and explanations of spontaneously incorrect sentences I have put forward are very tentative, and are only a first step into relatively unknown and complex territory. Their purpose is merely to show that spontaneous errors deserve both linguistic and psycholinguistic interest. That is why I have preferred to draw conclusions which may be too bold rather than restrict myself to recording observations. It may be argued that I have gone too far in the conclusions I have based on individual random observations. However, despite the provisional nature of my assumptions it can also be argued firstly that the kind of phenomena considered here are far more widespread than it would seem from the examples given, and secondly that the fact that they occur at all is more important than their experimental reproduction or their quantitative evaluation. Finally, a word of caution. I see no possibility at the moment of producing spontaneous errors in a way that would allow experimental control. Authentic data which allow legitimate conclusions have to be collected where they happen to be observed. This field of study exercises its own tyranny: whoever is haunted by an interest in error is unable to take part in any conversation or listen to any lecture without welcoming every mistake with acute interest and pleasure. Without this special, separate attention, the great majority of instances will escape him via his own correction mechanism.
Postscript The foregoing is the English version 1 6 of a paper written more than ten years ago and published in the early period of the rapidly growing interest in the analysis of speech errors. As some of the observations the paper contains might still be worthy of notice, it seems justified to make it available in English. It is obvious, on the other hand, that during the past ten years considerable changes have taken place, both with respect to the linguistic problems and assumptions and with respect to the concern with speech errors. Thus the issue of global constraints, for instance, contemplated in section 2 above, is no longer a real problem in linguistic theory. Appropriate qualifications or modifications apply to some of the other grammatical topics. Similarly, new questions and proposals have been put forward with respect to the theory of speech production in connection with evidence from speech errors. Instead of adapting the paper to the present state as I see it — which would amount in fact to
46
Linguistics and language error
601
writing a new paper — it seems to me more reasonable to leave it as it stands and instead to add a few remarks and further observations pursuing some of the basic ideas, which I still take to be essentially correct, regardless of developments since then. Besides the fact that these remarks implicitly indicate where the contents of the above paper are in need of revision or clarification, they provide additional motivation for publishing it in an English version: the following remarks make sense only against the background of considerations underlying the above paper. Before going into details, I should therefore pin down the points that seem to me relevant in the above considerations. (i) From a descriptive point of view, speech errors fall into a small number of types, among which sequential, selectional, and contamination errors can be sorted out in a natural way. (ii) Speech errors are grammatically principled, i.e. what is wrong with an erroneous utterance can be identified in terms of units or configurations of grammatical representations. (iii) Speech errors result from accidental interferences in underlying processing mechanisms, such as selective differentiation and sequential ordering. I have argued above — and I will continue to argue — that (ii) and (iii) are substantially independent facts; this is an empirically warranted claim, not a logical necessity. Point (i), on the other hand, should follow from (ii) and (iii), if these are worked out in sufficient detail and clarity. I don't have such an account to offer, although I will consider a few steps in that direction.
1.
Non-specific aspects of speech errors
There is increasing evidence coming from various fields of investigation that linguistic structures, just like other domains of cognitive capacities, are in crucial respects determined by fairly specific underlying principles, so that there is a modular character not only of the overall system of mental structures, but also of the linguistic system itself. (See Chomsky (1980) for general discussion.) Seen in this perspective, the speech errors discussed above are crucially related to structures determined by the syntactic and lexical system of language. That is what makes them linguistically principled in the sense of (ii). As I tried to make clear, however, they cannot be explained by reference to specific syntactic or lexical rules. The same is true for speech errors on the phonological level, to which I have referred occasionally and which in the meanwhile have been subject to extensive study. Although syntactic and sound structures
47
602
Μ. Bierwisch
are organised according to partially independent, autonomous principles, they are still aspects of language, and the fact that investigators have been largely concerned with errors in speech might lead to the impression that we are dealing with effects that belong strictly within the domain of language structure. My first point is therefore to generalise certain characteristics of speech errors to similar phenomena in other domains of behaviour. For the sake of illustration, consider first the well-known phenomena of typing errors. Although intimately related to linguistic structures and processes, typing is still not a fully integrated part of linguistic capacity, for obvious reasons: just like handwriting, it is not a result of the natural, spontaneous processes of language acquisition, although it is, of course, within the range of facultative abilities emerging on appropriate exposure. It is, moreover, based on a knowledge structure organised in part according to rules and principles different from those of language. In a rather oversimplified way, its structural aspect can be characterised by a mapping from graphemic units into motoric patterns determined by the makeup of the keyboard. This mapping presupposes, incidentally, the rules and representations of graphemic structure, which are not part of the primary linguistic knowledge either. (See Bierwisch (1972) for a preliminary discussion of this aspect.) The crucial point in the present context is the fact that in spite of the different character of the structures involved, 'slips of the finger' exhibit properties similar to slips of the tongue. Very much like speech errors, typing errors can be traced to disturbances in sequential order and selective differentiation. Accordingly, the majority of typing errors fall into the two types of sequential and selective errors. (As with phonological errors, there are difficulties with the diagnosis of contamination.) It is interesting in this respect to look a bit closer at the similarities and differences between speech and typing errors. Consider the following mistypings: (1) (2) (3)
Olkativ (instead of Lokativ) betrychtet (instead of betrachtet) Enerkie (instead of Energie)
Clearly, (1) is a case of disturbed sequence, while (2) and (3) are selectional errors. More specifically, the following observations are to be added. a. While speech errors are governed by principles of sound structure, so that e.g. syllables or word-initial clusters are subject to permutation, or vowels are replaced by vowels, leading in general to false, but pronounceable, sequences, that is not generally the case for typing errors. Their outcome is instead determined by conditions on 'typability' and the mapping function underlying the grapheme-keyboard correspondence.
48
Linguistics
and language error
603
Thus (1), although a typical instance of disturbed typing order, would scarcely have a chance of occurring as speech error. b. Just like speech errors, disturbed typing can be related to components of segments rather than whole segments. See e.g. Fromkin (1970) for a discussion of the role of phonetic features in this respect. The point to be noted is that phonetic components are completely different from 'typing components': in (1) e.g. the feature 'switch to capital' is preserved in its linear position, while the first and second graphemes are inverted (something that by definition has no parallel in speech). Similarly in (2), the feature 'leftmost column of the keyboard' is preserved, while the feature 'second row' went astray. (On the German keyboard, a and y are neighbours in the same column.) c. Although related to properties of typing structure — whatever that would eventually turn out to be — typing errors are by no means isolated from other aspects of structural organisation, as a comparison of (2) and (3) clearly shows: while (2) is a selection error with respect to 'typing features', th e g Ik error in (3) is clearly dependent on phonetic features, (g and k are not neighbours on the keyboard at all.) I will expand on this problem — interference f r o m different layers of structure — in the next section. Let me conclude these remarks on typing with the observation that not only the frequency, but also the qualitative character, of typing errors is clearly dependent on individual skills and manner of typing and therefore subject to considerable interindividual variation. (I will return to this point below.) What is important in the present context, however, is the fact that typing errors can plausibly be accounted for by the interaction of characteristic structural properties with accidental disturbance with respect to general mechanisms of selection and sequential organisation. To bring out the generality of this point as far as possible, let me briefly turn to a rather different domain, viz. practical behaviour. Suppose that the many varieties of practical action can be characterised, as a first approximation, by underlying plans in the sense of Miller, Galanter and Pribram (1960), i.e. by ordered hierarchies built up in terms of so-called ΤΟΤΕ-units. It goes without saying that these plans are subject to fairly different principles, i.e. they are based on different categories and specific kinds of relations, but nevertheless they share certain properties with syntactic phrase markers, and their execution again seems to be mediated by mechanisms of selective differentiation and sequential organisation. Accordingly, practical errors fall into the same types as speech errors. Here are some examples: (4)
A, standing near the door of B's office, ends a conversation with B,
49
604
Μ.
Bierwisch
in order to go to C across the corridor. Leaving B's office, A knocks on the inside of B's door. Here, a proper part of the action constituent 'go to see C' in A's plan has been realised at the wrong place, just like a word in a sequentially disturbed syntactic context. (5) (6)
In order to get rid of the ashes of his cigarette, A reaches for the sugar bowl instead of the ashtray beside it. Instead of looking for the key in order to open the door, A finds himself opening his coin purse and sorting coins.
These are ordinary everyday slips of action which are easily classified as disturbed selection, with 'using the key' and 'using the coin purse' as concurring alternative action constituents, and ashtray and sugar bowl as intended objects belonging to competing subplans. (7)
A and Β are engaged in a conversation in A's office, when C enters, whom A knows, but Β does not. A gets up and introduces himself to C.
This comes very close to the contamination type of speech errors. W h a t A intended was to welcome C and to introduce her to B, two appropriate (sub)plans, here merged into one inappropriate action. These are scattered observations, but ones which could easily be multiplied. In spite of their anecdotal character, the following conjectures about such slips of action seem to be justified: a. Although we have only vague ideas about the formal nature of plans for practical (or social) action, it still seems to be warranted that errors, just like slips of the tongue, are structurally principled in terms of constituents, units, and even feature-like entities: e.g. sugar bowl and ashtray are containers of similar size, differing in purpose; knocking at the door and opening it are contiguous constituents, although in the wrong order in (4). b. Again, the structural determinants in the range of which errors are possible can belong to different levels of structure: perceptual, motoric schemes and patterns of aims and goals are jointly or alternatively relevant for errors like (3) to (7). It would be interesting to look at other domains of behaviour, notably those with highly automatised skills comparable to linguistic behaviour. I strongly suspect that slips in, e.g. playing musical instruments, car driving (attention, here errors can be fatal!), or various kinds of craftsmanship exhibit similar properties. In any case, the domains I have briefly illustrated support the following conclusions:
50
Linguistics
and language error
605
The typology of speech errors supposed in (i) extends to a larger variety of domains of behaviour, the common traits being due to fairly general processes of sequential organisation and selective differentiation — or rather to accidental disturbances of these processes. The particular properties that differentiate speech errors f r o m those of other domains, and also phonological from syntactic or lexical errors, result f r o m the structural specificity of the various behavioural systems. Let me conclude this section with two further remarks. First, the degree of automatisation in different domains of behaviour may vary considerably, leading to qualitative and quantitative differences with respect to error phenomena. In a similar vein, some people seem to be characteristically susceptible to certain types of errors, even across various domains. Thus, the somewhat exotic error in (7) was produced by someone who also comes up with fairly bizarre speech errors. Secondly, pursuing the remarks on speech errors and aphasia in the above paper in the light of the present considerations, these might be extended to other domains: just as speech errors are related to certain aphasic phenomena, errors in practical action might be related to certain phenomena of apraxia due to particular brain lesions. These considerations require careful study before they can be taken seriously. The crucial point, however, is the perspective they suggest: speech errors as well as aphasic symptoms are neither independent of nor directly and exclusively related to linguistic structures. They are rather determined by domain-specific structural characteristics refracting or channelling the effects of nonspecific mechanisms of activation and organisation.
2.
Interaction of linguistic and noniinguistic structures
The next point to be made pursues the interaction of linguistic and other factors in a different direction. Whereas the former section concerned the distinction between mental structures — especially linguistic representations — and processing mechanisms, i.e. the relation between the assumptions (i), (ii) and (iii), I am now interested in the interaction of structural aspects belonging to different domains. To begin with, ordinary linguistic utterances, though crucially determined by the speaker's tacit grammatical knowledge, participate at the same time in quite a number of additional systems of mental representations. First of all, the actual meaning of an utterance is determined by its grammatical structure together with factual or non-linguistic conceptual knowledge in terms of which mental models of the situation to which an utterance refers might be represented. Second, an utterance may
51
606
Μ.
Bierwisch
serve one or another communicative or illocutionary intent, thus participating in the system of representations in terms of which social interaction is organised. It may exhibit, moreover, the speaker's emotional state in a more or less systematic manner, e.g. by vocal register, speed of articulation, etc. In short, a natural linguistic utterance is the outcome of the interaction of various systems of mental structure, both linguistic and non-linguistic, and it is quite reasonable to suspect that speech errors accordingly affect various types of structural representation. Besides the proper linguistic aspects of mental representations, I am particularly interested in the conceptual aspect of speech errors, which usually relates to conditions of the contextual or situational setting of the utterance in question. To illustrate the point, let us consider the following utterance, which is part of a description of a route that involves the crossing of a river: (8)
Da kommt man ohne Brücke nicht rüber — eh — es fährt also eine Fähre rüber. 'There you cannot cross without a bridge — ah — there is a ferryboat running'.
The first half of this utterance is a perfectly grammatical, meaningful, and even sensible sentence, but it does not express the intended idea, viz. that a ferry, not a bridge, is used to cross the river. Applying the notions developed so far, (8) should be classified as the result of a spontaneous selection error, this time on the level of conceptual representation, picking out a wrong, although related, thought. Though such relatively pure 'slips of thought' might not be very frequent, they are a real possibility, as (8) shows. A much wider variety of errors depends on both conceptual and linguistic, especially lexical, structure. That brings me to the point I am after in this section: Given the fact that interacting systems of representation jointly determine the structure of normal utterances, it is no longer plausible to assume that errors of the type under consideration are generally determined by just one level or aspect of structural representation. Rather, they may be dependent on two (or even more) sets of structural conditions. Consider from this point of view the following examples: (9) (10)
Ich hab dazu kürzlich mal 'η Vortrag gehalten — eh gehört. 'About that I recently gave — ah heard — a lecture'. Das sind Fragen, die nicht ganz offen sind. geklärt 'These are questions, that are not completely open'. clear'.
52
Linguistics and language error
607
These are typical instances of lexical selection error. According to overwhelming evidence, such errors — if not phonetically related, like say treffen instead of sprechen etc. — are usually related in terms of semantic components, like langsam instead of schnell ('slow' versus 'fast', i.e. antonymy), weniger instead of ärmer ('less' versus 'poorer', heteronymy), bringen instead of kommen ('bring' versus 'come', causativity) etc. Now halten and hören ('keep' and 'hear', or rather 'give' and 'hear' (a paper)) are not related in that way, unless the contextual interpretation is taken into account; but the verbs are in fact antonymous with respect to this particular setting. Similar remarks apply to (10): offen and ungeklärt ('open' and 'unclear') are not synonymous, except in the particular conceptual field comprising questions, problems, issues, etc. The same point can be made with regard to contaminations like the following: (11)
(12)
'Lexikon' ist ein Terminus, der sich eingebildet hat. ' " L e x i c o n " is a term that has become familiar/emerged'. (Here 'eingebildet' is a blend of'eingebürgert' and 'herausgebildet'.) Die Musik ist uns mit der Muttermilch eingesogen worden. 'Music has been instilled in us from birth'. (Here 'ist uns ... iengesogen' is a blend of 'haben wir ... eingesogen' and 'ist uns ... eingeflößt worden'.)
As observed above, blends usually depend on some kind of equivalence between the blended units or constructions. In most cases, this equivalence does not turn on true semantic synonymy, but is rather established with respect to the conceptual structure in terms of which the utterance relates to the situation talked about. Thus, sich einbürgern and sich herausbilden in (11) are not semantically equivalent in themselves — in fact, their interpretation would not be related at all in many contexts; they are genuine rivals only with respect to the particular setting associated with (11). In a similar vein, given almost any subject other than the one in (12), the two constructions blended there would not be equivalent. F r o m these observations, the following conclusions are to be drawn: a. A distinction has to be made between the grammatically determined semantic structure, including lexical and structural aspects of utterance meaning, and the conceptually determined interpretation, in terms of which an utterance relates to the subject matter talked about. W h a t is usually understood by the 'meaning' of an utterance is the joint product of its semantic and conceptual representation. b. Speech errors related to the meaning of an utterance may be determined by semantic or by conceptual constraints or by both. M o r e specifically, disturbed selectional differentiation, cause of either selection
53
608
Μ.
Bierwisch
errors or blends, can be an aberration within the structures of lexical organisation as well as within conceptual networks (whatever these may turn out to be). c. Because semantic and conceptual representations are closely interrelated in the structure of concrete utterances, speech errors can be determined simultaneously by both of them. The last point immediately generalises to other levels of structure: speech errors might be simultaneously constrained not only by linguistic and non-linguistic aspects of meaning, but also by semantic and nonsemantic levels of linguistic structure. I have just discussed the particularly intricate phenomena involved in blends from the point of view of types of meaning. But semantic and phonological, or syntactic and phonological, conditions might also closely interact in producing or constraining speech errors. It is, for example, by no means obvious to what extent the phonological similarity between gehalten and gehört (at least gehV is shared phonological structure) is involved in the speech error in (9). In a somewhat different way, sequential errors on the syntactic level may simultaneously be constrained by phonetic conditions. Thus it has been pointed out to me by John Pheby that an English utterance like (13) is less likely to occur than its (actually observed) G e r m a n counterpart (14): (13) (14)
Firstly not every hour lasts four rehearsals. Erstens dauert nicht jede Stunde vier Proben.
Besides the syntactic conditions discussed above, (14) is subject to certain rhythmical constraints that do not hold for its English counterpart: while the exchanged stems Stunde and Probe in German have the same metrical structure, their English counterparts hour and rehearsal are rather different in this respect. Whatever the cogency of this example might be, syntactic and phonetic interdependencies of this kind are certainly not peripheral. They are, in fact, intimately tied to the very nature of the process of temporal sequentialisation. I will briefly return to that problem in 3.3. To summarise, speech errors are not only causally dependent on underlying mechanisms that do not appear to be domain-specific, but they may also be related to various levels of structure — including nonlinguistic ones — because of the partial interdependence of the mental systems involved. Due to the interaction of the general processing mechanisms with specific structural representations, speech errors, nevertheless, exhibit domain-specific properties.
54
Linguistics 3.
and language
error
609
Special problems
3.1 T h e foregoing r e m a r k s should be t a k e n as a n o t e of c a u t i o n against d r a w i n g conclusions a b o u t g r a m m a t i c a l s t r u c t u r e t o o directly f r o m speech error d a t a . Nevertheless, speech errors might still p r o v i d e i m p o r tant evidence with respect to purely linguistic issues. T h u s the fact t h a t phonetic, m o r p h o l o g i c a l , syntactic, semantic, a n d even c o n c e p t u a l features or c o m p o n e n t s all f u n c t i o n in the same w a y with respect to selectional differentiation, or its eventual d i s t u r b a n c e , very m u c h s u p p o r t s the a s s u m p t i o n that they have a similar s t a t u s in some sense. In o t h e r w o r d s , if phonetic segments are assumed to be s t r u c t u r e d in t e r m s of phonological features, then by the same t o k e n lexical units should be assumed to be built u p f r o m semantic c o m p o n e n t s . Evidence of this kind is fairly indirect a n d must hence be carefully evaluated against o t h e r d a t a . But it must not be discarded, if it can help to decide otherwise unsettled issues. It should be n o t e d , on the other h a n d , t h a t evidence of the type provided by speech e r r o r s (or o t h e r p h e n o m e n a of l a n g u a g e use) is n o t a necessary c o n d i t i o n for ascribing psychological reality to properties of linguistic representations m o t i v a t e d by i n d e p e n d e n t linguistic evidence such as intuitions a b o u t g r a m m a t i c a l i t y , a m b i g u i t y , etc. In fact, linguistic hypotheses, insofar as they are empirical a s s u m p t i o n s , are hypotheses a b o u t a particular aspect of m e n t a l structures. It is f o r this very reason that the analysis of speech e r r o r s bears on a s s u m p t i o n s a b o u t g r a m m a t i c a l or lexical structure. W i t h these considerations in m i n d , I will t u r n first to the role of semantic c o m p o n e n t s . M o r e specifically, I will discuss a particularly intriguing p h e n o m e n o n that shows u p in selection e r r o r s involving prefixes in G e r m a n . F r o m a g r a m m a t i c a l point of view. G e r m a n prefixes c o m e in two categories, call t h e m A a n d B. C a t e g o r y A c o m p r i s e s six elements — be-, ent-, er-, ge-, ver-, a n d zer , category Β is a s o m e w h a t larger g r o u p , including ab, an, auf, bei, ein a n d a n u m b e r of others. T h r e e properties distinguish the A a n d Β elements: (i) Λ-elements are unstressed, 5 - e l e m e n t s stressed. T h u s we have Beständ, Verständ, entstehen, gestehen, but Anstand, Abstand, Umstand, beistehen, einstehen, zustehen, etc. (ii) If a t t a c h e d to verbs, Λ-prefixes are inseparable, while ß-prefixes get separated f r o m the stem u n d e r certain well-defined syntactic conditions. T h u s we have er bestand es, but er stand ihm bei f o r bestehen a n d beistehen, respectively. (iii) While ^-prefixes have practically n o a u t o n o m o u s , invariant seman-
55
610
Μ.
Bierwisch
tic content, ß-prefixes have a relatively well-defined semantic characterisation, in part because they are closely related to h o m o p h o n o u s prepositions or prepositional adverbs. Properties (i) and (ii) are fairly clearcut and stable, 1 7 whereas (iii) does not mark a sharp boundary, but rather a more or less definite tendency. In any event, while elements like auf and ab, ein and aus can be characterised by semantic components and appropriate relations like antonymy, heteronymy etc., defined in terms of them, no such characterisation can sensibly be given for be, er, ver etc. T o complete the picture, it must be added that four or five /^-prefixes appear also in category A, i.e. as unstressed, inseparable prefixes. Thus we have both umstέllen ('surround') and umstellen ('transpose'), übersetzen ('translate') and übersetzen ('ferry across'). Here the properties (i) and (ii) are linked as before (probably on the basis of boundary differences, as recorded in Note 17), and property (iii) agrees with their classification in category B; stressed über and unter, um, and durch can be characterised by specific semantic components, except under conditions to which I turn immediately. Prefixing is, of course, a typical process of word formation. As is well known, the semantic result of word formation may, but need not, be compositionally determined by the meaning of the constituent parts involved. Thus, while e.g. mealtime has a predictable meaning (is semantically transparent), ragtime has an unpredictable meaning (it is semantically opaque). Again, the distinction is not clearcut, given the wide variety of intermediate cases like runway, typewriter, etc. Now, this whole range of possibilities is covered by German prefix combinations: Ausgang and Eingang ('exit' and 'entrance') are fairly transparent, Anstand ('decency'), Aufstand ('uprising'), and Abstand ('distance') are rather opaque. This variation runs across A- and 5-prefixes, insofar as the semantic contribution of Λ-prefixes can be sorted out at all. A fairly transparent case with Λ-prefixes would be beladen ('load') and entladen ('unload'), while befehlen ('order') versus empfehlen ('recommend'), where emp- is a phonological variant of ent-, is opaque. Ignoring a wide variety of intricate factors that might be relevant here, we still can recognise a fairly strong tendency towards semantic opacity for both types of prefixes. Thus Anfang (beginning), although a clear case of a prefix compound, must semantically be characterised as the antonym of Ende, with no trace left of the semantics of an- ('at') and fang ('catch'). In addition to these grammatical properties, two independently justified prerequisites from the field of speech errors are needed in order to establish the point I am trying to make. First, it is well known that prefixes function as genuine structural units with respect to the processes involved
56
Linguistics and language error
611
in speech errors on the syntactic level, as shown by the following examples: (15) (16) (17) (18)
Wenn begründeter Bedacht versteht ... (... Ferdacht besteht ...) 'If there is justified suspicion ...'. Das is 'ne Bindewebsgeschwäche. (for 'Bindegewebsschwäche') 'That is a weakness of connective tissue'. Ich kann den Besuch befehlen — eh empfehlen. Ί may order — ah recommend to visit it'. Das dürfte fast alles zustimmen, (from 'zutreffen' and 'stimmen') 'That should all be correct'.
(15) and (16) are different cases of disturbed sequence, with two prefixes substituted for one another in (15) and with a prefix and a stem interchanged in (16): (17) is a selection error, with be- showing up instead of em-; 1 8 and (18) is a classical instance of a blend turning on two verbs, one with and one without a prefix. Second, selection errors, as already pointed out, exhibit a strong tendency to be determined by similarity relations specifiable in terms of either phonetic or semantic components. The most plausible account of this fact leads to the assumption that lexical knowledge is organised and assessed according to the components in question. (See Fay and Cutler (1977) for a systematic study in this direction.) Additional support for this assumption comes from word association (Clark, 1970) and neuropsychological experiments like those of Luria and Vinogradova (1959). Of primary interest in the present context is the role of systematic semantic relations. With these considerations in mind, compare the following errors: (19)
(20)
(21) (22)
Es fängt immer mit einer Verletzung der Innenhaut ab. an. 'It always starts with an injury of the inner tissue'. Der sollte bald abhören. auf 'He should soon come to an end'. Als Patentrezept bietet sich ab — an, ... 'As a patent solution, it offers itself ...'. Ich will lieber zum nächsten Abschnitt untergehen. über Ί prefer to turn to the next section'.
These are what might be called in a sense 'regular' selection errors: ab instead of an, ab instead of auf, unter instead of über, they are governed by the usual syntactic and semantic conditions: same syntactic category,
57
612
Μ.
Bierwisch
o b v i o u s semantic relatedness. T h e intricate point, however, is this: the particles are p a r t of fixed lexical units which are semantically o p a q u e , a b s o r b i n g the a u t o n o m o u s m e a n i n g of the particle to which the e r r o n e o u s substitute is semantically related. As already m e n t i o n e d , anfangen f o r example, which is turned into abfangen in (19), has n o t h i n g to d o with either an (at) or fangen (catch) semantically. In spite of this semantic integration, which does not even reflect the m o r p h o l o g i c a l structure of the relevant lexical items, the selection errors in (19)—(22) are determined, a c c o r d i n g to the above a s s u m p t i o n , by the o r d i n a r y semantic relations characterising the involved prefixes separately. In o t h e r words, the error seems t o depend o n semantic c o m p o n e n t s that are n o t p a r t of the actually intended lexical units, but only of their semantically suspended constituent parts. In cases like these, then, the m e a n i n g of one of its p a r t s overrides t h a t of the whole lexical unit. This in t u r n implies t h a t the a u t o n o m o u s m e a n i n g of the prefix, a l t h o u g h not actually relevant, is still s o m e h o w associated with it. This conclusion is c o r r o b o r a t e d by cases like (23), s h o w i n g the same effect in the opposite direction, so to speak: (23)
D a fing's d a n n auf. h ö r t e es ' T h e n it ended'.
H e r e the prefix auf of the intended lexical item aufhören is preserved, while the stem is replaced by that of its a n t o n y m anfangen, a l t h o u g h the t w o stems fangen (catch) a n d hören (hear) per se are not semantically related at all. In o t h e r w o r d s , the selection e r r o r in this case involves two items t h a t are semantically related only because they are constituent p a r t s of semantically o p a q u e , t h o u g h clearly related, lexical items. Hence this time the m e a n i n g of the whole lexical item overrides t h a t of its constituent parts. T o p u t it in other terms, (19)—(22) as well as (23) are selection errors t h a t t u r n o n the morphological structure of o p a q u e lexical items, exhibiting, however, a crucial difference: in the latter case it is the actual, integrated m e a n i n g t h a t determines the deviation, while in the f o r m e r , it is the veiled, a u t o n o m o u s meaning. A s for the prefixes, they are accessed on the integrated level of m e a n i n g in (23), a n d on the veiled level of m e a n i n g in (19)—(22). T h r e e f u r t h e r r e m a r k s are to be m a d e . a. T h e a u t o n o m o u s role of the prefixes is i n d e p e n d e n t of the question w h e t h e r they are c o n t i g u o u s to or separated f r o m the pertinent stem: (20) a n d (22) exhibit the first, (19) a n d (21) the second c o n d i t i o n . 1 9 b. W h e t h e r or n o t the w r o n g prefix results in a n actual w o r d of the language seems to be purely accidental, just as it is accidental whether the selection of a w r o n g w o r d results in a m e a n i n g f u l ( t h o u g h i n a p p r o p r i a t e )
58
Linguistics
and language error
613
sentence or not. Thus (20) would make sense in a different context, while (19) could not. c. Like other selection errors, those involving prefixes can be phonetically conditioned. (24) is a case in point: (24)
Bei ihm taucht ein entgegengesetztes Prinzip aus. auf. 'With him, an opposite principle shows up'.
Here the simplest explanation would assume that a phonetic feature rather than a prefix is replaced. Although the an jab cases are also open to an account in terms of phonetic segments, the close semantic relationship of these particles clearly points in the direction contemplated above. The main conclusion to be drawn from the facts just discussed is this: although the relevant meaning of a morphologically complex word might be more or less opaque with respect to the meaning of its parts, the independent semantic properties of these parts need not be discarded altogether. Rather they might play a latent, though still effective, role in the structural organisation and the actual use of lexical knowledge. In other words, the depth of various layers within the mental lexicon might go much further than usually assumed. New, opaque meanings do not, in general, extinguish semantic relations of integrated parts. The considerations concerning the parts of semantically opaque words carry over to the semantic aspect of idiom chunks. Hence phenomena similar to those related to prefix verbs should be expected in speech errors involving idioms. As selection errors with respect to idioms frequently result in contaminations, additional aspects have to be taken into account, and I will not go into these matters here. Two supplementary speculations, however, suggest themselves. First, the role of hidden aspects in opaque meanings is in a way reminiscent of the role that suppressed readings play with respect to ambiguous lexical items. In spite of controversial evidence (see Levelt (1978) for discussion), it still seems warranted to assume that besides the appropriate reading of an ambiguous item its other readings are initially activated in language perception. If we think of a prefix in a semantically opaque combination as an ambiguous unit whose primary reading is contextually inappropriate, then this assumption is independently supported by speech error data like those in (19)—(22): what I have called selectional differentiation is a process that involves related items that are barred not only for contextual, but also for strictly lexical, reasons. The second speculation concerns conclusions with respect to figurative meaning. Semantically opaque complex words have something in common with idioms and figurative meaning: the relevant semantic inter-
59
614
Μ. Bierwisch
pretation in all these cases is not identical with the primary semantic specification, at least of certain parts of the expressions in question. The dissociation of these two layers of meaning is, of course, the strictest in opaque words. If, however, the suppressed semantic properties are somehow involved even in opaque words, as argued above, then they should also be involved in cases of figurative meaning, where the connection is much more obvious. The interesting point, then, is this: the 'underlying' semantic representation might be involved without any actual derivation connecting it to the relevant non-literal interpretation, just as there is no actual derivation connecting auf and hören to the meaning of aufhören. In producing (or understanding) a metaphor, you are not necessarily running through the literal interpretation, and yet it may be present as the tacit background. 3.2 A number of interesting problems arise with respect to speech errors related to inflectional morphology. I can touch only on two or three of them. Grammatically, inflectional morphology mediates between syntactic and phonological representations; it spells out lexical and syntactic features by means of segmental phonology. Despite this mediating character, it constitutes a separate aspect of structural representations that might be affected by speech errors in the same way as phonological or syntactic structures. Thus you find cases of disturbed sequence like (25) or (26), and selection errors like (27)—(29): (25)
(26)
(27)
(28)
(29)
Leg ihm meine Glückwünschen zu Füße. Glückwünsche Füßen. 'Put my gratulations before his feet'. Das geht wie die Messer ins Butter. das in die 'That goes like the knife in the butter'. Der ist so alt wie uns. wir 'He is as old as we'. ..., der wiederum eine Schülerin von Renate ist. die '... who is in turn a student of Renate'. Da hab ich was getrinkt. getrunken 'Then I drank something'.
The wrong selection concerns case in (27), gender in (28), and strong versus weak inflection in (29). Other morphological categories can be affected in the same way. 20 In short, inflectional errors exhibit patterns of 60
Linguistics and language error
615
the same kind as phonological and syntactic ones and thus indicate the relative independence of morphological structure. That does not mean, of course, that errors showing up in disturbed inflection must be independent of other structural conditions. In fact, the observations discussed in section 2 fully apply to inflectional morphology. Consider for example the following cases: (30)
(31)
Ob der den Zeug fertig hat? das 'Whether he has finished the stuff 1 Das ist eines der prägendsten Eindrücke. einer 'That is one of the most influential impressions'.
A plausible conjecture with respect to the gender mistakes in these cases would be that alternative lexical items are involved in determining the inappropriate gender, probably den Kram (instead of das Zeug) in (30), and eines der Erlebnisse (instead of einer der Eindrücke) in (31). This conjecture loses much of its purely speculative character if placed within the context of related phenomena, such as the lexical-conceptual interaction discussed above. Thus, the same point, viz. the interaction of relatively autonomous (here, inflectional) processes with other structural conditions, can be made in a rather different way. I have in mind automatic morphological corrections patching up errors on the syntactic level. A fairly frequent type of disturbed sequence involves clause-final sequences of verb forms, which is characteristic for German clause structure. (32)
(33)
(34)
(35)
... der sich allerdings mit der Hälfte bescheiden werden muß müssen wird '... who will be forced, though, to be satisfied with half' Wenn man die Säule hier rüber gedacht stellt gestellt denkt 'If one imagines the pillar to be placed over there' eine Frage, die man schon jetzt stellen zu glauben mußte müssen glaubte 'a question that was believed already now to be necessarily raised' ... wo du dann zu versuchen zeigst zeigen versuchst '... where you then try to show'
In all these examples, two adjacent verb forms are erroneously permuted — or rather the stems of two verb forms are substituted for one another. The inflectional categories, however, appear to maintain their correct places. The crucial point is that these inflectional categories may have 61
616
Μ.
Bierwisch
different consequences if applied to different stems. And as a rule, it is the 'locally correct' form that shows up. Thus we have gedacht stellt in (33), not gedenkt stellt, which is what would result if only the stems were interchanged. In other words, inflectional rules in these cases operate as an automatic adjustment, very much like phonological assimilation rules in cases like untersiechte (with [p]) instead of untersuchte (with [x]) — a wrong vowel specification with subsequent fronting of the velar spirant. 2 1 T o summarise, inflectional morphology, due to its relative autonomy, can not only be subject to selectional or sequential errors of the usual type, it can also automatically be adapted to errors originating on a different level of structure. 3.3. The last problem to be considered concerns structural constraints involved in cases of disturbed sequence. As mentioned above, disturbed order on the syntactic level, with which I will be concerned here, may possibly be constrained by phonological factors, particularly those that are relevant for rhythmic aspects of speech production. 2 2 I now want to pursue that issue somewhat further. As a first approximation, consider the following principle: (P)
An utterance u with disturbed sequential order preserves the accent pattern of the intended (correct) utterance u'.
It can easily be seen that (P) covers a wide range of pertinent facts. Practically all examples of disturbed sequence discussed so far are subject to (P). It also holds, incidentally, for most of the phonological cases of disturbed sequence, like Wederendung instead of Redewendung ('figure of speech'). 2 3 On closer inspection, however, it turns out to be necessary to distinguish at least two types of sequential errors. In an abstract way, they can be characterised as follows: Type I: Type II:
u = x ay b ζ and u = x b y a ζ u — x y b ζ and u' —x b y ζ
Again, u and u' represent the actual and the intended utterance, respectively; χ, y, and ζ represent invariant parts of u and u\ while a and b are displaced units, χ or ζ may be empty. In other words, type I errors consist in an exchange of two units, while type II errors consist in the displacement of one item. There are a few problems with this classification, to which I will turn immediately. Let me first illustrate with the following examples: (36)
ein Hotel, das zum Schiff umgebaut ist Schiff Hotel 'a boat that has been reconstructed as a hotel'
62
Linguistics and language error (37)
(38)
(39)
617
die entstrebte Erspannung er ent 'the detente strived for' Ich versuche es sicherhin auch weiter sicher auch weiterhin T i l certainly try it further on' Weil sie sich zu übernommen haben scheinen. übernommen zu 'Because they seem to have overreached themselves'.
(36) and (37) are type I instances with exchanged nouns and prefixes, respectively. (38) and (39) belong to type II, misplacing -hin and zu, respectively. The problems with the classification are as follows. Suppose, first, that we assume that y can be empty, just like χ and z. This would not have any consequences for the type II cases, for their character as errors always implies a non-empty part which a is shifted around. For type I cases, however, a difficulty arises: type I cases with empty y like (40) become in some respect indistinguishable from type II cases, if one analyses the aelement of type I as the non-empty y of type II: (40)
Ansonsten macht die Spaß Fahrt. Fahrt Spaß. 'Besides that, the trip is fun'.
In other words, type II cases would become a special class of type I, namely, those with empty y. We will see immediately why this does not seem to be the case, i.e. why there is a difference between type I cases with empty y, like (40), and type II cases like (38) or (39). The second problem concerns an assumption embodied in the above formulation of type II, viz. that those cases are basically anticipations of the element b — rather than delays of the part to be classified as y. In terms of the above examples: that in (39) zu is anticipated rather than übernommen postponed. Although there seem to be both structural and processual considerations that argue in favour of this assumption, it is by no means obviously determined by the data. If it turns out to be too strong, two subtypes of II have to be distinguished. Nothing in particular depends on this point, though, with respect to the following argument. It has repeatedly been noticed (see, e.g. section 2 of the previous paper), that for cases of disturbed sequence, the inverted items tend to preserve membership of syntactic category. Let us call this principle (P 1 ): (P')
In an utterance u with disturbed sequential order the interchanged parts belong to the same syntactic category.
63
618
Μ.
Bierwisch
C o n s i d e r n o w , how (P) a n d (P 1 ) are interrelated. T h e interesting point is t h a t practically all errors satisfying category invariance, ( P ) , are also subject to accent invariance, (P). On the o t h e r h a n d , cases violating (P) are also in conflict with (P'). In addition to (29), the following cases are crucial examples: (41)
(42)
A n n e u e Signale will so n i e m a n d recht glauben. n i e m a n d so ' N o b o d y really believes in new signals'. Ein Sport, der k a u m wie ein a n d e r e r alle diese M e r k m a l e hat. wie k a u m Ά type of sport t h a t has all these features like scarcely a n y other'.
In other w o r d s , (P) a n d (P ) are intimately related: violations of the o n e usually are violations of the other. This fact can be explained, rather t h a n merely stated, if we assume that there are type I a n d type II errors, a n d t h a t it is precisely the f o r m e r that is subject to (P) a n d (P 1 ). Notice that on this a c c o u n t , the coincidence of (P) a n d (P') ceases to be a mere accident. It r a t h e r follows f r o m a deeper distinction separating type I and type II cases, n a m e l y an exchange of two units within a fixed structural f r a m e w o r k , preserving b o t h the syntactic f o r m a n d the accent p a t t e r n assigned to it, as o p p o s e d to the anticipation of a later unit which eventually violates the originally intended structural organisation. In order to see how the clustering of (P) a n d (P ) follows f r o m this distinction, one has t o take into account the fact that the strings in terms of which type I a n d type II errors have been characterised a b o v e are actually linear stretches o n which syntactic a n d suprasegmental phonological hierarchies are imposed. Given this structural setting, it follows t h a t (P) a n d (P') can be met in general if a n d only if two elements of the a p p r o p r i a t e type exchange their places, such t h a t one fits into the slot of the other, b u t not if one unit is anticipated, thereby occupying a structurally void place, such that (P ) c a n n o t , a n d ( P ) need not, be met. T h i s then motivates the different classification of (39) a n d (40), in spite of their superficial similarity. T h e r e is one f u r t h e r point to be n o t e d . Cutler (1980) discusses cases of disturbed o r d e r of lexical elements a c c o m p a n i e d by a c h a n g e in stress p a t t e r n a n d she observes that only displaced closed class elements (i.e. p r o n o u n s , prepositions, auxiliaries, particles) c h a n g e the intended stress pattern. A s in fact all these cases are type II errors, a f u r t h e r generalisation might be indicated: type II e r r o r s can affect only closed class elements, while type I e r r o r s can affect pairs of items of a r b i t r a r y b u t equal class m e m b e r s h i p . If this generalisation is b o r n out by f u r t h e r d a t a , the distinction between type I a n d type II errors could be t u r n e d into an even m o r e principled p h e n o m e n o n , which might eventually bear also o n G a r r e t t ' s (1980) distinction of functional a n d positional processes in the
64
Linguistics and language error
619
following way: functional errors resulting in disturbed order are always of type II. Whatever phenomena might cluster around the type I/type II distinction, the distinction raises, of course, a new question, viz. how the difference in question can be reduced to different kinds of interference in the process of sequentialisation. This seems to me a productive, albeit at present unanswerable, question. Any attempt to answer it will shed some light on the nature of the process of serial organisation. The relationship between syntactic and accent patterns in cases of disturbed sequence leads me to a final remark. One of the main problems of disturbed sequence is the delimitation of their domain. As can easily be seen, they cannot naturally be accounted for in terms of syntactic dependencies. Both type I and type II errors may appear, on the one hand, within one word, as shown by (43) and (44), and they may, on the other hand, go across clause boundaries, as shown by (45) and (46). (43) (44) (45)
(46)
Bewandtenversuche (instead of 'Verwandtenbesuche') 'visits of relatives' Bindewebsgeschwäche (instead of 'Bindegewebsschwäche') 'weakness of connective tissue' Wenn man wissen ist, was will will ist 'If one wants to know what is the matter'. Da hat's so doll an zu regnen gefangen. zu regnen angefangen. 'Then it started to rain heavily'.
I do not think, in fact, that the domain in question can be determined in any direct way in terms of configurational syntactic properties. I rather take it to be a matter of rhythmical properties and the accent pattern underlying them, and hence related to syntactic conditions only insofar as accent patterns are determined by constituent structure. As a first attempt to specify this conjecture more precisely, consider the following principle: (P")
In an utterance Μ with disturbed sequential order, the part y, around which the elements of the intended utterance u' are moved, m a y not contain more than one relative accent peak.
The notion of a relative accent peak can be made precise in various ways which I will not pursue here, taking it to be sufficiently clear on intuitive grounds. Although the validity of (P") must be further explored both conceptually and empirically, it has three advantages making it worthwhile as a first approximation. First, it covers a wide range of pertinent data. In fact, I have not yet seen any serious counterexample. Second, it expresses a fairly strong limitation. And third, it seems to point
65
620
Μ.
Bierwisch
in the right direction with respect to the way along which the rhythmical character of sequential timing might interact with structural conditions. In order to make this fairly vague claim somewhat more perspicuous, consider the following speculation. Suppose that the timing of speech p r o d u c t i o n processes is determined, however remotely, by general biorhythmical patterns. The co-ordination of speech and accompanying gestures is but one piece of evidence pointing in that direction. The n a t u r a l place to look for the interaction of those patterns with linguistic structures is, of course, the metrical, suprasegmental organisation of linguistic utterances, that is, the phonological phrases and the accent patterns attached to them. Seen f r o m this angle, accent peaks determine plausible d o m a i n s for relatively self-contained chunks of sequential processes delimiting also the range of s p o n t a n e o u s disturbances. The way in which these conditions are tied u p to syntactic relations a n d categories is then determined by the rules assigning the phonetic interpretation to syntactic structures. T o the extent to which these speculations go in the right direction, at least in principle, sequential speech errors bear on one of the most essential properties of natural language, viz. the basis on which thought gets a phonetically articulate sequential shape. T o sum u p the discussion of this section: just as selection errors (and blends) are constrained by phonetic, semantic, or conceptual relatedness of the items involved, sequential errors are constrained by the rhythmical organisation of the utterance and the accent pattern underlying it. 4.
Concluding remarks
Let me end this postscript with three remarks. First, it goes without saying that there is a vast n u m b e r of p h e n o m e n a and problems lying beyond the range of my discussion here. They require, on the one hand, further elaboration a n d clarification of the concepts and principles discussed here, and, on the other h a n d , recourse to completely different underlying mechanisms, in the sense of point (iii) above. T h u s a large class of language errors depends on the n a t u r e and limitations of short-term memory. Anacolutha and mid-sentence shifts of planning are consequences of disturbances in this general factor. They too, of course, are grammatically principled. There is, f u r t h e r m o r e , a smaller, but interesting class of errors that originate neither f r o m disturbances of selection or sequential organisation, n o r f r o m limitations of short-term m e m o r y , but are rather due to what might best be described as misapplication of particular grammatical rules. Typical cases are the following, which turn on incorrect application of NP-preposing: (47)
W o die klassische Theorie versucht wurde, zu zementieren.
66
Linguistics and language error
(48)
(49)
621
Wo versucht wurde, die klassische Theorie zu zementieren. 'Where one tried to cement the classical theory' Das wird wiederum zu überwinden versucht dadurch, d a ß ... Es wird wiederum versucht, das zu überwinden dadurch, d a ß ... 'It is tried, in turn, to overcome that by ...' Ein trauriges Erbe, das nun versucht wird zu beseitigen das zu beseitigen nun versucht wird Ά sad heritage, which to overcome is tried by now'
These are in no way sequential errors of the type discussed before. Without going into any of the interesting details (see Bierwisch (1975) for some discussion), I will merely point out that in cases like (47)-(49) the actual utterance does not seem to be just a processing defect realising an otherwise correct intended utterance. Instead, the error occurs in constructing the intended utterance itself. If this conjecture can be supported by further evidence, it will shed some light on the problem of how grammatical knowledge comes to bear on actual language use. Second, the errors I've been concerned with are based on mechanisms and their disturbances, which, I have argued, extend well beyond linguistic behaviour. General as they may be, the phenomena discussed here are still restricted in a fundamental way: they are all related to active or expressive rather than interpretive or perceptive behaviour. More specifically, in spite of important generalisations that go across various modes of language use — as for example the use of grammatical knowledge and the role of short-term memory — speech errors of the type discussed here are sharply restricted by processes underlying language production. In other words, 'slips of the tongue' are different in crucial respects f r o m what might be called 'slips of the ear'. See Bierwisch (forthcoming) for a brief discussion of some of the differences. Finally, as in the earlier paper, reprinted above, my remarks are fairly speculative and based on accidental observation. In order to come to more reliable conclusions, elaboration with respect to the empirical as well as the theoretical and conceptual side is required. With respect to the former, considerable progress has been made in the past decade, showing that in some respects even statistical significance can be achieved, a fact that I take to be desirable, though not indispensible in each case. The fascinating observations concerning G e r m a n prefixes, for example, will hardly ever be amenable to serious statistical testing. Still, they may be revealing in important respects. As for the theoretical issues, various models of speech production mechanisms are under vivid discussion. I have refrained from going into any of these proposals, in part because serious discussion would by far
67
622
Μ. Bierwisch
exceed the limits of my remarks, in part because I am more interested, for the time being, in exploring the character and the limits of potential principles that more detailed formal models would have to incorporate. In that sense, it seems to me a promising fact that a fairly wide variety of speech error phenomena can be reduced to the interaction of a few independent principles, the more concrete specification of which in terms of process models may yet be forthcoming, so showing how language production gains its specific traits within the general conditions underlying human behaviour. Zentralinstitut für Sprachwissenschaft Akademie der Wissenschaften der DDR Berlin G.D.R.
Notes 1.
The examination of defective sentences must of course be based on a rough classification of general conditions. The influence of alcohol, extreme tiredness, stressful situations etc. create changes in the general circumstances of behaviour, including the use of language, and such conditions must be excluded from consideration. The ideas discussed in this paper, and all examples, refer to language production in normal circumstances in the sense that none of the above conditions played any part.
2.
In the case of (5a), incidentally, there was one hearer who failed to notice the defect. In fact most word exchanges will go unnoticed by the hearer as well as the speaker. This means that word exchange — like other disturbances — is by no means confined to speaker strategy but has an automatically functioning corrective principle in hearer strategy.
3.
Of course choice between various hypotheses such as global constraints and surface interpretation rules must be made primarily on the basis of independent linguistic evidence. The question raised here simply shows that such decisions can have a directly psycholinguistic aspect, and could be bolstered by appropriate research. I d o not know whether exchange involves more complex constituents too. I have not observed any such cases. The following example cannot fully be regarded as a case of word exchange in the sense discussed here: it contains not only a simple disturbance of the linear arrangement but also, as a consequence of this, a syntactic restructuring: (i) M a n m u ß nur in die Küche reinkommen, in fünf Minuten das G a s anmachen, und schon ist es warm. 'You just have to go into the kitchen, put the gas on, in five minutes, and it's warm'. (ii) M a n m u ß nur in die Küche reinkommen, das G a s anmachen, und schon in fünf Minuten ist es warm. 'You just have to go into the kitchen, put the gas on, and in five minutes it's warm'. It seems reasonable, however, to regard phenomena of this kind as symptoms of the same disturbance that causes word exchange.
4.
68
Linguistics and language error 5.
6.
7.
8.
9.
10.
11.
623
This statement clearly tends to oversimplify the issue, as the following example shows. Instead of saying Er hat ein Paket Krawatten verloren ( ' H e lost a packet of ties') one speaker produced the sentence Er hat ein Krawet Pawatten verloren. Here a simple exchange of initial clusters is overlaid by the anticipation of a whole syllable. The problem of linear organisation in the sense being considered here is discussed using independent neuropsychological arguments, for example by Lenneberg (1967: 218-223). In the terminology of traditional structuralism word exchange would be described as a disturbance of syntagmatic relations, and selectional error as a disturbance of paradigmatic relations. M y discussion does not employ these concepts since the explanation of the structure of natural languages in terms of these two relations has for many reasons proved to be inadequate. Individual attempts to deal with this are to be found, for example, in Katz (1972) and Bierwisch (1965). This is an area to which also belong observations collected under the title 'word field' or 'semantic field'. It is also unclear so far whether and to what extent the theoretical elaboration of the relations between lexical entries must take account of the speaker/hearer's extralinguistic general knowledge. Strictly speaking the pair denken an ('think of, remember') and vergessen ('forget') are not antonymous. Antonyms like big — small etc. are contrary opposites, whereas vergessen forms a contradictory opposite to (one meaning of) denken an. The important thing here is that the meaning of denken an is given by the negation of the meaning of vergessen. There is a correspondence in German between X muß and es ist notwendig daß X ('It is necessary that X") and between X muß nicht and es ist nicht notwendig daß X ('It is not necessary that X'). This relation of the negation to the modal verb is obligatory for kann ('can'), darf ('may') and muß ('must') and optional for soil ('shall'), will ('want to') and möchte ('would like to'). It seems reasonable to find an analogy to the disturbance of lexical item selection in phonemic specification defects like Drei Wochen waren verstritten instead of Drei Wochen waren verstrichen ('Three weeks had elapsed'). Whether errors in word and phoneme selection can be regarded as disturbances of a unified mechanism, as seems plausible in the case of word and phoneme exchange, I cannot judge. What can be said against this assumption is that lexical items are selected on the basis of semantic determinants, whereas phonemes are not selected on the basis of semantic a n d / o r syntactic factors. This objection would not apply if word and phoneme determination could not be interpreted as a positive selection but as a selective suppression of alternatives in a 'preactivated' subsystem, in other words if selection in general were to be regarded less as the selection of what is right than as the elimination of what is wrong in a repertory of activated possibilities.
12.
This seemingly teleological formulation does not imply that the speaker plans the blend as a result of the similarity of the surface structures involved. There is no reason at all to assume that the surface structure plays any part in causing a blend. The surface structure is simply the stop in the derivation where a compromise between the clashing structures is made. If this compromise does not come a b o u t , then the sentence cannot be finished and the blend is not completed.
13.
This assumption is undoubtedly supported by other p h e n o m e n a . The easily verifiable asymmetry between active and passive c o m m a n d of one's native language could result to a considerable degree f r o m the fact that speaker strategies involve the process under discussion here while hearer strategies do not. Both the speaker and the hearer must generate the whole derivation of a sentence uttered and understood, but only the
69
624
14.
15.
16.
17.
Μ.
Bierwisch
speaker needs to find out which derivation produces a sentence which corresponds to his intended message. A psychologist once described to me the difficulties in this process which he observed in himself as 'the necessity of being one's own interpreter all the time'. Because of the complexity of the phenomenon it is very difficult to give an analogy on the phonemic level in the case of blends. One could think of involuntary word blends; observed instances like Auf Wiederschehn from Auf Wiedersehn and Auf Wiederschaun, blur from bloß ('only') and nur ('only') or: a. Da habm sie dich aber angeschmissen. b. Da h a b m sie dich aber angeschmiert. c. Da habm sie dich aber angeschissen, b, c = ' Y o u ' v e really been had there'. would then be relevant. The clash here is not between two semantic structures and their realisation by corresponding constructions but between two competing phonemic structures assigned to one semantic structure. The compromise between the two is forced in the phonemic representation. A systematic analysis of spontaneous errors of this kind will doubtless show that there are regularities at work within them too. The breaking off of a planned structure surely happens at places and due to factors which can be described in terms of linguistic structure and of the general mechanisms of language production. I am grateful to John Pheby for providing the translation of the original German version, and to Anne Cutler and Arnold Zwicky for help and suggestions both to the form and the content of this postscript. They may in fact be explained by one and the same underlying condition, viz. a difference in the type of boundary that separates them from their pertinent stem morpheme: /(-prefixes are separated by a morpheme boundary, S-prefixes by a word boundary, and both the pertinent syntactic and accent rules are sensitive to that difference. I cannot go into details of such an analysis, however.
18.
It might well be that in (15) as well as (17) the wrong occurrence of be- is a kind of perseveration, induced by the identical prefix of the immediately preceding word. Interesting as this possibility might be in itself — shedding additional light on the nature of the processing mechanisms involved — it is of no crucial importance in the present context, except that it reinforces the role of prefixes as independent structural units with respect to speech error phenomena.
19.
Even type /(-prefixes might be subject to the p h e n o m e n o n in question, insofar as they can be semantically identified at all. The substitution of be- for ent- in (17) might be a case in point, as these elements constitute an antonymy relation in pairs like bewässern — entwässern, beladen — entladen, etc. The relevance of this relation with respect to (17) is dubious, though, because of the perseveration effect, mentioned in Note 18. One might even find instances of inflectional blend: (i) Da hat er uns schnell das ' D u ' angebotet — eh angeboten. 'Then he quickly offered us the " D u " . Here gebotet conflates the (correct) strong inflection geboten and the erroneous weak inflection, which would be gebietet. Cases like (i) appear to be fairly peculiar, and the status of (i) as a blend rather than a selection error, is to be left open. Garrett (1980) argues that morphological accommodation accompanying sequential errors as in (32)—(35) is the consequence of an error occurring at a different level of processing than sequential errors of the type illustrated by (25) and (26), where morphological categories are attached at inappropriate places. He calls the level of processing involved in cases like (32)—(35) functional, the level involved in (25) and (26)
20.
21.
70
Linguistics and language error
625
positional, and he argues on the basis of a somewhat wider range of p h e n o m e n a that only functional, but not positional errors allow for morphological accommodation, due to the fact that only functional processing has access to lexical information. If this is correct, then cases like (26) apparently constitute unexplained counterexamples, as the contraction of in das to ins must probably be considered as a morphological accommodation. There is, however, a difference between (26) and the cases of positional processing errors that Garrett considers. (26) is a genuine exchange of the type I to be considered below, while Garrett's examples are all of type II, i.e. one inflectional morpheme is displaced. It remains to be seen whether the properties of type II errors on the morphological level account for the limits of accommodation that Garrett observes. 22.
23.
I found no indisputable evidence that semantic factors might be intervening in this respect as well. Conceivably, intriguing examples like the following constitute a case in point: (i) Ich weiß noch nicht, o b das hier bekannt ist. Ί don't know yet whether this is known here'. What the speaker of this utterance clearly intended was (ii): (ii) Ich weiß nicht, o b das hier schon bekannt ist. Ί don't know whether this is already known here'. The error in (i) can be understood as a case of disturbed sequence with the adverbial schon showing up in the wrong clause, where it has to be changed into its counterpart noch, because it now appears within the scope of the negation. T h a t this is a kind of automatic correction follows f r o m the fact that nicht schon under most conditions is excluded, with noch nicht taking over its interpretation. (Similarly nicht mehr shows up instead of nicht noch, the negation of noch.) Thus, the following sentences are nearly synonymous, both translating as Ί don't expect him to come already'. (iii) Ich erwarte noch nicht, d a ß er kommt. (iv) Ich erwarte nicht, d a ß er schon kommt. Although (i) is fairly clear as a d a t u m , its interpretation as a case of disturbed sequence in the usual sense is far less obvious. Hence I will leave the question open whether semantic factors might constrain sequential disturbances. It must be noted that I am concerned here only with cases of interaction between sequential errors and stress pattern. In other words, (P) does not hold, by definition, for errors that consist just in the displacement of accent peaks. As Cutler (1980) shows, there are at least two types of errors that result in nothing but dislocated accents. The first type concerns word stress as in syntäx instead of syntax, and Cutler has argued convincingly that these errors are best explained as lexically induced blends of two derivationally determined accent patterns pertaining to syntax and syntactic, respectively. The second type concerns phrase- and sentence-stress, which must be explained in a rather different way. In what follows, I will deal only with stress phenomena that are interrelated with other sequential errors, ignoring the otherwise rather interesting phenomenon of pure phrase accent displacement.
References Bierwisch, Μ. (1965). Eine Heirarchie syntaktisch-semantischer Merkmale. Studia Grammatica V, Berlin: Akademie-Verlag. —(1972). Schriftstruktur und Phonologie. Probleme und Ergebnisse der Psychologie 43, 21-44.
71
626
Μ.
Bierwisch
—(1975). Psycholinguistik: Interdependenz kognitiver Prozesse und linguistischer Strukturen. Zeitschrift für Psychologie 183, 1-52. —(forthcoming). H o w on-line is language processing? In G . B. Flores d'Arcais and R. Jarvella (eds), The Process oj Understanding Language. New York, London: Wiley. Chomsky, N. (1970). Remarks on nominalization. In R. Jacobs and P. Rosenbaum (eds), Readings in English Transformational Grammar, 184-221. Waltham: Ginn. —(1980). Rules and Representations. Oxford: Basil Blackwell. Clark, H. (1970). W o r d association and linguistic theory. In J. Lyons (ed.), New Horizons in Linguistics, 271-286. Harmondsworth: Penguin. Cutler, A. (1980). Errors of stress and intonation. In V. Fromkin (ed.), Errors in LinguisticPerformance, 67-80. London, New York: Academic Press. Fay, D. and Cutler, A. (1977). Malapropisms and the structure of the mental lexicon. Linguistic Inquiry 8, 505-520. Fromkin, V. (1970). The concept of naturalness in a universal phonetic theory. Glossa 4, 29-45. —(1971). The non-anomalous nature of a n o m a l o u s utterances. Language 47, 27-52. Garrett, M. (1980). The limits of accommodation: arguments for independent processing levels in sentence production. In V. F r o m k i n (ed.), Errors in Linguistic Performance. London, New York: Academic Press. Jackendoff, R. S. (1968). Quantifiers in English. Foundations of Language 4, 422-442. J a k o b s o n , R. (1960). Zwei Seiten der Sprache und zwei Typen aphatischer Störungen. Grundlagen der Sprache. Berlin: Akademie-Verlag. Katz, J. J. (1972). Semantic Theory. New York: Harper and Row. Lakoff, G. (1969). On derivational constraints. Papers from the Fifth Regional Meeting, 117-139. Chicago: Chicago Linguistic Society. —(1970). On generative semantics. In D. Steinberg and L. Jakobovits (eds), Semantics: An Interdisciplinary Reader in Philosophy. Linguistics and Psychology. Cambridge, New York: Cambridge University Press. Lenneberg, Ε. (1967). Biological Foundations of Language. New York: Wiley. Levelt, W. J. M. (1978). A survey of studies in sentence perception: 1970-1976. In W. J. M. Levelt and G . B. Flores d'Arcais (eds), Studies in the Perception of Language, 1-74. New York: Wiley. Luria, A. R. and Tsvetkova, L. S. (1968). The mechanisms o f ' d y n a m i c aphasia". Foundations of Language 4, 296-307. Luria, A. R. and Vinogradova, O. S. (1959). An objective investigation of the dynamics of semantic systems. British Journal of Psychology 50, 89-105. Miller, G . Α., Galanter, Ε. and Pribram, Κ. Η. (1960). Plans and the Structure of Behavior. New York: Henry Holt. Miller, G. A. and Isard, S. (1964). Free recall of self-embedded English sentences. Information and Control 7, 292-303. Weigl, E. (1970). Neuropsychologische Beiträge zum Problem der Semantik. In Μ. Bierwisch and Κ. Ε. Heidolph (eds), Progress in Linguistics. The Hague: M o u t o n . Weigl, E. and Bierwisch, Μ. (1970). Neuropsychology and linguistics: topics of common research. Foundations of Language 6, 1-18.
72
S p e e c h errors: old data in search of new theories*
BRIAN BUTTERWORTH
Abstract Recent theories of speech production have sought to explain speech errors in terms of the permutation or decay of intended elements. More venerable accounts — Freud, Meringer and Mayer — on the other hand, acknowledged the influence of unintended elements on the occurrence and nature of errors, and offered data whose most plausible explanation seemed to be in terms of the effects of unintended material. In this paper, I re-examine the claims made by these authors, along with modern attempts to explain away their problematic data. Recent theories are also committed to a strict sequence of processing stages, but a closer examination of both modern and older corpora reveals an improbable proportion of errors caused, apparently, by the malfunction of two or more theoretically independent stages. There seems to be no way of naturally extending strictly sequential models to accommodate these data, and the sketch of an alternative is proposed in which strict sequence is replaced by parallel processes with checking.
I.
The sources of error
In the seminal work on speech errors, Versprechen und Verlesen, Rudolf Meringer and Karl Mayer proposed three distinct sources of error: (i) interference from intended elements of the utterance (what I shall call PLAN I N T E R N A L ERRORS);
(ii) interference from an alternative formulation of the intended thought ( A L T E R N A T I V E PLAN ERRORS);
(iii) interference from an unintended thought
(COMPETING
PLAN
ERRORS).
Of course, both alternative and competing plan errors can be thought of as involving competition, but at different levels or stages of production. Informally, alternative plan errors involve competition between ways of
73
628
Β.
Butterworth
expressing or formulating an intended message; whereas, competing plan errors involve competition between separate messages, intended or unintended. Baars (1980) has recently proposed 'competing plans 1 as the 'trigger' for erroneous o u t p u t , but for him b o t h type (ii) and type (iii) errors fall under this rubric. Most other modern a u t h o r s assign nearly all errors to category (i); F r e u d , on the other hand, wanted to assign all errors to category (iii), in fact, to a special subcategory of (iii) which will be discussed below. I shall argue that modern theories are based almost entirely on errors assigned to the plan internal category, are designed therefore just to account for this type, and cannot be extended in a natural and consistent way to deal with alternative plan errors (except for two subcategories of these) and thus have to ignore these venerable but inconvenient data. T w o a u t h o r s have proposed fairly detailed models that attempt to trace the entire route f r o m thought to articulate speech using speech error data, F r o m k i n (1971: 1973) and G a r r e t t (1975; 1976; 1980a; 1980b). Both models are well-known and widely cited, and b o t h provide adequate treatments for large and varied c o r p o r a of error data. There have been admirable attempts to treat particular aspects of errors; for example, S h a t t u c k - H u f n a g e l (1980; Shattuck-Hufnagel and Klatt, 1979) has provided a very detailed treatment of segmental movement errors; Fay and Cutler (1977) offer an interesting account of a certain kind of word substitution error. However, F r o m k i n and G a r r e t t have tried to provide a comprehensive f r a m e w o r k for treating classes of errors from lexical, intonational and syntactic errors to phonetic feature movements and articulatory errors. Since, as will be seen, error categories (i)-(iii) cut across linguistic levels, my essay will concentrate on these two models. Plan internal errors. It is assumed by all writers that the generation of an utterance involves the translation or transduction of an intended thought into articulate speech via a hierarchy of levels of linguistic description — roughly, syntactic structures, intonational patterns, words (or morphemes), sequences of items representing sounds, sequences of motor c o m m a n d s , etc. A u t h o r s disagree a b o u t the n u m b e r of levels, the precise n a t u r e of descriptions at each level and the ordering of levels. Generally, it is held that at a given linguistic level there will be a (not necessarily complete) representation of the intended elements. So at a level where w o r d s (or morphemes) are represented, errors can lead to the anticipation, perseveration or transposition of these elements. (1)
a.
Die Milo von Venus Target: 'Die Venus von Milo'
74
(Meringer and Mayer, 1895).
Old data in new theories b.
c. d.
629
There you go again powdering mich (me) with deiner (your) puff. Target: 'There you go again powdering dich (yourself) with meiner (my) p u f f ' (Freud, 1924). in the phonology of theory Target: 'in the theory of phonology' (Fromkin, 1971). although murder is a form of suicide Target: 'although suicide is a form of murder' (Garrett, 1975).
At a presumably later level, where the sounds of words are represented, interfering elements need not be whole words but individual sound segments: (2)
a. b. c. d.
Eine Sorte von Tacher Target: 'Eine Torte von Sacher' (Meringer and Mayer, 1895). ...durch die Ase natmen Target: 'durch die Nase atmen' (Freud, 1924). the nipper is zarrow Target: 'the zipper is narrow' (Fromkin, 1971). the little burst of beaden Target: 'the little beast of burden' (Garrett, 1975).
Of course, not all errors yield the complete transposition of elements: there are, probably, at least as many errors of anticipation and perseveration of elements. We also find examples of the substitution of an unintended element for an intended element. Thus at the word level, whole word substitutions are widely observed. (3)
a.
b. c. d.
Ich gebe die Preparate in den Briefkasten (letter box). Target: '...in den Brütkasten' (incubator) (Meringer and Mayer, 1895: 74). ...they are certainly unusual people, they all possess Geiz1 (greed) — I meant to say Geist (cleverness) (Freud, 1924). I really like to — hate to get up in the morning (Fromkin, 1971). 2 At low speeds it's too light. Target: '...heavy' (Garrett, 1975).
Notice that in (3a) and (3b) the substituted word is similar in sound but different in meaning from the intended word; whereas in (3c) and (3d) the substituted word is similar in meaning but different in sound. All corpora report both kinds. Either kind of word substitution of course constitutes a prima facie problem for a plan internal explanation. A variety of solutions are possible and have been proposed. All require the postulation of
75
630
Β.
Butterworth
a b s t r a c t elements which d o not a n d c a n n o t by their n a t u r e show u p directly in the final utterance. F r o m k i n (1971) t h u s postulates abstract semantic features to explain (3c): The error cited in (3c) might then occur in the following way: the speaker wishes to say (at least on a conscious level — we leave the unconscious motivations to be explained by others) I really hate to get up in the morning. At the point in the generation of the utterance prior to the selection of the words, in the 'slot' representing hate, the features [ + v e r b , -desire ...] occur and an address for a word is sought from the semantic class which includes [ ± desire]. But either because of unconscious wishes or due to a random error, the address for a verb with the feature [ +desire] rather than one specified as [-desire] is selected, and the item at that address called forth with its accompanying phonological features turns out as [lajk] rather than [hejt].
A possible scenario for sound related substitutions could be something like this: the abstract element, the ADDRESS of the phonological item, u n d e r g o e s some r a n d o m m u t a t i o n such t h a t an item at a similar address is selected. This will result in a similar s o u n d i n g w o r d provided that such items are organised (addressed) o n a phonological basis, e.g. all onesyllabled w o r d s beginning with /g/ are g r o u p e d together (have similar addresses), all three-syllabled w o r d s beginning with / b / are g r o u p e d together, a n d so on. ( F a y a n d Cutler, 1977, have p r o p o s e d just such an account.) T h e claim that errors (1), (2), a n d (3) are all plan internal rests o n the a s s u m p t i o n t h a t o n e need look n o f u r t h e r t h a n the complex of intended elements t o explain the errors. W h y the error s h o u l d t a k e place at all, why elements should interfere with each other, is unspecified. F r o m k i n a n d G a r r e t t seem t o put it d o w n to some kind of r a n d o m , t e m p o r a r y , 'mechanical' fault. Of course, the types of e r r o r s will n o t be r a n d o m , they will show regularities determined by the kinds of representation a n d the kinds of processes hypothesised in the t h o u g h t to speech translation. A l t h o u g h the types w o n ' t be r a n d o m , their occurrence p r e s u m a b l y will be. Alternative plan errors. A n intended t h o u g h t might not have a u n i q u e linguistic expression, a n d thus the translation m a y lead to two, or m o r e , alternative a n d equally a p p r o p r i a t e p l a n s for linguistic expression. This shows u p in the blending of the alternatives. Meringer a n d M a y e r have some examples of this, a n d a neat w a y of d i a g r a m m i n g the interference of the u n s p o k e n o n the spoken alternative: (4)
a.
Die S t u d e n t e n haben d e m o n s t r a r t . T a r g e t : Die Studenten h a b e n d e m o n s t r i e r t or Die Studenten haben D e m o n s t r a t i o n e m gemacht
76
Old data in new theories
b.
631
Ich h a b e eine E m p f o h l u n g an Sie. Target: Ich h a b e eine E m p f e h l u n g an Sie or Sie sind mir e m p f o h l e n !£hhabe
eine
ρ
^
F r o m k i n a n d G a r r e t t offer examples of w o r d - b l e n d s , where either w o r d would seem t o be an equally a p p r o p r i a t e expression of the intended thought. (4)
c. d. e.
M y d a t a consists [m3onlij]-[mejstlij] (mainly/mostly) She's a real [swip] chick (swinging/hip. F r o m k i n , 1971). A t the end of t o d a y ' s [leksan] (lecture/lesson. G a r r e t t , 1975).
M a n y other m o d e r n a u t h o r s , e.g. H o c k e t t (1967) a n d Laver (1969), agree that a n u m b e r of suitable w o r d s m a y be activated or partly activated by higher levels of planning. But, as Meringer a n d M a y e r have suggested, if alternative w o r d s m a y be activated, then w h y n o t alternative syntax, or indeed, whole alternative clauses? Competing plan errors. A n d if alternatives representing the same t h o u g h t , why n o t alternatives representing quite different t h o u g h t s ? (5)
a.
Ru. was speaking of occurrences which, within himself, he pronounced to be Schweinereien ('disgusting', literally, 'piggish'). He tried, however, to express himself mildly, and began: "But then facts came to Vorschwein..." Mayer and I were present and Ru. confirmed his
77
632
Β.
Butterworth having thought Schweinereien. The fact of this word which he thought being betrayed in 'Vorschwein' and suddenly becoming operative is sufficiently explained by the similarity of the words. (Meringer and Mayer, 1895: 62).
b.
Here is another case. I asked R. von Schid. how his sick horse was getting on. He replied: "Ja, das draut .... dauert vielleicht noch einen Monat". I could not understand the draut, with an r, for the r in dauert could not possibly have had this result. So I drew his attention to it, whereupon he explained that his thought had been: "das ist eine traurige Geschichte ('it's a sad story')". Thus the speaker had two answers in his mind and they had been inter-mixed. (Meringer and Mayer, 1895: 97).
In these examples, n o t h i n g a p p a r e n t l y mysterious is occurring, since the speaker is well aware of the c o m p e t i n g t h o u g h t which is the source of the error. F r e u d distinguishes this class f r o m errors in which the speaker is u n a w a r e of the c o m p e t i n g t h o u g h t , a n d claims that these show the 'effect of w o r d s outside the intended sentence whose excitation would n o t otherwise have been revealed' (1924: 101-102). T h u s (3b) is, for him, n o t a substitution due to a r a n d o m mechanical fault. F r e u d suspected the speaker of having been a s h a m e d of her family a n d having reproached her f a t h e r with something not yet uncovered. She claimed to r e m e m b e r n o such r e p r o a c h , but it turned o u t , a p p a r e n t l y , t h a t it was indeed her f a t h e r ' s greed which she was a s h a m e d of a n d with which she r e p r o a c h e d him. Here are f u r t h e r examples: the first is a substitution, the second two result in blends: (6)
a.
b.
c.
In the case of the female genitals, in spite of m a n y Versuchungen (temptations) — I beg y o u r p a r d o n , Versuche (experiments). A y o u n g m a n said to his sister: " I ' v e completely fallen out with the D . ' s now. W e ' r e not o n speaking terms any longer." " Y e s indeed!" she answered, " t h e y ' r e a fine Lippschaff'. She m e a n t to say Sippschaft ('lot, crew'), b u t in the slip she compressed the two ideas: viz. t h a t her b r o t h e r h a d himself once begun a flirtation with the d a u g h t e r of this family, and that this d a u g h t e r was said to have recently become involved in a serious a n d irregular Liebschaft ('love-affair'). A y o u n g m a n addressed a lady in the street in the following words: "If you will permit me, m a d a m , I should like to begleitdigen y o u " . It was o b v i o u s w h a t his t h o u g h t s were: he w o u l d like to begleiten ( ' a c c o m p a n y ' ) her, b u t was afraid his offer w o u l d beleidigen ('insult') her. T h a t these t w o conflicting
78
Old data in new theories
633
emotional impulses found expression in one word — in the slip of the tongue, in fact — indicates that the young man's real intentions were at any rate not of the purest, and were bound to seem, even to himself, insulting to the lady. But while he attempted to conceal this from her, his unconscious played a trick on him by betraying his real intentions. But on the other hand he in this way, as it were, anticipated the lady's conventional retort: "Really! What do you take me for? How dare you insult m e " (reported by O. Rank). (Freud, 1924) Freud's distinct theoretical contribution is to emphasise that the competing plans may be unconscious, indeed, his proposal may be construed as claiming that unconscious plans are precisely the kind that are likely to interfere, perhaps because so much psychic energy is engaged in their activation and repression. 3 If we take examples (5) and (6) at their face value, and assume they really are caused by competing plans, will they create serious difficulties for modern theories? At least one author (Ellis, 1980) has tried to explain away Freud's corpus by reinterpreting the errors as plan internal, or wordblends of the most commonly reported types. Nevertheless, Ellis concludes, rather curiously, that although Freud's data are generally amenable to modern explanations, and his theory is untestable, it 'can be translated into modern speech production models without excessive difficulty ... the cognitive system ... should be capable of processing two rival messages simultaneously'. However, in her Introduction to the standard collection of readings on errors, Fromkin (1973) does not discuss Freud's corpus once, even though his is the first and longest paper in it. And we find no attempt in Garrett's papers to take the apparently simple step Ellis recommends in order to account for Freud's materials. We now turn to a more detailed consideration of Fromkin's and Garrett's models, and ask whether they can indeed be straightforwardly extended to deal with alternative and competing plans. 2.
Linguistics meets errors: Fromkin
Fromkin was not the first to see errors as providing evidence for testing linguistic theories. Meringer, himself a philologist, had deployed it; he demonstrated, for example, the reality of phonetic segments, phonetic features, the syllabic unit and showed that clusters were sequences of segments not single segments (see Cutler and Fay, 1978). His successors have used error data in a piecemeal manner to evaluate aspects of
79
634
Β.
Butterworth
linguistic — especially phonological and phonetic — theory (Hockett, 1967; Fry, 1969; etc.). But Fromkin (1971) was the first to make the much bolder step of trying to relate errors in a systematic way to an integrated linguistic theory (generative grammar, with emendations) ranging from syntax and lexical selection to phonetic features, and to sketch a performance model — 'utterance generator' — to collate the linguistic levels into a single, psychologically plausible system. Essentially, she sets out to demonstrate that the UNITS, and, to a lesser extent, the PROCESSES proposed by theory, are psychologically real. Theory claims that speech continua realise a string of discrete segments, thus one should be able to observe errors in which segments shift location in the string; and Fromkin, like others, documents a very large number of such errors (see (2a-d) above). Theory further claims that segments are complexes of features; by parity of argument, errors of feature movement should also be observed. Here a feature movement will yield a segment not in the intended elements: (7)
Cedars of Lemadon [lemadan] Target: 'Cedars of Lebanon'.
(7) c a n be i n t e r p r e t e d as the t r a n s p o s i t i o n of the STOP a n d NASAL f e a t u r e s
on the intended segments jbj and /n/, giving the unintended bilabial nasal segment /m/ and the alveolar stop segment /d/. Although the classical generative position (Chomsky and Halle, 1968) does not use the syllabic unit, Fromkin, like many other linguists, does. T o show the reality of this unit she employs the same kind of argument: syllables move as whole units: (8)
Morton and Broadpoint... Target: ' M o r t o n and Broadbent point...'.
She also uses another kind of argument. She points out that in segment movements and feature movements syllable structure CONDITIONS the loci of the movements: segments and features transpose only with their counterparts in homologous syllable positions. Thus in (2c) segments transpose from syllable initial positions: (2)
c.
the nipper is zarrow Target: 'the zipper is narrow'.
The logic of this interpretive principle has been made explicit by Garrett (see below, 3), but in Fromkin is appealed to implicitly. In her interpretation of word exchange errors ( l a - d ) , she notes that they typically involve words of the same syntactic class. Thus, she claims, syntactic categorisation of lexical items must be represented in the system.
80
Old data in new theories
635
Establishing the reality of linguistic PROCESSES, as contrasted with units, depends on a third interpretative principle. Theory claims that, for a given language, not all possible sequences of segments are allowable. In English, for example, words can't start / # t l - / . Theory further claims that some segmental elements will 'accommodate' to their segmental environments. Thus the affixal /s/ becomes [s,z,] or [az] according to the kind of segment it follows. If such an accommodation process is involved in utterance generation, then the misplacement of /s/, or of its environment, will in suitable cases result in different phonetic realisations of it. In (9) /b/ and /p/ transpose, and theory postulates [z] after [b] but [s] after [p]. The intended utterance contained [s], but the error showed the appropriate accommodation changing [s] to [z] in the presence of the transposed [b]: (9)
tap stobs [taep stobz] Target: 'tab stops' [taeb stops]
So Fromkin can argue that the processes of morphophonemic alternation which determine this accommodation operate in the system and, by invoking a further implicit interpretative principle, operate on segment strings representing selected lexical items. This is a strong argument for a hierarchy of levels: morphophonemic processes can only apply AFTER certain lexical and syntactic decisions have been made. She summarises her conclusions in the model of utterance generation given in Figure 1. Rectangular boxes stand for representations at the various linguistic levels, diamonds for PROCESSES translating one level of representation into another, and the big rectangular box 'Lexicon', stands for a complex process of lexical selection. Let us consider certain quite general features of the model, and see whether it can be naturally extended to treat alternative plan errors and competing plan errors, along the lines suggested by Ellis (1980) or in some other way. Fromkin commits herself to three properties of the model. First, 'levels' are 'stages' in the generation of the utterance, so that boxes and diamonds in diagram operate in a strict top-down sequence; a typical consequence is that lexical items can be selected only after the syntactic (-semantic) structure has been determined. Second, only one clause is processed at a time; and presumably this entails that, third, only one 'meaning' can enter the system at a time. This is a 'top-down' (or 'straight-through') system where the input to a process (a diamond) is no more and no less than the information in the representation (box) dominating it (connected to it by an input arrow). Thus, the action of, say, the 'intonation-contour generator' is conditioned solely by the 'syntactic-semantic structure' it
81
636
Β.
Butterworth
UTTERANCE
Figure 1.
Fromkin's
'Utterance generator'.
From Fromkin
82
(1971)
Old data in new theories
637
takes as input. It has no access to higher levels — the 'message' — directly, and no access to lower levels — the phonological form of the lexical items, for instance. This contrasts with heterarchical models where lower-level information may influence higher-level decisions, (Turvey, Shaw and Mace, 1978, for a discussion of model types and Arbib and Caplan, 1980, for application of heterarchical models to language processing). S T A G E 2. The 'idea' or 'meaning' is structured syntactically, with semantic features associated with parts o f the syntactic structure. For example, if a speaker wishes to convey the fact that 'a ball' rather than 'a bat' was thrown by a boy, the utterance A ball was thrown or alternately He threw a ball is structured at this stage. If he uses the second structure, part of the features specified for the final nouns must include [ + e m p h a s i s ] together with the features selected for 'ball", i.e. [-animate, - h u m a n , + c o u n t , + round, + u s e d in games etc.]. This suggests that the S T R U C T U R E itself is put into buffer storage prior to actual articulation of the utterance; this would account for the switching of noun for noun, verb for verb etc., when such transpositions occur. (1971: 49)
The 'intonation-contour generator' takes this representation and decides the kind and location of at least the main sentence accent: this augmented representation then determines the lexical items required, using the semantic features [-animate, etc.] to locate an entry in the 'lexicon', and incorporates them into the syntactic structure. And so on down to the motor commands to the muscles. On the face of it, this model cannot deal with alternative or competing plan errors at all. However several crucial properties are left un- or underspecified and judicious choice of appropriate specifications may turn out to provide the required flexibility. The most important gaps concern the 'dominance parameters' — the nature of the determination of a given level over its immediate successor, and the 'real-time parameters' — the nature of the real-time relations between one stage and the next. That is, will a given representation — box — completely determine the operation of the next process — diamond — or can the diamond generate two or more alternative translations of that representation? And, can a process begin operation before the prior process has completed generation of its representation (as has been suggested by Fry, 1969)? Fromkin maintains that at the level of lexical selection, dominance is not complete and alternative lexical items may be selected. This results in word-blends — e.g. (4c-e). If alternatives can be generated at the lexical level, why not at other levels? In which case, it would seem a straightforward matter to account for the blending of alternative clause structure plans. However, it is not straightforward. To begin with, the temporal
83
638
Β.
Butterworth
relationship between alternative plans must be specified. One way is to allow alternative clause structures, like alternative specifications of lexical items, to be generated simultaneously and stored together in a buffer. If this is the case, consequences develop which seem inconsistent with the model and inconsistent with a more detailed analysis of the data. First, if alternative clause structures can be generated and stored, the generation of complete alternatives — ultimately with full phonetic specification — will proliferate down through the system. Two alternative structures may give rise to two alternative lexical items each, and so on. Since blending alternatives can happen at each stage, a host of errors unpredicted by Fromkin would result. Consider, for example, the consequences of the simultaneous complete representations of the two sentences possible underlying example (3c) — see Note 1. (3c)' (3c)"
I really hate to get up in the morning. I really like to stay in bed in the morning.
Suppose that (3c)" is inhibited and reveals itself just through the kinds of mechanisms that Fromkin allows, e.g. word movements and segment movements and under just those structural constraints required for planinternal errors. The following errors could then arise: Word anticipation:
... really stay to get up... ... have to stay (up)... ... get up in the bed... (N.B. all honour grammatical category constraints) Segment movement: ... state to get up... ... hate to bet up... ... to get Z>up... (N.B. all honour syllable position constraints.) In each case, the error source is the unspoken clause. If such errors do occur, then the model will have to be radically modified and the loci of alternative elements specified. For example, it appears to be the case that segment movements rarely cross clause boundaries and span very few elements (Garrett, 1980a). To preserve this constraint, it must be assumed that the representation of (3c)' and (3c)" is quite different from the arrangement of two sentences intended to be spoken successively. Second, if errors are caused by representational similarity at a given level, then blends at the clause structure level should occur between similar structural elements in the two alternative clause structures. Garrett has observed that when elements exchange between ADJACENT clauses, they serve very similar grammatical roles ((12) below), and these data may be
84
Old data in new theories
639
invoked to support the principle for the alternative clause cases. However, closer examination of alternative plan errors (4a, b, c) and (e) reveals that BLENDING errors are conditioned by phonetic similarity, 4 and in (4a, b) a structural similarity constraint is violated and nouns blend with verbs: demonstriert (Vb) blends with Demonstrationen (Ν) to give demonstrarf, Empfehlung (Ν) blends with empfohlen (Vb) to give Emphfohlung. Notice that these blends follow the usual principles for word blends by honouring syllable structure — homologous part exchange, stressed syllable element exchanging with stressed syllable element, etc. So, either the blend occurs very late in the system, implying the full phonetic specification of both alternatives and hence the difficulties mentioned above, or, that higher level blending is sensitive to low-level descriptions of phonetic form, and this implies that the strictly top-down character of the model has to be abandoned. On the other hand, if alternative clauses are represented successively, then the second would have to catch up with the first in a race down the system in order to interfere with it. T o allow this would also require importing a new set of principles to preserve the regularity of observed errors. In particular, it would show why construction of Sie sind mir empfohlen catches Ich habe eine Empfehlung an Sie just at the point where both clauses have the c o m m o n phonetic form [empf-]. Competing plan errors require, of course, that the two 'meanings' are processed by the system together. This will result normally in an even greater proliferation of representations down the system, since two plans are entered even earlier in the hierarchy of stages. And scrutiny of these errors reveals just the same kind of phonetic constraint on blending as alternative plan errors. Thus in (5a) the word Vorschein ('came to light') blends with Schweinereien ('piggish') to yield Vorschwein — a nonword; and in (5b) dauert ('last', Vb.) blends with traurige ('sad', Adj.) to yield draut — a nonword. Similarly, in (6b) Sippschaft ('crew') blends with Liebshaft ('love-affair') to give Lippschaft, also a nonword. There is a clear phonetic similarity between the blended words, and the precise form of the blend follows the regularities seen in the plan internal blends and segment movements. So we see word-initial obstruent /1 / replacing word-initial /s/ in very similar syllabic contexts. Of course, we d o n ' t know whether it's an anticipation, a perseveration or a spoonerism! Vorschwein is a fairly common cluster addition error; compare the following examples f r o m Fromkin ((1973): 245, 255), the first two are blends, the third an anticipation: (10)
a.
book return shlute Target: 'slot' or 'chute'.
85
640
Β.
Butterworth
b.
shlug of whiskey Target: 'slug' or 'shot'. shmut his mouth Target: 'shut his mouth'.
c.
And in general, competing and alternative plan blends seem to obey the same rules as plan-internal segmental errors. This would be expected if two plans achieve full phonetic status, since the phonetic system(s) operate in ignorance of higher level constraints, witness the appearance of nonwords in plan internal errors indicating that the phonetic system does not check its output for lexical status. In any event, Fromkin's model cannot be readily adapted to handle alternative plan and competing plan errors: there will be an enormous proliferation of representations at lower levels which requires the postulation of new mechanisms to sort them out in an appropriate way, or else strict top-down processing will have to be abandoned, and it's not clear what that would mean for the model.
3.
Psychology meets errors: Garrett
The only other model of comparable scope was proposed by Garrett in 1975 and elaborated in a number of subsequent papers (1976, 1980a, 1980b). Garrett's model is similar in many ways to Fromkin's, but the interpretative principles used to construct it are made explicit and this turns out to force certain differences. Garrett's principle (A) is one that Fromkin uses implicitly all the time, and she uses (B) on occasion, as for instance, with errors conditioned by syllable structure. (A) When elements of a sentence interact in an error (e.g. exchange position), they must be elements of the same processing type. (B) The structural constraints for a given error type must be of a single processing type (that is, operate at a single level in Figure 2). The conjunction of principles (A) and (B) permits the differentiation of levels. Consider the following examples: (11)
a. b.
I went to get a cash checked. Target: '... check cashed', Even the best teams losts [tim lasts] Target: '... best teams lost [timz last]'.
The exchange of cash and check entails, by (A), that free morphemes are elements of the same processing type. Since grammatical category constraints are not honoured — (11a): noun and verb exchange; ( l i b ) : affix
86
Old data in new theories
641
moves f r o m a verb to a n o u n — we can infer t h a t g r a m m a t i c a l class is n o t i n f o r m a t i o n available to the processes shifting the elements a b o u t . 5 Consider the interpretation of m o r p h o n e m i c a l t e r n a t i o n . S u p p o s e / wanted to eat my beans first w a s intended, b u t u n d e r w e n t a m o r p h e m e exchange error involving want a n d eat. If the result is ...eated to want ... the g r a m m a t i c a l m o r p h e m e s will be a d d e d AFTER lexical selection a n d in ignorance of the lexical s t a t u s of eated. H o w e v e r , if the result is the irregular, lexically-conditioned ... ate to want, the process of a d d i n g g r a m m a t i c a l affixes has access to lexical i n f o r m a t i o n a n d the place of the process in the hierarchy becomes problematical. G a r r e t t (1980b) claims that errors of the eated type occur if rarely; b u t F r o m k i n (1971) argues that m o r p h o p h o n e m i c processes are entirely post-lexical in spite of reporting p e r h a p s the most celebrated example of the irregular f o r m in a m o r p h e m e m o v e m e n t : Rosa always date shranks (target: ' R o s a always dated shrinks'). Here, she maintains, the past-tense m o r p h e m e shifted f r o m date to shrink. Using (B), G a r r e t t can take a d v a n t a g e of observed regularities in e r r o r distribution, Since whole w o r d exchanges are p r e d o m i n a n t l y between items of the same g r a m m a t i c a l class, and d r a m a t i c a l l y so in cross clause exchanges, he can postulate a level of o r g a n i s a t i o n which handles b o t h ' g r a m m a t i c a l relations' and lexical selection — his ' f u n c t i o n a l level'; a n d he can differentiate this level f r o m the level where ' m o r p h e m e s t r a n d i n g errors' like (1 la,b) occur a n d which d o not typically involve elements of the same g r a m m a t i c a l category. H e calls this the 'positional level'. N o t i c e that ( l i b ) can be interpreted n o t as a segment that moves (or exchanges with a null element), but as something m o r e a b s t r a c t , like a plural m o r p h e m e . Generally, in segment m o v e m e n t s , the segment doesn't alter according to its new e n v i r o n m e n t ( t h o u g h a r g u m e n t s t o that effect have been a d v a n c e d , e.g. Hill, 1972; H o c k e t t , 1967). Segment interactions are assigned to the 'positional level' where p h o n e m i c i n f o r m a t i o n is represented, including a b s t r a c t p h o n e m e s like plural /s/. A c c o m m o d a t i o n s a n d certain other s o u n d errors are assigned to the later ' s o u n d level represent a t i o n ' . His m o d e l is summarised in Figure 2. This is b r o a d l y c o m p a r a b l e to F r o m k i n ' s model. ' M e s s a g e Level' is similar to F r o m k i n ' s ' m e a n i n g s ' . ' F u n c t i o n a l Level R e p r e s e n t a t i o n s ' c o m b i n e the o u t p u t s of her 'syntactic s t r u c t u r e ' a n d 'semantic f e a t u r e ' generators. 'Positional Level R e p r e s e n t a t i o n s ' c o m b i n e a 'syntactic structure' with a p h o n e m i c realisation of lexical" selections. ' S o u n d Level R e p r e s e n t a t i o n ' is equivalent to her 'fully specified p h o n e t i c segments in syllables', b u t she separates m o r p h o p h o n e m i c rules f r o m p h o n o l o g i c a l rules. T h u s f r o m the Positional Level she has t w o t r a n s f o r m a t i o n s t o G a r r e t t ' s one.
87
642
Β.
Butterworth
MESSAGE SOURCE
Μι,Μ2,Μ3...Μη 'Semantic' factors pick lexical formatives and grammatical relations Functional level of representation Syntactic factors pick positional frames with their attendant grammatical formatives; phonemically specified lexical formatives are inserted in frames
(Word substitutions and fusions occur here; independent word exchanges and phrase exchanges also occur here).
(Combined form exchanges and sound exchanges, word and morpheme shifts occur here).
Positional level of representation
Sound level of representation Phonetic detail of b o t h lexical and grammatical formatives specified
(Accommodations and simple and complex sound deletions occur here).
Instructions to articulators ('Tongue twisters') ARTICULATORY SYSTEMS
Utterance of a sentence Figure 2.
Garrett's model of sentence production.
From Garrett
(1975)
But these are minor differences. Essentially, both models are strictly top-down, with each level dominating the next one down. Garrett does not, however, commit himself to one clause, or indeed to one 'message' at a time, but it is not made clear how elements in different clauses interact and some get eliminated. He does note that in word exchanges errors across clauses (12a, b) show 'a striking structural parallelism' between exchanged elements. (12)
a. ... read the newspapers, watch the radio, and listen to TV. Target: '... listen to the radio, and watch TV'.
88
Old data in new theories b.
643
Every time I put one of these buttons off, another come on. Target: '... buttons on, another comes o f f .
Not only do the exchanged words belong to the same grammatical category, they also serve the same grammatical role, e.g. direct object ( N P dominated by VP). But some sound exchange errors between clauses show no such parallelism: (13)
Helf, helf, the wolp is after me. Target: 'Help, help, the wolf is after me'.
One interesting feature of the examples Garrett (1980a) cites is that there is arguably both structural and sound parallelism: (14)
a. b.
I bess I Target: I never Target:
getter go. Ί guess I better go', know you nuticed [nutist]. Ί never knew you noticed'.
Phonologically, the two initial consonant-vowel portions of guess and better, are parallel constructions of voiced stops followed by /e/, both syllables being stressed. In (14b) the exchange is between verbs, both with initial stressed syllable beginning /n/ and a back vowel, and thus comparable to within-clause sound exchanges. As with Fromkin's model, there is no way in which an error can be conditioned both syntactically and phonetically, since syntactic processes occur at the level at which phonetic information is not represented — the 'Functional Level'; conversely, as we have seen, phonetic errors are conditioned by factors like the phonetic similarity of interacting segments and their syllable position, but not by syntactic factors. Garrett (1980a) reports that only 39% of sound exchanges involve words of the same grammatical category, as compared with 85% of word exchanges. W h a t is a little strange is that syntactic information is represented at the level where sound exchanges are held to take place — the 'Positional Level', even though this information does not constrain the processes responsible for the errors at this level. Garrett, unlike most modern authors, is aware that some error types pose problems for a simple 'top-down' model. [Word] blends are something of a puzzle. The do not fit straightforwardly into the outline we have been constructing, for their antecedents are 'early' and their apparent error locus late ... one might ... argue for a routine parallelism in sentence construction (1980a: 211).
Garrett suggests that two alternative 'planning frames' — into which morphemes are slotted — could be formulated, at the Functional Level,
89
644
Β.
Butterworth
which are then 'carried down through the processing to the (by hypothesis) late stage of editorial selection in which competing formulations are weeded out' (1980a: 211). However, this is not discussed in any more detail, and it is unclear how the model is to be amended to accomplish both parallel planning and late editing. Let us consider whether 'routine parallelism' and 'late editing' can be accomplished in an extension of Garrett's model. Alternative 'Planning Frames' will be constructed at the Functional Level, using the same mechanism and stored in the same buffer — if separate buffers, there is no reason to expect any interaction at all. Similarly, alternative Planning Frames will occasionally be constructed for each of the two competing messages and stored in the same buffer. This, of course, will lead to proliferation as in Fromkin's model, but let us suppose for the moment that editing stops this getting out of hand. Interaction between frames may occur at each level: so blending between syntactic and semantic elements under comparable descriptions will occur at the Functional level, and blending between morphologically or phonologically similar items will occur at the Positional Level. What would Functional Level blends and substitutions look like? One might expect descriptions of grammatical roles in the two clauses to interact. If only grammatical frames are produced at this level, then syntactically correct but inappropriate structures will result. However, syntactic errors relevant to this haven't been systematically studied so far. In the case of words, semantic specification of lexical items — perhaps in a featural format, as suggested by Fromkin — would interact such that the new combination of features specifies an item inappropriate for either clause. Notice that there is no requirement that the erroneous items sound like intended items. The only classes of substitutions regularly reported show either semantic or phonological similarity to the intended item. One class of interactions at this level extensively studied by Garrett concerns word EXCHANGES: within- and between-clause exchanges, consequent upon the exchange of word descriptions, yield grammatical sentences, as in ( l a - d ) , since a condition on exchange is that items are of the same grammatical category. Interactions at this level occur between highly abstract elements, and hence the operation of lower level systems will typically ensure the lexical status of elements. An erroneous word description will still pick out a whole word or word-stem: the lexicon doesn't consist of stray bits of words. Thus interaction of Functional Level word-descriptions cannot be sufficient for word-blends. Blending at the Positional Level alone will not ensure the semantic and syntactic constraints observed by Fromkin and Garrett (4c-d). However, competing plan blends are not, of course conditioned by semantic
90
Old data in new theories
645
similarity, but often honour grammatical category constraints: in (5a), two nouns, Vorschein and Schweinereien, blend (in (5b), dauert (Vb.) and traurige (Adj.) constitute an exception), in (6b) two nouns, Sippschaft and Liebschaft and in (6c) two verbs, begleiten and beleidigen. Some alternative plan errors seem to depend on semantic but not syntactic equivalence: thus in (4a) demonstriert ('demonstrate'), verb, and Demonstrationen ('demonstration'), noun, interact, and in (4b) Empfehlung ('recommendation'), noun, and empfohlen ('recommended'), verb, interact. Interestingly, in Fromkin's collection of 'normal' blends (1973: 260, 261) where entries are (definitionally?) similar semantically and equivalent syntactically, a surprising number seem to be phonetically similar as well. It's not clear what the best measure of similarity is, and I offer several. All point to the same conclusion. (15)
a.
b.
c.
d.
e.
Of the two presumed words involved in the blends, half or more of the segments found in one are also found in the other, e.g. trying/striving -> strying: /t,r,ai,iq/ tummy/stomach —> stummy: /t,A,m/ blisters/splinters -> splisters: /l,i,t,z) 29 out of 65 errors. Same syllable pattern e.g. draft/breeze —• dreeze velars/dentals dentars terrible/horrible -* herrible 40 out of 65 errors (including those also satisfying criterion! a). Same initial phoneme e.g. what/which -v watch grizzly/ghastly -* grastly 21 out of 65 errors (including those also satisfying criteria a and b). Same initial phoneme PLUS 50% of segments e.g. pollution/population -> populution: /p,a,l,u,J",3,n) slick/slippery -» slickery: /s,l,i/ 19 out of 65. Same stress pattern, same initial phoneme, 50% of segments, e.g. transcribed/transposed transpired omnipotent/omniscient —> omnipicent mainly/mostly -> /meistli/ 8 out of 65.
On the other hand, only 14 out of 65 errors involve a pair of presumed words which differ on all of the three criteria (15a, b) and (c). (16)
minor/trivial -> edited /annotated ->
minal (/mainal/) editated
91
646
Β.
Butterworth
instantaneous/momentary -> momentaneous corollary/parallel corallel And one may feel that even in (16) some sound similarity may be found in some of the examples. N o reliable figures exist, to my knowledge, describing the distribution by type or token of stress patterns, initial phonemes or segmental similarity between arbitrary words but I think it would be hard to maintain the null hypothesis that the data for (15a) could have arisen by chance. If not by chance, then it's hard to see how a top-down model could account for it. 4.
Is there really anything to explain?
One way out of this difficulty is to discount or discard the problematic data. Ellis (1980), for example, goes through the whole of Freud's corpus in the 'Slips of the Tongue' chapter of the Psychopathology of Everyday Life, and tries to show that these errors can be reclassified into theoretically less problematic categories. He notes that 51 out of 85 errors drawn f r o m spontaneous speech are cases of lexical substitution. Almost all of the substituted words are related to the intended item semantically, phonologically or both. 'Thus, the lexical substitution errors which Freud adduces in support of his theory of conflicting intentions do not differ on formal or structural grounds from the errors analyzed by psycholinguists'. Unfortunately, Ellis failed to see the problem created by substitution where both semantic and phonological relatedness is involved. His treatment of word blends is (even) more sketchy. He offers an alternative plan internal explanation for one claimed blend, and permits us to infer that such explanation would be available for other examples. Atid finally, he allows that a 'disturbing word had been " s p o k e n " subvocally, so that the intended word could have blended with a lingering phonemic trace of the disturbing word'. But the status of such a trace in a speech production model and how it can interact with other plans is exactly what is problematical. The most thoroughgoing attempt to reclassify the Freudian corpus was undertaken by the Italian textual critic Sebastiano Timpanaro (1976). Psychoanalysts and textual critics have to a large extent studied the same phenomena — though their methods and purposes in doing so have been very different. The task of the textual critic is to inquire into the origin of alterations undergone by a text in the course of its successive transcriptions, so as to be able to correct those errors persuasively or to establish which of two or more variants deriving from different sources is the original, or approximates most closely to it.
92
Old data in new theories
647
Among the various types of errors of transcription, there are at least two which have nothing to do with a 'slip of the pen'. On the one hand, there are those mistakes which are inaccurately termed 'palaeographic'; these consist of misunderstandings of signs in the written text which the copyist had before him — for every kind of writing, ancient or modern, contains signs that resemble each other and are therefore liable to confusion. On the other hand, there are those alterations which have been consciously made in the transmitted text... But it has long been realized that the majority of mistakes in transcription and quotation do not belong to either of the two categories just mentioned. They are, on the contrary, 'errors due to distraction' (let us adopt, for the moment, this extremely imprecise formula), to which anyone transcribing or citing a text may be subject — whether scholar or lay man, mediaeval monk or modern typist or student... It has long been established that a copyist, whether ancient or modern, does not as a rule transcribe a text word for word, still less letter for letter (at least not unless he is transcribing a text written in a language or a script of which he is wholly ignorant), but reads a more or less lengthy section of it and then, without looking back at the original at each point, writes it down 'from memory'. He is therefore liable, if only in the brief interval between the reading, or, as the case may be, the dictation, and the actual transcription of the passage, to commit errors which are not substantially different from those examined by Freud and (though with other methods) by psychologists who were his predecessors and contemporaries... Furthermore, a textual critic often has to deal with what is called an indirect tradition — that is, with quotations, often from memory, of complete texts by other authors. Quintilian frequently commits such errors in quoting Virgil; Francesco De Sanctis in citing Dante or Petrarch, Leopardi or Berchet. Finally, he must consider oversights which are much more likely to be those of the author himself than of his copyists. Thus Cicero in a moment of distraction once wrote, instead of the name of Aristophanes, that of Eupolis — another great Athenian writer of comedies; on another occasion he confused the name of Ulysses' nurse, Euriclea, with that of his mother, Anticlea. Here we are manifestly concerned with 'slips of the pen' analogous to those studied by Freud. (1976: 19-23). Timpanaro analyses numerous parapraxes from the Psychopathology of Everyday Life — and not just 'Slips of the Tongue' and 'Slips of the Pen' — and tries to show that these errors can be accounted for by principles familiar to textual critics who have no access to, and evidently no use for, psychoanalytic information about the author, copyist or typesetter. The most powerful of such principles is BANALIZATION, which is best explained with reference to an example of Freud's that Timpanaro discusses in great detail. A young Austrian Jew, with whom Freud strikes up a conversation while travelling, bemoans the position of inferiority in which Jews are held in Austria-
93
648
Β.
Butterworth
Hungary. His generation, he says, is 'destined to grow crippled, not being able to develop its talents nor gratify its desires'. He becomes heated in discussing this problem, and tries to conclude his 'passionately felt speech' (as Freud, with a pinch of good-natured irony, calls it) with the line that Virgil puts in the mouth of Dido abandoned by Aeneas and on the point of suicide: [17]
Exoriare aliquis nostris ex ossibus ultor (Aeneid, IV 625). ('Let someone arise from my bones as an Avenger' or 'Arise from my bones, ο Avenger, whoever you may be'.)
But his memory is imperfect, and all he succeeds in saying is [18]
Exoriare ex nostris ossibus ultor. i.e. he omits aliquis and inverts the words nostris ex.
What is the explanation for this double error? The most mediocre of philologists would have no difficulty in giving one. As we have already mentioned, anyone who has anything to do with the written or oral transmission of texts (including quotations learnt by heart) knows that they are exposed to the constant danger of banalization. Forms which have a more archaic, more high-flown, more unusual stylistic expression, and which are therefore more removed from the culturallinguistic heritage of the person who is transcribing or reciting, tend to be replaced by forms in more common use. This process of banalization can affect many aspects of a word. For instance, it can affect its spelling: forms like studj, havere easily turn into studi and avere in texts transcribed today or even so in quotations written down from memory. It can affect its phonetic character: one so often reads or hears someone recite the famous line from Ariosto: Ό gran bontä de' cavalieri antiquiV with the antiqui replaced by antichi, even though the rhyme between the third and fifth lines of that octet favours the more archaic form. It can affect its morphology: 'enno dannati i peccatori carnali', wrote Dante, Inferno, V 38; but in various manuscripts of the Commedia one finds sono or eran, or some similar banalization (see Petrocchi's critical edition). It can affect its lexical character: again in Dante the archaic form aguglia was nearly always replaced by the more usual aquila in certain manuscripts — and still is today in quotations loosely made by modern authors. Finally, it can affect its syntactic or stylistic-syntactic character: in the sub-title to Ruggiero Bonghi's Lettre ertliche, Perehe la letteratura italiana non sia populäre in Italia ('Why Italian literature is not popular in Italy'), the subjunctive sia is itself not popular enough in Italy, so that when the sub-title is quoted from memory one frequently finds it replaced by the indicative mood e. (1976: 29-30) N o w , (17) cannot be translated directly because aliquis, the indefinite p r o n o u n , is difficult to render into G e r m a n with the second person singular verb exoriare. (17)
Exoriare aliquis nostris ex ossibus ultor.
(The error form:
94
Old data in new theories (18)
649
Exoriare ex nostris ossibus ultor.)
Something has to be sacrificed: either one wishes to bring out the character of a mysteriously indeterminate augury, which means rendering exoriare by the third person singular rather than the second person ('... let some Avenger arise'); or one prefers to conserve the immediacy and directly evocative power of the second person singular, which means modifying somewhat, if not suppressing outright, the aliquis ('Arise, ο Avenger, whoever you may be...'; 'Arise, unknown Avenger...'). (1976: 33-34) Distinguished German translators have in fact opted for o n e o f these simplifications, and Schiller loses both the invocation of the A v e n g e r and the character of augury: 'Ein Rächer wird uns meinem Staub erstehn'. T h u s some reasonable approximation to the meaning can be achieved by the deliberate suppression of aliquis, but other w o r d s cannot be suppressed without making a nonsense of the whole. So aliquis is the word most prone to loss. The principle o f banalization can now operate o n the residue to regularise highly irregular syntax: The young Austrian, as we saw, also made another mistake: he quoted ex nostris ossibus instead of nostris ex ossibus. This too is a banalization. It is a banalization in terms of Latin usage, since the word-order adjective-preposition-noun, although occurring frequently in Latin, was nevertheless not so common as the order preposition-adjective-noun (or preposition-noun-adjective), and was particularly rare in prose. It is also a banalization with respect to the German wordorder, in which, in a phrase corresponding to nostris ex ossibus, the attachment of the proposition in front of the whole complement it governs is precisely the rule. However, as Freud himself remarks ('he attempted to conceal the open gap in his memory by transposing the words'), this second error could have been a consequence of the first, viz. the forgetting of aliquis. Since this case concerns a young man who had been to school in Austria, it seems unlikely that he would have had a good recollection of elementary Latin prosody and metre, and would have kept up the habit of reading and reciting Latin hexameters according to the so-called ictus (rhythmic stresses) rather than the grammatical accents on individual words (had he gone to school in Italy, this would have been less probable). He would therefore have noticed, in a more conscious fashion, that the string of words exoriare nostris ex ossibus ultor could never be found in a hexameter, while this could well be the case for exoriare ex nostris ossibus ultor. (1976: 39-40) Banalization is, of course, well-known to psychologists in another garb: Bartlett's 'conventionalization' (1932: 268ff): 'When cultural material is introduced into a group f r o m the outside it suffers c h a n g e . . . (a) by
95
650
Β.
Butterworth
assimilation to existing cultural f o r m s within the receptive g r o u p ; (b) by simplification, or the d r o p p i n g out of elements peculiar to the g r o u p f r o m which the culture comes'. F r e u d has a quite different explanation of the t r a n s f o r m a t i o n of (17) into (18) involving the deeply repressed c o m p e t i n g t h o u g h t of the u n w a n t e d pregnancy of the speaker's girl friend in Naples, a n d revealed t h r o u g h successive associations f r o m the omitted w o r d aliquis. This explanation has n o account for the w o r d inversion in (18). N o w omissions, t h o u g h c o m m o n , are not discussed in m u c h detail by m o d e r n students of errors; however, w o r d - s u b s t i t u t i o n s are, and we turn n o w t o T i m p a n a r o ' s t r e a t m e n t of these. Textual criticism teaches us that one of the most frequent category of errors is a confusion between words of an equal number of syllables which are also connected by a marked phonic similarity, or even better, by assonance or rhyme. The great majority o f errors are not derived from misunderstandings o f the signs used in the text to be copied: many of the letters that c o m p o s e the respective words have a different form, and cannot be confused in any type of script. Rather, they are cases of faulty memory, and usually not so much visual in nature as auditory. (1976: 64)
Cicero called Ulysees's nurse Anticlea instead of Eurydea: 'here is the equal n u m b e r of syllables, the r h y m e . . . the affinity of role between the two characters — the one the m o t h e r , the other the nurse of the same H o m e r i c h e r o — are m o r e than sufficient to account for the " s l i p " ' (1976: 65). Heine cites Kätchen instead of Gretchen as the heroine of Faust: 'they are t w o of the most prevalent feminine diminutives, ... a n d they are b o t h n a m e s f o u n d in G o e t h e , a n d even belong to persons in his life\ Neither of these are banalizations, indeed they m a y involve deciding u p o n a lectio dijficilior. Such ' d i s i m p r o v e m e n t s ' h a p p e n when, d u e to an inability to localise the fault, the correction goes astray. T h u s in successive codices, Cicero's citation of the n a m e of a locality in Cisalpine G a u l , Litana, becomes banalized to the clearly i n a p p r o p r i a t e Latina. A later e m e n d a t o r 'realized that Latina was inadmissible, b u t did not succeed in restoring ... the difficult Litana\ a n d since he saw that the Lucani are n a m e d a little f u r t h e r on, it occurred to him to introduce the n a m e of a n o t h e r Latin people, the Hirpinf. T i m p a n a r o used these m e t h o d s to elucidate F r e u d ' s celebrated Boltrajfio error. F r e u d was trying to find the n a m e of the Italian painter Signorelli, but the only n a m e t h a t c a m e t o mind was Botticelli, a banalization t h a t he realised was incorrect. Bo- m a y then have been disimproved to give Boltrajfio — a little k n o w n painter of L e o n a r d o ' s school.
96
Old data in new theories
651
U n f o r t u n a t e l y , T i m p a n a r o does n o t deal with the interesting w o r d blend cases which we've discussed a b o v e (5, 6); b u t in the Boltraffio substitution example we notice again how it seems t h a t for an e r r o r to occur, b o t h semantic a n d phonological relatedness are involved. M o r e o v e r , T i m p a n a r o acknowledges t h a t s o m e e r r o r s m a y be genuinely ' F r e u d i a n ' , a n d p r o p o s e s t w o criteria by which an error should be admitted. ^1)
(II)
Psychological processes of a relatively 'superficial' character, which regularly give rise to 'slips' [i.e. like those discussed by Fromkin and Garrett, and Banalizations] and instances o f forgetting, are not sufficient to explain it. The 'Freudian' explanation does not rely on associations or symbolic connexions that are so forced as to make it wholly arbitrary and unverifiable. (1976: 125)
F r e u d reports a delegate in the Reichstag, L a t t m a n n , a p p e a l i n g for s u p p o r t for the E m p e r o r in the following w o r d s : It is our belief that the united thoughts and wishes o f the German people are bent on achieving a united demonstration in this matter as well, and if we can d o so in a form that takes the Emperor's feelings fully into account, then we should do so spinelessly [rückgratlos] as w e l l . . . (laughter)... Gentleman, I should have said not
rückgratlos but rückhaltlos [unreservedly]. He glosses this slip with a q u o t a t i o n f r o m a Social-Democratic p a p e r , which points o u t that the anti-Semitic L a t t m a n n involuntarily accused himself a n d the p a r l i a m e n t a r y m a j o r i t y by slipping ' i n t o an admission that he a n d his friends wished to express their o p i n i o n to the E m p e r o r spinelessly\ T i m p a n a r o accepts this case as 'genuinely F r e u d i a n ' . T h e e r r o n e o u s substitution was not a banalization, since rückgratlos is a m u c h less f r e q u e n t w o r d in the language, a n d there is n o reason to s u p p o s e it's m o r e f r e q u e n t in L a t t m a n n ' s idiolect. A n d there is n o t h i n g in the context conducive to a lectio dijficilior. So Criterion (I) is satisfied. So t o o is the second — the 'troubled conscience' which induced the hypocritical politician to give voice to the unfortunate adjective is all t o o obvious. W e need have recourse neither to the existence of improved connexions ... nor to symbolisms that adapt to all eventualities in order to expose it. (1976: 1 2 5 - 1 2 6 )
T i m p a n a r o goes o n to cite o t h e r examples satisfying his t w o criteria. T h u s , even a r i g o r o u s critic of F r e u d a c k n o w l e d g e s c o m p e t i n g plan e r r o r s a n d provides useful criteria f o r distinguishing t h e m f r o m F r e u d i a n
97
652
Β.
Butterworth
overinterpretations. In addition, his own textual examples reinforce the argument against top-down models by showing how errors are conditioned by sound and meaning. Finally, both Hill (1972) and Garrett (1980a) produce examples of competing plan errors, in which the competing thought can easily be traced to its source, and which Garrett calls 'environmental contaminant'. (19)
a.
b.
Target: 'Are you trying to send me a message, Dog?' Situation: Speaker is addressing Dog; Dog is standing by front door looking woebegone. Immediately beside speaker at eye level on a shelf, is a novel with the cover blurb: Ά novel of intrigue and menace'. Speaker has idly read this while approaching the dog and preparing to speak. Output: Are you trying to send me a menace, Dog? Target: 'People should take off their old bumper stickers'. Situation: Speaker is looking at a car bumper with two-year-old sticker reading, 'Dukakis should be governor'. Output: People should take off their old governor stickers. (Garrett, 1980a)
Of course, Garrett's examples do not support Freud's principal contention that competing plans are often, even typically, repressed into the unconscious; and Timpanaro's Criterion (II) will probably exclude, in practice, those slips Freud found particularly revealing. Nevertheless, the basic psycholinguistic datum of competing plans, from whatever source, seems well established. Attempts at a (re)solution I think it should now be fairly clear that alternative and competing plan errors exist and pose problems for recent information-processing models of speech production based on error data. The crucial point that emerges is that in a large class of competing and alternative plan word-substitution and blend errors, at least, two levels of representation seem to be simultaneously implicated. These data can be summarised in Table 1. At the error locus, both sound representation and at least one higher level representation is involved. Strict top-down models cannot allow this, and would have to resort to coincidence to explain the similarity in sound between interacting items. Freud's account of competing plan errors is a modification of Wundt's (1900) proposal, and he cites the following pasage f r o m Wundt with approval:
98
Old data in new theories
C
'2. Ε ο
G. ε Η
BO C a> O.
Β
δ:
.a c H C T3 1> M « ε — g u c/3 06
.wa ä c
Ν
Ν
- ac S
C/D 05
Ν
Ν
c
•a c 3 Ο
Ό Ν
Β ε ο
Ν
U
C
1C
ε
Ο.
,, s: ζϋ
^
^
ϋ — Λ ο Ζ
ft) ε δ δ
··= c δ δ
'w c0
£ι>
[pä] and [bä] —> [bä] transformations to the mechanics of the vocal tract. But if this is correct, then the programmer does not compute articulatory programs by taking phonetic representation (or rather its analogue in production terms) as input. We therefore reject variant A of Hypothesis 2. Variant Β: articulatory programs are part computed and part drawn from a library. According to this version of the hypothesis, the programmer
114
Syllables and segments in speech production
669
uses the text, which is analogous to phonetic representation, in order to perform a look-up of routines in the library. It is not at all easy to imagine how the programmer might extract the relevant information from the text in order to do this. As we have seen, phonetic representations, and hence the text, are not segmented. On the other hand, the most natural way of viewing articulatory routines is to see them as program segments which are strung together in order to form a complete articulatory program. 7 Consequently, the programmer must find a way of isolating those stretches of the continuous text that correspond to individual routines. Although, as I say, it is not easy to see how this could be done, I shall for the sake of argument assume that it can be done. The main difficulty with this hypothesis is that the programmer must be made to operate in an extremely uneconomical manner. Phonetic representations contain many redundancies. This is partly why phoneticians have devised so-called 'broad' or phonemic transcriptions, in which most of the redundancies have been removed. The relation between a phonemic representation and a phonetic representation (or at least those aspects of phonetic representation which correspond to the segmentals) is such that the form of the phonetic representation is entirely predictable (up to free variation) from the information contained in phonemic representation. It follows that any look-up procedure that can be performed on the basis of phonetic representation can also be performed on the basis of phonemic representation. N o t only that, it could also be performed much more efficiently on phonemic representation, since this contains all the necessary information and only the necessary information. I conclude that variant Β of Hypothesis 2 is not the optimal solution. It is inferior on grounds of efficiency to a theory in which the look-up of articulatory routines is based on (an analogue of) phonemic representation. (This alternative is discussed in 2.5). Parenthetically, it is interesting to ask what the function of phonetic representation might be if it is not the one just described. It seems to me to be as follows. The phonetic representation of an utterance IS its pronunciation in terms of linguistic competence. It is the linguistic image of the pronunciation the talker is aiming at. The task of the programmer is to generate a set of instructions which will cause the articulators to behave in such a way as to produce an utterance corresponding to the phonetic representation in question. We have just seen that the programmer performs this task without using phonetic representation as part of the generation process. A number of facts fall into place quite neatly given this view of the relation between phonetic representation and the generation of articu-
115
670
A. Cromp ton
latory programs. First, if the programmer did take phonetic representation (or rather an analogue of it) as input, and generate an articulatory program to match it, it would not be easy to explain why talkers are unable to reproduce sounds in a foreign language or even a different accent of their own language which they have no difficulty in distinguishing perceptually and which they must therefore be able to represent in phonetic representation. This is particularly true when the speaker's native language contains sounds identical or nearly identical to the foreign sounds he is unable to pronounce (for instance, the voiceless palatal fricative [ς], which causes difficulties for English speakers learning German, but which occurs in a slightly more lenis form, in English words containing /hj /, like huge = /hju:d3/ = [ju:d3] ~ [?u:d3]). Another interesting phenomenon which seems explicable in these terms has to do with the differences between children's speech and adults' speech, and the way the one develops into the other. It has been argued by Menn (1978) that the speech of young children is subject to severe output constraints which reflect the limitations in the child's ability to plan complex articulatory gestures and sequences of gestures. Within the framework proposed here, these limitations are limitations in the capabilities of the articulatory programmer, and the gradual increase in the capabilities of the programmer is an important part of the early stages of linguistic development. I would suggest that during the first few years of language acquisition, the power of the child's articulatory programmer gradually increases to a point where it is capable of computing an articulatory program to match almost any phonetic representation. While it is at this stage, the child's ability to master new sounds is at its maximum — young children are often extraordinarily good mimics. However, the ability to mimic is not greatly in demand in everyday use of language by adults. The child needs to be able to reproduce not any and every sound that occurs in human language, but only that relatively restricted range of sounds that occur in the language(s) it is learning. So instead of each articulatory program being wholly computed ad hoc for each utterance, a library of articulatory routines is gradually built up, and used to reduce the computational burden on the programmer during speech production. 8 After a period of time, the ability of the programmer to compute articulatory programs to match given phonetic representations begins to decline. There is, after all, no need for this capability once language acquisition is complete, since by that time, the stock of routines in the library will be sufficient to handle all the sounds of the languages the child has learnt. One consequence of this decline in the power of the programmer is that the speaker is no longer so able to attain mastery of new sounds such as may be required in some foreign language he is trying to
116
Syllables
and segments
in speech production
671
learn. Experience of second language acquisition suggests that this stage is reached a r o u n d the age of ten. On the other h a n d , the decline does not a p p e a r to be inevitable or irreversible. Phoneticians and (other) professional mimics generally have m u c h greater than n o r m a l ability to reproduce new and 'exotic' sounds that they hear, and this may be attributed in part to a revitalisation of the p r o g r a m m e r as a result of their training and intensive practice.
2.3 Hypothesis 3: the text is analogous to phonological lexical representation In a competence model, each lexical entry is a pairing of two types of information. One type of i n f o r m a t i o n is morphological, syntactic, semantic and pragmatic, and defines the grammatical properties of the item, its meaning, and the conditions under which it may be used appropriately. The other type of i n f o r m a t i o n is phonological, and defines the pronunciation of the item. It is generally assumed that the lexicon is accessed at two points in the derivation of an utterance. The first is at the level of deep structure, when the selection of lexical items is made. This involves inserting the m o r p h o logical, syntactic, etc. information contained in a lexical entry into aii a p p r o p r i a t e position in a phrase-marker (tree diagram), e.g. under one of the symbols Ν , V, Adj, and so on. This process is called lexical insertion. The second point at which the lexicon is accessed is at the level of surface structure, where the phonological i n f o r m a t i o n contained in a lexical entry is added to the morphological etc. i n f o r m a t i o n previously transferred during lexical insertion.. This process does not have a widely used name, but is sometimes called spell-out. Lexical insertion and spell-out are by definition part of the theory of competence. In speech production, it is safe to assume that there exists some c o u n t e r p a r t to these processes, i.e. some procedure whereby selection of lexical items is carried out. But whether this part of speech production, which I shall refer to as lexical selection, is a two-stage process in the way its counterpart in competence is, remains to be determined. F o r the m o m e n t , we are interested in the question of whether the information that is input to the articulatory p r o g r a m m e r is a n a l o g o u s to that which is contained in the phonological part of lexical entries. I shall refer to the phonological part of a lexical entry as a lexical representation. By extension (and not strictly speaking altogether accurately) I shall use the term 'lexical representation' to refer to the sort of representation of an entire utterance that would be obtained by stringing together the lexical representations of its c o m p o n e n t words.
117
672
A.
Crompton
A word of warning might be in order here. In the following discussion I shall be using several expressions that involve the term 'lexical', and as some of these refer to aspects of linguistic competence whereas others refer to aspects of performance, there is the possibility that some confusion may arise. In an effort to avoid this, I would point out that the term 'lexical representation' is used to refer to a hypothetical part of linguistic competence, and of competence only. I am not even discussing the possibility that this might also be a part of linguistic performance, although I AM discussing the possibility that it might be A N A L O G O U S to something in performance. The term 'lexical selection' is used to refer to the process by which lexical items (i.e. roughly, words) are chosen during the production of speech. The term 'lexicon' is used to refer to the dictionary-like component in a competence model but also to its psychological analog in a performance model. The context should make clear which sense is intended on any given occasion. In case confusion could arise, I use the term 'mental lexicon' to refer to the performance unit. We need to begin by making clear what kind of phonological information is contained in lexical entries. This is a question on which phonologists are far from being in unanimous agreement. (Cf. Stanley, 1967 for discussion.) However, there is a lot to be said in favour of the view that lexical representations contain the absolute minimum of information that is necessary in order to distinguish the lexical item in question f r o m every other lexical item and to unambiguously specify its pronunciation. It is important to bear in mind that lexical representations specify pronunciation only indirectly: the way an item is pronounced depends not only on its lexical representation but also on the way this is processed by the phonological rules of the language. It has become clear in recent years that the processing carried out by the phonological rules can be extremely complex, so that the relation between the lexical representation of an item and the way it is pronounced is often remote. A number of important differences between lexical representations and the actual pronunciation of items arise out of the fact that whatever is predictable in the pronunciation is in general not marked in lexical representation. Consider, for example, the word splint. Given that this begins with a cluster of three consonants, it is predictable that the first is /s/, that the second is a voiceless plosive, and that the third is j\j or /r/. Since the third is /I/, the second may not be /t/. Given that the word ends in /t/, it is predictable that the preceding consonant is not a plosive; and since it is a nasal, it must be /n/. The length of the vowel is also predictable given its height: /nt/ may be preceded by a long vowel only if it is low (as in slant, taunt). All these predictable aspects of the pronunciation of splint may be omitted f r o m its lexical representation.
118
Syllables and segments in speech production
673
Similar remarks hold with regard to stress. Ever since the publication of The Sound Pattern of English, it has been known that a large part of stressassignment in English is predictable from the morphological structure of words (even if the best way of making the predictions has remained a matter for debate). Syllabification is another aspect of pronunciation that is predictable. 9 As has been pointed out by a number of writers (e.g. Bell and Hooper, 1978), there is no known case of a language which employs syllabification distinctively. In other words, no language makes a distinction between words along the lines of the distinction between /.a.ba./ and /.ab.a. / where '.' marks a syllable boundary. 1 0 In all known cases, the location of syllable boundaries in a word is predictable from the nature of the phones that make up the word. We can use these facts in order to formulate predictions about the kinds of speech errors that can occur according to Hypothesis 3, and thus to assess its validity. Consider first of all errors that involve movement or deletion of consonants. The following examples are taken from Fromkin, 1973: Appendix. movement: deletion:
Π1 spaint in the tudio peach error tendahl retch your legs
(for (for (for (for
'I'll paint in the studio') 'speech error') 'Stendahl') 'stretch your legs')
Under Hypothesis 3, these errors must involve movements or deletion of phones (i.e. segments approximately the size of phonemes). 1 1 This means that I'll spaint in the tudio results from a misplacing of the /s/ before paint rather than before tudio. The crucial feature of this particular error for our purposes is the nature of the /t/ in studio and tudio. Since there is no opposition between /t/ and /d/ after /s/ in English, the lexical representation of studio specifies that the second phone is an alveolar plosive, but not that it is voiceless, since the voicing is predictable. Lexically, therefore, studio is represented as something like /sTu:dio:/, where 'T' is rather like an archiphoneme. 1 2 But now see what happens when the initial /s/ is moved. We are left with /Tu:dio:/. The actual pronunciation however was presumably something like [t h u:dio:]. So how do we get [th] out of /T/? And why not [d]? Our hypothesis has no explanation for this. Similar problems arise with the deletion examples, peach [p h i:tj] for speech and tendahl [t h a:ndal] for Stendahl are exactly analogous to tudio for studio. In retch for stretch, we again find that a number of feature specifications have apparently been conjured up out of nowhere. Lexically, the /r/ in stretch consists only of the feature specification
119
674
A. Crompton
[+consonantal] since /r/ is the only consonant that can appear after /st/. 13 Where all the rest of the feature specifications of [J 1 ] come from remains a mystery. The fact that stress and syllabification are predictable and therefore not marked in lexical representations 14 raises further problems for our hypothesis. It is well known that many aspects of speech errors are influenced by stress and syllabification, and these too remain unexplained. These arguments demonstrate that Hypothesis 3 is incorrect. They apply with equal force whichever position is adopted on issue (b). If the programmer draws on a library of routines, what makes it retrieve a routine that gives rise to an initial [th] in tudio when presented with an input analogous to /Tu:dio:/? If it does not draw on a library of routines, on what basis does it compute the relevant voice-onset-time specification? There are no answers to these questions.
2.4 Hypothesis 4: the text is analogous to systematic phonemic representation By 'systematic phonemic representation' (SPR) is meant the input to the (segmental) phonological rules. (Cf. Chomsky (1964), where the term originates.) Systematic phonemic representations are generally held to be a good deal more abstract than normal phonemic representations, which I refer to as 'classical phonemic' representations. On the other hand, they are a good deal less abstract than lexical representations. A major difference between lexical and systematic phonemic representations lies in the fact that the predictable feature-specifications left blank in lexical representations are now filled in. For example, the /s/ of splint is now no longer specified just as [-I-consonantal], but as [+consonantal, + obstruent, + continuant, ...]. This process is accomplished by a set of redundancy rules. A second difference is that systematic phonemic representations are syllabified. 15 Again this is taken care of by redundancy rules. The question of whether stress is specified in systematic phonemic representations is a rather difficult one. Chomsky and Halle assume that it is not, and that the rules assigning stress are part of the phonological component. However, it is quite possible to regard these rules — at least those that deal with word-stress — as another kind of redundancy rule, so that although lexical representations are, in the main, not marked for stress, systematic phonemic representation are. To make life easier for Hypothesis 4 (it needs all the help it can get), I shall adopt the second of these positions. 16
120
Syllables and segments in speech production
675
The hypothesis we are now considering is that the programmer uses representations analogous to systematic phonemic representations either to compute an articulatory programme in its entirety (variant A) or both for computation and as the basis for a look-up of articulatory routines stored in a library (variant B). On the face of it, this model looks a lot more promising than the last one we considered. The examples that spelled death for Hypothesis 3 seem to offer no difficulties for Hypothesis 4. The error that resulted in [t h u:dio:] for studio, for instance, can be simply accounted for: the removal of the initial /s/ leaves us with /tu:dio:/ which can be pronounced no other way than [t h u:dio:], given that the symbol /t/ here stands for a fully specified distinctive feature matrix, including the specification [-voice]. Similarly, the third consonant of stretch is now fully specified as /r/, and so must emerge as [j] once the initial /st/ has been removed. (It emerges as a voiceless fricative [J 1 ] when preceded by /st/.) This applies irrespective of what stand we take on issue (b). There are, moreover, data that appear to constitute direct evidence in favour of Hypothesis 4. Fromkin (1973: 21-22 and Appendix: I) discusses the implications of errors like /swin an swsig / for swing and sway and sprig time for hintler for springtime for Hitler. She notes that such errors seem to involve movements and modifications of segments that do not occur on the surface. So /swin an swsig / results from the shift of /g/ from swing to sway despite the fact that the most usual pronunciation of swing, /swig/, does not involve a /g/ at all. In addition, once the /g/ has been removed from swing we are left not with the velar nasal that actually occurs in /swig/ but with an alveolar nasal — /swin/. It seems impossible to account for such errors as involving transformations of surface representations like /swig/ and /sprig/. But it is possible to account for them as involving transformations of more abstract representations like /swing/ and /spring/. This argument seems to me to be quite correct. But its interpretation in terms of the hypotheses we are considering is not so straightforward. The data just discussed can be taken as evidence specifically in favour of Hypothesis 4 only to the extent that they are incompatible with other hypotheses under consideration. Among these, as we shall see in section 2.5, is the hypothesis that the text is analogous to classical phonemic representation. Are the sw/rtg-type data compatible with this alternative hypothesis? The answer depends, of course, on what one takes classical phonemic representations to be like. For many 'classical' phonemicists swing must be phonemicised as /swig/, whereas for others — including Sapir, whom Fromkin cites — a phonemicisation /swing / is permissible. It seems to me that the important point here is not whether swing is
121
676
A.
Crompton
phonemically /swig/ or /swing/ but rather the fact that the speech production mechanism needs, as Fromkin's arguments show, to make reference to representations analogous to /swing/. This is compatible with Hypothesis 4, but it is also compatible with Hypothesis 5 (section 2.5), under one particular, and quite legitimate, interpretation of 'classical phonemics'. Further evidence in favour of more abstract phonological representations playing a role in speech production is claimed to derive from wordsubstitution errors such as review for revise and movie for music, as discussed by Fay and Cutler (1977). According to these authors' analysis of malapropisms, an important factor influencing the appearance of a wrong word is its phonological similarity to the intended word. According to their surface-phonological representations there is some phonological similarity between review /ri'vju:/ and revise /ri'vaiz/ and between /'mu:vi:/ and /'mju:zik/. But this similarity is not as great as that which exists between the more abstract representations of these words proposed (in a somewhat disputed piece of analysis) by Chomsky and Halle (1968): /rivü/ ~ /riviz/ and /mövi/ ~ /müzik/. In particular, rather than having a correspondence between diphthongs (/ju:/) and vowels or dissimilar diphthongs (/u:/, /ai/) as is found in the surface representations we now have a correspondance between vowels (/ü/) and vowels (/ö/, /ΐ/). Thus an abstract analysis makes somewhat more sense of the data than a surface analysis. Even if we allow abstract representations like /müzik/ for music — as by no means all phonologists would — such data as those Fay and Cutler discuss are evidence in favour of Hypothesis 4 only to the extent that they are incompatible with other hypotheses. In fact, as we shall see later, errors like these can be accommodated quite easily in a hypothesis based on surface-phonemic representations granted the assumption that the /ju:/s in question are (complex) vowel nuclei rather than sequences of consonant plus vowel (i.e. new is / n + j u : / , not /nj + u:/). This assumption can in fact be justified by an analysis of phonotactic constraints in English (cf. Note 31). There is, moreover, evidence that 'falling' diphthongs such as /ai/ and / a o / act as units with respect to certain aspects of speech production (cf. Fromkin, 1973: 222-223), so it would not be surprising for 'rising' diphthongs like /ju:/ to do the same. Thus Fay and Cutler's data may not be taken as unequivocal evidence in favour of Hypothesis 4. Let us test Hypothesis 4 further by considering the phenomenon of blends. Examples of this type of error given in Fromkin, 1973 (Appendix) are momentaneous (blend of instantaneous and momentary, U l ) , insufferior (blend of insufficient and inferior, U15), Romsky (blend of Ross and Chomsky, U52). It seems likely that these errors occur during the process
122
Syllables and segments in speech production
677
of lexical selection, as suggested, for example, by Fromkin (1971: §7). One interesting feature of blends is the importance of phonological factors in the way they are formed. Wells' (1951) 'second law' of speech errors states that 'if the two original words are rhythmically similar, a blend of them will, with high probability, rhythmically resemble both of them By 'rhythmically similar' Wells means that the two words in question have the same number of syllables and are accented on corresponding syllables. He gives the blend behortment f r o m behaviour and deportment as an example. 1 7 Wells' 'third law' states that 'if the two original words contain the same sound in the same position, a blend of them will contain that sound in that position'. Examples given by Wells are shaddy from shabby and shoddy, and frowl from frown and scowl. Details aside, it is reasonable to assume that the occurrence of blends involves competition between two lexical items for insertion into a single slot in the utterance. A blend occurs when during the transfer of phonological information f r o m the lexicon the source of the transfer changes f r o m one of the candidate lexical items to the other. 1 8 Within the framework of Hypothesis 4, it is most natural to assume that the phonological information retrieved f r o m the lexicon is analogous to that contained in systematic phonemic representations. (Those aspects of systematic phonemic representations that are predictable and therefore not contained in lexical representations are filled in by redundancy rules in the way described earlier.) We can ask whether this type of representation provides the basis for an explanation of the kinds of blend that are known to occur. Consider Wells' second law concerning rhythmical similarity. At what level is rhythmical similarity defined? Is it at the level of systematic phonemics or classical phonemics or phonetics? The rhythmic pattern of a given utterance can differ quite radically from one of these levels to another, particularly between systematic phonemic representation and classical phonemic representation. Many languages have phonological rules deleting or inserting vowels, for instance, and these inevitably result in changes in the number of syllables. Unfortunately, English is not a language that shows extreme differences in, say, number of syllables between systematic phonemic and classical phonemic representation, so it is not easy to test the validity of Hypothesis 4 with reference to English data. Data on other languages is not easily available. But intuitively, it seems highly unlikely that systematic phonemic representation is the correct level for defining rhythmical similarity — classical phonemic representation is a more obvious choice — and this can be taken as evidence against Hypothesis 4. Similar arguments can be constructed on the basis of Wells' third law,
123
678
A.
Crompton
c o n c e r n i n g individual p h o n e s or sequences of phones. If the presence of identical p h o n e s or sequences of p h o n e s at a given position in the t w o source w o r d s necessarily results in the same p a t t e r n in the blend, we can ask at w h a t level this identity is defined. If it is at the level of systematic p h o n e m i c representation, then we might expect to find errors such as I asserved my position with asserve arising as a blend of reserve a n d assure. T h i s would c o m e a b o u t in the following way. reserve is underlyingly /reserv/ according t o C h o m s k y a n d Halle (1968), a n d assure can be t a k e n to be / a d - s j ü r / . W e therefore have the same s o u n d , /s/, at the beginning of the stressed syllable in b o t h cases. T a k i n g the first syllable of / a d - s j u r / a n d the second syllable of /re-serv/, we o b t a i n /ad-serv/, which is realised, via the usual phonological rules, as asserve /a'sgiv/. T o the extent t h a t errors such as this d o not occur, we have m o r e evidence against Hypothesis 4. 1 9 R a t h e r better tests of Hypothesis 4 can be based o n the predictions it generates concerning the s y n t a g m a t i c errors, anticipations, perseverations a n d reversals. W e should, according to the hypothesis, expect these to a f f e c t units in S P R . So if we can find, say, fonal phonology p r o d u c e d in e r r o r for tonal phonology ( F r o m k i n , 1973: A p p e n d i x : A5) we should also find, say hokel reception in error for hotel reception, since the c of reception c o r r e s p o n d s to a / k / in S P R . 2 0 Similarly, if we find perseverations like a phonological fool for a phonological rule, we should also find errors like deserve setter for deserve better, since deserve has an underlying /s/. Such e r r o r s are n o t attested. This a r g u m e n t applies with equal force whichever stand one takes on issue (b), i.e. irrespective of w h e t h e r the p r o g r a m m e r d r a w s on a library of a r t i c u l a t o r y routines or not. I c o n c l u d e that H y p o t h e s i s 4 is incorrect.
2.5 Hypothesis representation
5:
the
text
is
analogous
to
classical
phonemic
By 'classical p h o n e m i c r e p r e s e n t a t i o n ' ( C P R ) , I refer to the familiar p h o n e m i c representations developed on b o t h sides of the Atlantic d u r i n g the 1940s a n d 1950s. T h e ' b r o a d transcriptions' of J o n e s a n d the IPA are also p h o n e m i c in the intended sense. T h e r e has always been m u c h c o n t r o v e r s y over points of detail in p h o n e m i c t h e o r y , but the areas of c o n t r o v e r s y are small in c o m p a r i s o n with the areas of agreement. M o r e recently, there have been a t t e m p t s to discredit the whole idea of the p h o n e m e as a necessary or even desirable p a r t of the phonological description of a language. Cf., f o r example, C h o m s k y (1964). But as these a r g u m e n t s concern theories of linguistic competence, they have n o necessary bearing on theories of p e r f o r m a n c e .
124
Syllables and segments in speech production
679
I shall assume that C P R s are syllabified, and shall use the symbol ' . ' t o represent syllable boundaries. There is evidence that syllables have constituent structure comprising an onset (initial consonant cluster) and rhyme (everything else) and that rhymes in turn consist of a nucleus (including the syllabic, which is usually a vowel, and also glides) and a coda (final consonant cluster). I shall use the s y m b o l ' + ' to separate onset from rhyme, and the symbol to separate nucleus f r o m coda. For example, sprite will be transcribed /.spr + a i - t . / . Evidence for this syllable-structure is presented in Bell and Hooper, 1978. Individual phones are taken to consist of bundles of feature specifications. Variant A. On this variant of the hypothesis, we assume that the programmer computes articulatory programs in their entirety, without making use of a library of routines. The input to the programmer is analogous to C P R as just described. It is not possible to assess the adequacy of this model by means of an analysis of various kinds of speech errors. The reason is that all the errors of the sort we have been considering so far must arise before the process of articulatory planning is initiated. 21 To see why, it is sufficient to note that the phonologically-based errors never result in violations of allophonic constraints. In the error tendahl is for Stendahl is (Fromkin, 1973: Appendix G), for instance, the t of tendahl is pronounced as aspirated [th], the allophone appropriate to its new environment, and not as unaspirated [t], the allophone appropriate to its old environment. This shows that all allophonic variation is handled by processes that take place downstream of the point at which the errors arise. Since the task — or one of the tasks — of the articulatory programmer under the hypothesis being considered here is to deal with allophonic variation, it follows that the programmer is unlikely to have anything to do with the errors in question. Let us hypothesise that there is a device that lies immediately upstream of the articulatory programmer and the text and that the occurrence of the syntagmatic errors (sound substitutions, exchanges, etc.) is due to malfunctioning in this device. The output of the device must, according to Hypothesis 5, be analogous to C P R . The input must also be analogous to C P R since analysis of the errors in question shows that they involve the substitution of one P H O N E M E for another, the exchange of one P H O N E M E with another, and so on. 2 2 Where in the production process do we need a device that has phonemic representations both as input and as output? One purpose that might be served by such a device is the transfer of information f r o m the lexicon to the text (assuming that the phonological part of lexical entries is analogous to C P R , as seems reasonable enough). But it is clear that the device we are discussing cannot have this function:
125
680
A.
Crompton
the s y n t a g m a t i c sound-based errors m o r e o f t e n t h a n n o t involve m o r e t h a n one lexical item, which m e a n s that the device responsible f o r t h e m must have access to a representation that includes n o t just one, but several lexical items. H a v i n g discounted this possibility, it is extremely difficult to imagine any other plausible role for the device in question. It just does n o t seem necessary to have a device that translates o n e p h o n e m i c representation of an u t t e r a n c e into a n o t h e r . While this is, p e r h a p s , not conclusive evidence against variant A of Hypothesis — it is always possible t h a t such a device might eventually receive i n d e p e n d e n t m o t i v a t i o n — a great deal of d o u b t has been cast u p o n it. Variant B. W e now hypothesise that the p r o g r a m m e r generates prog r a m s partly b y m e a n s of ad hoc c o m p u t a t i o n a n d partly by using readym a d e routines stored in a library. T h e role of C P R is to provide the basis f o r the l o o k - u p of routines in the library. This is t a n t a m o u n t to saying t h a t C P R s define the addresses of routines in the library. At first sight, this model seems highly unlikely t o be correct. Suppose the s y n t a g m a t i c s o u n d - b a s e d e r r o r s arise as a result of m a l f u n c t i o n i n g on the p a r t of the articulatory p r o g r a m m e r . W e might then expect t h a t a spoonerism like guinea hig pair for guinea pig hair ( F r o m k i n , 1973: A p p e n d i x C) w o u l d involve the retrieval of all the correct routines, b u t a m i x - u p in the location of the routines for / h / a n d /p/. But this c a n n o t be right: the routine for / h / would on this view define a p r o n u n c i a t i o n [§], since this is the a l l o p h o n e of / h / f o u n d in hair, where the / h / originates. W h a t we get in hig, of course, is n o t [§ig] b u t [ug]. T h u s there seems n o way t o a c c o u n t for the fact t h a t speech e r r o r s d o not violate allophonic constraints. H a v i n g failed with the a s s u m p t i o n that the errors are d u e to the articulatory p r o g r a m m e r , we might n o w try t o s u p p o s e that they arise earlier in the p r o d u c t i o n process. But this is the idea we considered when discussing variant A. It t o o fails. So where d o we go f r o m here? An a s s u m p t i o n t h a t was implicit in o u r discussion of the guinea hig pair example earlier on was t h a t the articulatory routines c o r r e s p o n d to individual p h o n e m e s . There is n o reason why this must be the case, indeed it even seems r a t h e r implausible o n p h o n e t i c g r o u n d s , given h o w far-reaching the effects of allophonic variation m a y be a n d how little there m a y be in c o m m o n between different allophonic v a r i a n t s of the same p h o n e m e . So let's try a different tack, a n d a s s u m e t h a t articulatory routines c o r r e s p o n d to syllables. This largely gets r o u n d the p r o b l e m concerning allophones since a high p r o p o r t i o n of allophonic variation operates within syllables. T h e relatively small
126
Syllables and segments in speech production
681
amount that operates between syllables could reasonably be computed ad hoc by the device that incorporates the routines into the articulatory programs. One obvious question that arises if we adopt this approach is this: how do we explain exchanges of phonemes if articulatory routines correspond to syllables? There is only one possible answer to this within the framework we are now considering. Although errors such as guinea hig pair appear to involve a reversal of /h/ and /p/, they must in fact involve the substitution of the syllable /.hig./ for /.pig./ and the simultaneous substitution of /.pεαι./ for /.heai./. The same applies to anticipations and perseverations. The error it's a meal mystery for it's a real mystery, for example, (Fromkin, 1973: Appendix: A7) must be analysed as the substitution of /.mi:l./ for /.ri:l./. A perseveration such as gave the goy for gave the boy (Fromkin, 1973: Appendix: B19) must be analysed as the substitution of /.got. / for /.boi. /. The problem facing us therefore becomes that of developing some plausible account of the generation of syllablesubstitutions. To see how this might be done, we need to look more closely at the way syllable-sized routines are introduced into articulatory programs. This takes place in three stages, which I shall call 'addrressing', 'activation', and 'incorporation'. Addressing. As was mentioned earlier, the role of C P R is to define the addresses of articulatory routines in the library. We may interpret this as meaning that the programmer uses CPRs in order to draw up a set of instructions which it uses in order to locate routines in the library. The instructions include sets of conditions based on the C P R s of the syllables concerned. Take, for example, the syllable /.spaeqk./. The search instructions will specify a location in the library that meets the following conditions: 2 3 non-reduced syllable onset = sp nucleus = ae coda = ljk (Actually, each of the last three conditions corresponds to a whole set of more detailed conditions, since the symbols /s/, /p/, etc. stand for bundles of feature specifications, each of which may be taken to define one condition.) We assume that the space in which the routines are stored is multidimensional, so that each condition, i.e. each part of an address, identifies a particular sub-area of the library. 2 4
127
682
A. Cromp ton
Activation. It is assumed that the process of retrieval is not the simple one of reading the contents of the location specified at the addressing stage. Rather, it involves the activation of a location by a priming process that increases its level of excitation until a threshold level is reached, at which point the routine in question becomes available. More particularly, each of the conditions defined at the addressing stage is used to prime a particular area of storage. So in the case of spank, the sub-area containing the non-reduced routines will receive a degree of priming, as will the one containing syllables with /sp/ onsets, the one for syllables with /ae/ vowels, and so on. The location whose address is /.spaeqk./ lies at the intersection of all these sub-areas, and so should receive the largest increment in excitation intensity. It should thus attain the threshold level before any other location, thereby making available the information contained in it. Incorporation. Typically, the programming of an utterance will involve the activation of many routines. As there is no guarantee that the order in which the routines become available will correspond to their correct ordering in the articulatory program, it is necessary to assume that the programmer includes a device that sees to the ordering of routines. I call this the incorporator. Not only must the incorporator put the routines into the correct order, it must also carry out modifications on them, in order to guarantee smooth transitions from one routine to the next. We are now in a position to see how the speech errors mentioned earlier might arise. Recall that what we have to account for is the substitution of one syllable-routine for another under the influence of neighbouring syllables: the substitution of /.mi:l./ for /.ri: 1. / under the influence of the / m / in its a meal mystery, the substitution of /.goi./ for /.boi./ under the influence of the /g/ in gave the goy, and also the simultaneous substitutions of /.hig./ for /.pig./ and /.peai./ for /.hesj./ in guinea hig pair. It is plausible to ascribe these errors to malfunctioning at the addressing stage of retrieval. We said earlier that this involves the setting up of search-instructions based on a set of conditions defining the addresses of storage locations. We might hypothesise that confusion may arise during the setting up of the search-instructions, so that conditions from one instruction contaminate those of another instruction. For example, in it's a real mystery, we have conditions for /.ri:l./ that specify: non-reduced syllable onset = r nucleus = i: coda = 1 (these are much oversimplified, of course), and for the first syllable of mystery we have:
128
Syllables
and segments
in speech production
683
non-reduced syllable onset = m nucleus = ι and so on. If the onset condition for /.mi. / contaminates t h a t for /.ri:l./, we get precisely the conditions for the syllable /.mi:l./. Similar explanations can be given for perseverations like gave the goy and reversals like guinea hig pair. W i t h perseverations, it is again a case of one condition contaminating or over-writing another. In the case of reversals, two conditions exchange places. The model just described therefore seems capable of accounting for a variety of speech errors which proved problematic for other a p p r o a c h e s we have considered. In terms of the issues stated at the beginning of this article, our model has the following characteristics: Issue a.: Issue b.: Issue c.:
the input to the articulatory p r o g r a m m e r is a n a l o g o u s to classical phonemic representation the p r o g r a m m e r makes use of a stored set of articulatory routines the routines correspond to syllables
I have so far argued systematically for the solutions to issues (a) and (b). The solution to issue (c), on the other h a n d , has simply been assumed. W e have merely shown that IF routines are syllable-sized, then o u r model can account for the data. O u r next task is to show that this assumption can be justified independently.
3.
Articulatory programming units
By 'articulatory p r o g r a m m i n g units' I refer to the c h u n k s of code t h a t the p r o g r a m m e r manipulates during the generation of articulatory p r o g r a m s . T h e object of this section is to determine the n a t u r e of these units. It is possible that a n u m b e r of hierarchically related categories of unit exist, but I shall be concentrating on only o n e of these, the one that c o r r e s p o n d s ΐ ο the library-routines discussed in the previous section. 2 5 In order to place the discussion in this section in its p r o p e r perspective, I should point out that I a m assuming a distinction between articulatory p r o g r a m m i n g units and other types of p r o g r a m m i n g units that are employed in the production of speech. It is obvious that the process of speech production involves a wide variety of units, some semantic, some syntactic, some phonetic/phonological. As regards the last of these, it seems to me that there is ample evidence (reviewed in F r o m k i n , 1973: Introduction) that m a n y of the categories familiar f r o m studies of
129
684
A.
Crompton
linguistic competence — features, phones, syllables, etc. — have their analogues among the units of linguistic performance. What I am trying to do here is refine the analysis by finding out which of these units have analogues among the units of articulatory programming. The remainder, I shall conclude, have analogues only at earlier stages of the production process. Most previous investigators have not been concerned to distinguish between articulatory programming units and programming units more generally, and have been interested in constructing arguments that demonstrate the broad 'psychological reality' of features, phones, etc. 2 6 Since 1 am here attempting a more refined analysis, it would be in order to begin by making clear the kinds of argument that can be employed for this purpose. As we saw in the last section, the construction of an articulatory program involves the retrieval of routines from a library. The things I have called articulatory programming units become available as a result of the activation process described earlier. We saw that a reasonable account of certain kinds of speech error can be provided on the assumption that the locus of the errors is the addressing stage of the retrieval process. 27 Since this comes before activation it follows that analysis of the errors in question will not shed any light on the question of what the articulatory programming units are like. If the analysis of observed types of speech error is not going to help us it is necessary to find some other way of attacking the problem. The approach I would suggest is as follows. An adequate theory of speech errors must, obviously, account for the errors that are found to occur. But it must also account for the fact that certain logically possible types of error do not occur. 2 8 To say that certain types of error do not occur is to say that there are certain constraints that cannot be broken. Such constraints govern the form of particular units. How can we explain the fact that the constraints that apply to certain units are always respected? Within the sort of framework presented in the last section, where articulatory programs are built up out of elements drawn from a fixed store, a plausible explanation is that the units to which the inviolable constraints apply are the units that correspond to the items in the store. It is likely that we shall find several units of the sort just mentioned, standing in a hierarchical relation to one another. What we are interested in is the largest of these units. For the largest of the units that is subject to inviolable constraints corresponds to the smallest unit that is available for independent manipulation. 2 9 · 3 0 There is of course, danger in adopting an approach based on the nonoccurrence of particular types of error, as Cutler warns in her introduction
130
Syllables
685
and segments in speech production
to this volume. T h e absence o f a particular type o f error in the corpuses that have been collected may be due not to its non-occurrence but to the difficulty in noticing it when it does occur. T h e ability o f the speech perception mechanism to ignore even quite gross abnormalities is, of course, well attested. I am hopeful, however, that the data I discuss here are not distorted by this artefact. M o s t o f the examples would, if they occurred, represent sufficiently gross departures f r o m normal speech f o r it to be rather unlikely that phoneticians and psychologists on the look-out for such things could simply fail to detect them. M o r e o v e r , at least one published study o f speech errors, B o o m e r and Laver (1968), is based on tape-recorded
material. T h e
likelihood
of
important
types o f
error
slipping by undetected must be even lower in this case. T h e fact that Boomer and L a v e r uncover no radically new types o f error beyond those noted elsewhere can be taken as further evidence o f the general reliability o f reporting in this area. H a v i n g sorted out what kinds o f argument to use, we can n o w attack issue (c): what f o r m d o the articulatory routines in the library take? I shall consider various hypotheses, based on different units in the phonological hierarchy.
3.1
Features
Could there be an articulatory routine corresponding to each phonetic feature specification? W e test this out by seeing whether there are slips o f the tongue that violate the constraints on combinations o f
feature
specifications. T h e existence o f such errors w o u l d testify to the independence o f feature specifications as units o f articulatory programming. N o errors o f this kind are reported in the literature. T o the best o f my knowledge they d o not occur, and cannot occur. F o r example, in English we find stops and fricatives, alveolare and velars. But while both alveolar stops and alveolar fricatives are possible, the only velars that occur are velar stops. Velar fricatives are ruled out by a language-specific constraint. Is this constraint broken in speech errors? D o w e find English speakers producing [xit] in error f o r kiss, or [yaoz] in error for [gaoz]? Surely not. In order to account for the non-occurrence o f errors o f this kind, our theory must be constructed in such a w a y that the device that manipulates articulatory routines does not have access to individual feature specifications. In other words feature specifications are not units o f articulatory programming.
131
686
A. Crompton
3.2
Phones
There are in principle a number of ways articulatory routines could correspond to phones, depending on what level of representation we take as the input to the articulatory programmer. We could have routines corresponding to phonemes, to allophones (i.e. all allophones whether extrinsic or intrinsic) or to extrinsic allophones. To test the viability of a phone-based approach, we ask whether speech errors can violate the constraints governing combinations of phones. Regarding allophones, we have already noted that slips of the tongue do not violate allophonic constraints. Thus, guinea hig pair for guinea pig hair is pronounced not as [gini: §ig ρ'εοα], which violates allophonicconstraints, but as [gini: μ ς pe89j], which obeys them. Nor is the situation saved by working with extrinsic allophones, as proposed, for instance, in Fromkin, 1968 (cf. Ladefoged, 1966 for the distinction between extrinsic and intrinsic allophones): there are no known cases of errors like [ΙΛΙ] for [ΙΛΙ], involving transposition of two extrinsic allophones. As for phonemes, it is well-known that slips of the tongue do not violate phonotactic constraints (which are generally held to apply to sequences of phonemes). These constraints govern the way phonemes can be combined to form the onsets, nuclei and codas of syllables (cf. Brown, 1970; Fudge, 1969; Hooper, 1975 for arguments that phonotactic constraints should be defined on syllables). For example, in English, onsets may contain stop + approximant sequences (/pr/, /gl/, /tw/, etc.), but only if the stop is non-nasal. 31 Stop + stop sequences are ruled out, whether the stops are oral or nasal. Do slips of the tongue ever violate this constraint, and result in initial /nr/ or /ml/ or /dn/ or /pt/? Some errors like this have been reported (Shattuck-Hufnagel 1979: 299; Anne Cutler, personal communication), but they are exceedingly rare. The same goes for other constraints on syllable onsets, and for constraints on nuclei and codas. Other, more general constraints also appear to be inviolable, such as the one that blocks obstruent clusters with mixed voicing specifications (/sb/, /ztj/, etc.) within both onsets and codas. 32 ' 33 The conclusion is that phonemes are not independent units of articulatory programming.
3.3
Onsets, nuclei and codas
There is considerable evidence that syllables are not simple concatenations of phones, but possess more complicated constituent structure. In particular, it is necessary to distinguish the onset (initial consonant cluster), the nucleus (vowel, together with any glides), and the coda (final consonant
132
Syllables
and segments
in speech production
687
cluster). It also seems that there is a closer b o n d between nucleus and coda than between onset and nucleus. The nucleus + coda constituent is called the rhyme. Thus the structure of syllables is: Syllable Onset
Rhyme Nucleus
Coda
(Cf. Fudge, 1969). Constraints on the way nuclei a n d codas may be combined to f o r m rhymes mostly have to do with the length of the nucleus in English. Long vowels and diphthongs are ruled out before many final consonant clusters, including /In/, /lm/, /kt/, / p t / , / n k / , /ng/, and so o n . 3 4 Before /g/, the only diphthongs that occur are /ει/ {vague) and / a o / (rogue). M a y these constraints be violated by slips of the tongue? The examples given in the appendix to Fromkin, 1973 include n o n e that is a clear violation. 3 5 Whether slips such as /raet b a t g / for right bag, or /9id h a : q k / for think hard, or /fin f a i l m / for fine film are possible is hard to determine by intuition. Slips which would involve transposition of nucleus and coda, such as /fldi:/ for field, are unthinkable. There is consequently no reason to believe that nucleus a n d coda are independent units of articulatory programming. Constraints between onset and rhyme are almost non-existent in English. The only one I can think of is the one that blocks / j + a i / combinations (although even that is contradicted by exclamations like yipes!).36 But since transposition of onset and rhyme in speech errors seems a total impossibility (whoever heard a slip like /Aqkdr/ for drunk?) we can conclude that these units too are not the ones we seek.
3.4
Syllables
In order to find out whether articulatory routines might correspond to syllables, we must see whether the constraints t h a t govern the combination of syllables into larger units can be b r o k e n by slips of the tongue. The larger units in question are words. In English, there are at least two constraints of the sort we are interested in. First, there is one that excludes f r o m word-final position syllables that end in short vowels other than the reduced vowels /a/ a n d /i/. This rules out words like */betae/, */maedo/, */JI1A/, */hi:no/, etc. F o r m a n y speakers, namely those w h o have long /i:/ rather t h a n short /i/ at the end of words like very, the constraint is even stronger: /a/ is the only possible
133
688
A.
Crompton
short vowel in word final position. Second, there is a constraint excluding two successive reduced syllables word-initially: c o m p a r e composition (/kDmpa.../
not
*/kamp3.../')
with
compose
(/kam'paoz/
not
*/kDm'p30z/). C a n these constraints be violated by slips of the tongue?
F r o m k i n ' s (1973) list contains none that do, a n d errors such as /'smils/ for smelly or /k Dm'pes δε3 k a m p a ' z i j n z / for compare their compositions do not seem plausible. F u r t h e r constraints are found in other languages. A very c o m m o n one is the blocking of voiced obstruents word finally. It would be interesting to k n o w whether such constraints still hold under error conditions. M y feeling is very m u c h that they would. This, together with the facts of English just mentioned, suggests that syllables are n o t units of articulatory programming.
3.5
Words
Once we get to the level of words, we have really come to the end of the line. Intuitively, it is quite implausible to assume that there might be stored routines for units larger than words, simply because of the a m o u n t of storage this would require. (The n u m b e r of phrases in a language is virtually limitless.) Moreover, there seems to be n o way of testing this hypothesis within the framework of the present discussion since there seem to be n o phonological constraints on the way words are combined to f o r m phrases. 3 7 Consequently, it is a m o n g the units at the word level and below that we should look for (our analogue of) the units of articulatory planning. According to the way our argument was originally set up, we should conclude that articulatory routines correspond to the highest level linguistic unit that defines the d o m a i n of constraints that may not be violated by slips of the tongue. W e have seen that neither phones, n o r syllabic onsets, nuclei and codas, n o r syllables satsify this requirement. So, if only by process of elimination, we should conclude that the word is the unit we are looking for. There is, however, a strong c o u n t e r a r g u m e n t to this conclusion. We are assuming that syntagmatic speech errors, involving phonological anticipations, perseverations and transpositions, arise during the retrieval of articulatory routines, and, in particular, that they involve the retrieval of a w r o n g routine. 3 8 If routines correspond to words, then the result of these errors must be pronunciation of a w r o n g word. But even if the word that is p r o n o u n c e d is the wrong word, it should still be a word in the language. T h u s speech errors of this sort should never result in the production of
134
Syllables and segments in speech production
689
non-existent words. Is this prediction fulfilled? Certainly not. The production of non-existent words as a result of slips of the tongue is commonplace. We could, if we wished, try to get over this problem by assuming that the library contains routines for all the possible words of the language, so that slips of the tongue might result in non-existent but nevertheless possible words. But this is an impossible assumption, since the number of possible words in a language is limitless. We must therefore conclude that words are after all not the linguistic unit that correspond to articulatory routines.
3.6
Syllables
revisited
Having ruled out words as the basic units of articulatory programming, we are forced to look again at our next-best choice, syllables. We initially ruled out syllables on the grounds that there are some constraints that are not violated by slips of the tongue that are defined on units larger than syllables, viz. words. Is there any way we can account for the nonviolation of these constraints whilst maintaining that articulatory routines are syllable-sized? In English both of the constraints in question have to do with the distinction between reduced and non-reduced syllables. One is the exclusion of light, non-reduced syllables from word final position. The other is the exclusion of two successive non-reduced syllables word-initially. One plausible way of accounting for the fact that slips of the tongue do not violate these errors is to assume that syllable-sized routines are stored in the.library in non-reduced form. They are therefore retrieved in this form and incorporated into the articulatory program in this form. Only after incorporation are they changed by the programmer into their reduced form. Thus reduction is achieved by ad hoc computation by the programmer. 3 9 This would work quite well for the constraint blocking a succession of two reduced syllables word initially: the reduction process performed by the programmer would simply not operate in this environment. The other constraint is not so well handled, though. Consider a word like residential. The normal form of this word, including reduction, is /.'re.zi.'den.Jl/. Without reduction, we have something like /.'re.zai'den.fael./, 40 and it is this that provides the basis for the retrieval of articulatory routines. N o w we have already seen that malfunctions can arise during retrieval, and, in particular, that these can result in the apparent transposition of phones. Given an input like /.'rE.zai.'dEn.Jael./
135
690
A. Crompton
there would seem to be no reason why the nuclei of the second and third syllables should not (apparently) be transposed, giving /.'rs.ze.'dain.Jael./. After vowel reduction, this becomes /.'rezi.'dain.Jl/. But this is surely not a plausible slip of the tongue for residential (except if the /'dain/ had some totally different source). So it would seem that a computational approach to vowel reduction is not very successful. An alternative is to assume that syllables are retrieved from the library already in reduced or non-reduced form. To account for the absence of errors such as /'smite/ for smelly with mutual interference between the routines for /sme/ and /li/), or /kDm'pea öea kampa'zifnz/ for compare their compositions (with transposition of the routines corresponding to /kDm/ and /kam/), we assume that there is considerable distance in storage between reduced and non-reduced routines, maybe that they are stored in separate libraries, and that the feature [ ± reduced] is one of the first and most important that the programmer makes use of when retrieving routines from the store. If this hypothesis is correct, then we should expect speech errors to show different patterns of interference between syllables depending on whether they involve two reduced or two non-reduced syllables or whether they involve one reduced and one non-reduced syllable. There is some evidence that this is the case. The list of speech errors given in the appendix to Fromkin, 1973 contains virtually no examples of interference between reduced and non-reduced vowels (§J). Consonant-interferences seem generally to involve two syllables of the same type (both reduced, both non-reduced) or else to proceed from a non-reduced to a reduced syllable. If these apparent differences are real — we need more data in order to be able to decide conclusively — they would confirm the hypothesis that the distinction between reduced and non-reduced syllables is a very basic and important one in speech production. We see, then, that we can account for the non-violation of word-level constraints in English on the assumption that the speech production mechanism makes a fundamental distinction between reduced and nonreduced syllables. Whether similar explanations are possible with other word-level constraints in other languages is something that we shall only be able to decide when more data on other languages are available. 41 This in turn enables us to reinstate syllables as the basic units of articulatory programming. 4.
Other phonological programming units
The arguments just presented establish the syllable as the unit of articulatory programming (but cf. Note 25). Of course, speech production 136
Syllables and segments in speech production
691
involves other kinds of programming units apart from articulatory ones, and in this section I shall try to determine what others are made use of in the programming of pronunciation. (As before, I shall have nothing to say about prosodic phenomena.) There seems to me to be ample evidence in favour of features and phone-sized segments as programming units: cf. Fromkin, 1971. Although for the reasons given earlier I cannot accept the conclusion that the units Fromkin discusses are units of articulatory programming, it is surely beyond dispute that they are programming units of some sort. Accordingly, I shall take it for granted that there are programming units analogous to features and phones, and concentrate on other possible candidates. Notice that syllables are seen as playing a double role: they are programming units in the sense that features and phones are programming units, but they are units of articulatory programming as well. Syllables thus form a bridge between two parts of the system. As was mentioned earlier there is ample evidence in phonology that syllables are not simple concatenations of phones but possess a structure that comprises an initial division into onset (initial consonant cluster) and rhyme, with a subsequent division of the rhyme into nucleus (the vowel, together with any glides) and coda (final consonant cluster). In what follows, I shall consider the evidence in favour of these various kinds of syllable-constituent being programming units in speech production as well as units in linguistic competence. Prima facie, the case in favour seems quite strong. The initial consonant cluster behaves as a unit in errors like sloat thritter for throat slitter (Fromkin, 1973: Appendix: F) and the rhyme behaves as a unit in spack rices for spice racks (produced by myself). Errors where the final consonant cluster behaves as a unit seem rather rare, however. Additional evidence is found in Blumstein's (1978) study of speech errors by aphasics: error rates in consonant clusters were found to be markedly lower than those for single consonants, a fact which testifies to the internal cohesion of onsets and codas. The following two subsections show how the case for syllableconstituents is strengthened by an analysis of haplologies and blends. In both cases, it is assumed that the error is due to malfunctioning during the retrieval of phonemic representations from the lexicon. In particular, information is transferred chunk by chunk, and an analysis of the errors that occur during the transfer give insight into the size of the chunks. 4.1
Haplologies
Haplologies are a type of error that result in stretches of the intended utterance not being pronounced. Examples from Fromkin (1973: 137
692
A. Cromp ton
Appendix: A) are shrig souffle for shrimp and egg souffle, Halpert said for Herb Alpert said and Morton and Broad point out for Morton and Broadbent point out. It is easily seen that the form of haplologies is strongly constrained by syllable-structure. The jumps that result in a stretch of utterance being skipped over nearly always go f r o m a given position in one syllable to the following position in a later syllable. Position in the syllable is defined in terms of the syllable-constituents. In the examples that follow, I use'.' to mark syllable-boundaries,' + ' to separate onsets f r o m rhymes, and to separate nuclei f r o m codas. The omitted portions are enclosed between square brackets. In Halpert said, the j u m p is f r o m onset to nucleus: .h + [a-b.0] +ae-l.p + 9-t. In shrig soujfl0, it is from nucleus to coda: .J" + Ji-[mp.0 + a - n . 0 + e]-g. In Morton and Broad point out, it is from coda to onset: .b + iD:-d. [b + ε - n t j . p -I- Di-nt. Significantly, the jumps go from a constituent-boundary to an identical constituent-boundary. For jumps to take off or land within an onset, nucleus or coda is quite unusual. Within the framework being developed here, we may take this as evidence that transfer of information takes place syllable-constituent by syllable-constituent. While we are on the subject of haplologies, it is interesting to note that they provide further evidence for another suggestion made earlier, namely that the process of speech production assigns a crucial role to the distinction between reduced and non-reduced syllables. For it turns out that haplological jumps f r o m and to positions inside syllables generally go f r o m a reduced syllable to a reduced syllable or from a non-reduced to a non-reduced syllable.
4.2
Blends
Blends are a rather more complex form of error in that they involve the fusion of two lexical items that are, so to speak, in competition for the same slot in the utterance. I shall return later to the more general aspects of how blends arise within the model. For the moment, it is sufficient to note that they clearly involve the transfer of phonological information and that the source of the transfer changes f r o m one of the candidate lexical items to the other. What we are interested in here is the constraints on the
138
Syllables
and segments in speech production
693
ways in which the source of the transfer may change. As before, I shall assume that the phonological information contained in a lexical entry is -transferred chunk by chunk. We may further suppose that in order to keep track of the progress of the transfer, the system sets up a pointer that marks the end-point of one piece of information and the start-point of the next. When the source of the transfer changes from one lexical entry to another, the pointer ensures that transfer is resumed from a position in the new source that matches the position at which the last transfer terminated in the old source. In principle, the pointer could make reference to either grammatical information or to phonological information, and the chunks of information transferred could thus be grammatically delimited (e.g. by morpheme boundaries) or phonologically delimited (e.g. by syllable boundaries). The evidence is strongly against the idea that grammatical information is used. Many of the examples in Fromkin (1973: Appendix: U) demonstrate this: biled for boiled/wild (the grammatical division in boiled is between /boil/ and /d/, whereas the cross-over point falls between / b / a n d /oild/), everybun related/directed, etc. etc.
for
everybody/everyone,
relected
for
Given that the chunks of information that are transferred are defined phonologically, we can seek to establish the nature of these chunks by examining the points at which the source of transfer crosses over from one lexical item to the other. An analysis of Fromkin's data shows that the most common location for the cross-over point is the boundary between syllable-onset and syllable-nucleus: states says
st + 6its s + εζ
grizzly ghastly
gr + izli g + aestli
clarinet viola
klaerin4^t vai + a o b
-»
st + εζ
gr + aestli
-»
klaervn + a o b
Another fairly common cross-over point is the boundary between nucleus and coda: what which
WQ-t
switched changed
swv-tjt tjei-nd3d
Wi-tj
wa-tj
-»
swi-ndjd
139
694
A. Cromp ton
recognise « \ reflect
reka-gnaiz · a ri-fiekt
, _ , reka-flskt
Perhaps surprisingly, the cross-over point only rarely coincides with a syllable boundary: adjoining adjacent
3d3Di.ni!] sd38i.snt
dentals velars
dent.lz vi:l.rz
^ dEnt.rz
One thing we do not find is the cross-over point falling within an onset, nucleus or coda. This confirms what we saw in the analysis of haplologies: information is transferred syllable-constituent by syllable-constituent. Analysis of blends yields a number of other interesting points. In the examples we have looked at so far, it has been possible to determine precisely the location of the cross-over point. With many blends, perhaps the majority, this is not possible, since the blended items have part of their phonemic representations in common, and this acts as a sort of bridge between them. (No doubt it also facilitates the occurrence of the blend.) For example, Ross and Chomsky, which share o, are blended as Romsky; specific and precisely, which share /s/ (orthographic c in both cases), are blended as /spasaisli/. The shared material may be of various kinds; individual vowels or consonants, as in the examples just given, or longer sequences like v o w e l + c o n s o n a n t (killers/pills^kills), consonant cluster + vowel (striving/trying—>strying), even consonant + vowel + consonant (stomach I tummy—*stummy). However, not all types of material have the potential for forming a bridge between blended items, but only those that occur in the same position (as stipulated by Wells (1951) in his 'third law'). W h a t counts as the same position is determined in part by syllablestructure: the shared items must both be in the same syllable-constituent. Stress also plays a role: the shared items must be in the same kind of syllable (stressed or unstressed, or maybe reduced or non-reduced). Position in the word, o n the other hand, seems less relevant (beforeI first—>befir st). The conclusion of this section is that the syllable-constituents onset, rhyme, nucleus and coda are to be added to the list of programming units in speech production. 5.
An integrated model
In this section, I shall show how the model of speech production proposed earlier accounts for various kinds of speech error. Of course, there already
140
Syllables and segments in speech production
695
exist a number of other, more or less successful accounts of speech errors, and the validity of the model proposed here can only properly be assessed by comparison with these. T o begin with, however, I shall summarise the main features of the model. Speech production involves the generation and running of articulatory programs. I have called the device that generates the programs the programmer, and the input to the programmer has been referred to as the text. The text is analogous to a (classical) phonemic representation, and is hierarchically structured in terms of tone-groups (phonological phrases), 4 2 syllables, syllable-constituents (onset, nucleus, coda), phones, and feature specifications (this formulation will be modified slightly later on). The translation f r o m text to articulatory program involves both ad hoc computation and the utilisation of ready-made routines stored in a library. One role of the text is to define the addresses of the articulatory routines to be retrieved from the library. The routines themselves are syllable-sized, and their addresses are expressed in the form of phonemic representations. The retrieval of routines from the library is a three-stage process. The first stage, addressing, involves the construction of a set of search instructions to be used in accessing the library. The second stage, activation, involves the priming of a particular location in the library, in accordance with the search instructions, so that the routine stored there becomes available. At the third stage, routines that have been activated are incorporated into the articulatory program. Incorporation involves a number of processes, including the serial ordering of routines, contextual modification of routines, prosodic patterning, and maybe other things as well. Finally, the instructions contained in the program are transmitted to the articulators. Speech errors of varying kinds might be expected to occur as a result of malfunction at the three stages of the retrieval process. Errors which involve interference between neighbouring elements of an utterance are most naturally seen as the result of malfunction at the addressing stage. Under this heading fall exchanges (moggy barsh for boggy marsh), anticipatory and perseveratory substitutions (a leading list for a reading list, Michael Malliday for Michael Halliday), certain kinds of additions, omissions and shifts (thrink through for think through, poed nude for posed nude, pitch hint for pinch hit, the sweeter hitch for the heater switch, from prounds to fanks for from pounds to franks) (all these examples f r o m Fromkin, 1973: Appendix) — I shall discuss in more detail later how these might arise. The kinds of errors that might come about as a result of malfunction at the activation stage obviously depend on the precise mode of operation of this part of the process, and that is not at all clear to me at present. As I
141
696
A. Cromp ton
tentatively suggested earlier, one may envisage an operation whereby various sub-areas of the store are primed in accordance with the search instructions, in such a way that the sought-after routine lies at the intersection of the sub-areas; it should thus be primed to a higher degree than all other sub-areas and therefore become available first. In this case, one can imagine that the wrong routine might receive sufficient amount of priming to become activated either as a result of mere noise in the system or because of interference from the priming associated with some other set of search instructions that is being executed at around the same time. In either case, the result will be the utterance of a wrong syllable that is phonologically very similar to the intended one (because the routine corresponding to it is stored in close proximity to the intended one). Whether there is any way of systematically telling the difference between errors of this sort and errors that arise from malfunction at the addressing stage is not clear to me at present. Finally, there is the possibility of errors occurring owing to malfunction at the incorporation stage. By this time, of course, the programmer is dealing with chunks of code that correspond to entire syllables, so the types of error one can envisage involve the transposition, omission, addition or replication of entire syllables. As it happens, errors involving entire syllables as such (one has to distinguish these from cases where a syllable happens to be identical with a morpheme or a word) seem rather rare, judging by their virtual non-appearance in the examples presented in Fromkin (1973: Appendix). If that is true — further, statistically based analysis is needed to establish whether it is — this situation is compatible with the special role assigned to syllables in the model I have proposed, as long as the process of incorporation is taken to be almost immune to error. On the other hand, it can hardly be seen as positive evidence in favour of the model, as the state of affairs in question is also compatible with the hypothesis that syllables play a very limited role in speech production (a view taken, for instance, by Shattuck-Hufnagel (1979: 330)).
5.1
Garrett's model of sentence production
I shall now move on to consider to what extent the model proposed here is compatible with the findings of some other investigators who have used speech error data as evidence for models of speech production and, in cases of disagreement, try to determine the relative merits and demerits of the competing approaches. In this connection, it is worthwhile recalling the distinction made at the beginning of section 3 between units of
142
Syllables and segments in speech production
697
articulatory programming (in the narrow sense of chunks of code of which articulatory programs are constructed) and the various other objects that are manipulated during the production of speech. Other investigators have in general not been concerned to make this distinction, and have used terms such as 'element', 'segment' and 'unit' to refer to a wide range of items that includes both articulatory programming units and other things as well. One thing I shall try to do, therefore, is see whether the distinction I have been employing in the discussion so far can be accommodated within these other frameworks. One such framework that is worthy of attention is the model of sentence production developed by Garrett in a number of publications, of which I shall concentrate on the most recent (1980). Central to Garrett's model is a distinction between two processing levels, one of which, the 'functional' level, defines the grammatical functions of words and the phraserelationships holding among them, while the other, the 'positional' level, specifies the serial order of words and aspects of their pronunciation. The generation of representations at the functional level depends on semantic and pragmatic factors, and is of little relevance for the present discussion. The generation of representations at the positional level is effected with reference to functional-level representations and also to what Garrett calls planning frames. These are taken to provide details of the syntactic structure of utterances beyond the crude groupings of lexical items indicated at the functional level. Syntactic markers such as inflexional endings (e.g. for tense and number) and also non-lexical grammatical entities such as determiners, prepositions (also, apparently, some adverbs) are assumed to be features of the planning frame. In addition, the planning frame includes a specification of (sentence) stress patterns and maybe other prosodic information (cf. Garrett, 1980: 198). The sequential structures defined by the planning frame are filled out with particular lexical material according to the information represented at the functional level. It is assumed that part of this last process is the retrieval of phonological information f r o m the lexicon. Although Garrett is not entirely explicit about the nature of the phonological information transferred from the lexicon and incorporated into the planning frame in order to produce a positional-level representation (cf. p. 186), it would appear that this is roughly analogous to a classical phonemic representation. If we try to accommodate the model proposed earlier with Garrett's, the most obvious approach is to equate Garrett's positional-level representation with the text, i.e. the input to the programmer, since both are close to classical phonemic representations. Unfortunately, this approach does not succeed. In Garrett's model, errors such as sound exchanges, anticipations and perseverations are assumed to arise prior to the positional-
143
698
A.
Crompton
level (cf. p. 212), so that their consequences are already contained in representations at that level. In our model, on the other hand, such errors are taken to arise during operations to which the text is input. In order to bring into alignment the locus of segment errors in the two models, we need to adopt the alternative approach of equating Garrett's positionallevel representations with articulatory programs. At first sight, this looks quite hopeless, if only because it seems to do such violence to Garrett's own characterisation of the positional-level as being of a relatively high degree of abstraction in phonological terms. However, let us look more closely at the reasoning behind Garrett's formulation. He argues that different kinds of error arise at different levels of processing, and that if a particular kind of error involves interaction between elements of a given type, then the level at which that kind of error arises must be one at which processing is carried out in terms of elements of that same type (cf. Garrett, 1980: 183, which refers to Garrett, 1976: 237). This sounds rather complicated but is in fact quite straightforward: it is just a way of saying, for example, that errors involving phonological segments cannot occur during processing at the grammatical level (or vice versa). I see no reason to doubt the correctness of the principle Garrett enunciates here. What I think can be questioned, however, is the way Garrett applies the principle in his characterisation of the positional-level. His reasoning is that since (1) the phone-based errors arise at the positional-level and since (2) these errors involve segments of a fairly abstract nature (i.e. segments roughly like phonemes), 4 3 it follows that representations at the positional-level must be of a similarly abstract nature, i.e. also like phonemic representations (cf. Garrett, 1980: 186). But now consider the process of errorgeneration suggested for phone-based errors at the end of section 2.5 and summarized at the beginning of this section. It was proposed that anticipations, exchanges, etc. could arise as a result of malfunctioning at the addressing stage during the retrieval of articulatory routines. In particular, confusion might arise during the setting-up of the search instructions in such a way that two sets of instructions could crosscontaminate each other (recall the discussion of it's a meal mystery for it's a real mystery at the end of section 2.5). But now recall that both the input to this process (the text) and the output from it (the search instructions) are expressed in what Garrett would call the same 'processing vocabulary' (viz. that which is used to represent phonological structures in terms of syllables, phones, features, etc.) Consequently, the type of processing involved here conforms to Garrett's principle described above, so there is no reason, within the framework of constraints Garrett is operating with, why one should not equate Garrett's positional-level representations with articulatory programs.
144
Syllables and segments in speech production
699
Before we can conclusively equate the articulatory programs of our model with the positional-level representations of Garrett's, there is one further obstacle to be removed. As Garrett's model is formulated (Garrett, 1980: 212), all kinds of speech errors arise prior to the end of positional-level processing, so that the only phonetic/phonological modifications that remain to be carried out are the 'automatic processes for filling in phonetic detail', i.e. roughly the specification of allophonic variants. A m o n g these automatic processes, however, Garrett includes the 'accommodations' to their new environments of inflexional affixes (such as plural -s, past-tense -ed, etc.) that have been involved in errors like Ralph and my's for Ralph's and my or I roasted a cook for I cooked a roast (Fromkin, 1973: Appendix: S). It should be clear that there is no way, in our model, for these accommodations to be carried out after the generation of articulatory programs has been completed. In order to remove this difficulty, we begin by observing that Garrett is wrong to include these accommodations among the 'automatic processes for filling in phonetic detail'. There is no automatic process of this sort that can be invoked in order to account for the pronunciation of the s in Ralph and my's as [z] rather than [s], since [mais] is perfectly acceptable. Therefore it is necessary to devise an alternative account of accommodations for reasons that are quite independent of the questions we are concerned with here. What such an alternative account looks like within the framework I have proposed is something I shall return to later. I have now reached the position of having accommodated the model proposed earlier to that proposed by Garrett (or rather of having somewhat modified Garrett's model to make it fit in with my own). A consequence of this is that whatever success can be claimed for Garrett's model in accounting for speech errors can now also be claimed for mine. One large and important class of errors for which I do not find Garrett's account very satisfactory is that which includes exchanges, anticipations, perseverations, etc. of phones. Garrett hypothesises that these arise in the following way. As we have seen, representations at the positional-level are constructed using information both from planning frames and f r o m functional-level representations. However, since neither of these contains details of the phonological forms of lexical items, it is necessary to obtain these details f r o m the lexicon. Thus the lexicon is accessed on the basis of the functional-level information, and the phonological material retrieved in this way is located in a particular phrasal position, as defined by the planning frame. It is at this last stage, the assignment of phonological material to particular phrasal positions, that the errors in question occur. One point worth noting here is the fact that Garrett assumes that the errors arise at what is, in terms of our model, the incorporation stage. I, on
145
700
A.
Crompton
the other hand, have been taking them to occur at the addressing stage. Since Garrett is rather inexplicit on the details of the mechanism whose malfunction gives rise to the errors, it is not possible to say whether this discrepancy is symptomatic of more far-reaching divergences between the two approaches. As far as I can tell, this is not the case, and there would be no adverse consequences, within the framework Garrett presents, if we were to move the locus of these errors to the point at which the lexicon (in Garrett's terms — the library of stored routines, in my terms) is accessed. There are, moreover, positive reasons why such a move should be undertaken. These stem from the fact that there is, as far as I can tell, no way of accounting for certain important properties of exchanges, perseverations, etc. within the framework Garrett proposes. Consider the fact that these errors, in the overwhelming majority of cases, result in forms that conform to the phonotactic constraints of the language. Thus an anticipatory addition like nlon-linguistic for non-linguistic, although attested (Shattuck-Hufnagel, 1979: 299), is a very rare occurrence. One way of accounting for this phenomenon — not the way I suggested earlier, but a way which seems appropriate to the kind of framework Garrett is working with — is to assume that phonological material is retrieved segment by segment from the lexicon and inserted into slots defined in terms of syllable-structure. We may make the further assumption that the syllablestructure referred to here in effect defines the class of possible syllables for the language, and that an attempt to insert a segment into an inappropriate slot (inappropriate in the sense that the insertion would violate the constraints on possible syllables) would be blocked or filtered out. (Cf. Shattuck-Hufnagel, 1979: 306, 321 for this proposal.) While this accounts successfully for relevant aspects of the errors in question, it is not at all easy to see how it could be accommodated within Garrett's model. The positions into which the retrieved phonological information is placed are defined by the planning frame in Garrett's model and this is said to contain no (segmental) phonological information, hence no specification of the form of possible syllables. In consequence, the discrepancy between Garrett's model and mine with regard to the way sound-exchanges etc. arise is no real hindrance, since Garrett's model requires modification at this point anyway.
5.2
Shattuck-Hufnagel's
scan-copier
model
As has just been mentioned, a suitable mechanism for the generation of a correctly sequenced representation of the pronunciation of an utterance is one in which phonological material is retrieved segment by segment from
146
Syllables
and segments
in speech production
701
a store a n d inserted into slots defined in terms of syllable-structure (also, n o d o u b t , in terms of stress). A m o d e l of this sort has been presented in some detail by S h a t t u c k - H u f n a g e l (1979), a n d has been s h o w n to be extremely p o w e r f u l in a c c o u n t i n g f o r speech errors (as well as for errorfree speech). It is, therefore, i n c u m b e n t u p o n me to consider the relation between this m o d e l a n d my own. A s u m m a r y of the m a i n features of S h a t t u c k - H u f n a g e l ' s model is given in S h a t t u c k - H u f n a g e l a n d Klatt (1979: 50). Briefly, the planning process involves: (1) a set of planning segments or phonemes, from candidate lexical items selected from the lexicon for the utterance, (2) a sequence o f structurally defined "slots" for the utterance, which are computed separately from the segments, and which dictate the eventual temporal order, and (3) a mechanism for integrating the t w o parts of the representation: this mechanism includes a scan-copier to insert segments into the slots, a bookkeeper to check off or delete segments, and an output error monitor. Malfunctions in one or more o f these components can explain how each of the observed types of sound segment errors in the M I T corpus might c o m e about. For example, if the scan-copier selects the wrong segment by accessing a segment in the comparable position in a different word (one that is to be spoken later in the utterance), an anticipatory error or an exchange will result ...
I shall not b o t h e r t o review in detail the way in which S h a t t u c k - H u f n a g e l ' s model a c c o u n t s for various kinds of segment-based errors. Suffice t o say t h a t the level of success it achieves is such as to lend S h a t t u c k - H u f n a g e l ' s p r o p o s a l s a very high degree of plausibility. Before considering ways in which t h e p r o p o s a l s m a d e earlier in this article might be fitted in with those of S h a t t u c k - H u f n a g e l , I shall point o u t a n u m b e r of s h o r t c o m i n g s in the latter, some of which are recognised by their a u t h o r herself. First it has already been n o t e d t h a t segment-based e r r o r s display a very strong tendency to respect p h o n o t a c t i c constraints. S h a t t u c k - H u f n a g e l ' s way of a c c o u n t i n g f o r this involves the a s s u m p t i o n t h a t the slots into which segments are m o v e d 4 4 are ' m a r k e d in s o m e w a y t h a t shows the segments t h a t could legally a p p e a r there in a n English w o r d ' (1979: 306). In other w o r d s , the slots s o m e h o w include a specification of a large n u m b e r of p h o n o t a c t i c constraints. S h a t t u c k - H u f n a g e l herself n o t e s that this is a 'representationally expensive' way of doing things a n d considers it to be 'intuitively unsatisfying' (1979: 321). O n e m i g h t a d d the f u r t h e r criticism based o n the fact t h a t it is in general r a t h e r undesirable t o permit language-specific structures to exist at this relatively low-level p a r t of the
147
702
A.
Crompton
production mechanism, since this gives rise to the question of whether languages could differ from one another in this respect. Is it, for example, a peculiarity of English that constraints embodied in the way the slots are represented are the same as the constraints that govern the form of items in the lexicon? Could a language have different sets of constraints at these two levels? Could a language have slot-representations that failed to embody any constraints at all, so that slips of the tongue would be vastly less restricted than they are in English? Intuitively, one would wish to answer ' n o ' to all these questions, but one would prefer not to have to ask them in the first place. A second problem concerns the way in which the right number and right kind of slots are generated for each word. Shattuck-Hufnagel states that 'segments must be copied one-by-one into waiting ordered slots that have been computed independently' (1979: 314), which means that the production mechanism must somehow know how many syllables to make available for each word, and how these syllables are stressed. There is no indication of how this might be accomplished. Third — and this is really rather odd — although the scan-copier and the process by which it inserts segments into slots are extremely wellmotivated in terms of the explanations they permit of the way speech errors occur, the function of this mechanism and its operation within the production of speech in general remain quite mysterious. ShattuckHufnagel herself remarks, with reference to her model, that 'perhaps its most puzzling aspect is the question of why a mechanism is proposed for the one-at-a-time serial ordering of phonemes when their order is already specified in the lexicon' (1979: 338). Indeed, it is hard to see, within the framework Shattuck-Hufnagel adopts, why the transfer of phonological representations from the lexicon should not take place all at one go, especially since this would obviously remove the possibility of a large class of (actually occurring) phone-based errors. I would now like to suggest that despite the disparities that seem to exist between them, my model and Shattuck-Hufnagel's model are in fact quite compatible. Moreover, once Shattuck-Hufnagel's scan-copier mechanism is allocated its proper place within the framework I have proposed earlier, the difficulties just mentioned are easily circumvented. In principle, there are two possible locations for the scan-copier and the operations it performs, within the model proposed here. The first possibility is to place them within the process by which texts are generated, since this would seem to require just the sort of transfer of information that the scan-copier is designed to perform. But a moment's thought shows that this cannot be right. T o begin with, this solution is still open to the first and second criticisms levelled at Shattuck-Hufnagel's original
148
Syllables and segments in speech production
703
formulation. In addition, I have so far been assuming that segment-based speech errors arise as a result of malfunctioning on the part of the articulatory programmer, which means that they occur A F T E R the generation of the text. This is obviously not consistent with locating the operation of the scan-copier W I T H I N the generation of the text. The second possibility for locating the scan-copier, and the one which I think is correct, is to place it within the addressing stage of the retrieval of articulatory routines. Indeed, one may even say that the scan-copier (together with its associated devices the bookkeeper and the errromonitor) IS the mechanism by which the addresses are generated. (Recall that the addresses take the form of phonemic representations.) The advantages of this formulation are clear. First, the model proposed earlier has greater credibility as an account of speech errors since it now incorporates the powerful mechanisms proposed by Shattuck-Hufnagel. Second, the last of the criticisms levelled at Shattuck-Hufnagel's original model falls away immediately. Third, the first of these criticisms can be obviated, since there is no longer any need to assume that the slots into which segments are inserted somehow have built into them a statement of the phonotactic constraints of the language. If the addressing device malfunctions, and generates a representation that violates the phonotactic constraints, the search-procedure will end up looking for a non-existent routine in the library (the library contains routines corresponding to all the possible syllables and only the possible syllables, remember). An errorcondition of this sort is much less likely to remain undetected and hence uncorrected. 4 5 Finally, there is a straightforward way of overcoming the second criticism made earlier. As regards the number of slots that need to be employed, in particular the number of syllables, 46 the production mechanism must generate one set of instructions for each syllable, since it is syllable-sized routines that are stored in the library. At the other end of the retrieval process, the incorporator must construct an ordered sequence of syllable-sized slots in the articulatory program, into which the retrieved routines are inserted. The information the incorporator needs in order to do this can be obtained from the text. Similar remarks hold with respect to the kinds of slots that must be made available, in particular as regards the stressed or unstressed nature of the syllables in the text, the syllable-sized routines retrieved from the library, and the syllable-sized slots awaiting them in the articulatory program. 4 7 Notice in particular, that there is no need here to assume any independent computation of slots: all the necessary information is available in the text.
149
704
A. Cromp ton
5.3
Accounting for speech errors
Let me summarise the conclusions of this discussion by outlining the model as it now stands. We took over f r o m Garrett the idea that sentence production involves the generation of (1) a representation which specifies the grammatical relations between lexical items and also, in crude terms, the groupings of such items into phrases (Garrett's functional-level representation), and (2) a structure which defines the (surface) syntactic structure of the utterance, including the serial order of elements, and which also contains grammatical markers corresponding to inflectional endings and non-lexical items such as determiners, auxiliaries, etc. (Garrett's planning frames). The information contained in these two structures is combined, with the further inclusion of information about pronunciation, to produce the final, complete, temporally ordered and phonetically fully specified representation of the utterance which I have called the articulatory program and equated with (a modified form of) Garrett's positional-level representation. 4 8 The 'information about pronunciation' takes the form of syllable-sized articulatory routines retrieved f r o m a library and incorporated into the serially/temporally ordered structure defined by the planning frame. 4 9 The retrieval of routines is performed on the basis of a phonemic representation (the text), which defines the addresses of routines in the library. The search instructions which provide the means for accessing the library are generated by a scan-copying mechanism (with associated devices) as proposed by Shattuck-Hufnagel. The incidence of a wide variety of speech errors is explicable in terms of malfunctions in various parts of this process. Errors that (appear to) involve the exchange, substitution, addition or deletion of phone-sized segments can plausibly be accounted for on the basis of malfunction of the scan-copier or its associated devices, as Shattuck-Hufnagel has demonstrated. Stranding exchanges (like I thought the park was trucked for I thought the truck was parked) are taken to occur at the point where lexical items are assigned positions in the serially ordered structure defined by the planning frame. Shifts of affixes and non-lexical material (even the best team losts for even the best teams lost) arise when features of the planning frame are allocated positions in the ordered sequence, as Garrett has shown. A further class of errors, including blends and malapropisms, involves the process of lexical selection. Garrett has proposed that lexical selection takes place in two stages. The first involves the retrieval of semantic and grammatical information (but not phonological information) and takes place at the functional level (or even earlier). It is at this point that
150
Syllables and segments in speech production
705
meaning-based word substitutions and also word exchanges occur. The second stage involves the retrieval of phonetic/phonological information and takes place at the positional level (i.e. during the generation of articulatory programs, in our terms). It is at this point that malapropisms occur. (Cf. Garrett, 1980: 200 and 212, also Fay and Cutler, 1977.) Blends seem to involve both of these stages, in that the blended items almost invariably display both semantic and phonetic similarities. One way of accounting for this is to assume that the first, semantic/grammatical stage of lexical selection does not result in the selection of a single lexical item, but rather specifies a class of candidate items. Final choice among these candidates then takes place at the second stage, at the point at which phonetic information is retrieved. (Cf. Garrett, 1980: 211). Blends arise when the source of the phonetic information shifts from one candidate item to another. A familiar problem, mentioned earlier, is the way affixes such as plural or present-tense -s, or past-tense -ed accommodate themselves phonetically to their new stems in errors like even the best team losts (Garrett, 1980: 187) or the coach likes to have his rest teamed for ... likes to have his team rested (Fromkin, 1973; Appendix: S). As noted earlier, this accommodation cannot be attributed to processes that govern allophonic variation. N o r does it have anything to do with phonotactic constraints. In Ralph and my's (for Ralph's and my), my's is pronounced /maiz/ despite the fact that /mais/ is phonotactically permissible. Thus we need to invoke a mechanism different from either of these in order to account for the phenomenon of accommodation. The solution I propose is as follows. We are assuming, following Garrett, that affixes such as -5 and -ed are represented in the planning frame that defines details of the utterance's syntactic structure. It makes sense to suppose that this representation takes the form of syntactic features of some sort, and that it does not include phonetic information. This means that information about the pronunciation of -s, -ed, etc. must be provided during the generation of the positional-level representation (i.e. the articulatory program). Since this information is not predictable on the basis of syntactic representations (any more than it is for lexical items like cat and window), it must be retrieved f r o m the lexicon. That the production mechanism should have recourse to the lexicon at this stage is something we have already found to be necessary for other reasons. In other words, we treat -s, -edetc. at this stage as independent lexical items. Like certain other lexical items, their pronunciation depends on the elements with which they are in construction: just as be is /bi:/, /aem/, /iz/, /a:/ etc. depending on what it is in construction with, so plural -s is /s/, /z/, /iz/, /an/, etc. depending on whether it follows rat, dog, house, ox, etc. T h u s
151
706
A.
Crompton
we can employ the same mechanisms to account for the accommodation of -s, -ed and so on as we use for any other lexical item. Once the shifting and stranding errors have taken place (e.g. once day and week have exchanged in the error weeks in the day for days in the week (Fromkin, 1973: Appendix: S)), the pronunciation of the affixes is handled in just the same way as the pronunciation of any other items. 5 0
5.4
A revised view of the text
In conclusion, I should like to describe a little more explicitly than I have done so far the status and role of the text in speech production. The arguments of section 2 show that the text is analogous to a classical phonemic representation and that it is structured in terms of features, phones, syllable-constituents and syllables. As we have seen, part of the process by which a text is generated is the retrieval of phonemic representations from the lexicon. I shall refer to this retrieval of phonemic representations as 'spell-out'. Implicit throughout the discussion so far has been the idea that a text is generated for an entire chunk of utterance (Boomer and Laver (1968[1973]) suggest that such chunks correspond to tone groups) and that subsequently this text is employed by the articul a t o r programmer to generate search-instructions for accessing the library, retrieve articulatory routines and incorporate them into articul a t o r programs. A feature of this formulation is that for a given chunk, the whole of the spell-out process is completed before the programmer comes into operation. An alternative formulation is possible. Instead of assuming that spellout is over and done with before the addressing stage of the retrieval of routines is initiated, we might view spell-out as being part of the addressing stage. As we have seen, the function of the addressing stage is to draw up sets of search-instructions for accessing the library. Since there is a set of search-instructions for each syllable it is not necessary for the addressing mechanism to have access to the phonemic representation (address) of more than one word at a time. So there is in principle no need for the various items of phonological information that result from spellout all to be gathered together to constitute the text of a complete tonegroup. Instead of doing this, the addressing mechanism might use the lexicon rather like an address book. Using the grammatical information contained in the functional-level representation, the production mechanism singles out a particular lexical item. It then goes to the lexical entry for that item, where it finds the address of an articulatory routine (or, in the case of polysyllables, the addresses of several routines). This is
152
Syllables and segments in speech production
707
then used to set up search instructions, and the routine is retrieved and incorporated in the usual way. 51 In order to account for the segmentbased errors that involve interference between lexical items, we must assume that spell-out/addressing takes place in parallel for several items. There seems to be no reason why this should not happen. How can we decide between these alternatives? According to the first hypothesis, spell-out is completed and the text is generated before addressing takes place. Since the segment-based errors like spoonerisms are taken to arise at the addressing stage, and since the input to this stage is the text, in which elements are serially ordered, this first hypothesis would be consistent with these errors showing significant effects due to serial ordering. In particular, one would not be surprised to find significant differences between anticipatory errors and perseveratory errors. Such a finding would not be consistent with the second hypothesis, since no serial ordering has yet been imposed on (word-sized) elements at the time addressing takes place (the task of getting the serial ordering right falls to the incorporator on this hypothesis: cf. N o t e 51). I am not aware of any data that bear on this issue. The question deserves further investigation. A second way of deciding between the two hypotheses can be derived f r o m the predictions they make about the possibilities of interaction between different types of error. Under the first hypothesis, spell-out precedes addressing, and the two processes are so to speak separated by the text. One would therefore expect no interaction between the sorts of errors that arise during the generation of the text and those that occur afterwards. Under the second hypothesis, spell-out and addressing are part of the same process, so errors that arise at these two stages should be free to interact. To be more precise, consider the possibility of interaction between blends and segment-based errors such as anticipatory substitutions. Under the first hypothesis, the only relation that is possible between these types of error is for the elements that are involved in the substitution (whether as source or target) to be part of a blend. T h a t is, they must be included in the RESULT of a blend but they may not form part of the items that were the SOURCE of the blend (unless, of course, they form part of the blend as well). Under the second hypothesis, the relations that may exist are much less constrained. In particular, an element that is involved in the substitution may be part of one of the sources of the blend, even though it is not part of the result of the blend. Which hypothesis makes the right predictions in this case? Apparently, the second one does. Boomer and Laver (1968[1973: 129]) report the error separite — separating out of the nucleuses. They remark that 'according to the report of the speaker (one of the authors) he had fleetingly considered, and rejected, the
153
708 . A.
Crompton
possibility of using the word nuclei, /njukliai/, instead of nucleuses. Here, interestingly, the rejected word distorts not its successful competitor, but another word in the tone-group'. The existence of such errors suggests that spell-out and addressing are effectively the same process. Our conception of the text and its role in speech production should be revised correspondingly. Instead of being analogous to a phonemic representation for an entire tone-group, it now corresponds to the phonemic representation for individual lexical items. 5 2 Department of Linguistics University of Nottingham Nottingham, NG7 2RD England
Notes * 1.
2.
I should like to thank Anne Cutler for her helpful editorial suggestions. This is of course only part of the problem, since the pronunciation of words must be accommodated to the context in which they appear. I shall not be concerned with this further aspect of the problem in this article. The term 'analogous to' is important here. Levels of representation such as syntactic surface structure and phonetic representations form part of a theory of linguistic competence. Speech production is on the other hand part of linguistic performance. Although common sense tells us there must be some relation between competence and performance, the nature of this relation is still largely unknown. It seems to me that we cannot simply assume that levels of representation in competence must have their counterparts in performance, or vice versa. N o r can we assume any close relation between the rules employed in grammars and the rules employed in theories of speech production (or perception). We should instead construct theories of competence as theories of competence and theories of performance as theories of performance. Once each has been worked out in its own terms, then we will be in a position to look for correspondences between them.
3.
This formulation assumes that individual lexical items are selected before their associated articulatory routines are retrieved. A variant of the model under consideration would assume that instead of sub-strings in the text corresponding to individual items, they would correspond to small classes of syntactically and semantically related items. Final selection of one item f r o m a m o n g these would take place at the time of retrieval of articulatory routines from the library. I shall return to this issue in the context of another model later on (5.4). For the moment, it does not matter which of the two variants we adopt, as the arguments that follow apply equally well to either of them.
4.
It is possible that not all aspects of the prosodic patterning of utterances are computed ad hoc. There may be articulatory routines for particular pitch patterns, loudness patterns, etc. I shall not go into this question here. In order to prevent any misunderstanding here, it is worth pointing out that the crucial feature of this model is the hypothesis that the addresses of the articulatory routines in
5.
154
Syllables and segments in speech production
709
the library are defined by sets of syntactic/semantic feature-specifications. This is equivalent to saying that the library is structured on a syntactic/semantic basis. The fact that, say, freeze can be produced in error for phrase calls for a library that is structured 'on a phonological basis. My argument here is therefore analogous to that of Fay and Cutler, 1977 regarding the structure of the mental lexicon. 6.
This may seem a strange thing to say in view of the vast a m o u n t of work that has been carried out in descriptive phonetics over the last sixty years or so. But the fact is that almost all of this work has been either impressionistic (hence non-quantitative) or purely instrumental (hence often untheoretical). F u n d a m e n t a l questions have hardly been posed, much less answered: how many dimensions does the phonetic quality space consists of? how is distance defined in this space? how is pitch represented? how is time represented? and so on. See C r o m p t o n , 1981 for a fuller discussion of these questions and some possible answers.
7.
I am not suggesting that the construction of an articulatory p r o g r a m involves nothing more than a stringing together of articulatory routines. The p r o g r a m m e r must doubtless also make many ad hoc computations, including modifications to the routines designed to accommodate them to neighbouring routines and to the prosodic pattern of the utterance. Ordering of routines in the program may also not be strictly sequential if, for instance, aspects of prosodic patterning are handled by means of routines.
8.
This statement takes for granted the correctness of one solution to issue (b). The justification for this will be given later. In view of our earlier claim that phonetic representations are not syllabified, we should say that syllabification is an aspect of pronunciation only in a rather abstract sense. There are, however, lanaguages where the syllabicness of segments is distinctive, for example French, which opposes 'oui' / w i / = /üi/ to 'houille' / u j / = / u i / . They cannot be analysed as syllable-based errors because lexical representations are not syllabified. They cannot be analysed as word-based errors because the resultant forms are often not words (e.g. spa in I and tudio). Actually, we should also have an archiphoneme /S/ in place of /s/, but that is unimportant here. It may be necessary to specify /r/ as being [-lateral] as well. This is only partly true in the case of stress, since some aspects of stress placement are not predictable: C o m p a r e Berlin /ba:'lin/ with Merlin /'maMin/, for instance. This is not part of Chomsky and Halle's approach in SPE. Since 1968 a considerable amount of work has demonstrated the superiority of syllable-based phonology. Cf. Bell and Hooper, 1978. There is some evidence that the assumption that systematic phonemic representations (SPR) are marked for stress is a necessary one, rather than just a matter of argumentational convenience. This derives from Cutler a n d Isard's (1980) claim that stress must be marked in the mental lexicon. Clearly, if stress is marked there, then it must also be marked at all levels of representation that lie downstream of it in speech production, and that includes, according to Hypothesis 4, SPR. Cutler a n d Isard's claim is based on an analysis of lexical stress errors such as prisenting for presenting, economists for economists, etc. They note that a general characteristic of this class of errors is that 'the erroneously produced stress pattern is always that of another word (thus no such errors as * administrative are observed); and this word is always morphologically related to the intended word' (p. 248). They argue that these errors can only be accounted for on the basis of confusions that arise between two differently stressed forms in the lexicon (such as prisent, present), which obviously presupposes that (mental) lexical representations are marked for stress. In particular, they claim that
9. 10. 11.
12. 13. 14. 15.
16.
155
710
A. Cromp ton the errors cannot be accounted for on the basis of misapplications of the stress rules, since this approach would fail to explain why lexical stress errors only occur in morphologically complex or derived words (the stress rules themselves apply to derived and non-derived words alike) and also why the errors always result in a stress pattern found in some other related word. However, this last argument seems to me to be incorrect. The fact that stress errors occur only in complex or derived words can be explained on the basis of misapplications of stress rules when one recalls the effects of suffixes on stress-patterns in English. Some suffixes have no effect on stress placement (e.g. -ing, -ize — Chomsky and Halle (1968) separate these f r o m their stems by a # boundary), whereas others do affect it (e.g. -ity, -ial— C h o m s k y and Halle have these preceded by a + - b o u n d a r y ) . One might therefore envisage erroneous stress placements occurring as a result of a failure on the part of the production mechanism to correctly identify a suffix as being stress-neutral or stress-affecting. T h u s a failure to disregard -ists for purposes of stress assignment could result in the error economists. Other rules of stress assignment are sensitive to grammatical category: cf. subject (noun) versus subject (verb). A malfunction in this area could result in the error (any) conflicts (verb-stress instead of noun-stress). In prisenting, the -ing has been (correctly) ignored, but the stem present has been stressed by the noun-rule, and not the verb-rule. While I have no desire to argue that this MUST be the mechanism by which stress errors arise, it is, contrary to what Cutler and Isard claim, a possible mechanism. Thus stress need not be marked in the mental lexicon after all. Other apparent exceptions, such as the alternations exemplified by (the State o f ) Tennessee versus Tennessee (Williams), do not, in my view, involve word stress at all, but rather sentence stress. Thus Tennessee, for example, has two word stresses (Tennessee), and these mark the syllables in question as having the potential for receiving accent (or sentence stress — not necessarily primary sentence stress). (Cf. Weinreich, 1954 and Huss, 1978 for arguments in favour of this view of word stress.) Whether this potential is realised or not depends on a number of factors, including rhythmical ones: the well-known tendency for English utterances to display alternating accent patterns often leads to the first of two adjacent word-stressed syllables not being accented in environments where one would otherwise expect it to be. Thus the wordstress on the last syllable of Tennessee does not realise its potential to receive an accent, in cases where it is followed closely by another accent. The de-accentuation of dog in big black dog occurs for exactly the same reason.
17.
Actually, Wells' characterisation of rhythmic similarity is probably a bit too restrictive. According to his definition neither insufficient and inferior nor instantaneous and momentary are rhythmically similar, yet the blends momentaneous and insufferior are clearly influenced by the rhythmical patterns of their origins.
18.
I discuss the question of how blends arise in 5.3. See also 4.2 for an analysis of some of the properties of this type of error. It must be admitted that the arguments just presented are rather intuitive, and thus not wholly convincing. One might also object to them on the grounds that they do not bear exactly on the question at issue. Hypothesis 4 makes claims a b o u t the relation between SPR and the input to the programmer, whereas the above arguments concern the relation between SPR and the information retrieved from the lexicon. The information retrieved from the lexicon is not necessarily the same as that which serves as input to the programmer. I shall refer to the retrieval of phonological information from the lexicon as 'spell-out'.
19.
Although this point is correct, it does not entirely invalidate our arguments. T o see this, consider the relations between the levels a n d processes in question as represented below.
156
Syllables
and segments
(i)
text
in speech production
711
(ϋ)
=
SPR
^
lexical representation
~
null
~
redundancy rules
~
SPR
=
SPR
There are two views of the relation between the o u t p u t of spell-out and the input to the programmer that are consistent with Hypothesis 4. The first, indicated as (i) above, is the one we have been implicitly assuming so far, namely that the two are the same a n d are analogous to SPR. On this view, the redundancy rules apply as part of the spell-out process. The alternative view shown as (ii) above, is that the result of the spell-out process is analogous to lexical representation and that this undergoes processing that transforms it into something like SPR. Given that lexical representations are even more abstract than SPR, it is, I think, all but self-evident that the arguments presented above within the framework of assumption (i) apply with equal and indeed greater force within the framework of assumption (ii). If blends cannot be explained on the basis of SPR, they certainly cannot be explained on the basis of lexical representation. So Hypothesis 4 fails whichever alternative is chosen. 20.
21.
22.
The evidence for underlying /k / in cepty is f o u n d in words like accept, which is underlyingly / s d k s p t / . The / d / assimilates to the / k / , giving /aekk£pt/, and the / k / then becomes /s/. There are, of course, other kinds of errors that one might assume occur later in the production process, namely those that result from 'getting one's tongue in a twist', (cf. Butterworth and Whittaker, 1980). We shall not be considering this type of error, however. Other possibilities exist within the framework of Hypothesis 4, namely that the substitutions, exchanges, etc. operate on clusters of phonemes or on syllables consisting of phonemes. For example, face spood for space food (Fromkin, 1973: Appendix: F) might be seen as an exchange of /sp/ and /F/ or even as a substitution of /fets/ for /speis/ and /spu:d/ for /fu:d/. The same arguments apply to these alternatives as to the variant discussed in the text.
23.
These conditions may not be correct in detail. Instead of the onset being specified as /sp/, it may be sufficient to specify only /sP/, where / P / stands for a bilabial plosive unspecified for voicing (/sb/ does not occur). (The existence of errors like peach seduction for speech production is evidence against this, however.) Similarly, instead of / q k / we should perhaps have / N k / , where / N / symbolises a nasal consonant unspecified for place of articulation.
24.
In addition, it seems likely that onsets, nuclei and codas "should be described according to a 'slot-filler' scheme that indicates what material occupies each possible structural position. For example, onsets may contain three slots or structural positions: all of them are filled in sprang, while in spank the third slot is vacant, and in prank the first slot is vacant. Evidence in favour of this approach is provided by Shattuck-Hufnagel (1979) in her discussion of the role of 'null-segments' in speech errors.
157
712 25.
26.
A.
Crompton
A m o n g the other types of programming unit we might hypothesise, only one seems to me to have received any motivation, namely that which corresponds to the tone group or phonological phrase. Cf. Boomer and Laver, 1968 and references cited there. For Fromkin, apparently, the 'psychological reality* of these units extends to the domain of linguistic competence as well as performance. She considers that behavioural data such as those provided by speech errors are sufficient to verify hypotheses a b o u t linguistic competence (1971(1973: 218]). I cannot see the logic of this position.
27.
Other reasonable accounts of the origin of these errors could probably be devised. It is conceivable that they arise at the activation stage of retrieval, for instance. However, it seems to me that all such accounts must share the feature that the locus of the errors precedes the point at which the routines become available, and that is all that is necessary for the point I am making here. The idea that the errors arise at the addressing stage is taken up and developed further in section 5.
28.
In this respect a theory of speech errors is just like a theory of grammar, which must be capable of generating all and only the acceptable sentences of the language. Simply generating the acceptable sentences is a trivial task (a device that generates strings of arbitrary length consisting of arbitrarily chosen words accomplishes this). But simultaneously ruling out the unacceptable sentences is far from trivial. The same goes for speech errors.
29.
Recall that there may well be several hierarchically related categories of articulatory programming unit. T o say that a unit is the smallest that is available for independent manipulation is, therefore, not to say that it is the only such unit. Cf. Note 25. Note that the form of a unit A that consists of sub-units B, C, etc. is governed b o t h by the constraints imposed at the level of A itself and by the constraints imposed at the level of B, C, etc. Thus to say that the constraints that govern the form of A may not be violated entails that the constraints on B, C, etc. also may not be violated. For example, the form of syllables is governed by constraints imposed at the level of syllables (e.g. a syllable consists of an onset followed by a rhyme) and by the constraints imposed at, say, the level of phones (e.g. a phone (in English) may not be both velar and fricative). T o generate acceptable syllables (e.g. in a model of competence), we must observe b o t h the constraint on onsets and rhymes and the constraint ruling out velar fricatives (among others).
30.
31.
32.
33. 34.
35.
36.
Exceptions to this arise when the approximant is /j / as in new and mule. Since the vowel in these cases is invariably /u:/, it is probably better to assign the /j/ to the syllabic nucleus rather than the onset. T h a t is, new is / n + j u : / rather than /nj + u:/. Notice that this is indeed a constraint on syllable onsets and codas and not on words, in English. This is shown by the existence of numerous words with hetero-syllabic obstruent clusters with mixed voicing. M a n y personal or place-names ending in -by (e.g. Whitby) or -ford (e.g. Redford) provides examples. Cf. M a c K a y , 1978 for some interesting experimental data relating to this point. The situation is somewhat complicated by cases where final /t, s, d, ζ/ are past tense, plural or possessive markers, etc. These d o not seem to take part in the usual syllablestructure constraints. There is also some evidence that they behave differently f r o m other consonants with respect to speech errors. I discuss this briefly in section 5. We d o find the error /psrnt/ for point (K9), but this may be a substitution of the nucleus jij for the nucleus /οι/. (Notice that /parnt/ is pronounced [pant], i.e. [po*nt] in most American accents, and not [parnt] (with a rolled [r]) or [parnt] (with a flapped [ f]).) There would not seem to be any general constraint ruling out / j / = /a1/ before /nt/, since burnt exists. M a n y languages, and also accents of English other than RP, do have much stronger
158
Syllables
and segments
in speech production
713
phonotactic constraints between onset and nucleus. There is frequently a block on initial /wu/ or /Jί/ sequences (although these may involve complex nuclei rather than onset + nucleus), and languages with palatal and velar stops sometimes permit only palatals before /i/. It would be interesting to have slip-of-the tongue data from languages like this to see whether the constraints hold. 37.
38. 39.
This is not the same as saying that there are no phonological constraints at all holding above the level of the word. That would clearly be untrue. Consider, for example, the constraint in English (RP) that rules out the sequences /sV/, /a:V/ and, for some speakers, /o:V/. This operates within words (cf. bar/'ba:/ but barring / ba:rir)/ not */'ba:irj/, also baa-ing (of sheep) which is again /'ba:rtr|/ not */'ba:iq), and it also operates between words: Kajfka is /'kaefkariz/ not */'kaefk3 iz/, the Shah is /Ö3'Ja:riz/ not */ösJa:iz/. However, this constraint does not govern what words go with what words, but what phonological forms of a word occur in particular phonological context. Cf. section 5 for further arguments in favour of this view. Adoption of this hypothesis would entail that the input to the articulatory programmer could no longer correspond exactly to classical phonemic representation, as argued earlier. Instead, it would correspond to C P R without vowel reduction. O f f h a n d , I can't see any reason why this shouldn't work.
40.
There are a n u m b e r of problems in deciding what the non-reduced form is. In the second syllable, the surface vowel /i/ alternates with / a i / , as in reside, although [ai] is not the obvious candidate for the non-reduced form of [i] from a purely phonetic point of view. In the last syllable /Jl/ alternates with /jjael/ (cf. pairs like partial — partiality) but also with /Ji.ael/ or /si.ael/ in very formal speech (cf. partiality as /.pai.Ji.'sel.i.ti./.). This is particularly problematic, since different numbers of syllables are involved. N o n e of this reflects well on the hypothesis we are considering.
41.
One obvious place to look would be Meringer's collection of speech errors in G e r m a n (Meringer, 1908; Meringer and Mayer, 1895). In passing, it is worth noting that the relevant phrasal units are phonological units, not syntactic ones. T h a t is, we are talking of tone-groups (phonological phrases) a n d not of noun-phrases, verb-phrases, clauses, etc. All too often, it is assumed that there is a simple correlation between the phonological and the syntactic units, so that prosodic phenomena such as pause and 'final lengthening' must relate directly to syntactic units. In fact, analysis of any corpus of naturally occurring speech (not even necessarily informal speech) will demonstrate the existence of far-reaching discrepancies between the two types of unit. (Cf. C r o m p t o n , 1978.) Note that Boomer and Laver (1968[1973]), in their discussion of the domains within which interference occurs in speech errors, consistently refer to tone-groups, and not to any syntactic unit.
42.
43.
The reason why these errors must involve segments roughly akin to phonemes is obvious enough. If they involved less abstract segments, viz. positionally specified allophones, it would be impossible to explain why the errors d o not involve violation of allophonic constraints. Thus in lumber sparty for lumber party (Fromkin, 1973: Appendix: G) we find not the aspirated [p h ] of party but the unaspirated [p] appropriate to the environment of a precding [s],
44.
The term 'move' here is not to be taken in a literal sense, either within the context of Shattuck-Hufnagel's model or within the context of mine. In b o t h cases it is a copying of information that takes place. Although such errors are much less likely to get t h r o u g h , some of them do, and result in the very rare slips of the tongue that violate phonotactic constraints. (One such error, reported to me by Anne Cutler, is [qna:do sr.nta] for Gardner Centre.) How can such
45.
159
714
46.
47.
48.
49.
A. Cromp ton sequences possibly be uttered if they d o not correspond to routines in the library? Presumably the articulatory programs underlying them are computed in their entirety by the programmer. Recall our earlier discussion in section 2.2, where it was suggested that such a capability might be attributed to the programmer even though it is hardly ever utilised in normal speech and is in any case somewhat impaired in adult speakers. The nature of the slots remains to be determined. Shattuck-Hufnagel seems to vacillate between the view that there is a slot for each p h o n e m e and the view that there is a slot for each syllable-constituent. A third possibility is that there is a slot for each syllable. I return to this question later. I shall suggest later that the stressed-unstressed distinction (or, to be more precise, the reduced-non-reduced distinction) between syllables plays an important role in the operation of the system. One large gap in this account (among several, no doubt) is the absence of any specification of prosodic patterns. Butterworth (1980b) argues the need for a separate sub-system to handle this. Cf. Cutler and Isard, 1980 for some discussion. Of the information necessary for the correct serial ordering of the routines retrieved from the library, part (that which concerns the ordering of syllables within words) is obtained from the text, and part (that which concerns the ordering of words within phrases) is obtained from the planning frame.
50.
A small point, which does not substantially affect the argument, concerns the distinction between the phonologically conditioned variation in the pronunciation of -s after rat, dog, house and the morphologically or grammatically conditioned variation in the pronunciation of be as /bi:/, /aem/, etc. Slightly different submechanisms will be needed to account for these two types. Notice, however, that the distinction between phonological and grammatical conditioning does NOT dovetail with the distinction between inflectional material and full lexical material. The pronunciation of the plural marker is grammatically conditioned as well as phonologically conditioned (oxen, not axes) and phonological as well as grammatical conditioning is found with full lexical material (e.g. alternations like Rad /ra:t, ~ Rade /ra:da/ in German).
51.
Note that some of the phonological information contained in the lexical entry would have to be made known to the incorporator: recall o u r earlier discussion, where it was hypothesised that the incorporator makes available the right number and right kind (stressed or unstressed) of syllable-sized slots for the routines to be inserted into. Moreover, since there is no longer a text for a complete tone-group, the serial ordering of words would also have to be dealt with by the incorporator. One effect of this revision is to suggest an alternative account of the accommodation phenomon discussed earlier. If spell-out/addressing is carried out for full lexical items on the basis of functional-level representations, then, ipso facto, no account is taken of features of the planning frame. Consequently, inflectional affixes, etc. which are features of the planning frame must be handled separately. Since, in the majority of cases, they do not correspond to syllables on their own, they cannot have their own routines in the library. Therefore they must be handled computationally.
52.
Interestingly, this view of the production of inflectional affixes (which may be valid for some derivational affixes as well, for all I know), correlates very well with their linguistic behaviour in that it makes good sense NOT to make -.9, -ed and -th subject to the phonotactic constraints of English. That is to say, in handling words like sixths, the final consonant cluster /ks0s/ is subject to the phonotactic constraints only as far as the first /s/.
160
Syllables
and segments
in speech production
715
References Bell, A. and Hooper, J. B. (eds) (1978). Syllables and Segments. Amsterdam, New York, Oxford: North-Holland. Blumstein, S. E. (1978). Segment structure and the syllable in aphasia.In A. Bell a n d J. B. Hooper (eds), Syllables and Segments, 189-200. Amsterdam, New York, Oxford: N o r t h Holland. Boomer, D. S. and Laver, J. D M . (1968). Slips of the tongue. British Journal of Disorders of Communication 3, 1-12. (Reprinted in Fromkin, 1973.) Brown, G. (1970). Syllables and redundancy rules in generative phonology. Journal of Linguistics 6, 1-17. Butterworth, B. (ed.) (1980a). Language Production: Volume I: Speech and Talk. London: Academic Press. —(1980b). Some constraints on models of language production. In B. Butterworth (ed.). Language Production: Volume 1: Speech and Talk, 423-459. London: Academic Press. Butterworth, B. and Whittaker, S. (1980). Peggy Babcock's relatives. In G. E. Stelmach a n d J. Requin (eds). Tutorials in Motor Behaviour, 647-656. Amsterdam, New York, Oxford: North-Holland. Chomsky, N. (1964). Current Issues in Linguistic Theory. The Hague: M o u t o n . Chomsky, N. and Halle, M. (1968). The Sound Pattern of English. New York: Harper and Row. C r o m p t o n , A. S. (1981). Phonetic representation. Paper given at the spring meeting of the LAGB. Cutler, A. and Isard, S. D. (1980). The production of prosody. In B. Butterworth (ed.). Language Production: Volume 1: Speech and Talk, 245-269. London: Academic Press. Fay, D. and Cutler, A. (1977). Malapropisms and the structure of the mental lexicon. Linguistic Inquiry 8, 505-520. Fromkin, V. A. (1968). Speculations on performance models. Journal of Linguistics 4, 47-68. —(1971). The non-anomalous nature of anomalous utterances. Language 47, 27-52. (Reprinted in Fromkin, 1973.) —(ed.) (1973). Speech Errors as Linguistic Evidence. The Hague: M o u t o n . Fry, D. B. (1969). The linguistic evidence of speech errors. Brno Studies in English 8. (Reprinted in Fromkin, 1973.) Fudge, E. C. (1969). Syllables. Journal of Linguistics 5, 253-286. Garrett, M. F. (1976). Syntactic processes in sentence production. In E. Walker and R. Wales (eds). New Approaches to Language Mechanisms, 231-255. Amsterdam: NorthHolland. —(1980). Levels of processing in sentence production. In B. Butterworth (ed.), Language Production: Volume 1: Speech and Talk, 177-220. London: Academic Press. Hooper, J. B. (1975). The archisegment in natural generative phonology. Language 51, 536-560. Huss, V. (1978). English word stress in the post-nuclear position. Phonetica 35, 86-105. Ladefoged, P. (1966). The nature of general phonetic theories. In R. J. O'Brien (ed.), Selected Papers on Linguistics 1961-1965. Georgetown University Round Table, M o n o g r a p h No. 18. M a c K a y , D. G. (1969). Forward and backward masking in motor systems. Kybernetik 6, 57-64. —(1970). Spoonerisms. Neuropsychologia 8, 323-350. (Reprinted in Fromkin, 1973.) —(1978). Speech errors inside the syllable. In A. Bell and J. B. Hooper (eds). Syllables and Segments, 201-212. Amsterdam, New York, Oxford: North-Holland.
161
716
A. Cromp ton
M e n n , L. (1978). Phonological units in beginning speech. In A. Bell and J. B. H o o p e r (eds.), Syllables and Segments, 157-171. Amsterdam, New Y o r k , Oxford: North-Holland. Meringer, R. (1908). Aus dem Leben der Sprache. Berlin: B. Behr. Meringer, R. and Mayer, K. (1895). Versprechen und Verlesen. Stuttgart: G. J. Goschen. N o o t e b o o m , S. (1969). The tongue slips into patterns. Nomen Leyden Studies in Linguistics and Phonetics. . The Hague: M o u t o n . (Reprinted in Fromkin, 1973.) Ohala, J. (1978). The production of tone. Report of the Phonology Laboratory (Berkeley) 2, 63-117. (Also in V. Fromkin (ed.) (1978), Tone: a Linguistic Survey. London, New York: Academic Press.) Shattuck-Hufnagel, S. (1979). Speech errors as evidence for a serial-ordering mechanism in sentence production. In W. E. Cooper and E. C. T. Walker (eds), Sentence Processing: Psycholinguistic Studies Presented to Merill Garrett, 295-342. Hillsdale, N.J.: Lawrence Erlbaum. Shattuck-Hufnagel, S. and Klatt, D. (1979). The limited use of distinctive features and markedness in speech production: evidence from speech error data. Journal of Verbal Learning and Verbal Behavior 18, 4 1 - 5 5 . Stanley, R. (1967). Redundancy rules in phonology. Language 47, 429-447. Weinreich, U. (1954). Stress and word structure in Yiddish. In U. Weinreich (ed.), The Field of Yiddish, 1-27. New York: Linguistic Circle of New York. Wells, R. (1951). Predicting slips of the tongue. Yale Scientific Magazine 26, 9 - 3 0 . (Reprinted in Fromkin, 1973.)
162
Substitutions and splices: a study of sentence blends* DAVID FAY
Abstract In the first section, sentence blends found in a large corpus of errors in spontaneous speech are described and categorized. The major classes of blends are substitutions, splices, and complex blends. Substitution blends seem mainly sensitive to the grammatical structure of the blending sentences, while splice blends are conditioned by sentence rhythm. There are two subtypes of splices in which the splice occurs before or after a stress site. These subtypes dijfer in the material from the two target sentences that is included in the blend. Complex blends consist of combinations of simple substitutions and splices. Substitutions occur at an earlier stage in speech production than splices. A second section applies the taxonomy of sentence blends to purported transformational errors. While not all such errors can be described as blends, certain classes can. Errors with duplicated particles, in particular, are better accounted for as sentence blends than as transformational errors. This is shown by comparing the distribution of noun and pronoun objects in sentences with particle errors with that in error-free spontaneous speech. The difference in distributions is predicted by a splice theory of duplicated particles but not by a transformational theory.
1.
Introduction
Sentence blends have long been recognized as a distinct phenomenon in speech production. From Meringer and Mayer's (1895) early description of speech errors to Fromkin's (1971) more recent survey, blends have played a prominent role in discussions of linguistic lapses. Remarkably little is known about them, however, aside from the obvious fact that two sentences that blend together into one erroneous utterance usually have the same meaning.
163
718
D. Fay
This is a serious gap in the study of speech production, as sentence blends raise basic questions about the processing mechanisms and capacity of the speech system. D o sentence blends show that two or more distinct plans for speech have been created at the same time? And if so what kind of memory can simultaneously contain two plans so that they may combine? The existence of blends raises methodological issues as well. Until we understand sentence blends better, we run the risk of contaminating our samples of other types of errors with blends. It would be hard, for example, to distinguish blends from other kinds of errors if blends are fusions of any number of sentences in any possible combination. Of course, this is unlikely. But what the actual constraints on blend formation are no one knows. Until we do know, any set of errors whatsoever could be interpreted as blends. One concrete example of this difficulty is found in syntactic duplication errors (Fay, 1980a). This is a class of errors in which a word or a syntactic marker appears twice in an utterance when it should appear only once. Several examples are given in (l)-(3). (1) (2) (3)
A boy who I know a boy has hair down to here. You always get the pimply-faced boys that haven't quite yet matured yet. Would you turn on the light on?
How does the duplication come about in these errors? One account assumes that movement transformations apply as mental operations during the construction of a sentence (Fay, 1980b). Since movement transformations consist of two suboperations — copying and deletion — it is possible for copying to take place without deletion. When an element that was to be moved is copied into a new location, but remains, as well, in its original spot, a word in the sentence is duplicated. For example, in (3) the particle on is duplicated as Particle Movement tries to move it from a position after the verb to a new position after the object noun phrase. An alternative account of duplication errors assimilates them into the class of blends, some examples of which are given in (4)-(6). Each of these blends can be readily explained as a combination of the two roughly synonymous targets displayed below the error. (4)
(5)
It's Tj: T2: It's Tj: T2:
the old acorn and the egg. It's the old acorn and the oak tree. It's the old chicken and the egg. spent me a year ... It's taken me a year ... I spent a year ... 164
Blends (6)
719
... if I don't get the ball on the show. Τ ι; ... if I don't get the ball rolling. T 2 : ... if I don't get the show on the road.
A similar account can be given for the duplication errors given in (l)-(3). For example, (1) could be explained as a blend of the two targets given in (7): (7)
TV A boy who I know has hair down to here. T 2 : I know a boy who has hair down to here.
How might we decide between these two theories of the origin of duplicated words? One way is to analyse duplication errors in greater detail to find out whether their distributional characteristics are just those predicted by the transformational theory. For example, we would like to know whether duplications are created for all and only the movement transformations. Although this approach has been explored (Fay, 1980a), it is difficult to pursue very far because of the relative scarcity of duplication errors. An alternative approach is to examine blend errors in detail. Once a thorough description of the errors is available, it will be possible to tighten up the theory that accounts for them. We can then reconsider duplication errors to see whether the theory of blends still provides a plausible explanation for them. If it doesn't, then the transformational theory emerges the better account. If it does, we will have to seek additional evidence to distinguish the two theories. This is the approach taken in this paper. The first section provides a description and categorization of blends. The second section takes up the question of transformational accounts of errors and considers one type in detail. We show that errors with duplicated particles have all the characteristics of blends. Furthermore, blend theory explains the distribution of the duplications better than the transformational theory.
2.
An analysis of blends
A blend occurs when a speaker has in mind simultaneously two ways of expressing the same message. Instead of one or the other expression being used, they are combined in some way to give a new, synthesized utterance that does not match exactly either of the intended expressions. In this section we analyse these errors to see what distributional constraints we can discover. 1
165
720 2.1
D. Fay Methodology
The data base for this study is a corpus of over 5,000 errors in spontaneous speech collected by the author and friends since 1972. Only the data collected up until the summer of 1978, when the first version of this paper was written, are analysed here. F r o m this collection were selected all those errors noted as being possible sentence blends at the time of their occurrence. To these were added a few more that seemed to have two obvious sources but which were not recognized as being blends when they were recorded. There were 98 errors all together. From this set, all errors were excluded that were possibly syntactic errors (Fay, 1980a), because we were interested in trying to distinguish these two classes of errors distributionally. Operationally, this amounted to removing blends in which the error departed from being well formed only in having a repeated grammatical element or word. In (8), we have given some examples of the errors removed for this reason, along with their possible sources as blends. (8)
a.
b.
c.
d.
Well, then, get up here and eat it then. T j : Well, then, get up here and eat it. T 2 : Well, get up here and eat it then. That's what my mother did that. T ^ That's what my mother did. T 2 : My mother did that. If Justie should wakes up —. T i : If Justie should wake up —. T 2 : If Justie wakes up —. I didn't said nothing. T j : I didn't say anything. T 2 : I said nothing.
It was hoped that by eliminating putative syntactic errors from the data a set of clear cases of blends could be established. Once the properties of these clear cases are determined, we can re-examine the excluded errors to determine whether they could also be blends. Two blends were also eliminated which could be explained by other known error mechanisms. These are given in (9) and (10): (9)
(10)
The number of noun compounds greatly exceeds the number of verbs or adjectives — so there are far many here. T t : ... there are far more here. T 2 : ... there are many more here. If the husband beats the wife, she's at a terribly disadvantage.
166
Blends
721
Τ , : she's at a terrible disadvantage. T 2 : she's terribly disadvantaged. The error in (9) could be explained as a blend in which many in T 2 substitutes for more in T j . But it could also be explained as a simple omission of more in the target given in (11). (11)
There are far many more here.
Since word omissions are known to occur anyway, this example is susceptible to another explanation and, for that reason, was excluded. The error in (10) could also be a blend. In this case terribly f r o m T 2 substitutes for terrible in T p But terribly might also appear in the error because a mistake was made while searching for terrible in the mental lexicon. Substitution errors involving two morphologically related words are known to occur and this may be an example. By excluding (9) and (10) as blends we hoped to purge the data of noise so that whatever generalizations can be made about blends are as clear as possible. As should be apparent from the discussion of (9), there is some flexibility in assigning target utterances to a blend. In many cases, the target utterances for the blends were supplied by the speaker, who had a clear intuition of having two distinct utterances planned out. In the remaining cases, the speaker was not sure of the source of the error, but we could reconstruct two plausible sources from the form of the error and the linguistic and situational context in which it occurred. It would have been interesting to check whether there were differences between the cases of speaker-supplied target and reconstructed targets. This was not possible, however, since in most cases, it was not noted when the error was recorded whether the target utterances had been supplied by the speaker or inferred by the recorder. As a consequence, the data analysed in this paper do not fit precisely a behaviourally-defined, pretheoretical definition of blends, to wit, errors in which the speaker actually reports two target utterances. Nevertheless, we believe that future studies will show that our collection of blends and targets is representative of this type of error.
2.2
Data
analysis
The data analysis proceeds in several steps. First, we describe four categories into which the data naturally seem to fall. Next we examine these categories in detail to determine their characteristics. Finally, we try to order the error types with respect to each other and to other events in speech production. 2
167
722 2.2.1
D. Fay Category types.
The four categories of blend are as follows:
2.2.1.1 Substitution blends. These blends are defined by the intrusion of a single word, or occasionally a phrase from one target into the other. Several examples are given in (12): (12)
a.
b.
c.
If I ever get my hold on them — get my hands on them. T j : ... get my [hands] on them. T 2 : ... get [hold] of them. It's spent me a year ... T t : It's [taken] me a year. T2: I [spent] a year. It so is. T i : It [sure] is. T 2 : [So] it is.
2.2.1.2 Splice blends. This type of blend is defined by the concatenation of either part or the whole of one target with part of the other. Although some of these errors could also be described as substitution errors of a special type, reasons will be given below for including them here. Several examples of splices are given in (13). (13)
a.
b.
c.
W h o is it that? TV Who is it? T 2 : W h o is that? I've got one of those splicers and I want to show it to you how it works. T i : show it to you. T 2 : show you how it works. You're going to be another Joe Oppenheimer when you get up. T t : when you get older. T 2 : when you grow up.
2.2.1.3 Indeterminate blends. This category includes all those blends that meet the definitions of BOTH of the preceding categories. This subset of errors will be excluded f r o m further analysis as they provide no information of theoretical interest. 2.2.1.4 Complex blends. These errors seem to be cases of neither simple substitutions nor simple splices. We will define them by default in this way, until we take them up again in detail below. Examples are shown in (14): (14)
a.
I think it's terrible they only let you allow to take one course. T t : they only let you take one course. T 2 : they only allow you to take one course. 168
Blends b.
c.
723
... if I don't get the ball on the show. T j : the ball rolling. T 2 : the show on the road. One thing that fascinated by me ... T j : One thing that fascinated me ... T 2 : One thing I was fascinated by ...
The distribution of the errors into the four categories is shown in Table 1. Table 1.
Distribution
of error types
Category
Ν
Substitutions Splices Indeterminate Complex
14 19 54 5
Total
92
We turn now to a detailed treatment of each of the error types, with an eye toward discovering the identifying properties of each.
2.3
Substitution
blends
In analysing the 14 substitution blends listed in the Appendix, we would like to determine as narrowly as possible the kinds of substitutions that can occur. We start by labelling the two words involved in the substitution by the grammatical class in which they fall. Since most words fall in more than one category, we used the category that was (uniquely) compatible with the syntax of the target it originated in. The results of this analysis are shown in Table 2. Table 2.
Grammatical
Target 2: Target 1
category in substitution Ν
Ν V ADJ ADV PRT Ρ
V
blends ADJ
ADV
1
1 1
PRT
Ρ
1
1 1
1 7
169
724
D. Fay
As Table 2 shows, 11 out of the 14 substitutions are between words of the same grammatical category. There is then at least a strong tendency toward a grammatical category constraint on these errors. We can support an even stronger statement of this generalization by examing the three errors that violate it. These are listed in (15): (15)
a.
b.
c.
When you're working with severely profound children ... T i : with severely [retarded] children ... A / T 2 : with [ [profound]ly] retarded children ... ADV A Hope I can pull it through. T , : ... pull it [ [off]]. PRT I?/" T 2 : ... go [through] with it. Ρ He sent a flyer out around it. T j : He sent a flyer out [about] it.
p
I
T 2 : He sent a flyer [ [around]] about it. PRT Ρ In (15a) we see that (part of) an adverb has substituted for an adjective. Note, however, that the adverb profoundly, is composed morphologically of the adjective, profound, and the adverb-forming suffix -ly. Derivational morphology of this sort is created by word formation rules, which take a word from a lexical entry, add an affix, and then deposit the result back in the lexicon (Aronoff, 1976). As in this case, the word formation rules often change the grammatical category of the word they apply to. It is possible that complex words of this sort are marked in the lexicon with bracketing to show their derivational history. If so, this bracketing would be carried along with the word when it is inserted into a syntactic structure, as the notation in (15a) indicates. If this suggestion about word formation and lexical insertion is correct, we can immediately see that (15a) is not a violation of the grammatical category constraint at all. Rather, it is simply a case of one adjective, which happens to be embedded in an adverb, substituting for another, adjective. The same account holds for the errors in (15b,c). In these cases, we find a substitution involving a particle and a preposition, words ostensibly from different classes. Yet it is plausible to assume that particles are derived from prepositions, since there are no particles which are not also prepositions. Of course, no morphological material is attached to a
170
Blends
725
preposition when it becomes a particle. However, 'zero derivations' of this kind are accepted as a regular part of derivational morphology (Aronoff, 19-76). We see then that the apparent exceptions to the constraint on grammatical category actually conform to it once the double bracketing of words is taken into account. This is a strong hypothesis, but one that is easily falsified. If it is incorrect, it should not be difficult to find counterexamples in which a blend could only be a substitution error, yet the words involved in the substitution clearly and unambiguously fall into different categories. In the absence of any counterexamples, we will assume here that the grammatical category constraint on substitutions is correct. As an aside, it is interesting to consider the implication of this analysis for other aspects of speech production. That the double bracketing hypothesis allows us to establish a categorical generalization for blends argues strongly for its correctness. We would expect that adoption of the hypothesis will illuminate other areas of study as well. This expectation is born out by the clarifying role double bracketing plays in the analysis of exchange errors (Fay, 1975). The grammatical category constraint seems to be the m a j o r factor in word substitution blends. One other factor, position in sentence, plays a role, albeit a mysterious one. To examine sentence position effects, each word in the two targets was labelled with its ordinal position. The first step in the analysis was to find the correlation between the positions of the two words involved in the substitution. They are highly similar (r = 0.895, ρ < 0.001). This result is not unexpected since the two target utterances are similar in meaning and structure. With the added constraint that the words involved in the substitution must have the same grammatical category, we should certainly expect them to occur in roughly similar positions in their respective targets. That is, the semantic and grammatical properties of the target sentences should jointly constrain substitutions to occur between similar ordinal positions. In contrast to this similarity, there is an odd but systematic difference between the positions involved in the blend. In seven of the 14 blends, the source of the substitution occupies an earlier ordinal position in its target utterance than the position of the slot into which it moves in the other target. In contrast, in only one case was the source in a later position than the slot. In the remaining six cases there was no difference. The mean source position was 4.5, while the mean slot position was 5.0, a difference that is significant (t = 1.84, d f = 1 3 , ρ — ^ fallow you to take one course.
Moreover, the splices don't follow the standard stress constraint discussed above. However, by changing the assumed targets, we can account for the error as a simple splice, as (28) shows. (28)
I think it's terrible they only let you allow to take one course. 2 1 T ^ terrible they only let youUake one course. 1
T 2 : terrible you're only
'»allowed to take one course.
There are two problems with this account. First, we are forced to assume that the error was misrecorded. What appears in the error is allow to take, while our account claims the error is allowed to take. This doesn't seem a serious problem, since the two phrases are homophonous in casual speech. There is no way the recorder could have known what was actually uttered. A more serious problem concerns the stress pattern in the assumed targets. We have assumed in this case that a splice can occur at a position
177
732
D. Fay
where one target is stressed but the other isn't. This forces us to say that Post-Stress splices can go from one target to a stressed word in the other target, but not necessarily from a position just before a stressed word. This doesn't seem like much of an extension of the typology of splices, but it lacks independent support. Another error that can be dealt with in terms of the simple typology already established is given in (29). (29)
He is going to keep me in contact. 2 1 T i : He is going to keep me informed. 2 1 T 2 : He is going to keep in contact [with me],
A Pre-Stress Splice which follows the stress constraint gives the error, except for the omission of the phrase with me. To account for this omission we need only assume the speaker either cut himself off before completing the error or never intended to say the phrase at all. In the latter case we are still left with the plausible target: He is going to keep in contact. These ad hoc accounts should not be taken too seriously without supporting evidence. But if (28) and (29) represent a class of systematic deviations f r o m the patterns we have established for blends, we should find similar examples occurring more and more as error collections grow. Until this happens, it seems justified to attribute these deviant examples to other factors, as we have done. The remaining complex errors, listed in (30)-(32), seem to be true examples of multiple blends. (30)
(31)
(32)
... if I don't get the ball on the show. Τ ι: if I don't get the [show] on tne [road], ΝJ Ν T 2 : if I don't get the [ball] rolling. Ν It has a collar on because it owns by someone — it's belonged by someone. T j : it [belong]s [to] someone. t \ T 2 : it is [own]ed [by] someone. One thing that fascinated by me ... 1 T , : One thing that-,fascinated*me ... V
^
T 2 : One thing I was fascinated b y ' . . . In (30) and (31) two substitutions of the standard sort give the error. In 178
Blends
733
(30), one substitution seems to have dislodged the word it replaces, which in turn knocks out another word. But in each case the grammatical category constraint is honoured. A similar account holds in (31), except that, in this case, two substituting words come f r o m one of the targets. In (32), Pre-Stress and Post-Stress Splices have combined to produce the error. Each is of the canonical form with respect to stress, so we needn't introduce any new categories of splices. What we have tried to show in this section is that complex errors are of two sorts. Either they are combinations of already established types of blends or they are susceptible to no generalizations at all. We feel justified therefore in maintaining that no error types need be admitted beyond those already introduced. The one change forced on us by a consideration of complex blends is that simple blends may on occasion combine in a single utterance. In the next section, we consider how blends might fit into the time course of speech production. 2.6
Temporal
ordering
We have offered no mechanism for the creation of splice blends. However, if the description of the errors given earlier is correct, we can order the errors with respect to each other and to other events in speech production. Since sentence stress is needed to describe the varieties of splice blend that can occur, the determination of stress in an utterance must precede the occurrence of the error. A splice blend, then, is probably a late event in the speech chain, on the assumption that stress is calculated on the surface form of a sentence. We must assume that both target utterances are constructed at least to this point before they come together as a blend. It is conceivable that the two targets are constructed sequentially and only exist simultaneously in a buffer storage area just before being output. However, it is also possible that they are constructed in parallel. In either case, though, we can see that there is a large capacity processor/storage system underlying speech production. It may be significant though that no errors seem to require more thar two targets to exist at the same time. Perhaps the speech processor has the capacity for two, but not three, simultaneous utterances. It is instructive to try to order the two types of blends that have been identified with respect to each other. One approach is to ask whether substitution blends are like splices in being sensitive to rhythm. T o answer this question, we assigned primary and secondary sentence stress values to the target utterances for substitutions as we did for the splices. Table 3 shows the relation between stress values on the two words involved in the substitution.
179
734
D. Fay
Table 3.
Stress level in substitution blends
Target 2: Target 1
1 2 None
1
2
None
4 2 1
3 1
3
Table 3 reveals that ten of the 14 substitutions had the same stress. While this suggests a stress constraint on substitutions, it is also accounted for by the grammatical category constraint. Since the words involved in the substitution are of the same category and occupy roughly the same syntactic position they will, of course, generally have the same stress. This confound between stress and grammatical category is easily broken. We do find some cases compatible with the grammatical category constraint that are incompatible with a stress constraint (the four cases lying off the diagonal in Table 3). However, we do not find the inverse: there are no cases which violate the grammatical category constraint, yet have the same stress values. As far as the present evidence goes, then, there is no indication that substitutions are stress-specific. While we cannot conclude f r o m this that stress is not represented when substitution errors occur, there is no evidence that it is. It is possible then that substitution errors occur at an earlier stage in production than do splices, a stage before sentential stress is specified. Further evidence for this conclusion is found in an examination of morphology in blends. There are several cases of substitutions in which the substituting word does not preserve its morphological form in the error. These cases are given in (33)-(36). (33)
with severely profound children T j : with severely [retarded] children
(34)
T 2 : with [ p r o f o u n d l y retarded children I've been trying to wonder out T i : I've been trying to [figure] out
(35)
T 2 : I've been [wonder]ing When you boil down to it T j : when you [come] down to it
(36)
T 2 : what it [boil]s down to If she doesn't bother a little
180
Blends Τι: if she doesn't
735
[mmd] a little talk
T 2 : if she doesn't mind being [bother]ed In each case the substituting word takes on the morphological form suitable to the syntactic frame in which it appears. When the morphological form of a word is compatible with the syntactic frame into which it substitutes its form is preserved, as (37)-(40) show. (37)
It's spent me a year It's [taken] me a year
(38)
T 2 : I [spent] a year Since I ate him T , : Since I [fedl him
(39)
T 2 : Since he [ate] It may care to you that I don't think that way. Ti: It may [matter] to you
(40)
T 2 : You may [care] that Did he put it up or choose it out? T j : Did he put it up or [pick] it out?
1
T 2 : Did he put it up or [choose] it? There are two ways to account for this generalization. The Ordering Hypothesis supposes that substitutions occur before syntactic inflections are ever specified. So, for example, in (35) we would assume that the verb boil substitutes for the verb come before boil is marked for the third person present inflection. The Structure Hypothesis, on the other hand, supposes that boil is already marked for tense but that its internal bracketing must be compatible with the bracketing of come in order for the substitution to take place. That is, only that part of'[[boil] [s]]' that is compatible with VV AF '[come]' would be able to substitute. V There are several problems with the Structure Hypothesis. First, it rests crucially on the assumption that when come is marked for second person present tense, its structure is not '[[come] [ ]]' but rather '[come]'. In VV ÄFF V other words, it denies that there are zero derivations in syntax. For if there
181
736
D. Fay
were, '[[boil] [s]]' could substitute for '[[come] [ ]]' since their structure VV AFF VV AFF would be the same. A second problem arises even if zero derivations are not assumed. What is to stop the outer verb bracketing of'[[boilsjs]' from being identified with VV the verb bracketing of'[come]?' Unless this is prevented, boils will again be V allowed to replace come, contrary to fact. In the Ordering Hypothesis, neither of these problems arise. However, there is a different problem associated with (33). Here we are dealing with a derivational inflection, rather than a syntactic one. How are we to account for the stranding of the adverbial affix ly, when the adjective profound substitutes for the adjective retarded? Unless we suppose that substitution can take place before the ly is attached to profound, a highly dubious hypothesis under the theory of word formation assumed here, we must suppose that it is the morphological bracketing of a word that determines what it can substitute for. That is, we must assume the Structural Hypothesis for (33). If profoundly has the structure '[ ADV [profound] ly]' (see Aronoff, 1976), then the substitution will be a simple ADJ instance of the operation of the grammatical category constraint. It appears then that the Ordering Hypothesis accounts best for syntactic inflections and the Structural Hypothesis for derivational inflections. N o matter which of these explanations proves to be correct, it is apparent that substitution blends take place at a stage when syntactic structure is specified in a highly abstract form. This supports the view argued for above that substitutions occur at an earlier stage than splices. This claim receives further indirect support from the fact that splices never strand an affix. For example one never finds a splice like that in (41). (41)
*what she's get in for what she's [get]ting into what she's letting herself in for
Splices act as if words were unanalysed wholes, indicating once again that they take place at a later stage than substitutions. What has been argued in this section is that substitutions and splices have different characteristics. The former are sensitive to syntactic and (possible) morphological structure, while the latter are sensitive to sentence rhythm. Because of this difference, substitutions are believed to occur prior to splices during the construction of an utterance.
182
Blends 3.
737
Duplication errors as blends
We return now to the question of whether putative transformational errors, especially duplication errors, have the characteristics of blends. An examination of errors previously described as transformational reveals that some, but not all, could equally well be blends. Examples are given in (42). With each example is given its categorization as a transformational error (a fuller description and derivation of these errors can be found in Fay (1980a; 1980b). Below the error are two targets that could blend together to produce the error. (42)
a.
Be-Shift Why do you be an oaf sometimes? T i : Why do you [act like] an oaf sometimes? I T 2 : You can [be] such an oaf sometimes. Why did this be done? T ( : Why d i d - ^ s o m e o n e do this? T 2 : How could this be done?
b.
Passive It was all told me about. TV It was
all^told to me.
T 2 : It was something someone told me about. c.
Subject-auxiliary inversion Are those are for the taking? 2 1 T j : Are>those for the taking? 2
T2:
1
^Those are for the taking?
In (42a), the first example can be produced f r o m the two targets by a substitution blend, if we assume that the verb in the first target is a complex verb. In the second example, the two targets combine by a PreStress splice blend. In (42b) and (42c) the two targets also combine by a Pre-Stress blend. As these examples show there are transformational errors that can be interpreted as blends. But there are also transformational errors that cannot be blends, either because they have n o plausible source in two synonymous targets or because two target sources could only blend to produce the error by violating some of the constraints on blends discussed
183
738
D. Fay
above. For example, it is hard to imagine what the source could be for errors like those in (42), in which a WH-word remains in its deep structure position (Fay, 1979). (42)
Do you talk on the telephone with which ear? T: Which ear do you talk on the telephone with?
Likewise, errors like (43) in which the Pied Piping constraint is violated, do not readily suggest two sources for a blend. (43)
G o ahead and do what you're going to do else and I'll be there in a minute. T: G o ahead and do what else you're going to do ...
N o w it's always possible to force an error out of two targets as in (44), but only by violating the constraints on blends discussed above. (44)
If I was done that to 2 1 •that, done to me
In this example, two plausible targets are combined by a series of splices. But the splices themselves are implausible. Not only are there four of them, including one which skips around within a single target, and one which ends the utterance before the end of the target, but they violate the rhythmic constraints described earlier. The evidence discussed thus far does not present a compelling case against a theory of transformational errors. Although some errors might be described as blends, others cannot be. Furthermore, for the disputed cases there seems to be no evidence that would favour one account over the other. However, there is one subtype of duplication error, involving Particle Movement, for which additional evidence can be found. All known cases of duplication in Particle Movement are given in (46). (46)
a. b. c. d.
and they put in a lot of other special flavours in with it. because I have a filter that throws out everything around 1000 Hz. out. D o I have to put on my seat belt on? Would you turn on the light on?
These cases are discussed in detail in Fay (1980a). In that paper it is argued that the errors derive from a failure to delete the particle from its position next to the verb when the Particle Movement transformation copies the particle into its new position to the right of a noun phrase. A theory of
184
Blends
739
blends, on the other hand, claims that they arise in the blending together of two synonymous sentences, in which the particle is next to the verb in one case and after the noun phrase in the other, as in (47). (47)
E: Do I have to put on my seat belt on? 2 1 TV D o I have to put on my seat belt?) 2 1 ^ T 2 : D o I have to put my seat belt on?
Not only does the blend theory provide a plausible source for the particle errors, but also the blends have all the characteristics of a Post-Stress splice. That is, the splice takes place just after a stressed element in the source and includes unstressed material after both positions if there is any. For example, in (47) the splice takes place just after the stressed lexical item seat belt. Note that we are assuming once again that stress rules can take several words to be a single lexical item for the purposes of applying the rules. To this point, the transformational theory and blend theory account equally well for the errors in (46). But when we consider sentences with pronoun objects instead of noun objects, an interesting difference emerges. With a pronoun object, the transformational theory must predict duplication errors just as with full noun objects. The only difference in the two cases is that with pronoun objects the application of Particle Movement is obligatory. The blend theory, on the other hand, predicts the absence of duplicated particles when the object is a pronoun for the simple reason that one source would have to be ungrammatical. That is, the pronoun case parallel to Ti in (47), in which the particle appears next to the verb, is ruled out because Particle Movement is obligatory with pronoun objects. It cannot serve as a source for a blend with a pronoun object. An examination of the particle errors in (46) shows that they all have full noun phrases as predicted by blend theory and contrary to transformational theory. This evidence ought not to be accepted at face value however, since it might be that the absence of particle duplications with pronouns is simply due to an infrequent use of pronouns in particle constructions. To check this possibility, the record of conversational speech in Carterette and Jones (1974) was examined. This is a literal transcription of over 15,000 words of spoken speech among 24 individuals in groups of three. All examples of particle constructions were located in this sample and each was classified according to whether the particle appeared next to the verb or not (14 versus 15 cases), and, if the particle appeared after the object, whether the object was a p r o n o u n or noun. The
185
740
D. Fay
results of this latter analysis, the one of interest, are given in Table 4 along with the tally for the errors. Tabic 4. Number of pronoun versus noun objects followed by panicles in correct speech versus speech errors Object:
Pronoun
Noun
Correct Speech Speech Errors
11 0
4 4
As Table 4 shows, there is no bias toward noun objects when the particle is placed after the object in casual speech; in fact, just the opposite. The difference between the classifications of errors and correct utterances shown in Table 4 is significant by a chi-square test with the Yates correction for continuity ( X 2 ( l ) = 4.3, ρ