184 48 9MB
English Pages 359 [364] Year 1984
Intonation, Accent and Rhythm
Research in Text Theory Untersuchungen zur Texttheorie Editor Jänos S. Petöfi, Bielefeld Advisory Board Irena Bellert, Montreal Maria-Elisabeth Conte, Pavia Teun A. van Dijk, Amsterdam Wolfgang U. Dressler, Wien Peter Hartmann, Konstanz Robert Ε. Longacre, Dallas Roland Posner, Berlin Hannes Rieser, Bielefeld Volume 8
w DE
G
Walter de Gruyter · Berlin · New York 1984
Intonation, Accent and Rhythm Studies in Discourse Phonology
Edited by Dafydd Gibbon and Helmut Richter
w DE
G Walter de Gruyter · Berlin · New York 1984
Library of Congress Cataloging in Publication Data Main entry under title: Intonation, accent, and rhythm. (Research in text theory = Untersuchungen zur Texttheorie; v. 8) Includes indexes. 1. Prosodic analysis (Linguistics) — Addresses, essays, lectures. 2. Discourse analysis — Addresses, essays, lectures. I. Gibbon, Dafydd. II. Richter, Helmut, 1935. III. Series: Research in text theory; v. 8. P224.I5 1984 414 84-3212
CIP-Kurztitelaufnahme
der Deutschen Bibliothek
Intonation, accent and rhythm: studies in discourse phonology / ed. by Dafydd Gibbon and Helmut Richter. -Berlin; New York: de Gruyter, 1984. (Research in text theory; Vol. 8) ISBN 3-11-009832-6 NE: Gibbon, Dafydd [Hrsg.]; GT
© Copyright 1984 by Walter de Gruyter & Co., Berlin 30. Printed in Germany Alle Rechte des Nachdrucks, der photomechanischen Wiedergabe, der Herstellung von Photokopien - auch auszugsweise - vorbehalten. Satz: Dörlemann-Satz G m b H & Co. KG, Lemförde Druck: Rotaprint-Druck W. Hildebrand, Berlin Bindearbeiten: Lüderitz & Bauer, Berlin
CONTENTS The Authors and their Affiliations Dafydd Gibbon and Helmut Richter Phonology and Discourse: a Variety of Approaches
VII 1
Janet Bing A Discourse Domain Identified by Intonation
10
L. Boves, B. L. ten Have, W. H. Vieregge Automatic Transcription of Intonation in Dutch
20
David Brazil The Intonation of Sentences Read Aloud
46
Alan Cruttenden The Relevance of Intonational Misfits
67
Anne Cutler Stress and Accent in Language Production and Understanding . .
77
Grzegorz Dogil Grammatical Prerequisites to the Analysis of Speech Style: Fast/Casual Speech
91
Anthony Fox Subordinating and Co-ordinating Intonation Structures in the Articulation of Discourse
120
Anna Fuchs 'Deaccenting' and 'Default Accent'
134
Dafydd Gibbon Intonation as an Adaptive Process
165
J. 't Hart A Phonetic Approach to Intonation: from Pitch Contours to Intonation Patterns
193
W. Jassem, D. R. Hill, I. H. Witten Isochrony in English Speech: its Statistical Validity and Linguistic Relevance
203
Gerald Knowles Variable Strategies in Intonation
226
VI
Contents
Manfred Krause Recent Developments in Speech Signal Pitch Extraction
243
D. Robert Ladd English Compound Stress
253
Hans-Heinrich Lieb A Method for the Semantic Study of Syntactic Accents
267
Helmut Richter An Observation Concerning Intensity as a Predictable Feature of Intonation
283
Mitsou Ronat Logical Form and Prosodic Islands
311
Peter Winkler Interrelations Between Fundamental Frequency and Other Acoustic Parameters of Emphatic Segments 327 Name Index
339
Subject Index
342
The Authors and their Affiliations Janet Bing Department of English, Old Dominion University, Norfolk, Virginia, USA L. Boves Instituut voor Fonetiek, Katholieke Universiteit, Nijmegen, Netherlands David Brazil Dept. of English Language and Literature, University of Birmingham, England Alan Cruttenden Dept. of General Linguistics, University of Manchester, England Anne Cutler Medical Research Council Applied Psychology Unit, Cambridge, England Grzegorz Dogil Fakultät für Linguistik und Literaturwissenschaft, Universität Bielefeld, Federal Republic of Germany Anthony Fox Dept. of Linguistics & Phonetics, University of Leeds, England
VIII Anna Fuchs Seminar für Deutsche Philologie, Universität Göttingen, Federal Republic of Germany Dafydd Gibbon Fakultät für Linguistik und Literaturwissenschaft, Universität Bielefeld, Federal Republic of Germany J o h a n ' t Hart Instituut voor Perceptie Onderzoek, Eindhoven, Netherlands B. L. ten Have Instituut voor Fonetiek, Katholieke Universiteit, Nijmegen, Netherlands D. R. Hill Department of Acoustics, University of Calgary, Canada Wiktor Jassem Acoustic Phonetics Research Unit, Polish Academy of Science, Poznafi, Poland Gerald Knowles School of English, University of Lancaster, England Manfred Krause Technische Universität Berlin, Federal Republic of Germany D. Robert Ladd Department of Experimental Psychology, University of Sussex, Brighton, England
The Authors and their Affiliations
T h e Authors and their Affiliations
Hans-Heinrich Lieb Fachbereich Germanistik, Freie Universität Berlin, Federal Republic of Germany Helmut Richter Fachbereich Germanistik, Freie Universität Berlin, Federal Republic of Germany Mitsou Ronat Centre National de Recherche Scientifiqi Paris, France Wilhelm Η. Vieregge Instituut voor Fonetiek, Katholieke Universiteit, Nijmegen, Netherlands Peter Winkler Sozialwissenschaftliche Fakultät, Universität Konstanz, Federal Republic of Germany I. H. Witten Department of Acoustics, University of Calgary, Canada
DAFYDD GIBBON A N D HELMUT RICHTER
Phonology and Discourse: a Variety of Approaches Both 'phonology' and 'discourse' tend to arouse strong feelings among those of differing linguistic persuasions; experience has shown that the combination 'discourse phonology' raises as many problems as its constituents. However, the field which the contributions to this volume treat is reasonably clear: that of 'intonation' and related phenomena in relation to the constituents of discourse. It is to the credit of the contributors that they have found it possible to overcome some initial hesitation and take part in this cooperative venture. 'Disourse' is used here with both methodological and substantive implications. O n the methodological side, it is not intended to imply a particular approach (such as 'discourse analysis'); rather, it indicates a general tendency to select from certain data types: i. Natural data (Cutler, Fuchs, Gibbon, Jassem, Knowles, Winkler), as opposed to paradigmatically elicited or generated test data; ii. Discourse response data (Bing, Cruttenden, Fuchs, Gibbon, Knowles, Ladd, Lieb, Richter, Ronat), associated with terms such as 'adjacency pair', 'natural response', 'proper response relation' (Lieb), as opposed to single sentences; iii. Discourse oriented sentential data (Bing, Cruttenden, Fox, Knowles) such as vocatives, expletives, 'epithets', quotations and verba dicendi, subjective adverbs, 'tags', negation, appraisive 'superlatives', as opposed to sentential units (Dogil, Boves, 't Hart, Jassem); iv. Data selected in the context of discourse types (or styles, speech 'registers'), such as reading aloud (Brazil, Boves, 't Hart, Jassem) or fast versus slow speech styles (Dogil). O n the substantive side, those descriptive categories for discourse which were used to characterise these data types are clearly in the forefront of attention in practically all the papers. It is the sound-oriented, phonological and phonetic aspects which are the main substantive concern, however: intonation forms and structures, both in their own right and in relation to textual locutions; accent and 'stress' and their placement in words and sentences under specific discourse conditions such as repetition, anaphora, speech timing and its phonological implications. A further dimension is the conception of these categories as processes; this conception is inherent in the signal processing approach (Krause) and in experimental phonetics
2
D. Gibbon and H. Richter
in general, but other papers (cf. Cutler, Gibbon) use this aspect to extend the descriptive power of linguistic descriptions. Indeed, a concern with temporal organisation (e. g. pitch contours as a function of time, rhythm, tempo) and therefore, implicitly or explicitly, with 'processes' might be considered an a priori condition for any treatment of suprasegmental, prosodic, discourse phonetic or discourse phonological matters. The contributors (except Cutler and Gibbon) are mainly concerned with the 'post-production' phases of phonetics and phonology: signal processing and the experimental phonetics papers on the one hand, and interpretative linguistic analysis on the other. Issues in articulatory phonetics are not dealt with; on the signal processing side, an introductory overview (Krause) is included in view of the increasing methodological importance of computer supported acoustic analysis of pitch. There are three issues which stand out particularly, in the set of contributions taken as a whole, as being of lasting concern in this field: i. The nature of intonational meanings, with a number of different approaches crystallising out. Two of the main lines might be thought of as the 'basic meaning' approach in various forms (cf. Bing, Cruttenden), with a 'system relative' version proposed by its critics (cf. Knowles), on the one hand, and the 'configurational' approaches on the other, including a 'pattern indexing' approach (cf. Fox, Gibbon) and a 'cohesion marking' approach with categories of 'focus', 'anaphora' and the like (Fuchs, Ladd, Ronat). A different framework is provided by Lieb in his formal reconstruction of 'speaker attitudes'. ii. The concept of 'normal intonation\ 1normal accentuation' (especially Fuchs, Winkler; also Ladd, Ronat, Lieb), in particular with respect to the position of an utterance within discourse (e.g. initial; 'first instance' vs. 'second instance') or in a specific discourse type (e. g. quotation; reading aloud). Clarification of this notion takes place here by careful study of, among other things, a range of different 'non-normal' forms: contrastive accent, the hitherto poorly understood 'default accent' and other types (cf. Fuchs, Ronat); emphasis (Winkler). iii. The autonomy of discourse phonological systems relative to specific locutionary domains such as the sentence (Gibbon, Knowles, Ronat; most others, implicite et passim). While not all would agree on details, Fuchs' statement would probably not be contested too hotly by the contributors: "The syntactic hierarchy of a sentence does not determine accent choice, but plays an important role as a framework for the choices to be effected." Generative phonology, which applied a non-autonomy hypothesis to prosodic features in the first two decades of its existence, has also adopted various versions of the autonomy hypothesis during the past decade (Dogil, Ronat). It is perhaps the last of these issues which will ultimately provide a key to the solution of the first two. The question whether intonation struc-
Phonology and Discourse
3
tures can be reduced to constituent structures at other levels, or whether intonational meanings can be reduced to meaning categories applicable in the lexicon of the language concerned, or whether a given intonation or accentuation is 'normal' in terms of congruence with other language systems can only be answered satisfactorily when close attention is given to the 'autonomy of levels' issue. The following paragraphs are therefore concerned with isolating some of the concepts involved. The central notion is that of 'level', a term which has been variously conceived in different linguistic methodologies as 'level of analysis', in method-conscious approaches, or as 'level of representation', in approaches concerned with formalised descriptions. A 'level of description' may be thought of as uniting both aspects. It is a function of a method of analysis, M\ in the minimal case, Μ = (Ο, D, C, 5), a quadruple consisting of the linguistic observer, his data, his descriptive categories and the structure into which these enter. 1 The data are the observed events (utterance tokens); empirical judgments or measurements are a function of the triple {O, D, C), and determine a universe of discourse. The roles of D and C need no further comment; the role of Ο is also evident: a judgment or measurement is a contingent fact relative to an observer with certain 'observational properties' (e.g. 'aided by technical equipment') and cannot be automatically generalised to other observers. For exposition here, we can concentrate on C and S as the major determinants of linguistic levels. A few examples will clarify their role. In perceptual phonetic descriptions, C may be the categories of the IPA matrix and S a 'bead on a string' linear model of structure; in a phonemic description, C is supplemented by a functional category of contrast, and S by a structural principle of complementary distribution (as well as metatheoretical principles such as simplicity). In morphology, a criterion of meaning similarity or identity is central. The levels of analysis in pre-1970 transformational grammars may be explained similarly: surface structure was determined by morphological similarity and the grouping criteria of traditional IC analysis; deep structure was determined in part by S, (generalisations over discontinuous constituents), in part by C (both by ambiguity types and by paradigmatic similarity relations between sentences, based mainly on generalisations over valency properties of verbs - subcategorial and selectional restrictions). Later developments introduced anaphoric relations into C, then the 'natural response relation' mentioned above, which made questions of discourse phonology more accessible to this framework (cf. Ronat). These? examples are particularly clear cases; the 'factual content' of other approaches may be similarly described. It is clear that some levels of description are more closely related to each other than others; levels which have similar C or similar S may be 1
Gibbon, D. Perspectives of Intonation Analysis. Bern, Lang, 1976, p. 91.
4
D. Gibbon and H. Richter
said to be 'compatible'·, a strong postulate would be that if levels Z.,· and Lj are compatible, and Lj and Lk are compatible (ΐφ j, ]Φ k, ίφ k), then Li and Lk are compatible, i.e. compatibility is transitive. T h e details of this relation will not be discussed here. A 'coherent' linguistic description may then be said to consist of a set of compatible levels; these levels share a loose family resemblance by virtue of the transitivity of this relation. A semantic level, a level of intonation description, a perceptual phonetic level and others may therefore belong to the same coherent linguistic description. A subset of levels in such a description will contain phonetic as well as functional categories in C; let us say that C = (PC, FC). Forms occurring at levels Lt and Lj may enter into a 'concomitance'relation2 if P C , and PC: are disjunct, and different PC, or FCj apply to the forms when PC, ana PCj occur together than when P Q or PC, do not occur together. For example, a rise-fall pitch contour P C , in English may be interpreted as positive appraisal PC,i when it occurs with a word like 'lovely' (PCj, which has a similar positive connotation, FCj) or with an interjection 'mm' (PQfe, with no specific connotation of any kind, P Q ) . However, with a word like 'John' PC) used as a proper name in a vocative context FC)j, PC,· may be interpreted as negative appraisal FC&; if 'John' is in a context like 'John's going to do it' FCI, then PC,· may receive an interpretation FCQ which may be different from either PC,i or F Q 2 or both. An entirely different set of cases arises when the use of different methods leads to 'incompatible' levels, in particular when PC,· and PCj are disjunct and O,· and Oy are not identical (i. e. when the observational properties of O,· and Oj differ). A typical case is a perceptual phonetic 'transcription' as P C , vs. an experimental phonetic 'registration' as PCj. In these cases, an extension of the observer's empirical faculties by means of material tools and their supporting theories makes the phonetic descriptive categories at each level quite incommensurable; the levels may then be said to describe different domains of discourse. Between these types of incompatible level, assuming that PC,· and FCj are not distinct, it is possible to establish a relation of a relation of 'correlation'. This is the basis of the idea of 'articulatory correlates', 'acoustic correlates', 'auditory correlates' of intonations and other forms which are established as mappings between empirically independent domains. It is conceivable (and explicitly postulated in the paper by Gibbon) that the articulatory and auditory levels may, in a discourse context, be analysed as compatible, not just correlated; the same assumption underlies the 'stylization' method used b y ' t Hart and Boves & al. 1
Richter, Η., & D. Wegner. "Die wechselseitige Ersetzbarkeit sprachlicher und nichtsprachlicher Zeichensysteme", in: Posner, R. und H.-P. Reinecke (eds.), Zeichensysteme, Wiesbaden, Athenaion, 1977.
Phonology and Discourse
5
The following sections are a contribution toward an up to date explication of the notion of 'correlate' (in the sense underlying approaches like those of Jassem, Richter, and Winkler), and of the changing conception of 'parameter' based on the C, and Cj of the correlated levels. Given a linguistic structure @ of a sentence of a language such as German or English, it will be a reasonable assumption that this structure consists of a finite number s of linearly ordered elements e. So the structure can be represented as an ordered s-tupel @ = (ex, e2,. . ., e u . .
e)
(1)
If the structure is to possess acoustic correlates element by element, it must at least be the case that some set derived from acoustic data belonging to utterances of the sentence having (§5 as its structure, is mapped into and onto So we have to assume a surjection (i. e. every value e is assigned to at least one argument) Con = :
i, 2>
® (A » e N)
(2)
with the co-ordination
of elements of
to elements of J+x mjj&c ... & mj+y j m IJ& · · • & mI+ y,J+ χ
= vI,J+x = vI+y,J = VI+ y,J+ χ
or or
(x, y > 0), we can mark the case that the set of scalars ν can be partitioned into k subsets containing η elements each and that the ν can be arranged (cf. Figure 2) into a matrix SB with elements Vy such that Figure 2
h
1
J
tn
Λ
vn
vxj
V\n
Pi
ViX
Vij
Vin
Vkj
Vkn
SB =
Pk
Phonology and Discourse
7
VUJ above V^Y, if vnj = VJJ and V^Y = vj+ YJ and vtJ1 before v^, if = νΙ>} and vVjl = vI>J+x Let us call äß matrix of acoustic valuations of an utterance. The valuations Vij = f(P»t·) can be said to be values of parameters P{ at valuation times tj. While the entries of SR are to be understood as the results of procedures such as ^ - e x traction or itensity measurement, SB represents a variable accounting for the speech signal being in one respect or another problem-oriented. With the exception of a trivial transition from SR to 30, this implies that parameters are no longer the dimensions of measurement themselves hut varying bundles of these activated with varying 'density' along the temporal axis (cf. Figure 3). Figure 3
Variation, however, is not arbitrary even though the connection was not required to remain constant. What count are the topological constraints introduced with the transition from to SB as a formulation of the notion that parameters are expected to form characteristics which are both essential and not biased by looseness due to their partially accidental basis. Whether or not the above construction will succeed as an up-to-date explication of the concept of parameter in intonation research needs not concern us too much. (Of course the scalars vy can figure in equations as the values of parameters usually do.) The point is that the correlates Sy one is in search of can, in our view, be obtained only when one starts from SB. Statements about which measurement dimension provides the relevant intonatory cues seem to be outdated. So let us introduce parameter combinations 6
η
Figure 4: Regression cofficient b = 0. Values of dcompletely random.
as shown in Fig. 2. Note that if the regression coefficient b = 0, then isochrony is indefinite, as shown in Figs. 3 and 4. We shall consider two models with two variables. In one of these, i/will be estimated from the sum of the mean durations of the constituent phone classes. This sum will be denoted by d. 7.2. A model with three variables Another possibility that has to be investigated is that d, the duration of the rhythm unit, depends both on d, the mean cumulative duration of the constituent phone classes, and on their number n. The value of d may be expected to correlate highly with n, yet the interrelations between d and η may be such that a better estimate of d is obtained if both d and η are assumed to have an effect on d. In a general from, this multiple association is expressed by a regression equation d = a + bd + cn. 8. Analysis of Regression 8.1. Two variables: d and η 8.1.1. Linear regression In a bivariate regression model, d is first made to depend on η only. For no isochronism, the regression line can be made to pass through the origin and form a 45° angle with the abscissa by performing a linear transforma-
218
W. Jassem, D. R. Hill, I. H. Witten
tion of both axes, with ( d - d) / ρ on the ordinate and (η - ή) on the abscissa. The regression equation then takes the form ^
^ = a + b(n- ή). Ρ Ideally, under this transformation, a should equal zero, but in reality a « 0 because ρ is not calculated from exactly the same raw data as d. As mentioned before, some rhythm units had to be left out because their duration was not measurable. Also, in a few cases, the total duration of the rhythm unit could be measured, though segmentation was doubtful so the individual values of /»were not measured. It will be seen that the resultant discrepancy is entirely negligible. Table 6 n-fl
d - d Ρ
FOOT 0.00 0.00 0.00
0.00 0.00 0.00
NRU ANA
a (n-fl) σ 2.21 1.63 1.53
d-d Ρ
1.98 1.47 1.68
r
r*· 100%
a
b
arctg b
0.72 0.62 0.83
51.7 39.0 69.6
-0.0026 -0.0018 -0.0036
0.644 0.561 0.916
32.8° 29.3° 42.5°
Table 6 gives the results of an analysis of regression with the variables d and n, as expressed by eq. (1). The magnitude that is most directly related to isochrony is either the regression coefficient b or the corresponding arctg b. The following conclusions can be drawn from Table 6: (1) The regression coefficient for ANA is quite close to unity, and the corresponding arctg b, i.e., the angle of the regression line with the abscissa, is close to 45°, consequently there is very little isochrony in ANA. (2) The NRU has a coefficient of regression which is nearly half (exactly .613) the coefficient of regression for ANA, and the angle of the regression line for NRU is 0.69 the angle for ANA13. There is distinct tendency towards isochrony in NRU, though it is not very close to strict isochrony. (3) The regression coefficient for FOOT is intermediate, but closer to that of NRU. (4) The coefficient of determination r 2 · 100% indicates that though there is distinctly more variance unaccounted for in the regression of NRU than in the regression for ANA, both may be considered as satisfactory in the sense of their predictive power. The fact that the coefficient of determination is high for ANA and lower for NRU makes it plausible that there is a factor which is active in the latter but absent in the former. Prob13
Cf. above, Sec. 6, on the relative mean durations of NRU and ANA.
Isochrony in English Speech
219
ably this factor is the distinction between final and nonfinal position. The ANA can, by definition, only stand in a nonfinal position. The coefficient of determination for the FOOT is intermediate between the other two and indicates that theory (A) is not, in a statistical sense, unacceptable. But it is shown to obliterate a distinction which is statistically very highly significant (viz. that between ANAs and NRUs). The isochrony effect is shown in Fig. 5.
8.1.2. Quadratic regression Table 7 gives the results of a quadratic regression of d on η and n2. After normalization of the variables, the regression equation here is of the form ^ψ-
= a + b(n-
n) + c(n-
ή)1.
(2)
Table 7
FOOT NRU ANA
wm
σ(η-η)1
4.90
7.14
0.29
0.72
2.67
4.52
0.37
0.62
2.34
3.81
0.55
0.83
r[(n-n)»,d] r[(n-fl),d]
r[(n-n), (n-n)']
a
b
c
H-100%
0.27
- 0 . 1 4 1
0.619
0.0282
52.7
0.47
- 0 . 0 7 8
0.523
0.0287
39.6
0.56
0.117
0.844
0.0515
70.3
220
W . Jassem, D. R. Hill, I. H. Witten
It can be seen from Table 7 that the coefficients of determination for all three types of rhythm unit are marginally better than in the linear model, which is more directly interpretable. Therefore, there is very little, if anything, to be gained from a quadratic regression model. 8.2. Two variables: d and d. It is also possible to estimate the duration of a rhythm unit from the cumulative average duration of the constituent phones. In other words, we now take the mean duration of each phone in the rhythm unit, as shown in Table 3 in the appropriate column, add the figures obtaining d and estimate i/from this. If there is no isochrony, then, on an average, d = d. In the case of strict isochrony, d is constant, so there must be a 'coefficient of compression'. We have found it convenient to express the relation between d and dby d-d , ι d . —— = a + b(3) Ρ Ρ because, again, the regression coefficient and its corresponding angle are easily interpretable. Table 8
FOOT NRU ANA
d P
d σΡ
Γ
Γ2· 100%
a
b
arctg b
5.74 4.25 2.92
2.09 1.48 1.58
0.80 0.73 0.90
63.8 52.8 80.4
-4.35 -3.07 -2.78
0.758 0.724 0.951
37.1° 35.9° 43.6°
The results of the analysis of regression are contained in Table 8. As the means for d are here normalized by ρ i. e., pFoot, />NRU and PANA respectively, they represent in fact the mean number of phones in the different rhythm units. As in Table 6, the correlation coefficient is highest for ANA and smallest for NRU, but each is distinctly higher than its counterpart in Table 6. The values of the coefficients of determination are therefore also each better. Thus, by taking the means for the various phone classes which go to make d, we have accounted for part of the variance still unaccounted for in the previous bivariate models. Fig. 6 shows the isochrony effect. 8.3. Regression with three variables Even though the simple regression models are quite satisfactory as judged by the high coefficients of determination, it is tempting to see whether some further improvement might not be achieved by predicting d from both d and n.
Isochrony in English Speech
Figure 6: Linear regression of do η dm Foot, N R U and Ana according to Table 8.
Figure 7: Linear regression of don d. Regression lines brought to a common origin.
221
222
W. Jassem, D. R. Hill, I. H. Witten
Table 9 η FOOT NRU ANA
5.76 4.25 2.94
d
d
Ρ
Ρ
5.76 4.25 2.92
5.74 4.25 2.92
σ(η) 2.21 1.63 1.53
4 2.09 1.48 1.58
r(n,i) r(nA Ρ 0.93 0.91 0.96
0.72 0.62 0.83
,d d.
Ρ
a
b
c
r 2 · 100 %
1.33 1.11 0.17
-0.187 -0.221 -0.378
0.942 0.950 1.302
64.4 53.8 81.3
Ρ
0.80 0.73 0.90
The following linear regression was tested: i = a + b-_+ cn, (4) Ρ Ρ and Table 9 gives the results of the regression analysis; cf. also Fig. 7. Ideally, the three means: n, dΊρ and dJρ should be equal. Again, the slight differences are due to small differences in the raw data. The correlation coefficients for (n, d/p) are naturally very high. Both for (n, dip) and (d/p, d/p) the correlation coefficients are highest for ANA and lowest for NRU, but all values are high. As can be seen from the values of the coefficients of determination, the three-variate model accounts for more variance than any of the bivariate models, though it is only marginally better than that for d and d.
9. Some Linguistic Considerations It was explained in Jassem ( 1 9 4 9 ; 1 9 5 2 ) that if the model of rhythm there proposed is accepted, then rhythm can very simply be indicated in the phonemic transcription of a running English text by observing the rules quoted above in Section 2. The beginning of Unit 3 0 in Halliday ( 1 9 7 0 ) transcribed according to the rules reads as follows: [izöaet tjiz tcö'it o'izit ta'pffidin s'maffls traepll ifaid'n3ö)n altar a'bacot öa'wedu) aidav'sent a'keibl] The beginning of Unit 39 reads: [ai'laik Öast'p3im baiöaeti'levnj3r scold os'treihan boi didjcD silt] Is that /'izöaet/, cheese / ' t j i z / , earlier / sliar/ are examples of NRUs that are co-extensive with TRUs because they include no anacrusis. To eat /teo'it/, or is it/o izit/, to put in /ta'poadin/, if Id known /ifaidn3(Dn/ are examples of TRUs that begin with an anacrusis. No attempt has ever been made to indicate rhythm as described by Abercrombie ( 1 9 6 7 ) and Witten ( 1 9 7 7 ) in a running transcription of an English text. As mentioned earlier, both models are independent of syntax, but both admit interrelations between syntax and rhythm. Abercrombie's model, as applied by Halliday, sometimes results in very peculiar tone groups such as //if Id known earlier about the wedding I'd have//sent a cable//. The
Isochrony in English Speech
223
first tone group includes the subordinate clause plus the subject and part of the predicate of the main clause, the remainder of which forms the second tone group. Many peculiar tone group boundaries may be found in Halliday 1970. Here are some more examples: //he was grey and he was woolly and his//pride was inordinate, he danced on a sandbank in the//middle of Australia and he//went to the Big God Ngong/ / (p. 121). / / on the Isle of Man you can//still ride in a horse-drawn tram// (p. 117). Such tone group boundaries, strange from the syntactical point of view, are due to the assumption that unstressed (unaccented) syllables always belong to the same rhythm unit as the preceding stressed (accented) syllables and from the assumption that silence (pause) is a marker of a tone-group boundary 14 . Model (B) of English rhythm has a simpler relation to syntax and does not result in such disconcerting discrepancies between the phonological and the syntactical structure of running speech.
10. Summary and Conclusions A statistical method has been applied to express isochrony in quantitative terms, and an attempt has been made to find isochrony in the acoustic speech signal. It was assumed that if isochrony was at all detectable in the speech wave, it should affect the duration of the phones which constitute the rhythm units. Tape recordings of continuous, naturally spoken General British English ('RP'), consisting of a total of almost 2500 successive phones served as experimental material. The duration of the phones was measured spectrographically and the phones were grouped into classes according to their mean duration. Two theories of English rhythm were tested: Abercrombie's — called (A) - which postulates one type of quasi-isochronous rhythm unit, the F O O T , and Jassem's - called (B) - which posits two types, viz. ANACRUSIS with no isochrony, and NARROW R H Y T H M U N I T which tends towards isochrony. Four regression models were applied making the duration of a rhythm unit depend (a) linearly on the number of phones in the unit, (b) curvilinearly on the number of phones in the unit, (c) on the sum of the mean durations of the phones in the unit and (d) on both the sum of the mean durations of the phones in the unit and their number. The results of the regression analysis show that in all models the tendency towards isochrony is minimal in ANACRUSIS and quite distinct, if not very strong, in the N A R R O W R H Y T H M U N I T . Isochrony is also present in the FEET, but the F O O T averages out and obliterates the distinction be-
14
On a distributional view of the tone-group, see Jassem 1978.
224
W. Jassem, D. R. Hill, I. H. Witten
tween ANACRUSIS and NARROW R H Y T H M U N I T which is shown to be statistically very highly significant. In keeping with theory (B), rhythm and isochrony can be very simply indicated in running transcription of English text, which does not appear to be possible within theory (A). This is of particular importance for computer-controlled speech synthesis by rule. Using a very simple algorithm based on rules supplied by theory (B), the temporal organization of speech may be generated from a transcription indicating the incidence of accent and boundaries between T O T A L R H Y T H M UNITS plus a table of mean phone durations. Theory (B) relates the syntactic component of spoken text to its phonological component much more simply than does theory (A).
Acknowledgements The authors wish to express their appreciation of a grant from the National Research Council of Canada to the Department of Computer Science, University of Calgary, Alberta which enabled WJ to work there during an extended visit to Canada, and to thank the British Council for supporting a shorter working visit by WJ to the University of Essex, Colchester and one by IHW to Poznaü. The co-operation of dr M. Krzysko and Mr. P. Stolarski of the Computer Centre of Mickiewicz University, Poznari, in the computing labour is also gratefully appreciated.
References Abercrombie, D. (1964). Syllable quantity and enclitics in English. In D. Abercrombie & al. (eds.), In Honour of Daniel Jones. Longmans: London. 216-222. Abercrombie, D. (1967). Elements of General Phonetics. Edinburgh University Press: Edinburgh. Abercrombie, D. (1973). A phonetician's view of verse structure. In Phonetics in Linguistics. Longman: London. 6-13. Adams, C. (1979). English Speech Rhythm and the Foreign Learner. Mouton: The Hague. Allan, G. D. (1968). On testing for certain stress-timing effects. UCLA Working Papers in Phonetics 10: 47-59. Bolinger, D. L. (1965). Pitch accent and sentence rhythm. In D. L. Bolinger, Forms of English. Hokuou Publ. Co.: Tokyo. 139-180. Gabriel, K. R. (1964). A procedure for testing the homogeneity of all sets of means in analysis of variance. Biometrics 20 (3): 459—477. Halliday, Μ. A. K. (1970). A Course of Spoken English: Intonation. Oxford University Press: Oxford. Hockett, C. F. (1955). A Manual of Phonology. Indiana University Publications in Anthropology and Linguistics, Memoir 11. Jassem, W. (1949). Indication of rhythm in the transcription of Educated Southern English. Le Maitre phonetique 111/92: 22-24.
Isochrony in English Speech
225
Jassem, W. (1952). Stress in Modern English. Bulletin de la Societe Linguistique PolonaiseXll: 189-194. Jassem, W. (1978). O n the distributional analysis of pitch phenomena. Language and Speech 21: 362-372. Jassem, W. (1980). Fonetyka jfzyka angielskiego (English Phonetics) P W N : Warszawa, 7th ed. Jassem, W. (1981). Podrfcznik wymowy angielskiej (A Handbook of English Pronunciation) P W N : Warszawa, 7th ed. Jassem, W. & Gibbon, D. (1980). Re-defining English accent and stress. Journal of the International Phonetic Association 10: 2-16. Jones, D. (1976). Outline of English Phonetics. Heffer: Cambridge. 9th ed. repr. Ladefoged, P. (1975). A Course in Phonetics. Harcourt, Brace, Jovanovich: N e w York. Lea, W. A. (1974). Prosodic aids to speech recognition: IV. A general strategy for phonologicallyguided speech understanding. Univac Rep. P X 10791. Lehiste, I. (1973). Rhythmic units and syntactic units in production and perception. Journal of the Acoustical Society of America 54: 1228-1234. Lehiste, I. (1975). T h e role of temporal factors in the establishment of linguistic units and boundaries. In W. U. Dressier & al. (eds.), Phonologica 1972. 115-122. Lehiste, I. (1977). Isochrony reconsidered. Journal of Phonetics 5: 253-263. O ' C o n n o r , I. D. (1965). T h e perception of time intervals. Progress Report, Sept. 1965. Phonetics Lab. UCL. 11-13. O ' C o n n o r , I. D. (1967). Better English Pronunciation. Cambridge University Press: Cambridge. O ' C o n n o r , I. D. (1968). T h e duration of the foot in relation to the number of component sound-segments. Progress Report, June 1968. Phonetics Lab., UCL. 1 - 6 . Pike, K. L. (1945). The Intonation of American English. University of Michigan Press: Ann Arbor. Shen, Y. and Peterson G. G. (1962). Isochronism in English. University of Buffalo Studies in Linguistics, Occasional Papers 9: 1-36. Uldall, E. (1971). Isochronous stresses in R. P. In L. L. Hammerich et al. (eds.), Form and Substance. Akademisk Forlag: Odense. 205-210. Witten, I. H . (1977). Flexible scheme f o r assigning timing and pitch to synthetic speech. Language and Speech 20: 240-260.
GERALD KNOWLES
Variable Strategies in Intonation The best thing to do with a good idea is to knock it down and replace it with a better one. Of the different models of intonation that have been proposed, the 'British' model as put forward in the works of Kingdon (1958) and O'Connor and Arnold (1973) has been remarkably successful in getting to grips with the often elusive patterns of intonation. This model has probably achieved all it can be expected to do, based as it is on structuralist assumptions. Now that linguists have gone on from the study of structures and systems to the study of language in communication, we require rather more of an intonation theory, and need an improved model. In order to move forward, we need new evidence, which will challenge the old ideas and force us to examine them afresh. British studies have dealt almost exclusively with RP, and there are just a few exceptions such as Brown, et al. (1980). In this paper I shall take a diasystemic approach to intonation. The traditional model is unable to handle sociolinguistic variables, and trivially different patterns in closely related varieties of English are irreconcilably different: this I regard as the reductio ad absurdum of the 'tone' model. I shall be referring to several varieties of English, but mainly R P and Scouse. Scouse is the dialect of Liverpool and Merseyside which arose in the nineteenth century as the result of large scale immigration from Ireland (Knowles, 1975), and which remains an interesting hybrid of West Lancashire speech and Anglo-Irish. In order to analyse Scouse intonation, one has first to understand some of the ways in which Northern English and Anglo-Irish intonation differ from RP. Although this is of course a very partial diasystemic approach, it does avoid some of the false generalisations that can arise from a study of RP alone. What happens to good ideas in practice, is that they are received somewhat uncritically, and become orthodoxy and conventional wisdom, and anyone who dares to challenge them is liable to be branded a crank or a heretic. In 1775, Steele had the idea of transcribing English rhythm in a musical notation, and today, most linguists and teachers of English 'know' that English has an isochronous stress. For my own part, I cannot believe that of English speak in musical bars of the kind described by Abercrombie (1965) or Halliday (1967): indeed, the rhythmical hypotheses I regularly use to interpret fundamental frequency contours are incompatible with isochrony. Again, the most obvious observations on intonation concern pitch movements, and it is widely taken for granted that by
Variable Strategies in Intonation
227
segmenting pitch patterns in some way we can arrive at the irreducible units of intonation. On the contrary: we need to take several other factors into account, including rhythm, gradient, voice quality, and the way vowels and consonants fit the pitch contour. In this paper I shall retain as much as possible of the traditional approach, including terminology and symbols; but the effect of going outside RP is that a quite different network of relationships among familiar patterns emerges, and many of the assumptions on which traditional analyses are based are shown to be untenable.
1. Maps and Strategies If we start with the lexico-grammatical or 'verbal' system, as linguists naturally do, and look outwards from there to intonation, the hypothesis that naturally suggests itself is that intonation is a function of something in the verbal system. The aim of a theory of intonation is simply to find rules to map something in the verbal system on to intonation. This approach is at first very successful: non-final sense groups are mapped on to low rises, yes/no questions on to higher rises, statements and commands on to falls, and so on. Exact equivalents for these mapping rules can be found in other languages. However, when these rules are tested against real data, they turn out to be stereotypes, with all sorts of hidden assumptions built into them. The view taken here is that intonation is an autonomous semiotic system, which plays a rather different role than the verbal system. The speaker has not only to decide what to say, but how to convey it effectively to the addressee. H e has several channels at his disposal - verbal, intonational and paralinguistic - and employs communicative strategies to combine the signals sent on each channel so that the total effect will be correctly interpreted by the hearer. Conventional linguistics concentrates on the content of the message that is conveyed: intonation is part of rhetoric, or the strategies employed to get that message across. In the analysis of a complex speech event, it is not always easy to decide which channel does what. Perhaps few linguists would agree with the metrist who argued that the dactyl is a merry foot, citing as evidence the line Merrily, merrily shall I live now: but the literature abounds with examples in which the meaning of the words is ascribed to the intonation. Similarly we must not confuse the role of intonation with the total strategy of which it is a part. For instance, intonation is important in strategies for conveying illocutionary force, but it is unlikely that intonation has illocutionary force in itself. The aim of a theory of intonation is to identify formal patterns and to identify their role in speech. It is then a problem for a theory of rhetoric
228
G. Rnowles
to show how these patterns are used in communicative strategies. In practice, it is impossible (and undesirable) to keep the two strictly apart, and although we shall be concentrating on a theory of intonation, we shall have to deal where appropriate with rhetoric more generally. 2. Accent The simplest job that intonation has to do is to draw the hearer's attention to a part of a locution, and the gesture involved is an upward obtrusion in pitch. When this gesture is used to highlight a single syllable, it is conventionally called an accent: the pitch rises to a peak on the accented syllable and then returns to a lower pitch. The notion of accent is often associated with Bolinger (1958) but it is in fact one of the oldest ideas in lingustics. It can be traced from the Vedic tradition through several ancient traditions to comparative philology, where it proved of central importance from the time of Verner on. Bolinger's particular formulation of accent is not without its problems, and his accents A, B, C are prosodically complex and open to the same sort of objections as British tones. Although accents A, B, C have been used as a universal basis of comparison (Bolinger, 1978), their description does not seem to my mind to be sufficiently general to apply even to English outside the standardised varieties. 2.1. The Accent Contour Although an accent highlights a whole syllable in principle, it seems to focus on certain parts more than others: on the vowel (or 'syllabic') rather than consonants, and on the first element of a falling diphthong (e.g. /ai/) or the second element of a rising diphthong (e.g. /ju/). Ultimately accent focuses on a single point which we can call the accent point. Several kinds of pattern focus here, but we shall deal here with only two, namely the pitch contour and the rhythm. The contour begins with a transition to the peak at the accent point, and then glides down again. The terminology for British tones labels only the part of the contour following the accent point, so that this rising-falling contour is conventionally called a 'fall', and indicated with the grave sign ("). The RP fall - at least as it is usually described - has a relatively rapid fall from the accent point, and then the contour levels out at low pitch; this can be described as a 'concave' fall. In other kinds of English, notably in Northern English, the tone is correspondingly 'convex', beginning relatively flat after the accent point, and falling steeply later. The difference here is trivial, but unless it is recognised it can lead to a systematic failure to recognise the accented syllable in some kinds of English: Northerners might seem to have a perverse tendency to delay the fall and to accent the wrong syllable. Thus in Northern High Street, the pitch
Variable Strategies in Intonation
229
movement on street might give the impression to a Southern ear — or to an analyst trained in RP intonation! - that the accent has shifted to street. The rhythm of the accent is governed by a very general principle whereby tempo is reduced after a rhythmical peak, in this case the accent point. The initial transition is very rapid, giving the auditory impression of an instant jump from one pitch to another, so that the accent point is apparently at the very peak; the downward glide is much slower. This tempo change is particularly marked in RP, but in other varieties it may be less so. In Scouse the initial transition may be so slow that it is perceived as an upward glide. Again the difference is trivial, but unless it is recognised, a 'fall' with a slow transition can easily be mistaken for a 'rise-fall'. Accent has the routine job of giving prosodic shape and rhythm to words by highlighting some syllables and not others, so that e. g. * beggar is accented on the first syllable, and begin on the second. (This is of course commonly called 'stress': but 'stress' is a notorious Humpty Dumpty term, which means whatever the user intends it to mean on any given occasion, and which we shall consequently avoid.) Not all the accents which a word has when spoken in isolation appear when the word is in context in a locution, and this is the result of accent suppression rules. Perhaps the best known suppression rule is the compounding rule. Accents are suppressed except on the first element of a compound, as in textbook examples blackbird\ Trench teacher as opposed to black \bird or French \teacher. By calling this a 'rule', we do not mean that it has to operate whenever its conditions are met; it is a rhetorical device that the speaker may or may nor use. If we introspect about compounds, we automatically carry out the rule in all cases where it can apply. It also applied in informal conversation. But in any kind of public speaking or lecturing, it is frequently observed to fail, e.g. the Labour 'Party and 'Mersey 'side which are just two of many examples heard recently on the BBC. It is likely that in certain formal styles the compounding rule is cancelled or overridden by another rule which may be related to the delayed accent reported by Bolinger (1972: 643). Accent is also used for foregrounding; or more preceisely, items which are put into the background are deaccented. There are several cases of foregrounding, one of them being the parallelism rule, by which variables are accented and constants deaccented, e.g.: Some of these unemployed teenagers are actually unemployable. As the governor now is in effect the govern ment. . . (/mant/) Real examples like this tend to look outlandish compared to constructed examples of 'contrastive stress', but they are regularly formed. The constant, unemploy- and govern-, are deaccented and the accent falls on the rest of the words. This constant/variable distinction is often confused with given/new, as in: A: Who painted the Mona Lisa? B: Da "Vinci. Here da Vinci is the variable in the frame X painted the Mona Lisa, and it
230
G. Knowles
also happens to be new information. In many cases accentuation will parallel other cohesive devices such as the use of pro-forms, ellipsis and so on, but the fact that it is independently motivated is illustrated by such familiar examples as John hit Fred, and then ' he hit 'him. The fact that he and him are given triggers off pronominalization, but the fact that they are variables in the frame X hit Y keeps them accented. 2.2. Tails The term tail is used by Kingdon (1958) to refer to any syllables following the last accented syllable of a tone-unit. The term can more usefully be used to refer to the stretch following the change of gradient in the accent contour: that is, the part that levels out at low pitch in the RP type, or the part that falls sharply and levels out in the Northern type. The item that begins the tail is the highest in a hierarchy of the following kind: (1) a suppressed accent, e.g. rocking ·horse (where (•) marks the tail), (2) an unreduced trailing syllable, e. g. stele books,
b. 'marked' (deaccented or contrastive)
T h e view of stress presented here also makes it possible to talk of deaccenting applied within constituents smaller than the sentence. This is seen in a pair of examples from Schmerling (1976: 5 5 - 6 ) : (6) a. This is the döctor I was telling you about, ('normal') b. This is the doctor I was telling you about, ('normal' in medical context) T h e problem that Schmerling points out is that both of these are in some sense 'normal stress'; out of context (6a) seems 'normal', but (6b) seems just as normal in the context of a hospital or a medical convention. Confined as she is to the Trager-Smith-Chomsky-Halle view of normal stress as a merely automatic consequence of the syntax of a sentence, Schmerling is prepared to use examples such as these as the basis for abandoning the notion of normal stress altogether. But to do that would be to throw away a valuable concept. Indeed, the first step toward treating this puzzle is to take 'normal stress' in the Hallidayan sense of the stress pattern that signals an unmarked focus. 3 This makes it possible to speak of both stress patterns as normal, in the sense that both convey the focus this is NP. This focus is reflected in the rhythmic structure by the fact that at a higher level in the tree both versions, as
2
3
mately, of course, an explanation will have to be given for these distinctions as well. Third: It is well known that there is a certain amount of individual and dialect difference in assigning stress patterns to compounds. The data here reflect my own speech, but I have checked with other informants to avoid basing my statements on some idiosyncratic usage. In particular, I have checked not only individual terms, but, in accordance with the analysis presented here, pairs and groups of items as well (e. g. I have checked chocolate cake and apple cake together and find that many speakers make the distinction noted here. Question marks next to individual items in the data tables indicate those items in which there seems to be considerable disagreement about the stress pattern. Any discrepancies between standard Liberman-Prince trees and these are intentional, but cannot be justified here. For stress and focus see Halliday, 1967; Chomsky, 1971; Jackendoff, 1972; Wilson & Sperber, 1979; Ladd, 1980.
English Compound Stress
257
expected by the normal stress rules, have the s assigned to the rightmost NP: (7)
This is the doctor I was telling you about. With the focus assigned, we can go on to assign either marked or normal stress within the strong N P constituent; doctor is either weaker or stronger than telling depending on whether it is or is not deaccented to refer to some medical context. Thus: (8) a.
b.
the doctor I was telling you about \ S
the doctor I was telling you about In short, the answer to Schmerling's puzzle is simple: sentences can exhibit both normal and marked stress at different levels of structure simultaneously. 2.2 This is the idea to be applied to the problem of compound stress. Specifically, my thesis in this paper is that compound stress represents the deaccenting of the head of the compound. Thus the normal or unmarked stress for the type of structure in e. g. green house would be as follows:
W S They live in a green house, ('normal') The reverse of this could be contrastive (10)
/ \ s w They live in a green house, not a grey one. ('marked' — contrastive)
258
D. R. Ladd
or, as in other cases of marked stress, it could also represent deaccenting, as in: (11) s w I grew them in a greenhouse, ('marked' — deaccented) As I just showed, the deaccenting can apply within the compound without affecting the focus information conveyed at a higher level in the rhythmic structure of the sentence; that is, compound stress can be treated as marked or non-normal without in any way implying that it is thereby impossible for it to occur in a sentence with 'normal stress'. 2.3 At this point it is worth spending a paragraph or two to explain why it is specifically deaccenting that I think is involved in compound stress. As I showed in Ladd (1980), deaccenting cannot be seen simply as e.g. a syntactic rule that interacts with the normal stress rules in cases of coreference. In fact, it occurs in a wide variety of situations, and must be treated as making some independent semantic/pragmatic contribution to the interpretation of the sentence, like Hallidayan 'normal stress'. Unfortunately, space permits only a two-sentence summary of my earlier findings; the interested reader is referred to Ladd (1980: Chs. 3 & 4) for more detail. In brief, what deaccenting signals is that some specific reference to the context is necessary for a full or exact interpretation of the deaccented constituent. The actual details of the inference made in individual cases, such as 'coreference' or 'this is a medical context', are left to pragmatic interpretive strategies. This meshes very well with recent work on the semantics of compounds by Downing (1977)*, Kay and Zimmer (1976), and Dowty (1979). What distinguishes these writers from earlier generative work on compounds (notably that of Lees [1960, 1970], Levi [1978], and Mötsch [1970]) is that they do not seek to explain the specific relationships seen in compounds by positing some sort of underlying predicate relation between the two parts of the compound. (For instance, steel warehouse is not represented as being underlying 'warehouse for steel', nor apple tree as derived from 'tree with apples'.) Instead, they posit a single general compounding relationship that leaves the specific relation to be inferred on the basis of the individual lexical items involved. To put it another way, the compound construction does not corfvey an explicit meaning that fully determines the 4
While Downing's experimental study was primarily concerned with the creation of novel compounds, she found little support for the underlying-predicate approach to compound semantics; I do not feel that I distort her findings by including them here.
English Compound Stress
259
interpretation of each compound, but only a rather inexplicit set of guidelines, as it were, for pragmatically inferring an interpretation. Relevant quotes from Kay and Zimmer, Dowty, and Downing are the following: The prototypic use of nominal compounds is to narrow the semantic coverage of the head noun to a smaller class. (Kay and Zimmer, 1976: 4) A novel compound αβ denotes some set (exactly which one we do not know) such that all members of this set are ß's and are typically associated by some appropriately classificatory relation to an a. (Dowty, 1976:319) The speaker tends to create the compound on the basis of a parameter significant for his categorization, rather than merely his description, of the entity in question. (Downing, 1977: 838) The common thread running through these is something like the following: The compound construction signals that there is some relation between the attribute and the head which is relevant for classifying or categorizing the head, not merely describing it; a compound thus names some entity or category distinct from the entity or category named by the head alone. This meshes very nicely with the function of deaccenting as described above. In general, deaccenting signals that some specific reference to the context is essential for a full or correct interpretation of the deaccented constituent; specifically in the case of a compound, the deaccenting of the head signals that in order to determine the category named by the compound, the head must be understood in the light of what Dowty calls the 'appropriately classificatory relation' between it and the attribute. In green bouse, for example, nothing special is signaled about the interpretation of house in this context; house is more precisely described, but not newly subcategorized. In greenhouse, on the other hand, house is deaccented to signal that it contributes only part of what is necessary for identifying the new category of things named by the compound as a whole. 3.1 The hypothesis just presented is a fundamentally different type of analysis from the traditional description of compound stress. One of the reasons that the traditional description cannot account for exceptions is that, in effect, it cannot account for the regularity either. That is, it suggests no particular explanation of why compounds should be stressed one way or the other; it merely states an observed correlation between syntax and prosody. The analysis proposed here, by contrast, suggests an actual reason for this correlation, namely, a certain congruence between the information conveyed by the stress pattern and the information conveyed by the compound relation itself, as just illustrated with the case of green house and greenhouse. One way to test this explanation, then, should be to see whether excep-
260
D. R. Ladd
tions to the traditional rule exhibit some kind of mismatch between what the compound relation and the stress pattern convey. If my explanation is correct, then compounds with phrasal stress ought to be cases where the information conveyed by the deaccenting would be somehow inappropriate - say, cases where any subcategorizing effect of the attribute is relatively small. I will discuss three groups of cases which I think show this quite clearly. 3.2 The first set involves place names like those shown in (12). We might predict that these would take phrasal stress, since the head (Avenue, Road, etc.) is in no sense subcategorized by the attribute: Madison Avenue does not name a particular type of avenue, Olin Library does not denote a special category of library, the Golden Gate Bridge is a bridge, etc. As the data in (12) show, the prediction of phrasal stress on these is largely borne out. There are, however, a few nouns that are deaccented in such compounds: street, house, town, land, and perhaps a few others. Considering these each in its own general semantic group, though, one can see that they are always the least specific or least marked. In city thoroughfare names, for example, we get at least vague expectations about the nature of the thoroughfare being named from most of the possible head nouns - we would expect an Avenue or Boulevard to be wide or important; a Road probably leads out of town; a Place or a Crescent is probably residential; and so on. Street, however, gives us no such information. It could be State Street, in the heart of downtown, or it could be Dogwood Street in some quiet suburb. There is, in other words, a real sense in which we do get less information about the category of things being named from Street than from any of the others, and hence more from the attribute; this is more typical of ordinary compounds, and is exactly what is signalled by the stress pattern. 5 Comparable observations can also be made about the cases in (13), in which the head is the proper name of the inventor or discoverer of the entity or category named by the compound. The case of disease names is typical here: the relatively vague Syndrome and Disease (like Street) are deaccented but more specific words like Chorea and Palsy are not. While I 5
Quite some time after presenting this paper, I discovered that both this phenomenon and its explanation have been noted by non-linguist native speakers, as can be seen from the following passage: "Why, in speaking of thoroughfares,' asked a correspondent of John o'London's Weekly in 1936, 'is it the custom to accent the proper name only in the case of a street? It is always Fleet street, Southampton Street, but Shoe lane, Farrington road, Fetter lane.' The paper's lexicographer, Jackaw, answered: 'In a town the great majority of thoroughfares are streets; street, therefore the expected word, needs no emphasis, and the stress goes on the street's name. Lanes and roads, being much less common, these words are naturally given at least equal stress with their distinctive names; convenience begets habit.' ('The Street and the Stress', John o' London's Weekley, April 18, 1936, cited by Mencken, 1948.)
English Compound Stress
261
cannot go through each of these cases in detail, it is nonetheless important to emphasize the nature of the prediction being made: the analysis does not claim to be able to make predictions about individual cases, which is what the traditional analysis purports to do, but only implicational predictions about groups of cases. If Syndrome and Disease and Street actually worked like all the others in their respective groups, the validity of the analysis would not be affected. The analysis predicts only that if one or two members of a particular semantically related group of head nouns are deaccented, they will be the least marked or least specific. Thus it is only if Palsy were deaccented and Syndrome were not that we would call the analysis into question or look for some further factor. 3.3 A second set of cases (shown in 14) involves the classification of culinary terms. As can be seen from just three cases - chocolate cake, apple cake, and apple pie - it is futile to try to explain the exceptions to the traditional Compound Rule in terms of individual lexical items, since apple can be either stressed or unstressed in attribute position, and cake can be either stressed or unstressed in head position, depending on the compound. Moreover, since all three seem to represent an underlying relation Β made of A, the stress cannot be explained in Levi- or Lees-style syntactic terms either. Instead, what seems to be involved here is classification in terms of what one might call 'flavors' vs. 'categories'. Things to eat often come in a variety of flavors - ice cream, milk shakes, sandwiches, and souffles are all examples. For most purposes in the culinary taxonomy, the different flavors all count as 'the same'; that is, in the terms we have been using to discuss compounds and deaccenting, naming the flavor further describes, but does not further categorize. This is why many of these culinary compounds have phrasal stress. In chocolate cdke and apple pie, in other words, cake and pie are the categories, and chocolate and apple are merely flavors. In apple cake, on the other hand, we do have a different category: the deaccenting signals something like 'this thing is cake only to the extent circumscribed by something else in the context, namely, apple'. The effect of the deaccenting here is thus like what we saw in greenhouse. (12) Compound Place Names 'Phrasal stress' Madison Avenue Trumansburg Road Maple Drive Kingsford Crescent Marvin Gardens Park Place Olin Library
'Compound stress' State Street (downtown) Dogwood Street (suburban)
Eastman House (Rochester museum)
262
D. R. Ladd
Morrill Hall Gannett Clinic Johnson Museum McGraw Tower Rockefeller Center New York City Enfield Village Tompkins County New York State Baffin Island Cayuga Lake (the) Charles River (the) Atlantic Ocean (the) Sahara Desert Golden Gate Bridge Walt Disney World (the) Erie Canal Shea Stadium Fenway Park Penn Station Harvard Square Schoellkopf Field (13) Compounds with Proper Names 'Phrasal Stress' Halley's Comet Planck's Constant (?) Grimm's Law (?) (the) Monroe Doctrine (?) Occam's Razor Huntington's Chorea Bell's Palsy Franklin Stove Coleman Stove Morse Code Gutenberg Bible Phillips (Head) Screwdriver
Blair House (U. S. Govt. Official Guest House) Andrews House (Brown Univ. Infirmary) Faunce House (Brown Univ. student union) Dunster House (Harvard dorm) London Town (big) Middletown (little) Baffin Land (old name for Baffin Island) Marie Byrd Land (section of Antarctica) Chicagoland (area around Chicago) Disneyland (California amusement park)
in Attribute Position 'Compound Stress' (the) Van Allen Belts (the) Peter Principle (the) Sapir-Whorf Hypothesis (?) Downs' Syndrome Parkinson's Disease Skinner Box Allen Wrench Plimsoll Line
English Compound Stress
263
(14) Culinary Compounds 'Phrasal Stress' 'Compound Stress' apple pie mud pie (?) blueberry pie apple cake cherry pie carrot cake chocolate cake coffee cake vanilla ice cream peanut butter (?) strawberry ice cream apple butter cheese souffle sweet roll chocolate souffle egg roll lemon souffle jelly roll grilled cheese sandwich ice cream sandwich (?) peanut butter & jelly sandwich tomato sauce lemon sherbet hot sauce raspberry sherbet Worcestershire Sauce coffee milk shake white sauce whole wheat bread date and nut bread rye bread zucchini bread NB: stress on ice cream varies - what is indicated above is stress on the whole word ice cream without regard to which syllable. If this seems too facile, there is a simple pragmatic test that seems to suggest that the distinction between flavors and categories is a real one. If the head of such a compound can be inserted into the frame ' D o you want a ?' or ' D o you want some ?' without misleading the addressee about what is being offered, then the attribute is a flavor. For instance, 'Do you want a sandwich?' is fine even if all the speaker really has available is, say, a cheese sandwich. On the other hand, if both the attribute and the head must be included in order not to mislead the addressee, then a separate category is involved; Do you want some bread? is decidedly infelicitous if what the speaker has in mind to offer the addressee is banana bread. The reader is invited to try this test on the data in (14); while the results are not 100% consistent with the stress patterns, the correlation is quite considerable. 3.4 The final group of cases is provided by expressions where the head names an artifact of some sort, and the attribute names the material of which it is made. In general, these also have phrasal stress, as shown in (15). This suggests that in these cases, as in those involving culinary flavors, the category named by the compound is essentially the category by the compound is essentially the category named by the head alone. To put it another way: the material of which an artifact is made, generally is not relevant for classifying or categorizing it. There is independent evidence for this in Downing's study of the creation of new compounds. She suggested that 'naturally existing entities
264
D. R. Ladd
(plants, animals and natural objects) are typically classified . . . on the basis of inherent characteristics; but synthetic objects are categorized in terms of the uses to which they may be put. This would seem to correlate with the fact that synthetic objects are typically created with some goal in mind, while natural entities generally are not' (Downing, 1977: 831). In those few cases of (15) which do have compound stress, it seems for the most part - e. g. glassware, leather goods, gingerbread man - that the material really is relevant for specifying the category being named. (15) Material-plus-Artifact compounds 'Phrasal Stress' 'Compound Stress' paper bag glassware cardboard box leather goods silver candelabra gingerbread man gold watch cedar chest (?) tweed jacket aluminium foil (?) wool suit cotton shirt steel warehouse (made of steel) silk stockings carbon steel glass jaw tin ear silk purse wooden nickel 3.5 At this point we are in a position to explain the minimal pair steel warehouse / steel warehouse. Since, to repeat Downing's words, we are more likely to categorize synthetic objects on the basis of the uses to which they may be put rather than on the basis of inherent characteristics, it follows that we categorize warehouses according to their intended contents, not the material of which they are made. Thus we interpret steel warehouse as 'warehouse made of steel', because the stress pattern tells us that no subcategory is being named, whereas we interpret stiel warehouse as one for storing steel, first because the stress pattern tells us that warehouse is indeed being classified into some subcategory by steel, and second because Β for storing A is a reasonable classificatory relation to infer between those two nouns. N o underlying syntactic difference or abstract predicate need be posited to explain the interpretations here; they follow quite simply from inferences based on what we as speakers know about stress and about compounds. Once again it is important to emphasize the relative or implicational nature of the prediction made by the analysis presented here. I believe it is in principle impossible to predict stress patterns in individual cases solely on the basis of the two lexical items involved, or solely on some underlying
English Compound Stress
265
syntactic relation between the two. The relevant factor is whether the attribute categorizes or merely describes the head; to determine that, we may have to consider individual cases against the background of other possible attributes or other possible heads. Both apple cake and steel warehouse represent Β made of A, but in the case of cake, the fact that it is made of apple categorizes it, when compared to other possibilities, whereas for warehouse, the fact that it is made of steel only describes, especially when compared to other possible relations between the two lexical items warehouse and steel. 4. The foregoing analysis of stress patterns in compounds has several points of interest. First, it explains rather than merely describes the rough correlation between compound syntax and so-called compound stress. Second, it makes the description of English simpler, by removing compound stress from the cases to be covered under 'normal stress' and subsuming it under the independently needed rubric of deaccenting. Third, it tends to provide independent confirmation of analyses of compounds like Dowty's which have a relatively impoverished semantics and a richer pragmatics, and gives no support to generative models like those of Levi and Lees. Finally, it may be possible to turn the analysis around - as in the case of 'flavors' vs. 'categories' - and use it as a tool for investigating taxonomies and markedness relations in the structure of the lexicon. For all these reasons I think it provides some genuine new insight into an intractable old problem. 6
' Limitations of space make it impossible for me to do more than mention the existence of two complicating factors. First is the likelihood that any treatment of the semantics of compounds must distinguish between the 'ordinary' semantic opacity in a compound like, say greenhouse, and the semantic opacity involved in what may best be described as idioms, such as white elephant, French letter 'condom' (so also a number of other expressions involving ethnic slurs), swan song, wallflower, etc. (Note that both stress patterns are found in these.) Levi (1978: 11-12) argues for just such a distinction in connection with the semantic opacity of compounds. The implications of this for the analysis presented here are not entirely clear. The second complication is that purely phonological factors are sometimes involved to at least some extent in determining compound stress patterns. At least two types of cases come to mind. First, there is a tendency to stress very long compounds farther to the right than might otherwise be expected (e. g. travel expense reimbursement voucher, not travel expense reimbursement voucher, or maple syrup container distributor, not maple syrup container distributor). Second, it is likely that the leftward shift in short, common compounds such as oatmeal and ice cream (which are still pronounced oatmeal and ice cream by conservative speakers) is related to the general leftward shift in nouns in general (e. g. cigarette, still pronounced cigarette by conservative speakers). One might say that such cases are being treated in effect as non-compounds. This explanation is entirely consistent with the fact that many monomorphemic .words in present-day English are known to have arisen from earlier compounds (e. g. daisy < day's+ eye, hussy < house+ wife, sheriff < shire+ reeve).
266
D. R. Ladd
References Chomsky, N. (1971). Deep Structure, Surface Structure, and Semantic Interpretation. In Steinberg & Jakobovits (eds.), Semantics: An Interdisciplinary Reader in Philosophy, Linguistics, and Psychology. Cambridge: Cambridge University Press. 183-216. Chomsky, N., and M. Halle (1968). The Sound Pattern of English. New York: Harper & Row. Downing, P. (1977). On the Creation and Use Of English Compound Nouns. Language 53: 810-842. Dowty, D. (1979). Word Meaning and Montague Grammar. Dordrecht: D. Reidel. Halliday, M.A.K. (1967). Notes on Transitivity and Theme in English (Part II). Journal of Linguistics 3: 199-244. Jackendoff, R. (1975). Morphological and Semantic Regularities in the Lexicon. Language 51: 639-671. Kay, P., and K. Zimmer (1976). On the Semantics of Compounds and Genitives in English. Unpublished paper. Univ. of Berkeley: California. Kingdon, R. (1958). The Groundwork of English Stress. London: Longmans. Ladd, D. R. (1980). The Structure of Intonational Meaning: Evidence from English. Bloomington: Indiana University Press. Lees, R. B. (1960). The Grammar of English Nominalizations. IJAL 26: Publication 12. Lees, R. B. (1970). Problems in the Grammatical Analysis of English Nominal Compounds. In Bierwisch & Heidolph (eds.). Progress in Linguistics. The Hague: Mouton 174-186. Levi, J. N. (1978). The Syntax and Semantics of Complex Nominals. New York: Academic Press. Liberman, M., and A. Prince. (1977). On Stress and Linguistic Rhythm. Linguistic Inquiry 8: 249-336. Mencken, H. L. (1948). American street names. American Speech 23: 81-88. Mötsch, W. (1970). Analyse von Komposita mit zwei nominalen Elementen. In Bierwisch & Heidolph (eds.). Progress in Linguistics. The Hague: Mouton. 208-223. Poutsma, H. (1914). A Grammar of Late Modem English, Part II. Groningen: Noordhoff. Quirk, R., S. Greenbaum, G. Leech, and J. Svartvik (1972). A Grammar of Contemporary English. New York & London: Seminar Press. Schmerling, S. F. (1976). Aspects of English Sentence Stress. Austin: University of Texas Press. Trager, G., and H. L. Smith (1951). An Outline of English Structure. Norman, Oklahoma: Battenburg Press. Wilson, D., and D. Sperber (1979). Ordered Entailments: An Alternative to Presuppositional Theories. In Oh & Dinneen (ed.). Syntax and Semantics, vol. 11 (Presupposition). New York: Academic Press. 299-323.
HANS-HEINRICH LIEB
A Method for the Semantic Study of Syntactic Accents* 0
Introduction
It has always been a vexed question in the study of syntactic accents ('sentence stress', 'contrastive accent' etc.) how to find a method by which the semantic effects of accents can be established beyond vague generalities. The adequacy of any method can be judged only after the following question has been answered: What ARE the semantic effects of syntactic accents? In a recent study on 'Accent and Meaning' (forthcoming) I adopt the following answer: (1) For each occurrence of a syntactic accent in a sentence there is (i) a speaker belief associated with the occurrence whose content involves a propositional attitude of the hearer, and there may be (ii) a second doxastic attitude of the speaker (not necessarily belief) that is associated with the occurrence of the accent and whose content does not involve the hearer. I have no space here to defend or further explain this view. Given this conception, I was confronted with the problem of finding a method by which the attitude/content pairs associated with accent occurrences could be established in a systematic way. I developed the method characterized in this paper, the METHOD OF DIALOGUE SCHEMATA, whose basic features are as follows. Expression (2) denotes a dialogue schema, or rather, set of dialogue schemata:1 (2) A. Ich bezweifle, daß die Frau gekommen ist. Β. (1) Der Mann ist gekommen. (2) Der Männ ist gekommen, nicht die Frau.2 * Adapted from Sections 1.3 and 1.4 of Lieb (forthcoming) * Adapted from Sections 1.3 and 1.4 of Lieb (forthcoming) 1 In Lieb (forthcoming) the language in which syntactic accents are studied is German. I here keep the German examples. 2 A. I doubt that the woman has come. Β. (1) It is the man who has come. (2) It is the man who has come, not the woman.
268
H . - H . Lieb
Very roughly, a dialogue schema is a pair of sets of sentences. The second set contains a sentence with AT M O S T O N E accent occurrence and may contain expansions of this sentence. Each sentence of the first set renders explicit a certain propositional attitude. In an actual dialogue based on the schema, an utterance of a sentence of the second set may or may not be a 'proper response' to an utterance of a sentence of the first set. If it is, we may attribute to the speaker of the sentence from the second set the belief that the hearer has the attitude that is made explicit in the sentence from the first set. Moreover, if utterances of the expanded sentence and utterances of the expansions are proper responses to utterances of the same sentences we conclude that the expansions only make explicit what is part of the meaning of the expanded sentence. If certain additional requirements are met, these meaning parts plus the previously established speaker belief are identified as semantic effects of the accent occurrence in the expanded sentence, or of lack of accent if there is N O accent occurrence. The method of dialogue schemata consists, very roughly, in constructing appropriate dialogue schemata and eliciting judgments from native speakers on the 'response compatibility' of sentences of the second set with sentences of the first set; these judgments are then used for hypotheses that eventually lead to hypotheses on the semantic effects of syntactic accents in a language.
1 The Method of Dialogue Schemata: Basic Concepts 1.1 The concept of dialogue schema A language or language variety D is taken simply as a set of 'idiolects' (in a defensible sense), each with its own system or systems S.} A pair {A, Β) of sets is a DIALOGUE SCHEMA in an idiolect system S if, and only if, A is not empty, S is a system of a 'spoken' idiolect, and the following conditions (a) to (e) are satisfied. Condition (a). A consists of STRUCTURED INTERPRETED SYNTACTIC UNITS (SISU) of S, that is, of triples (f,s,u) such that / is a syntactic unit or concatenation of units of S; s is a syntactic structure of / in S; and u is a meaning that / h a s in S given structure s and some assignment of word meanings to the words that occur in f More specifically, A consists of interpreted structured sentences of S. 3
I will use variables of various kinds, all of them italicized letters with or without numerical subscripts. To keep the degree of formality as low as possible I will interpret the variables only by informal hints in the text. For the same reason concepts will be introduced by definitions that are formally as unassuming as possible.
Syntactic Accents
269
Condition (b). Β consists of STRUCTURED PARTLY INTERPRETED SYNTACTIC UNITS (SPISU) of 5, that is, of triples (f,s,v) such that / a n d s are as before but ν is not a meaning but a POSSIBLE MINIMAL BASE for meanings of f. This concept is to be understood as follows: A MINIMAL BASE for a meaning determines the referential, the propositional and some of the 'illocutionary force' aspects of the meaning. All other aspects are left unspecified; in particular, the semantic effects of accent occurrences as characterized in (1) are not represented in meaning bases. (For a definition, cf. Lieb forthcoming: [112c].) A POSSIBLE minimal base is an entity 'of the same formal type' as a minimal base but which, as a matter of fact, may not be a base for any meaning of the sentence. (Cf. Lieb forthcoming: [112b].) In contradistinction to the elements of set A, no element of Β has a complete meaning as one of its components; it is exactly the completion of meaning bases by accent effects that we wish to study by means of dialogue schemata. Condition (c). There ist exactly one function e of the following kind. (i) The arguments of e are parts / of syntactic units such that each / is an argument of e if and only if for some (f,s,ü) in A or some (/^s^v) in B,/2 is a 'primitive' constituent of / ( o f / ) relative to the syntactic structure s (ij). (A primitive constituent is one that doesn't contain any other constituents.) (ii) If fa is an argument of e, e(/) is a lexical meaning of f2 in S. (iii) If / and / are arguments of e and f2 and / are 'occurrences' of the same word or words, e(^) = e(/). (iv) For each (f,s,u) in A, if e1 is the subfunction of e whose arguments are parts of / then u is a meaning of /relative to f,s, e, and S. (v) For each {fi,sl,ύ) in B, if e1 is the subfunction of e whose arguments are parts of / , then there is a meaning t^ of / relative to / , ely and S. Intuitively, function e assigns lexical meanings everywhere in the dialogue schema where lexical meanings can be assigned. (On our conception there are no 'meaningless' primitive constituents but some contituents may have an 'empty' meaning.) The same word in different places is assigned the same lexical meaning (condition [iii]: this guarantees constancy of lexical meanings throughout the entire dialogue schema. The meaning of a sentence in A is based on the assignment of lexical meanings, and for the partly interpreted sentences in Β it is possible to obtain sentence meanings based on the assignment of lexical meanings. (Conditions [iv] and [v]; the concepts of sentence meaning and meaning base are understood as in Lieb forthcoming: [100] and [112].) Condition (d). Every sentence in A has the syntactic-semantic form:
270
H . - H . Lieb
E G O ATTITUDE T H A T p, where E G O is an expression of S for selfreference by 'the speaker', such as ich in German or a morphological marking of first person singular, ATTITUDE is a verb of S that denotes a propositional attitude such as English believe, doubt, want etc.; T H A T is an expression of S like English that which introduces the formulation of a content of the attitude (in case there are such expressions); and ρ is a formulation of a content of the attitude. E G O and ρ are identical for all sentences in A. Condition (e). There is exactly one (f,s,v) in Β such that: (i) there is at most one accent occurrence in ./given structure s, and (ii) every other element ΟίΛ,ίΟ of Β is a PERMISSIBLE EXPANSION of (f,s,v) in S in the following sense. There is a SPISU (£,s2,v2) of 5 such that (f^s^ = (f,s) 'preceded by' or 'followed by' v1 is ν 'plus' v2, where "plus" denotes a certain semantic operation on meaning bases; and for any meaning u, if is a base for u and (f,suu) is uttered, the speaker expresses a propositional attitude whose content is partly formulated by the {flysz,v2)-part {fyh^i) and that involves the adressee of the utterance only if the addressee is explicitly referred to. (The concept of permissible expansion is exemplified by [2] and [1] in [2B]). In this informal definition of "dialogue schema" two unique entities have been assumed: By the LEXICAL INTERPRETATION in S of a dialogue schema in S we understand the unique function e postulated in (c) that assigns lexical meanings to words of the sentences of the schema. By T H E C E N T R E in S of a dialogue schema in S we understand the unique element (f,s,v) of its second component that satisfies condition (e). 1.2 Notation for sets of dialogue schemata Expression (2) is not a dialogue schema but the orthographic name of a schema. More correctly (2) denotes a SET of dialogue schemata (A,Β) in some German idiolect system (here left unspecified); this set is defined as follows. (The rest of this subsection, though important for theoretical reasons, is not needed for an intuitive understanding of [2] and may be omitted on a first reading.) Each orthographic word in (2) uniquely denotes a phonological word of the idiolect system. The left-to-right sequence of orthographic words in a single line of (2) denotes the sequence of phonological words denoted by the orthographic words. Each line of (2) - orthographic words, punctuation signs, and accent names ("v") taken together - denotes the set of pairs ( f s ) such that / i s de-
Syntactic Accents
271
noted by the left-to-right sequence of orthographic words in the line and s is a syntactic structure of f in the idiolect system that is compatible with a fixed traditional interpretation of the punctuation signs, a presupposed syntactic analysis of the idiolect system, and a fixed interpretation of the accent names. Lack of accent names means lack of accents. ('Syntactic stress', which is different from accent, is left unmarked.) For each line of (2) there are semantic specifications in the context of (2) (in particular, the translations in fn. 2). These specifications partly determine the meanings or meaning bases allowed for each pair (f,s) that is in the denotation of the line. Each line of (2) T O G E T H E R W I T H its semantic specifications denotes the set of triples (fs,ü), for a line in (2A), or the set of triples (f,s,v), for a line in (2B), such that (f,s) is in the denotation of the line and u (or v) is allowed for (f,s) by the semantic specifications of the line. It is assumed that all denotations are non-empty. Expression (2) excluding "A", "Β", "(1)", and "(2)" denotes the set of dialogue schemata (A,Β) in the idiolect system such that: (i) each element of A is a triple {f>s,u) in the denotation of a line in (2A), and for each line in (2A) there is only one triple in its denotation that is an element of A; (ii) each element of Β is a triple (f,s,v) in the denotation of a line in (2B), and for each line in (2B) there is only one triple in its denotation that is an element of B. The expressions "(2)", "(2A)", "(2B)", "(2B[1])", "(2B[2])" (the last four without "2" where this is supplied by context) are T H R E E F O L D AMBIGUOUS, (i) They are used to refer to the name of the set of dialogue schemata and to its parts, (ii) They are used to refer to the sets denoted by the name or its parts. (In this sense, "(2A)" denotes the set of first components of dialogue schemata that are in the denotation of expression (2); "(2B[1])" denotes the set of triples (f,s,v) that is denoted by line [1] in (2B), etc.) (iii) They are used to refer to individual elements of the sets denoted by the name or its part. Disambiguation is by context. A dialogue schema in an idiolect system S is of limited interest if there are no corresponding schemata in the systems of all idiolects of a given language or language variety. 1.3 Dialogue schemata for sets of idiolects A dialogue schema in a single idiolect system may be representative of a whole set of idiolects in the sense that there are 'corresponding' schemata in the systems of all idiolects in the set. This idea will now be made more precise. Consider a structured interpreted unit ( t ) in an idiolect system SiOn our conception the syntactic structure contains a 'constituent structure' and a 'marking structure' that are complex constructs of syntactic
272
H . - H . Lieb
categories of Each category is a set; it may be a set of syntactic units of Sj. For instance, Non-Group-in- Su the set of noun groups of 5 t , may be such a category. In a different idiolect system S2 we may again have Noun-Group-in- S2 as a category. The two categories are 'analogous' in an obvious sense: if we assume a general relation ' / is a noun group in S\ then the two categories are the sets of first-place members of the relation that belong to and S2, respectively, as second-place members. I shall assume concepts of analogy of the type . in St is ANALOGOUS to . . . in S2', where S1 and S2 are any idiolect systems and . . is any formal or semantic category of St and . . . any formal or semantic category of S2. It is impossible to explicate these concepts in the present context. Basically the explication is to proceed as indicated in our 'noun group' example. For each category there is at most one category in S2 to which it is analogous. Analogous categories may still be different sets. This may well be the rule. Therefore, structured sentences in different idiolect systems will normally be non-identical even if the idiolects belong to the same language: however similar a sentence ( f x , s x , % o f S1 may be to a sentence {f2,s2,u^j of S2, if the structures and s2 involve the categories NounGroup-in- St and Noun-Group-in- S2) respectively, and these are different sets, the nonidentity carries over to the structures and s2, hence, to the two triples. On the other hand, the triples may be identical except for the differences between analogous categories. We informally introduce the following notion: Let 5t and S2 be idiolect systems. Let (f,s,ii) be an interpreted structured syntactic unit of THE Si/Sj-VERSION of (f,s,ü) = the triple Οί,ίι,«ι) obtained form (f,s,u) by replacing in f , s, and u each category of S1 by the analogous category in S2 if there is such a category; otherwise Οί,^,^ι) is an appropriately chosen 'empty' entity. (The definition for SPISU's (f,s,v) is analogous.) Obviously, the St/52-version of (f,s,u) need not be a SISU of S2; similarly, for the S^ ^-version of (f,s,v). Even if the ^-version of (f,s,v) is a SPISU of S2, it may be an 'unsuccessful' one; there may be no 'completion' of the St/S2-version, in the following sense: (3) Let (f,$,v) be a structured partly interpreted syntactic unit of S. (.f,s,u) is a COMPLETION of {f,s,v) in S iff a. (f,s,u) is a structured interpreted syntactic unit of S; b. for all e, if u is a meaning of /relative to f , s, e, and S, then wis a base for u relative to f , s, e, and S. (e is an assignment of lexical meanings to words occurring in f . For the concepts of meaning and meaning base, cf. again Lieb forthcoming: [100] and [112].)
Syntactic Accents
273
Using the concepts of version and completion, the notion of dialogue schemata in an idiolect system may be extended to SETS of idiolects such as languages and their varieties: (4) (A,B) is a DIALOGUE SCHEMA I N 5 FOR D iff a. 5 is a system of an idiolect in D; b. (A,Β) is a dialogue schema in S; c. for every system S1 of any idiolect in D, (i) the S/ Sj-version of every (f,s,u) ε A is a structured interpreted syntactic unit of S t ; (ii) the 5/5"χ-version of every {f,s,v) ε Β is a structured partly interpreted syntactic unit of (iii) there is a completion in Sl of the S/ S t -Version of every (f,s,v) ε Β. Intuitively, definition (4) singles out those dialogue schemata in a given idiolect system that are REPRESENTATIVE of an entire set of idiolects, such as a language: for whose sentences there are exact correspondences in the systems of any idiolect of the set. This holds not only for the (partly interpreted) sentences in the second component of the schema but also for the sentences in its first component; this is important in view of the use we wish to make of dialogue schemata; every speaker of the language is to be a competent judge for either part of the schema both as a hearer and as a speaker: he must be able to answer the question whether the sentences in the second part of the schema are 'response compatible' with the sentences in the first. 1.4 Proper dialogue schemata The notion of 'response compatibility' is based on the notion of 'proper response'; this notion is taken as basic. The proper response relation holds between triples (V,u,V,), where Vis a speech event produced by speaker Vt and Μ is a meaning of ^intended by Vy·. ' Vwith u as produced by V^ is a proper response to V2 with u^ as produced by V3'. Proper responses will not be explicitly characterized except for the following informal assumption: (5) Postulate for the proper response relation. If Κ with u as produced by Vy is a proper response to V2 with u^ as produced by V}, then a. Vi correctly understands u^ with respect to V}; and b. u is motivated by V^'s understanding of Uy and by V^'s beliefs about Vy that are based on F t 's understanding of t^.
274
H . - H . Lieb
The notion of response compatibility is construed as follows: (6) Let a {fvhyHl)
be a structured interpreted syntactic unit of Sl and Sisu of S2. in S Ifuhti i is RESPONSE COMPATIBLE with {f2,s2,u^ in S2 iff for every V, Vlt V2, and Viy if a. Kis a normal utterance by Vx of (/ί,ί1} «χ) in S^ b. V2 is a normal utterance by Vj of {f2,s2,u^ in S2, c. Vis a reaction by Vt to V2, d. Vi addresses Vto V3, then Vwith u as produced by Vi is a proper response to V2 with u^ as produced by V}.
(For the concept of normal utterance, cf. Lieb 1979: Sec. 2). 'Proper' dialogue schemata are, very roughly, those schemata in which every completion of each element of the second component is response compatible with each element of the first component, regardless of idiolect system: (7) (A,B) is a PROPER DIALOGUE SCHEMA in 5 for Z>iff a. (A,Β) is a dialogue schema in S for D; b. for every ΟίΛ,^), {f^u^, Su S2, if (i) St is a system on an idiolect in D, (ii) S2 is a system of an idiolect in D, (iii) (Xii,«i) is a completion in S1 of the S/S^-version of some element of B, (iv) {f2,s2,u^j is the S/S2-version of some element of A, then (A,s1,u1) in Si is response compatible with (f 2 ,s 2 ,uJ in S2. Three different idiolects S, Slt and S2 may be involved in a proper dialogue schema. System S is the one to which the sentences in the two components actually belong. In a practical application, this may be a system of a researcher who is also a native speaker. When (7) and (6) are seen in combination, system is associated with a speaker of (completions of S/Si-versions of) sentences in the second component, i. e. of the partly interpreted sentences that contain accent occurrences. S2 is associated with a speaker of (S/ S2-versions of) sentences in the first component that make explicit propositional attitudes and to whose utterances the S^speaker reacts.
Syntactic Accents
275
2 Exemplification and Comments 2.1 Steps 1 to 5 Every method is characterized by a series of procedural steps. I will not describe these steps in abstracto but give a schematic example from which the general nature of the steps may be inferred. The starting-point will not be single sentence (structured and partly interpreted) but a set of sentences. This is due to the fact that in studying accents we concentrate on the pitch properties of intonation. Intonation structures of accented sentences will therefore be specified only with respect to pitch. Since on our conception each syntactic structure contains an intonation structure component, individual syntactic structures of accented sentences cannot be completely specified. We therefore consider not an individual sentence but a set of structured (partly) interpreted sentences that differ in non-pitch properties of their intonation structures. Correspondingly, we must consider a set of 'equivalent' dialogue schemata, not a single schema. Starting point: The German idiolect system S* and B* = the set of structured partly interpreted sentences (f,s,v) of S* such that: (i) f = der mann ist gekommen; (ii) s is such that 'downward contrastive accent' Ο occurs on mann and no other accent occurs anywhere else; (iii) / is a declarative sentence of S* relative to structure s; (iv) ν is a possible minimal meaning base that involves, very roughly, 'adult human male' as a lexical meaning of mann; 'come' in a literal sense as a lexical meaning of gekommen; and 'empty' meanings for der and ist. (ν) the meaning of der mann is such that the speaker refers to exactly one person in any normal utterance of (f,s,u) if ν is a base for u. (This is only a rough outline of a definition of "B*".) Step 1. A set of dialogue schemata (A,B) in S* is defined such that (i) the centre of each (A,Β) in S* (cf. end of Sec. 1.1 for "centre") is an element of B*; (ii) each element of B* is the centre of some (A,Β) in the set; (iii) certain other conditions are satisfied which would make the set of schemata 'permissible' in the sense of Lieb (forthcoming: [129]). Let us accept (2) as a name of this set. Line (Bl) in (2), "Der Männ ist gekommen.", denotes the centre of each dialogue schema in the set. Step 2. (2) ist adopted as a set of dialogue schemata in S* for German, either after appropriate testing or on the basis of a more general assumption.
276
H.-H. Lieb
Step 3. The linguist selects native speakers of German (informants) and makes sure that they correctly understand the sentences of the schemata (i. e. correctly identify the S*/S-versions of the sentences in their own German idiolect systems S, including the semantic component of the versions). Step 4. The linguist presents the informants with the pairs of sentences ((Bl), (A)), ((B2), (A)) of each schema (cf. [2]) and elicits for each pair judgments on the response compatibility of the first sentence with the second. This step requires great care. The linguist may have taped normal utterances of each sentence by a (the) speaker of idiolect system S*, produced in succession for the sentences of each pair. This serves only for identification of the sentences (taping may already be required by Step 3); it must not be mistaken as production of a dialogue, which it is clearly not. If orthographic representation is chosen, it is CLASSES of sentence pairs that are presented (cf. Sec. 1.2); in this way 'irrelevant detail' can be abstracted from right at the beginning. (On the other hand, the linguist cannot be certain that not relevant detail is lost in the abstraction process.) Elicitation of judgments. There are two different kinds of judgments that may be elicited. In the first case, each informant is made to pass judgments on the following question: Is every completion of the version of the/a (B)-sentence in the German idiolect systems of the informant response compatible with the version of the/an (A)-sentence in every German idiolect system he knows of or can imagine? In the second case, the question is generalized: Is every completion of the version of the/a (B)sentence IN EVERY German idiolect system he knows of or can imagine compatible with the version of the/an (A)-sentence in every such system? Naturally, the judgments cannot be elicited by asking such questions. In the first case, an appropriate question might have a form such as: 'Imagine you are having a conversation with another German. He tells you at one point when this is appropriate: [replaying of recording of (A)-sentence, or joint characterization of all (A)-sentences]. Would it be just normal for you to continue by [replaying of recording of selected (B)-sentence, or joint characterization of all selected (B)-sentences], provided that this is what you believe and what you see fit to tell the other person, and tell him in this tone of voice?'4 In the second case, an appropriate question might be of the following form: 'Imagine two Germans are having a conversation. One tells the other: [as before, for (A)-sentence]. Would it be just normal for the other 4
The final qualification is intended to make the informant abstract from intonational differences between the (A)-sentences and (B)-sentences that are unrelated to accent manifestations.
Syntactic Accents
277
to continue by [as before, for (B)-sentence] provided that this is what he believes and what he sees fit to tell the first person, and tell him in this tone of voice?' After presenting one of the questions the two recordings or sentence characterizations may be repeated in succession. What the native speakers judge is the existence or non-existence of a proper response relation between utterances of the sentences taken either as sentences of their own German idiolect systems or as sentences of arbitrary German idiolect systems. The "just normal" part of the questions must be a formulation that agrees as closely as possible with the postulated properties of the proper response relation. The judgments elicited from the speakers are recorded; they exemplify the EMPIRICAL DATA for the semantic method of dialogue schemata. Step 5. The judgments are evaluated and a hypothesis is formed concerning the status of (2) as a set of P R O P E R dialogue schemata in S* for German. Suppose that (2) is accepted as such. We then proceed as follows. 2.2 Steps 6 and 7 As pointed out in Sec. 0, the relevance of the method of dialogue schemata depends on the theoretical assumption that the semantic effects of syntactic accents are as postulated in (1), i. e. consist in the contribution of attitude/content pairs to sentence meanings. The next step exemplifies how such pairs may be identified. Step 6. The status of (2) as a set of proper dialogue schemata in S* for German is used to identify attitude/content pairs of the following kind: a. they are as required in (li) or (Iii); b. they may be posited only on the hypothesis that (2) is a set of proper schemata; c. they do not involve any lexical meanings not present in B* (accents do not introduce lexical meanings). There are at least two pairs that satisfy these conditions: (8) a. Belief/there is a non-man whose coming is doubted by the hearer. b. Belief/there is a non-man who has not come. These pairs are established as follows (The following formulations are informal. A more precise formulation would heed the distinctions made in definitions [7] of "proper dialogue schema" and [6] of "response compatible". All utterances are to be 'normal' ones, cf. [6].)
278
H.-H. Lieb
Consider any dialogue schema in (2). In any utterance of (A) the speaker explicitly expresses the propositional attitude of doubt towards the coming of the person to whom he is referring by die frau in his utterance. 5 Assume an utterance of (B2) that is a reaction to the utterance of (A) and is addressed to its speaker. As (B2) is response compatible with (A), the speaker of (B2) correctly understands the meaning of the utterance of (A) (cf. [5]). The speaker of (B2) is therefore entitled to the belief that the speaker of (A) doubts the coming of the person he is referring to by die frau in his utterance, and the speaker of (B2) will normally have this belief. On a proper construal of the proper response relation it should follow from the compatibility of (B2) and (A) that the spaeker of (B2) refers by die frau in his utterance to the same person that the speaker of (A) refers to by die frau in his. By uttering (B2) the speaker of (B2) expresses his belief that this person is a woman. He will therefore normally have the belief that there is a woman whose coming is doubted by the speaker of (A), who is addressed by the speaker of (B2). It is obvious from the utterance of (B2) that the speaker believes that no woman is a man. Thus, in his utterance of (B2) the speaker will normally have the belief that there is some non-man whose coming is doubted by the hearer, which takes us to the attitude/content pair (8a) with respect to (B2). Now (Bl) is also response compatible with (A). We conclude that the speaker of (Bl) will normally have the belief that there is some non-man whose coming is doubted by the hearer also in an utterance of (Bl) that is a reaction to an utterance of (A). We thus associate the attitude/content pair (8a) with (Bl). (8b) is connected with (B2) in an obvious way without involving (A), and is then associated with (Bl) by the argument used for (8a). The attitude/content pairs in (a) is all we can get on the basis of dialogue schemata (2), and (8) is obviously not yet satisfactory. As a matter of fact, an improved version of (1) would have ruled (2) out to begin with. First, (2) is too specific with respect to the propositional attitude of the hearer. We should use only dialogue schemata that allow the checking of arbitrary propositional attitudes (Step 4). This may seem hard to achieve, given an indefinitely large number of propositional attitudes; a solution to this problem is indicated in Lieb (forthcoming: Sec. 5.3). Second, the schemata in (2) may be incomplete with respect to relevant expansions of their centres; there may be expansions that justify a belief that is stronger than (8b): whose content has (8b) as a logical consequence. The dialogue schemata in (2) are only an abbreviation of the schemata in Lieb (forthcoming: [135]), and these are indeed satisfactory. They establish two different attitude/content pairs which it is more reasonable to associate with (Bl) in (2). 5
We assume a meaning of die frau such that the speaker refers to exactly one object by die frau in any normal utterance of the sentence.
Syntactic Accents
279
We have not yet shown that the attitude content pairs (8) - if accepted should be taken as semantic effects of the accent occurrence on mann. For this, an additional step is required. Step 7. The pairs identified in Step 6 are compared with the pairs obtained by applying Steps 1 to 6 to some set Bi of triples (f,s,v) such that: f = der mann ist gekommen; s is a structure identical with a structure t h a t / h a s in B* except that given s, either mann has an accent other than downward contrastive accent or a word other than mann has downward contrastive accent; and ν is as in B*. Put in a nutshell, we change either the accent or the accent place and see what happens. If different attitude/content pairs are obtained, both the old and the new pairs are tentatively taken as semantic effects of the relevant accent occurrences. Otherwise, Step 7 is repeated. This concludes our exemplification of the method. Generally, the method is applied in seven steps, referred to as 'Step 1', 'Step 2' etc., which are exemplified by the seven steps in the above example. Various aspects of the method are further developed in Lieb (forthcoming: Sec. 5.3).
2.3 Comments Comment 1. The method is dependent both on theory and on empirical fact; it is sound if the semantic effects of syntactic accents are correctly viewed as in (1). Comment 2. The method is not a discovery procedure, for a number of reasons. (i) The initial choice of dialogue schemata is crucial. N o rigorous rules were formulated to govern this choice. Further restrictions are put on the initial set of schemata in Lieb (forthcoming: Sec. 5.3.4), but even they do not guarantee unique choices. (ii) The basic relation of proper response requires further study and explication. The less it is restricted the greater are the uncertainties in actual application of the method. (iii) The results of Steps 1 to 6 must be taken as tentative until all syntactic accents have been studied for all types of accent occurrences. Again, there is no discovery procedure for establishing the overall set of attitude/content pairs. Comment 3. The attitude/content pairs associated by Step 6 with an EXPANSION of a dialogue centre may not automatically be associated with
280
H . - H . Lieb
the CENTRE: I have not been able to completely exclude the possibility of expansions that introduce 'extraneous' attitude/content pairs which as a matter of fact should not be included in meanings of the centre. (i) An attitude/content pair is N O T associated with the set of centres if it is ruled out by the following test: For any completion of a centre, any utterance of the completion by the speaker, and any response by the HEARER to the utterance, if the hearer assumes in his utterance that the speaker has the attitude towards the content, then the HEARER's utterance is not a proper response to the speaker's utterance. (ii) An attitude/content pair IS associated with the set of centres if it passes the following test: For any completion of the centre, and any utterance of the completion by the speaker, there is a proper response by the HEARER to the utterance in which the hearer assumes that the speaker has the attitude towards the content. (Both conditions [i] and [ii] are sufficient ones. The two conditions extend, so to speak, dialogue schemata by allowing not just for a simple exchange Α-B but for an exchange A-B-A.) Comment 4. The method does not entail the view that the semantic effects of syntactic accents are exclusively 'discourse phenomena'. On the contrary, it accepts view (1) by which the semantic effects of an accent occurrence essentially consist in contributing attitude/content pairs to sentence meanings. The method exploits the possibility of discourses of a special type. A speaker uses a sentence with a meaning by which the speaker must have a certain belief concerning a hearer attitude, and he uses this sentence AFTER an explicit statement by the hearer that he (the hearer) actually has this attitude. We may thus get at necessary speaker assumptions by means of a discourse. However, the assumptions are required by the meaning of the speaker's sentence; they are not required by the hearer's previous utterance. Put differently, we study phenomena in the domain of sentence meaning by means of discourses; these phenomena do not become discourse phenomena by the fact that they can be so studied. True enough, they are extremely important for the structuring of discourse but this does not force us to exclude them from sentence meaning. I take the position that discourse structure should be partly explained by sentence meaning; my conception of sentence meaning is construed accordingly. Comment 5. The method of dialogue schemata would not be suited for isolating attitude/content pairs of the speaker that are not expressed in proper responses to utterances of the addressee. However, I have been unable to discover such pairs. The most likely candidates would be pairs associated with 'all-new' utterances (where the speaker assumes that he is saying something completely new to the hearer); it turns out that this situation is covered by the method (cf. Lieb forthcoming: [175], Comment).
Syntactic Accents
281
Comment 6. The method combines elements of two different informal approaches to the study of accents that have been frequently used in the literature. One is the investigation of non-contrastive accents by questionanswer pairs, the other the study of contrastive accents via expansions of the sentences in which they occur. It can now be seen why these methods can be partly but not completely successful. A question of the usual kind is a request for additional information on what is already partly known or believed to be true. In asking a question, the speaker expresses propositional attitudes and their contents which are connected either with the request for information or with the speaker's doxastic background. The person who answers the question may assume that the addressee of his answer has the attitudes expressed by the question and may in turn express this assumption by using a certain accent. Thus, a question-answer pair may be used to establish attitude/content pairs as assumed in (li). On the other hand, only a small number of propositional attitudes are associated with questions, which disqualifies the question-answer method as the only method for accent studies. Similarly, a speaker may express explicitly by an expansion of a sentence an attitude/content pair that would remain implicit in an utterance of the sentence. But such expansions cannot make explicit the attitudes with addressee-oriented content that are studied by the question-answer method; thus, the expansion method is also inadequate as the only method used. Comment 7. The method of dialogue schemata requires systematic variation of accents and accent places over given syntactic units or concatenations of units (Step 7). This limits the usefulness of studying recordings of spontaneous speech: even with a huge data collection, a study based on such material can only approximate the systematic variation required by Step 7. On the other hand, recordings of elicited material may be used in Steps 3 and 4 (presentation of sentence pairs), and recordings of actual discourse are useful for evaluating hypotheses on the overall semantic effect of a given accent. Comment 8. The method allows for the following situation AS A LIMITING CASE. (i) In Step 1 the linguist, who is a native speaker of D (the language or language variety investigated), defines a set of dialogue schemata whose sentences belong to a system S of an idiolect in D that is an idiolect of the linguist. (ii) In Step 2 the set is adopted as a set of dialogue schemata in S for D. (iii) In Step 3 the linguist is the only native speaker of D to be selected. Steps 4 and 5 are then applied as before, and Steps 6 and 7 remain unaffected; it is simply the case that the empirical data (speaker judgments) are
282
H.-H. Lieb
obtained in a particular situation in which the linguist may be tempted to make the data fit the theory. To minimize this danger, presentation of sentences, elicitation of judgments, and recording of judgments must be carefully kept apart, and intuitions concerning the semantic effects of accent occurrences that the linguist may have as a native speaker of the language must be recognized AND DISCARDED. It is a grave mistake to allow linguistic intuitions, which may have heuristic value in the very first stages of theory formation, to interfere with linguistic judgments in a situation of controlled data collection.
References Lieb, Η. (forthcoming). Accent and meaning. A study of syntactic accents, stress, and rhythm, with special reference to German. Lieb, Η. (1979). The universal speech function. A functional account of the relation between language and speech. In: Ezawa, K., Rensch, Κ. H.: Sprache und Sprechen. Festschrift für Eberhard Zwirner zum 80. Geburtstag. Tübingen: Niemeyer. 185-194
H E L M U T RICHTER
An Observation Concerning Intensity as a Predictable Feature of Intonation The present pilot study has its main interest in the applicability to intensity of methods such as the trend technique, and in their descriptive power. This technique consists in finding a strainght line which is optimal according to Legendre's criterion of least squares 1 . It is most closely related to the theory of Bravais-Pearson's correlation coefficient 2 and therefore involves all the problems of adequacy in cases of a curvilinear relation between two variables, well known in behavioural statistics. The focus of the study will be on the empirical data rather than on reflexions on statistical methodology. Here, however, it must be said that (1) the incomplete fitting of a mathematical function is less importantly suspicious on account of its suggestion of a structure where there is none, than on account of its liability to fail to grasp a structure; (2) the relevant acoustic information about intonation may be "coded" in features of the speech signal which lie beyond uni-directional processes (involving either an increase or a decrease of one parameter). I can now add that fitting straight trend-lines to intonation curves really does seem to do a good job for purposes of automatic phonetic analysis (Rietveld and Boves, 1979, Takefuta, 1979). This illustrates the first point above, for there is no original straightness in, for example, the i^-curve. Moreover, it can be said that intensity is (re-)gaining considerable interest in intonation research. As to the above point (2), inspection of the material available to me soon led me to the question whether the relation between the slopes of the trend-lines preceeding and succeeding an intensity peak might be indicative of the dynamics of speech. First branch is introduced to refer to the part of the intensity curve which precedes an absolute maximum (peak), the curve being the graph of a grosso modo monotonously rising function, or, on the other hand, to the straight line adapted to that curve section, and second branch to refer to
1
2
The present applications of the technique have a classical forerunner in 'phonometry', a statistically oriented phonetics of the thirties (Zwirner and Zwirner, 196611, p. 186-188). For a detailed discussion of this theory with respect to phonometrical applications see Richter, 1974.
284
Η. Richter
the graph of the grosso modo monotonously falling function or straight line following the peak. Then it can be stated that - configurations with a first branch slope greater in its amount than that of the second branch will show the intensity peak after less than half of the total time involved or, in short, will be left-asymmetrical\ while - configurations with a second branch slope greater in its amount than that of the first branch will show the intensity peak after more than half of the total time involved or, in short, will be right-asymmetrical {see Fig. 1).
Left-asymmetry
Right-asymmetry
—1/2—•+•—1/2 — 2
4
6
8
2
4
6
8
Figure 1
It is obvious that these "polar" types of configuration are allied to visual perception in terms of gestalt. Being immunized (as a disciple of Eberhard Zwirner) against any idolatry of curves, I abbreviated the procedure by letting two subjects classify the events in the material 3 according to whether they were 1 visually left-asymmetrical, r visually right-asymmetrical, or η neutral (undecidable or visually symmetrical) The instruction explicitly referred to the technique of least-square straight lines; nonetheless other stimulus properties, such as, for example, the area under the curve, may have influenced the judges. This open situation seems by no means invidious. Given that it is useful to examine the relation between pre-peak and post-peak intensity in terms of types of asymmetry, it is quite unsettled whether a comparison of slopes or a global visual response corresponds best to the relevant device in auditory perception. In the light of the initial point (1), one could even argue that judgements based on gestalt perception do not involve the fitting of a pe3
In fact my wife (a mathematician who segmented and measured registrations and wrote transcriptions for Zwirner's institutes for years) and myself were these judges.
Intensity as a Predictable Feature
285
culiar function, or that if fitting a straight line to a curve does not obscure a structure, reacting to the whole configuration will not have this effect either. Table 1: T w o subjects' judgement of intensity-type
Judge 2
1
r
η
201 3 22
2 77 9
25 22 79
228 102 110
226
88
126
440
Judge 1 1 r η
(number of curves to be judged)
The degree of agreement between the two judges is shown in Table 1. Only very few judgements came out as contradictory (pairs (l,r), (r,l)): 5 pairs about 1 percent) out of 440. There was little doubt that the neutral category was mainly applied to visually symmetrical configurations rather than in order to express the judges' being at a loss. Accordingly, the visual typology might be a triple one. The convention used in this pilot study in order to have one and only one specification for every curve somewhat arbitrarily gives priority to the non-neutral judgements. So (1,1), (l,n), and (n,l) were set equivalent to yield the left-asymmetrical or L-curves (248), (r,r), (r,n), and (n,r) were set equivalent to yield the right-asymmetrical or R-curves (108); similarly (n,n), (l,r), and (r,l)were set equivalent to yield the symmetrical or O-curves (84)4. The material used to test the supposition that there is some interest in the relation between pre-peak and post-peak intensity in terms of L, O, and R consisted in the intensity registrations of 256 utterances of the German ja and 184 utterances of the German »em by 30 subjects (19 male and 4
In a more detailed investigation special attention would have to be paid to judge-specific tendencies. Obviously judge 1 was polarizing more sharply than judge 2. Another critical point is the obvious violation, as a result of our convention, of a consistent order # 1) # n) # r in the diagonal and in both the marginal column and marginal line of Table 1 notwithstanding the difference in individual tendencies. At present, however, the only technical alternative to giving the non-neutral judgements priority would have consisted in haphazardly splitting the mixed judgement pairs such as (l,n).
286
Η. Richter
11 female students at the University of Bonn)' under the following experimental conditions 5 : In experiment I 10 subjects were given a set of 12 standardized yes-no questions, with the instruction to invent for each one a dialogue consisting of A's question (standardized)-»- B's answer (ja or nein)-*· A's comment. The performance of the dialogue (as a monologue, but 'as in radiodrama') was tape-recorded in a studio, which resulted i n l O x 12 = 1 2 0 utterances of ja and nein. In experiment III χ 10 subjects were given one of two sets of 16 standardized comments, with the instruction to use each one in a fictitious dialogue consisting of A's question «- B's answer (ja or nein) χ- A's comment (standardized) The remaining 2 X 10 X 16 = 3 2 0 elements of the present corpus were obtained by tape-recording these dialogues. 6 Standardization of the experimental stimuli (questions or comments, respectively) was as follows: It was assumed that questions can be interest-changing, that is tend to alter the selective concern of the person asked (B) about aspects of his "world". As to experiment I, we reduced these aspects to two — the person asking (A) and one section or event of the environment - reducing the presumed continuum of degrees of interest to the polar values ο ( o f f e n , open) and a (abgeschlossen, closed), and suppressed merely stabilizing questions (with zero-change). This results, combinatorially, in the 12 characteristics of questions, listed in Table 2 in the order in which they were given to the subjects. Here the first pair represents what we shall term B's inner situation presupposed by A, while the second pair represents what A intends to be B's inner situation. Within these pairs a value ο or λ in first position refers to B's interest in the environmental event, a value ο or λ in second position to B's interest in A. (So (ao, ad) is to mean that A presupposes that Β is uninterested in the relevant section of the environment but interested in A himself, A intending Β to lose his interest in A as well.) The alteration 5
I did the experimental work at the Institut für Kommunikationsforschung und Phonetik (IKP), University of Bonn, together with Rainer Seidel and Dirk Wegner. Another colleague of the IKP, Bruno Fritsche, provided us with pitch and intensity registrations of the yes-no answers. The experiment, not primarily undertaken for intonation purposes, was sponsored by the Deutsche Forschungsgemeinschaft, Bonn-Bad Godesberg. Intensity was analyzed with an integration time of 10 ms and an input resistance of 500 Ohm, and registered with a speed of 5 cm/s on a logarithmic scale ranging from 0 to 40 dB. ' In both experiments there was a second run prescribing nein where the subject had originally chosen ja, and ja where he or she had originally chosen nein. The data in the present paper are from the first run only. Experiments I and II in a sense complete the design of Richter, 1967 and Seidel, this design being A's question - + (+ Vv)]'
294
Η. Richter
η being the number of cases (120 or 320, respectively) and h and ν being the number of lines and columns 10 . A question to be answered before direct reference is made to the a! ocharcteristics and alteration formulae seems to be: can the asymmetry-type be predicted better when one knows to what degree the item in question tends to make the subjects choose ja rather than nein in their simulated dialogues? a) Asymmetry-type andja-affinity The items were trichotomized for each of the experiments (m: number of subjects responding with ja) into 7'd-affinity weak m< 4 jd-affinity medium 11 4 < m < 8 ja-affinity strong m > 8 These criteria have the result that 'weak' and 'strong' comprise the extreme quartiles in experiment I and the extreme 6 items ( = 19.4 percent each) in experiment II; thus 'medium' comprises the middle half of ranked items in experiment I and 61.3 percent in the 'middle' of the ranking order of items of experiment II. Table 5 combines the four contingency tables to be distinguished (two experiments, ./d and nein). Each contingency table gives the distribution of ja- or wei»-occurrences upon pairs of asymmetrytype and level of ^ - a f f i n ity. Percentages adding up to 100 per column are given in parentheses. Table 5: Asymmetry-type vs. ja-affinity
Experiment I ja weak medium strong R Ο L
2 (22.2) 4 (44.4) 3 (33.3)
13 (39.4) 11 (33.3) 9 (27.3)
6 (25.0) 8 (33.3) 10 (41.7)
9 33 24 (99.9) (100.0) (100.0) 10
11
nein weak medium strong 21
R
23
Ο
22
L
66
1 (4.8) 2 (9.5) 18 (85.7)
_ 1 (3.7) (0) 2 (7.4) (0) 24 6 (88.9) (100.0)
21 27 6 (100.0) (100.0) (100.0)
2 4 48 54
The reader is assumed to be familiar with the x2-procedure applied to contingency tables. Concerning the correlation coefficient as a measure of the goodness of estimating or predicting values of one variable on the basis of given values of another variable, see e.g. Richter, 1967, p. 334/5. The double criterion value was applied to item 17 of the comments (20 subjects).
Intensity as a Predictable Feature
Experiment II ja weak medium strong R Ο L
11 (52.4) 4 (19.0) 6 (28.6)
nein weak medium strong
34 (29.6) 23 (20.0) 58 (50.4)
24 (44.4) 10 (18.5) 20 (37.0)
69
R
37
Ο
84
L
21 115 (100.0( (100.0)
54 (99.9)
190
100-1
Figure 2
295
4 (10.3) 8 (20.5) 27 (69.2)
10 (11.8) 11 (12.9) 64 (75.3)
2 (33.3) 1 (16.7) 3 (50.0)
6 39 85 (100.0( (100.0) (100.0)
16 20 94 130
296
Η . Richter
The correlation estimates are: ja, experiment I .23 ( χ 2 = 2.3771, df= 4; n.s.) nein, experiment I .15 (χ2 = .8687, df= 4; n.s.) ja, experiment II .23 (χ2 = 6.8345, df= 4; n.s.) nein, experiment II .22 (χ 2 = 4.1316, df= 4; n.s.) We shall use these values in order to assess gains (or losses) in predictability on the basis of different variables12. Figure 2, where use has been made of the percentages in Table 5, reveals systematic variation of the intensity 12
None of the chi-squares is significant, even though the tables for nein give rise to rather small 'expected values' in some cells which might lead to overestimations of the respective deviation from the 'observed values'. According to Lorenz's rule of thumb for evaluating single chi-square summands (comparing the summand with a criterion obtained by dividing the threshold value for significance, here for Ρ = .05: 9.49, by the number of (inner) cells of the table, here: 9), in the table for ja, experiment II, (R, weak) can be said to be overrepresented more than accidentally, (R, medium) and (L, weak) can be said to be underrepresented more than accidentally; (R, strong) and (L, medium) (overrepresentation) and (L, strong) (underrepresentation) come close to fulfilling the criterion:
An important technical detail is that (non-)significance of a contingency table cannot simply be transferred to the derived correlation estimates. For the test a, b, and c it should be kept in mind that a "factual" correlation would differ significantly from Ο in experiment I (df= 64): at the 5-percent level if it was » .24 or greater at the 1-percent level if it was » .35 or greater
in experiment II (df
0
00
- . 1 3 + .20 + .22
-.14
sum of column
0
'inter ided' aa oa
ao
4
0
1
2
8
+ .50
-.05"
7
+ .33
-.07'
6
s
5
-.50
-.33
+ .75
-,08
0
-.57
0
+ .43
0
-.14'
4
-.14
-.10
0
-.14
0
-.24'
3
- . 3 3 + .13 - . 4 0
-.60
-1.20'
2
-.57
-.50
-1.47'
1
-.40
0
-.40
-1.50 -.03
3
+ .73>
0
-.47 -.86
-.10
+ .22
0
+ .13 - . 3 3 0
- . 1 3 + .20 + .44
sum of line
ph
aa
4
-.75 -.97
1
2
+ .70 3
4
In Table 9 μ is given (inner cells) as a function of (alteration formula, 'presupposed) and of (alteration formula, 'intended') for the questions, and of (alteration formula, 'presupposed'), (alteration formula, 'intended'), and (alteration formula, 'resulting') for the comments.15 The lines and columns were ordered with increasing sums of μ. They were accordingly given 15
Item 1 was evaluated by m^ = - . 0 8 .
302
Η. Richter
rank numbers ξh and ξ ν , and ξ was defined as the sum of these numbers for a cell: Shv: = Sh + ξν (4) Some correlation between μ and ξ must lead to a concentration of negative μ in the lower left area (low ξ^, ξ ν combining to low ξ) and a concentration of positive μ in the upper right area (high ξ^, ξ ν combining to high ξ) of the matrices. Cells with negative μ are marked by small triangles; so it can be seen empirically to what degree the concentration takes place16. As a result, we have a considerable further increase of coefficients: 'presupposed', experiment I Γμξ = + .80 (s. at 1-percent level, df= 10) 'intended', experiment I ίμξ = + -40 (n.s.) (N= 12; τηξ = 4.5, ίξ = 1.3844; m^ = -.016, ί μ = .3654) 'presupposed', experiment II ?μξ = + .46 (s. at 1-percent level, df= 29) (N= 31; m^ = 6.936, ς = 2.5645; m^ = -.079, ί μ = .3566) 'intended', experiment II Γμξ — + .46 (s. at 1-percent level, df= 29) 'resulting', experimentII Γμξ = + . 5 1 (s. at 1-percentlevel, d f = 29) (N = 31; τηξ = 6.903, ίξ = 2.5318 [different location of the empty cell for item number 1], mμ and as above) 17 What has been observed then, concerning the predictability of intensity patterns of ja on the basis of our interest-changing questions and comments on answers to such questions, is a general tendency for quantitative expressions of predictability (correlation coefficients I r l a n d Γμξ) to increase when the number of non-specified componente of the vectorially 14
The correlation indeed depends on an order in the data, which can be shown with the following possible matrix where the technique would obviously fail:
-1 +1
+1 -1
0 0
0
0
0
Since the numbers of responses per item are small, μ might have come out pseudo-exact. Therefore also correlations r ^ were calculated, μ ^ ε ^ round values attributed to the elements of the equivalence classes of positive μ (μ' = + 1), negative μ (μ = — 1); for μ = 0, μ' = 0. The outcome is very similar: 'presupposed', experiment I 'ξμ = + -73 Γ = 'intended', experiment I ξμ + -45 'presupposed', experiment II rξμ = + .42 'intended', experiment II 'ξμ = + -43 'resulting', experiment II 'ξμ = + -50 (significances like those for Γξμ, with the exception of .42 and .43 which are significant at the 5-percent level for df = 29).
Intensity as a Predictable Feature
303
Maximal coefficients obtained
1.0 τ
questions
comments
0.5 -
./'a-affinity A
0
χ
a alteration formula b d/o-characteristic c combined
Β
C Number of Η non-specified 0 components
Figure 3
characterized questions and comments decreases. Figure 3 clearly brings out this general rule. In Figure 3 the maximal correlations obtained are plotted against the number of components not specified. The average values would have given rise to a very similar picture, with the exception of a/o-characteristics for comments where the obtained minimal value of .05 forces the average (.17) under the value for mere y'd-affinity (.23). This, however, is misleading rather than simply undesirable. As is suggested by the considerable span between minimal Τμξ (.40) and maximal r ^ (.80) for questions, there can be more or less effective 'stimulus scaling' (in terms of ξ; 'response scaling', also tentative, is in terms of μ). More generally, there can be a grouping of data more or less effective for the purpose of revealing a covariation; so levelling the differentiation in the 'resulting'-component seems less appropriate (I f\ = .05) than levelling the differentiation in 'intended' ( | f | = .29) and in 'presupposed' (I f | = .27)18.
18
In this sense also a grouping according to 'type of transition' (sc. of change in equal positions of a pair in different components) is less effective than the groupings according to alteration formula and characteristic (questions):
304
Η. Richter
For similar reasons I tend to be cautious about explaining the lesser degree of predictability for the comments in terms of phenomena such as difficulties on the subjects' side in coping with a more complex organization compared with the questions. In experiment II the verbal reactions to the comments were highly specific, so that missing a correlation as high as .80 may well be due to the research worker's difficulties in finding an effective or appropriate stimulus metric. This leads to another ground for caution. When at the beginning of this text auditory perception was mentioned, any formulation was avoided which might suggest a direct perception of right-asymmetry vs. leftasymmetry. I shrink from interpretations in terms of, say, hesitating vs. precipitating sound gestures. These must be possible with nein as concomitant too; nein, however, turned out almost exclusively to be left-asymmetrical. Strictly speaking, the predictable entities are, as has been seen in the case of both the contingency tables and μ, proportions of responses rather than shades of the individual's reaction. e) Asymmetry-type and F^ Intonation is carried by more parameters than by intensity, and the present text was not intended as an argument "against F0 (or pitch)". In a forthcoming article19 I will, instead, study the covariation of situational characteristics, stated in terms of 'open' vs. 'closed', with in greater detail than when compared with the present observation regarding the asymmetries of intensity. There is one question left, however, to be answered immediately: to what degree is there a correlation between intensity and Fq} Were this degree a high one, the observed regularities might eventually have to be attributed to Fq rather than to intensity.
18
(Fortsetzung) and are heads of constructions in the sense of X-bar theory. X usually ranges over {5, NP}, and i over {1, 2} e. g. 5° = S; S1 or S immediately dominates S, S2 or S immediately dominates S, etc. Ed.) Chomsky's hypothesis, ECP, essentially requires that the empty element be properly governed; Kayne's insists on the relation between the antecedent and the empty element: the chain of 'superscripts' must not be broken; Gueron's, CCC, is relevant in cases where a variable is free in the 'presupposed' part of the sentence. Within the ECP framework, the natural claim is that prosodic binding creates an abstract governing category**) (an abstract S?) which functions as an absolute barrier for government and prevents empty categories from being linked to their antecedents: (40) Prosodic binding creates an abstract governing category7 Thus, (40), if true, might explain observation (18). In this case, one must note that although structure building rulesW for LF may not be desirable 7
O n e can explain why the relation between an Α-bound category and its antecedent is not affected by the 'governing category' created by prosodic binding, in suggesting that in this case the coindexing takes place before the matching of syntax and prosody.
Logical Form and Prosodic Islands
321
if they are extensions of syntax - because of the projection principled) they may be desirable if they are extensions of the prosodic component, where the projection principle is irrelevant. The prosodic restructuring of the domain of the empty category may be compared, to a certain extent, to the restructuring resulting from readjustment rules in phonosyntax. This hypothesis accounts for the data presented so far, involving W H Movement, WH-Raising, Q-Movement, Q-Raising, etc. However, it does not explain why it is prosodic binding and not other comparable phenomena, which create prosodic islands. In (19) we saw that the flat contour following contrastive stress does not correspond to such an island. And there exist other rules which break the intonational curve into several prosodic constituents, e.g. parentheticals, without creating prosodic islands; (41) is perfect: (41) Et qu'a-t-elle cru, dit Jean, que Paul a v u ( e ) ? (And what did she think, said John, that Paul has seen?) Gueron's CCC, given in (39), predicts that a constituent included in the presupposed part of a sentence cannot contain a 'hole', a free variable. One may reasonably compare the notion of presupposition, and the notion of 'relation to context'; they look similar. They are very close, but in fact, as noted above, not exactly identical: in (3), (4), (7), etc., we have seen that the deaccented element may carry new information. So it could be that the term presupposition is not really appropriate here. Because it is necessary, anyway, to explain why prosodic binding, and not parentheticals, creates prosodic islands, I will suggest slightly extending Gueron's CCC, replacing the term 'presupposition' by a more general one, say 'discourse-bound'. Constraint (39) could be rewritten as (42): (42) A complete constituent X' may not contain a variable not bound in X. (A complete constituent (. . . = 27); a CC is discourse-bound.) Gueron's hypothesis would have the advantage of including the present analysis in a general 'functional' analysis. However, her constraint is crucially concerned with variables. Therefore it cannot for cases where empty categories, although A-bound, are not bound by quantifiers. 8 Clitic-Placement(x) is one such case.9
8
Anaphors must receive special consideration. They are problematic in the context of this paper, for two reasons. First, it is difficult to find agreement among native speakers on such data, since it is difficult to construct natural discourse involving prosodic binding and anaphors; informants generally tend to reconstruct contrastive situations. Secondly, the acceptability (or not) of such data may be important for the model itself. See Appendix 2 to this article for further discussion. ' Riny Huybregts (unpublished work) explores the hypothesis that the clitic is an A-binder. Similarly, Aoun, in his GLOW paper delivered at Göttingen (1981), explores this hypothesis.
322
Μ. Ronat
(43) (A: Paul est diabetique) B: *Et Marie lui a laisse [manger (e) des gäteaux] C: Et Marie l'a laisse (e) [manger des gäteaux] (A: Paul is diabetic. B/C: And Mary let him eat cakes.) (44) a. (A: On dit que Jean est etudiant aux Beaux-Arts) B: Oui, je Tai vu (e) [peindre aux Beaux-Arts] (A: They say that John is a student at the School of Fine Arts. B: Yes, I saw him paint at the School of Fine Arts.) b. (A: On dit que cette voiture a ete peinte aux Beaux-Arts) B: *Oui, je l'ai vu [PRO^RB peindre (e) aux Beaux-Arts] (A: One says that this car was painted at the School of Fine Arts. B: Yes, I saw it being painted at the School of Fine Arts.)
IV Theoretical Considerations On the basis of the data available so far, it seems very difficult to choose between the two hypotheses, extended ECP or CCC, the former having the advantage of empirical adequacy, and the latter the advantage of explaining the contrast (15.b)/(41). One can speculate that a solution may be found in a third hypothesis, which would subsume the advantages of the preceding ones. This solution will depend crucially on the place attributed to intonation in core grammar. Previous studies either rejected intonation as being outside of the domain of competence (and it is true that emotions, for instance, are also expressed by means of intonation), or they included it inside the grammar, but as a part of the phonological component, very close to stress phenomena. See Liberman's work (op. cit.). Selkirk's (1978) proposals, however, constitute a first departure from the standard position: they establish that prosody must be represented by a set of metrical trees, independent of syntax, which expand autonomous prosodic categories. These prosodic categories are supposed to be matched with syntactic trees, at the level of surface structure, on the left side of the model (cf. Appendix 1 to this article. Ed.) But the data presented in this paper show that intonation must look at S-Structure, at Logical Form and perhaps at deep structure, too: obviously, it is not sufficient to propose a matching between prosodic trees and surface structure. The idea which seems natural, then, is that intonation may function as a component which is autonomous from but related to syntax, filtering sequences generated by the syntactic component from a 'third' dimension, to refer to a concept used by Vergnaud in another context.
Logical Form and Prosodic Islands
323
The third solution might adopt a more abstract point of view than syntax and prosody, and say that Universal Grammar may define what counts as a governing category in each component. The rhythmic notion of 'repetition', for instance, could subsume the recursive nature of syntactic governing categories (TVPand 5),10 and the nature of discourse binding. Then (40) and (42) could be tentatively restated as (45): (45) An Ä-bound category must be bound within all the governing categories in which it is embedded. Needless to say, much work remains to be done in this area. In any case the evidence presented in this paper strongly supports a theory which postulates the existence of empty categories over one which does not, and strongly supports the linguistic reality of the antecedent/empty category relation, since it can be 'heard', indirectly. Moreover, the same evidence indicates that an important part of intonation must be treated as part of linguistic competence; it would be strange to postulate that rules of performance could be based on the PRO/WH-trace distinction. Appendix 1: Terminological
Explanations
Generative grammar is considered as a set of autonomous but interrelated subcomponents which establish correspondances between sound and meaning through syntax according to the following schema: BASE 1 Lexicon + Deep syntactic structure
Π
SYNTACTIC TRANSFORMATIONS S( = Surface syntactic)-structure PHONOLOGICAL INTERPRETATION
SEMANTIC INTERPRETATION
Surface structure
Logical Form
I
10
I
Notice that for the purpose of our discussion, the syntactic definition of governing category does not include the notion of 'accessible subject'. In fact, the examples presented in this paper show that the constituent under prosodic binding must contain the governer of the empty category. This must be taken into account in case (29).
324
Μ. Ronat
'Logical Form' is the name of the component in which linguistic theory describes the syntactic aspects of semantic interpretation: either because the input of semantic interpretation crucially depends on syntactic structure, or because semantic rules presents a syntactic form. The level of Logical Form essentially describes rules of quantification, negation, etc., and must be distinguished from 'Semantic Interpretation II' which takes into account meanings. Its formalism includes standard logic (predicate calculus, variables bound by quantifiers etc.). It supposes the existence of semantic movement rules (invisible in surface structure) like 'QuantifierRaising' which raises the Quantifier to the beginning of the sentence. So the surface-structure: Pierre est heureux avec une femme may have a semantic representation similar to: (3x) χ is a woman and Peter is happy with χ Other definitions: - A-bound/Ä-bound: Bound by an argument/bound by a non-argument. An argument is a subject or a subcategorised (direct) complement of the verb; a non-argument is not in these positions (for instance, the WHelement in questions is in 'Comp', that is, attached to 'S' (the sentence). - Binding Theory (see Chomsky, 1981): The binding theory summaries in two principles the properties (complementary distribution) of pronouns and anaphors. Moreover, the relation between full and empty elements to their antecedent are explained by these principles. - Case-marking: Case is assigned to NPs by their governor (see below), even in languages which do not show overt Case paradigms. Case is obligatory for full elements. Verbs and prepositions assign (in English) Objective and Oblique Case. Tense assigns Nominative Case; Genitive Case is found in [NPX] Structures. - Clitic-Placement is a syntactic transformation which moves the pronoun from the argument position to preverbal position. - Core grammar: the grammar described by the schema above. It is supposed compatible with universal grammar conditions. Outside of core grammar are exceptions to general rules. - Govern: Verbs and prepositions (properly) govern their subcategorised complements. Tense governs the subject of the sentence. - Governing categories: constituents in which a governer can govern a complement (NP and Sor S). - Heavy-NP-Shift: (Stylistic?) transformation which inverts two complements of the verb, the longest going to the right of the shortest. - Island: 'frozen' constituent, ie. constituent from which nothing can be extracted nor refered to by a semantic rule. - Projection principle: this principle says that once a verb is described as having some subcategorisation properties in the lexicon, it must keep them during the whole derivation, including logical form.
Logical Form and Prosodic Islands
325
- Structure building rules: create syntactic structures (consequently, change the subcategorisation of verbs) during the syntactic derivation of the sentences. - Wanna contraction: Want+ to can become wanna when the empty category is PRO (who do you want PRO to see (e)) but not when the empty category is a trace (who do you want (e) to go there). - WH-Raising: Movement rule in Logical form, 'moving' the WH-element in multiple questions (who said what, etc.). The concept of 'empty category' is one of the most important in recent work in generative grammar. It refers to syntactic categories whose presence is indirectly attested by some grammatical rule, although they have no phonetic content. Empty categories are: 1) 'PRO', which represents the empty subject of infinitivals or gerunds (John thinks that [PRO to feed himself]/[PRO feeding himself] will be difficult); 2) 'traces' left by a syntactic constituent moved by a syntactic movement rule; for instance, 'WH-Movement' moves the WH-element from its deep structure position you gave what to Paul, to what did you give (e) to Paul, where (e) is the 'empty trace' of what; 3) variables: empty element at the level of logical form, bound by the antecedent Quantifier (see example above). Note the analogy between WH-(e) and Q-x. Appendix 2: Data for Anaphora and Default Accent Personally I reject sentences containing a default accent on lexical anaphors. For instance, for me (46) is a well-formed discourse, whereas (47) is not: (46) A: Paul a parle pendant des heures au telephone. B: A propos, les Dupont ont casse [le telephone] (Paul talked for hours on the phone/ By the way, the Duponts broke the phone) (47) A: Paul et Marie ont parle l'un de 1'autre pendant des heures. B: *A propos, les Dupont se souviennent [l'un de l'autre] A: Paul and Mary talked about each other for hours. B: By the way, the Duponts remember each other. The problem with (47b) is that one can get a derivative interpretation making it acceptable (Gibbon, personal communication). If the discourse tends to mean that there is an opposition, a contrast, between to talk and to remember, implying for instance that Paul and Mary are good friends, and the Duponts are not good friends anymore. Then the discourse is acceptable. Moreover, one must avoid the interpretation: 'it is not true that the Duponts do not remember each other'.
326
Μ. Ronat
If the anaphor/antecedent relation is affected by prosodic binding, then the binding theory can handle this case, since it says that an anaphor cannot be free in its governing category - while the CCC is inoperative, since no variable is involved. But in this case one can see a discrepancy between NP-traces and anaphors (see (35)-(36)), which must be taken into account in some way. Lack of space prevent us from discussing this discrepancy and other phenomena such aus inalienables, idioms, discontinous constituents, etc. These questions will be dealt with in a subsequent study.
Reference Adjemian, J. C. (1978). A Functional Generative Theory of the Structure of French: Intonation and the Problem of Syntax. Unpublished dissertation: University of Washington. Aoun, J., N. Hornstein & D. Sportiche (1981). On wide scope quantification. Journal of Linguistic Research 1 (3). Bing, Janet M. (1979). Aspects of English Prosody. P H . D.: University of Massachusetts, Amherst. Bolinger, D. L. (1977). Meaning and Form. London: Longmans. Chomsky, N. (1981). Lectures on Government and Binding: the Pisa Lectures. Foris: Dordrecht. Gu6ron, J. (1980). Logical operators, complete constituents, and extraction transformations. In May & Koster (eds.), Levels of Syntactic Representation. Foris: Dordrecht. Guferon, J. (1981). Remarques sur la representation de la quantification. In Attal, P. (eds.), Actes du Colloque "Syntaxe et Semantique", U. de Haute-Bretagne. Halle, M. & Vergnaud, J. R. (1979). Metrical structure in Phonology: a fragment. MIT: unpublished paper. Jaeggli, O. A. (1980). Remarks on To contraction. Linguistic Inquiry 11: 239-245. Kayne, R. S. (1981a). ECP extensions. Linguistic Inquiry 12: 93-133. Ladd, R. (1980). The Structure of Intonational Meaning. Bloomington: Indiana University Press. Liberman, Μ. Y. (1975). The Intonational System of English. Ph. D.: Cambridge, Mass.: ΜΓΓ. Liberman & Prince, A. (1977). On stress and linguistic rhythm. Linguistic Inquiry 8: 249. May, R. R. (1977). The Grammar of Quantification. P H . D.: Cambridge, Mass.: MIT. Safir, K., ed. (1979). Papers on Syllable Structure, Metrical Structure and Harmony Processes. MIT Working Papers in Linguistics I. Selkirk, L. (1978). On prosodic structure and its relation to syntactic structure. Unpublished paper, Indiana University Linguistic Club. Sportiche, D. & H . Koopmann (1981). Pronouns and the bisection principle. Unpublished paper, Montreal (UQAM).
PETER WINKLER
Interrelations Between Fundamental Frequency and Other Acoustic Parameters of Emphatic Segments
1.
Aims
Following the classifications of intonation and pitch contours into certain levels described b y ' t Hart & Collier (1975), f D movements and perturbations (jitter) exist at the "atomistic" level of pitch or intonation patterns. It is evident that f D is sometimes absent in small parts of utterances, and sometimes the listener does not hear a pitch, but the gestalt of an intonation contour and the general pattern of intonation results from other properties than f„ alone. Pitch contours that have rather different 'fine structures' with respect to their f 0 variations do not appear to have any linguistic relevance (Lieberman 1974: 2434). It is possible that f D movements and interruptions are perceived not as the pitch contour or as a fundamental sound of the voice, but as a paraphonetic marker - as sound which is not specific for language units like phonemes or intonemes. The minimal pitch movements may be a supplementary criterion for speech perception; these phenomena are investigated exhaustively in reports concerning speech perception and the function of voice onset time (VOT). For example, Summerfield, Bailey, Seton & Dorman (1981) showed that the perceptual difference between /slit/ and /split/ is connected with the typical movement of intensity and f Q within the segments. In real, natural utterances there is a large scale of f D changes which are necessary neither for the pitch contour nor the phonematic specific V O T nor the coarticulated phoneme boundary. These f Q -changes are, in articulatory terms, to a high degree 'uneconomical'. If a voiced [z] in fluent speech is spoken as a devoiced but lenis [s] (or [z]) with the vocal cord vibrations starting within the segment, it is the normal coarticulatory effect and the usual realization of a phoneme boundary. But if the whole segment [z] in connected speech is voiced, this part of the utterance sounds 'unusual', the missing intrinsic voice onset 'means' something in a paraphonetic sense (something affective, personally typical, interactionally meaningful, sociolectally important, etc.). Surely there is no overt reflection about what has caused this sound. It manifests a specific character of the utterance which will be classified by the listener as 'emphatic' or 'conspicuous'. In such cases . . lis-
328
P. Winkler
XTR
RH0
WGL
PGL see . 250 .
230 .
158
J
100 J 50
RKM ose
Figure 1: Acoustic parameters of an emphatic utterance (female speaker) AKM = maxima of autocorrelation function PGL = intensity WGL = speed RH0 = zerocrossings XTR - minimax S 2. 2,23 ζ, Μ 3.« £.3C I I I I n n H I I I 1 1 II I I I I H I t I I I H I I 1 I I i I U I I l-ll I t I 1approximation 1ι1I
Acoustic Parameters of Emphatic Segments
329
teners are unable to differentiate the contours at any linguistic level though they may ascribe different emotional contexts to the contours." (Lieberman 1974: 2434). F 0 changes within a segment are in respect of paraphonetic information not atomistic, but essential features (if we interpret the term 'feature' not as a certain paraphonetic 'register' with a fixed meaning or a fixed acoustical structure). In Fig. 1 an example is given where the speaker inserts a coarticulated superfluous interruption of f D between the boundary of [m] and [a]. Simultaneously the noise portion shifts (see parameters AKM = mean of the autocorrelation function, and RH0 = zero crossings). This sequence is a part of the utterance " . . . da haben die mir lauter R's gezeigt, und mal war's verkehrt 'rum und mal war's richtig 'rum . . The speaker has seen some laterally inverted occurrences of "R". The sequence is spoken in a colloquial style, very emphatically (ironically; perceived as odd) and with many deviations from the usual articulation. A paraphrase might be "This was a crazy test; I had to identify letters in normal and inverted position I have no idea why this test was conducted". The German semantic unit "und mal war's" (sometimes it was) has completely switched over to the meaning "whatever it was, it was crazy". The meaning of this sequence is independent of the semantic units; the actual meaning is constituted by phonetic means at the moment of pronouncing the three words. If we look for the acoustic pecularities which caused this effects we can exclude the pitch contour as a whole. The paraphonetic information lies well below the level of intonation, even sometimes under that of phonemic segments. Additional features like voice timbre are present only in small degrees (the spectral information of the speech material is limited to below 10 kHz). We can also neglect the influence of accent (in the sense of Lieberman, 1974: 2434) or stress (in the sense of Jakobson, Fant & Halle 1965: 15). Because acoustic parameters are used as well as linguistic and paralinguistic markers, the paraphonetic information cannot be an attribute of some particular 'substance' (like the on-off character of fG or the absolute signal to noise ratio). We chose a "dynamic" hypothesis: the paraphonetic information marks the manner of realizing the linguistically necessary configuration. Let us continue with the same example. In observing the developmental characteristics of all parameters, some typical and some atypical curve types appear. The prototypical sound pattern of [t] has lack of f D and, simultaneously, increasing-decreasing movements of intensity (PGL), speed (WGL), and zero-crossing (RH0) likewise. Atypical is the strong increase in speed within the segment [s], but the form of f 0 , RH0 and intensity (PGL) seems to be 'normal'. We can check this configuration by listening to the tape: this segment indeed sounds striking. The paraphonetic information is marked by dynamically changing the phonematically necessary combination of acoustic parameters. These changes consist of short-term
330
P. Winkler
replacements of the standard curve types by other curve types. The new configurations are not totally 'new' patterns, but they start from the phonematically predetermined constellation. The change has to produce a contrast effect in order to be audible as non-linguistic marking. The redundancy will be used to interpolate further information; this can be done by intensification, interruptions, supplements etc., in respect of one or more parameters. This results in changing the combination of curve types within a segment or segment-boundaries. It also follows from this that neither particular curve types nor particular parameters can carry the information alone, but rather the configuration as a whole. The resulting new combination may perhaps be prototypical for some types of affect, situation and so on. This will be ignored in the following analysis; the purpose is only to list and calculate the combination patterns and to look for prototypical differences between neutral and emphatic sequences. To this end, neutral and emphatic utterances of two speakers were selected from a dyadic live conversation. The sequences were classified into seven curve types of four acoustic parameters at the segmental level for computing the probabilistic combinations and comparing the two sorts of utterances. Only the dynamic characteristics of parameters will be considered (such as curve direction: increasing or decreasing, etc.), not absolute frequency (Hz) or intensity (dB). The results describe the combinatory possibilities of f G with other parameters in a comparison of neutral and emphatic segments.
2. Material and Methods Material: Tape recordings of a non-prestructured dyadic conversation; no instructions are given to the speakers and there was no phonetic preevaluation, or eliciting of particular conversational articulatory styles. (Material from the TAKE D of the Konstanz project "Analyse unmittelbarer Kommunikation und Interaktion als Zugang zum Problem der Entstehung sozialwissenschaftlicher Daten", financed by the Fritz-Thyssen-Foundation, headed by Professor Th. Luckmann and P. Gross.) Recording of the sequences took place in a normal echoic room (film studio), with separate microphones (Beyer Dynamic) and recorders (NAGRA, 19 cm/sec) for each speaker. Acoustic analysis: Digital signal analysis (f 0 : autocorrelation method; intensity: smoothed absolute amounts; speed: differences x n + i - x n ; RH0 = zerocrossings); computer-aided segmentation and transcription (PDP11/50 processor and software by the Institut für Phonetik und sprachliche Kommunikation der Universität München, headed by Prof. Η. G. Tillmann).
331
Acoustic Parameters of Emphatic Segments
Selection and classification: 245 segments of 25 emphatic utterances; 244 segments of neutral utterances. 'Emphatic' is the category for all sequences which seem to be non-neutral, non-normal, non-standard or non-factual/detached. The curve plots were arranged into two categories with the support of listening to the tapes; after this the curves within the segments were stylized as: / increasing ->• stable or = 0 \ decreasing Λ in-/decreasing V de-/increasing f abruptly increasing or abruptly decreasing or finishing within the segment (likewise without considering the placement within the segment). The curve characteristics were assessed for general tendencies; jitter and small deviations were ignored. The type of curve is related to the whole segment which is constituted by auditive evaluation ('phonemic segment'), not by acoustic criteria. More than these seven curve types were anticipated, but did not appear. In the example of Fig. 1 the segment [t] has been classified as Λ Λ Λ"; the segment [ο] as "\ Λ /-»·"; [m] as "\ / Λ A". The frequency of the curve types was listed in a matrix, the combinations were calculated following the proposals of Altmann & Lehfeldt (1980: 295f.).
3.
Results
The 244 neutral segments are, in detail: 48 open vowels, 43 closed vowels, 25 voiced plosives, 9 unvoiced plosives, 20 voiced and 52 unvoiced fricatives, 32 nasals and 15 liquids. The usual phonetic classification of the emTable 1: Combinations of parameters and curve types emphatic vs. neutral speech type parameter Fo PGL WGL RH0
Σ it S KS r X!
emph. neuL emph. ncut emph. neut emph. neut emph. neut. emph. neut emph. neut. emph. neut.
\
/ M