Systems of Prosodic and Paralinguistic Features in English [Reprint 2021 ed.] 9783112414989, 9783112414972


208 40 9MB

English Pages 94 [96] Year 1964

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Systems of Prosodic and Paralinguistic Features in English [Reprint 2021 ed.]
 9783112414989, 9783112414972

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

SYSTEMS OF PROSODIC A N D PARALINGUISTIC FEATURES I N ENGLISH

JANUA LINGUARUM STUDIA M E M O R I A E NICOLAI VAN WIJK DEDICATA edenda curai

C O R N E L I S H. VAN S C H O O N E V E L D STANFORD UNIVERSITY

S E R I E S MINOR NR. XXXIX

1964

MOUTON & CO. LONDON

• THE H A G U E •

PARIS

SYSTEMS OF PROSODIC AND PARALINGUISTIC FEATURES IN ENGLISH by

DAVID CRYSTAL and RANDOLPH QUIRK

1964 M O U T O N & CO. LONDON

• THE H A G U E •

PARIS

© Copyright 1964 by Mouton & Co., Publishers, The Hague, The Netherlands. No part of this book may be translated or reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publishers.

Printed in The Netherlands

ACKNOWLEDGMENTS

The debt we owe to previous contributors in the field is quite inadequately represented in our footnotes and other references. In much of the basic work underlying the study, we have enjoyed the co-operation of present and former colleagues on the Survey of English Usage, University College London, particularly Jan Svartvik (Assistant Director), Derek Davy (who has also helped us by a judicial reading of the manuscript), and J. P. L. Rusiecki (now of the University of Warsaw). We are grateful to Dr. Sarah Gudschinsky (Summer Institute of Linguistics) for detailed criticism of the manuscript and for highly skilled advice on specific points throughout; to R. W. P. Brasington and other members of the Department of Linguistics, University College of North Wales, especially for help in preparing the spectrographic material. More pervasively, we owe a great deal to the many discussions we have had with M. A. K. Halliday, A. C. Gimson, and other colleagues in the linguistic field at University College London. Above all, we would express our indebtedness to J. D. O'Connor, among whose former students we are both fortunate to count ourselves, and whose specialised knowledge and valuable time have been generously at our disposal in the preparation of this monograph. None of our colleagues and friends, however, can be held responsible for the inadequacies which stubbornly remain. March 1964

D.C. - R.Q.

CONTENTS

Acknowledgments

5

1. Introduction

9

2. Developments in Paralinguistic Study

14

3. Categories of Paralanguage

32

4. Prosodic Features

44

5. Conspectus

63

Appendix: Spectrographic Evidence

74

Index to Authors Cited

93

1

INTRODUCTION

The present study seeks to set up a framework in which all the prosodic and paralinguistic features of connected speech can be accommodated and which can facilitate investigation of the relations that such features have to each other and to grammar. The framework is that postulated to account for the data observed in the Survey of English Usage, and the occasion will be used not only to explain the symbology appearing in current and forthcoming publications of the Survey's work but also to explain and justify our observing a particular set of distinctions rather than more or fewer distinctions or a different set of distinctions. We shall not attempt, however, to place within the framework every feature that could conceivably be of linguistic interest, though we trust that the placing of our own particular set will be sufficient to demonstrate the framework's total adequacy; still less can we attempt to demonstrate all the interrelationships of points in the framework at which prosodic and paralinguistic features operate. Special - but not, we trust, undue - attention will be given to certain aspects which have recently interested us or which have in the past received less adequate treatment than seems appropriate. In the latter connexion, we are particularly concerned to see a degree of systemic relationship among several of the "non-verbal" features comparable to that more commonly seen in only one of them, intonation. The entire formulation and the observation underlying it are naturally the joint product of a continual interplay of hypothesis and data. The data are recordings amounting to 30,000 words (over three hours) of spontaneous English speech in the form of discussion between a total of 31 educated British adults. The recordings are the property of the Survey of English Usage and are referred to by the code numbers they bear in the Survey collection;

10

INTRODUCTION

they are accessible in University College London to interested scholars. Two basic assumptions must be stated as underlying the present study. The first is that in examining a stretch of language one can distinguish between aspects of the phonic continuum which have a direct and identifying relationship to the words selected, and other phonic aspects which are essentially variable in relation to them. We may contrast the variations that can occur in respect of loudness, tempo and pitch in uttering "Dinner" with the constants (alveolar voiced plosion and nasality, for example) which are word-identifying. It is with the description of the "non-verbal phonology" that we are here concerned and that we are calling the "prosodic and paralinguistic" features of an utterance. 1 The second assumption is that a basic distinction can be made - and is made by listeners habitually - between "personal" and "conventional" features; that is, between the features of a person's enunciation which are largely a physiologically-determined background characteristic and those features which he shares as communicative conventions with others in his speech community. There is no question, however, of a difference in substance between these classes of features. The same parameters can be postulated as operating in the same ways to produce both the personal and the conventional: a feature within the personal complex of one speaker may exist, identical in articulation and acoustic effect, only within the conventional complex of other speakers. The distinction thus lies in the individual's controllable use of features from the one complex (the conventional) against the background of the other complex (the personal, which - in common with other linguists - we term the "voice set").2 1 See G. L. Trager, "Paralanguage: A First Approximation", Studies in Linguistics, 13 (1958). 1-12. The term "paralinguistics" was "suggested by A. A. Hill" (Trager, p. 4), but given its main currency in this paper by Trager. Our own use of the term is rather narrower than Trager's or Hill's (Introduction to Linguistic Structures, New York, 1958, pp. 408f.), as will be seen below on p. 12 and elsewhere. 2 See D. Crystal, "A Perspective for Paralanguage": "the biologically-determined permanent background characteristic in a person's speech", Le Maitre

INTRODUCTION

11

Where a given isolatable feature (for instance, an intonation contour completed within a high pitch range) does not occur as a norm-characteristic of a given individual's voice set, it is crucial to the hypothesis that its occurrence has a conventional, non-random relation to context (for instance, correlating with "surprise" or "excitement"). That is, the complex constituting voice set is an individual's norm, but departures from that norm are not individual but part of the linguistic conventions of the speech community. It is because the features not constituting an individual's voice set have this potentiality for conventionalised, patterned correlation with such other aspects of the utterance as grammar that we wish to bring them (but not voice set) within the purview of the Survey of English Usage.3 In a speech situation, then, we may assume that we can (and do) rapidly "tune in" so as to recognise (and henceforth relatively discount)4 the participants' norms of voice quality, their relatively invariable complexes of voice set. Whatever the signalling capacity any component of a voice set may have outside that voice set, this is neutralised for its owner, who may therefore have to exaggerate the component or substitute another in the given conventional function. At a time, however, when little work has been done on the conventional contextual correlations of any such components, there is little point in speculating on the substitutions resorted to by individuals. It is sufficient for the present purpose merely to say that the substitutions enforced by voice-set characteristics mean Phonétique, 78 (1963). The phenomenon in question is also discussed in B. Siertsema, "Timbre, Pitch and Intonation", Lingua, 11 (1962), 388-98. 3 Though voice set has no place in the present study, this is not to agree with those linguists who would place the description of voice set entirely outside linguistics. It should not be thought that "personal" and "conventional" are entirely mutually exclusive; thus there is some evidence that at the "personal" pole there can be some degree of conventionality - as, for instance, in the deliberate cultivation or eradication of a particular voice set. This is a field much in need of close investigation and one likely to yield much of linguistic interest, as P. L. Garvin has already shown (see below, p. 30). 1 Cases seem to be rare when it is difficult to make the judgment-adjustment quickly; for as long as the first hour of acquaintance, a man with a laryngeal or other throat affection may repeatedly seem to be offering hoarse confidences.

12

INTRODUCTION

that components recognised as having constant conventional value (as distinct from voice-set property) must necessarily be variable in substance: a pitch range which strikes the hearer as "high" will be higher in absolute terms if the speaker's voice set incorporates a high pitch feature. Despite variability in the substance by which they are manifested, then, the conventional use of prosodic and paralinguistic features carries with it (as condition or corollary, depending on the point of view) the necessity that we can - and habitually do - distinguish common characteristics linking some manifestations in contrast with others, and that systemic relationships can be recognised as operating between and within the features and classes of features which are here distinguished as prosodic and paralinguistic in the English speech community. We are using the expressions "prosodic" and "paralinguistic" to denote a scale which has at its "most prosodic" end systems of features (for example, intonation contours) which can fairly easily be integrated with other aspects of linguistic structure, while at the "most paralinguistic" end there are the features most obviously remote from the possibility of integration with the linguistic structure proper (tremulous voice or clicks of annoyance, for example). Since, therefore, both expressions have this "more or less" character, there is no question of a sharp division between the two, and it would be prejudging the results of future careful research to make a clear-cut list of features undoubtedly playing a role in linguistic patterning and another list of features undoubtedly "beyond" the limits of describable linguistic structure. The system of "tension" is one which, as we shall see (Ch. 4, pp. 48 f.), is equivocal as between prosody and paralanguage. The system of pause is similarly equivocal, since its voiced exponents in particular range from a reasonably patterned distribution (for example, schwa at points of lexical selection) to a relatively random distribution of variously formed vocalisations as non-linguistic as a cough. The unequivocally labelled chapters must not, therefore, be understood as representing a fundamental decision as to the assignment of features to paralinguistic or prosodic status. They reflect

INTRODUCTION

13

rather the convenience of presentation in the form we have given to the present study, 5 which is organized to draw attention to areas in which we particularly feel the need for fresh thinking.

6

The summary of classes, features and relationships given in Table 5 on pp. 66fF. somewhat better represents our current thinking on the gradient between the two extremes, though even here decisions enforced by a two-dimensional presentation cause some distortion.

2

D E V E L O P M E N T S I N P A R A L I N G U I S T I C STUDY

While there has been controversy enough about the nature and relationship of pitch contours, prominence, tempo and rhythm, 1 much more has been agreed (and indeed known) about these features than about those which have been labelled "paralanguage". Before setting forth our own observations in this area, therefore, it would be well to summarise with some discussion the chief recent contributions which form a background and starting-point for our own work. Many scholars have given incidental mention to the study of paralinguistic features (though grouping them variously and using various terminologies). During the first half of this century, phoneticians such as Sweet2 and Pike,3 and physiologists such as G. O. Russell4 hinted at the complexity of the analysis of those "nonspeech sounds" and "socially significant gradations ... which affect the meaning of utterances but are not organised into a rigidly limited set of contrastive units". 5 There was little attempt made to systematise and classify these observations, however,6 and the 1

See for example the discussion in K. L. Pike, Intonation of American English (Ann Arbor, 1945); D. L. Bolinger, "Intonation: Levels versus Configurations", Word, 7 (1951), 199-210; "A Theory of Pitch Accent in English", Word, 14 (1958), 111-149; A. E. Sharp, "The Analysis of Stress and Juncture in English", TPS, 1960, 104-135; G. F. Arnold, Stress in English Words (Haarlem, 1957); D. Abercrombie, "Syllable Quantity and Enclitics in English", In Honour of Daniel Jones (London, 1964). 2 A Primer of Phonetics (Oxford, 1906). 3 Phonetics (Ann Arbor, 1943). 4 Speech and Voice (New York, 1931). 6 Pike, Intonation of American English, pp. 99ff. 6 Cf. Pike, Intonation, where the aim was only "a convenient means of rough identification of the voice qualities, not as an adequate analysis of their productive mechanisms", for which "considerable instrumental study was still necessary" (p. 99).

DEVELOPMENTS IN PARALINGUISTIC STUDY

15

definitions and descriptions which occur are usually incomplete, inaccurate and ambiguous, with no specifically linguistic orientation being consistently maintained. Sweet,7 having distinguished "expiration", "inspiration" and "clicks", goes on to mention five kinds of "throat sound", though only giving these a minimum of articulatory definition: breath (with an open "passive" glottis), voice (divided into "chest" and "head" types), whisper (the "intermediate between breath and voice"), glottal stop, and wheeze ("stage whisper"). He later gives a list of five "voice-qualities", clearness, dullness, nasality, wheeziness, gutturality, which may "characterise the speech of whole communities" as well as individuals; but there is little attempt made to relate these labels to each other or to provide much additional phonetic information. The main focus of phonetic interest was elsewhere. Further terms, such as "sepulchral" and "muffled", are given as tonal impressions only. Sapir8 is quite definite in his exclusion of paralanguage from linguistic study: "All that part of speech which falls out of the rigid articulatory framework" (of language) "is not speech in idea, but is merely a superadded, more or less instinctively determined vocal complication inseparable from speech in practice. All the individual colour of speech - personal emphasis, speed, personal cadence, personal pitch - is a non-linguistic fact, just as the incidental expression of desire and emotion are, for the most part, alien to linguistic expression." We would agree that voice set might well be excluded from linguistic study; but, if Sapir is here seeking to exclude also those prosodic and paralinguistic features which we distinguish as conventional, we would of course strongly dissent. A. T. Weaver,9 developing suggestions of J. Rush, lists eight vocal qualities, and suggests other characteristics of speech. The qualities are aspirate (which he glosses as "breathy"), guttural ("throaty"), pectoral ("deep and hollow"), nasal, oral ("mouthy"), '

Op. dt., pp. 7ff. Language (New York, 1921), p. 47. • Speech Forms and Principles (New York, 1942), Ch. 10. 8

16

DEVELOPMENTS IN PARALINGUISTIC STUDY

falsetto, normal ("the right tone for all ordinary informal situations"), and orotund ("with a round mouth"). The whole section is orientated towards the emotional states and rhetorical nuances accompanying paralanguage, with little attention to either physiology or systematic presentation. These deficiencies also mar his discussion of force (described as effusive, expulsive, and explosive), time (including pause, rhythm, and "singsong"), and pitch (including "monopitch", step up and down, and slide). He recognises the similarity to musical phenomena in some of his categorisations, but as with the qualities listed above, there is no objective definition and no attempt to distinguish personal from conventional effects or to verify any of these features by relating them to a corpus of spoken material. Pike10 attempts to classify all articulatable sounds produced in the vocal tract, and to make good the "failure to develop classifications and terminology for nonspeech sounds in an inclusive system with speech sounds and marginal sounds". He suggests a procedure of investigation which we use in this paper with certain modifications: 11 "auditory analysis with description in terms of articulatory movements supplemented by a few acoustic criteria" (p. 31). Imitation labels are seen as "convenient tags" in discussing, for example, whispered speech, vibrato, falsetto, vocal trill and murmur. Though he urges that phoneticians should begin to study the "nonphonemic factors ... which have met with little or no discussion" except among those concerned with speech-and-drama training, Pike's immediate concern is to classify only features carried by the syllable, and though at times his observations may be extended by implication to features over longer stretches, he does not formalise any procedure for systematising observations over such longer stretches. His survey of nonspeech sounds (pp. 32ff.) lists a number of vocal effects, many of which play some part in the formation of paralinguistic sounds in the manner suggested below,12 but his criteria for differentiation are not explicitly stated. 10 11 la

Phonetics, pp. 1 Iff. Cf. pp. 32ff. below. Cf. p. 37.

DEVELOPMENTS IN PARALINGUISTIC STUDY

17

He is concerned with "speech sounds" merely as noises, and disregards their function deliberately. Thus one finds listed together the "interjectional or inarticulate utterances" of Bell's Visible Speech, with no comment made as to their obvious functional differences: "sighing, panting, fluttering, shuddering, sobbing; the sneer, yawn, gasp, hiccough, pang, moan; the murmur of ridicule, vexation, disgust; and so on" (p. 33). In The Intonation of American English, published soon after Phonetics, Pike adds a little more perspective in talking about voice qualities, but since his aim is only "rough identification", he makes little attempt at clear or complete definition or systematic exemplification. His definition of voice qualities as nonphonemic gradations is given above, p. 14. The polarities of certain parameters are named (tense/relaxed vocal cords, small/large throat opening, normal/falsetto utterance, whispered/aloud speech) though little additional information or description is given. Pike briefly mentions, without further description and discussion, breathiness, huskiness, "song", "clear" voice, strong and weak articulation, wide and narrow pitch intervals, and a number of other criteria (length of pause, rhythmicality, crescendo, loudness). Related physiological work, as might be expected, was concerned mainly with analysis of voice set, and G. Oscar Russell has13 an acoustico-physiological approach that produces many labels of interest for anyone who is trying to classify various vocal effects. There is much physiological detail and much interesting material based on acoustic experiments (relatively early and therefore necessarily approximate), but a linguistic orientation is lacking and there is too much reliance on personal impression in his labels for these to be of much help to linguists.14 H. M. Kaplan, more recently,15 shows awareness of this need for the accurate definition and description of physiological events; but his is a pathological approach, and he only deals with a selection of vocal effects with a 13

Op. cit. Cf. p. 191. Examples of such labels are: "dark,deep, barrel-like, hollow ... smooth, harmonious, pleasing, velvety..., p. 164. 16 Anatomy and Physiology of Speech (New York, 1960). 14

18

DEVELOPMENTS IN PARALINGUISTIC STUDY

view to understanding their abnormality in terms of voice set.16 However, his definitions of "stridency", "hoarseness" and "throatiness", for example, are valuable starting-points for further articulatory definition, though he limits them to voice set. T. Chiba and M. Kajiyama 17 provide detailed physiological statement to define two voice registers, "differing in the condition of the vocal cords and in the manner of vibration": chest register (divided into sharp voice - keen, ringing, energetic, powerfully penetrating; ordinary voice; and soft voice - dull, guttural, obscure); and head (falsetto) register, which is divided into two types. Whisper is also mentioned. They discuss pitch level and laryngeal, oral and pharyngeal cavities as determiners of voice set for male, female, youth and child. Although their treatment of these mechanisms is rather general, it might have provided a frame of reference for the treatment of paralanguage; in fact, however, paralanguage is not mentioned. Miss Siertsema, in a more theoretical paper on timbre,18 has recently distinguished voice timbre, "which can be compared to the particular sound of a musical instrument... the quality of the tone", from the pitch level of utterance, which "may be compared to the key in which a work of music is composed or played"; and both of these from the tune or speech melody, "the variation of pitches during the utterance" (pp. 388f.). Timbre for her, therefore, includes both voice set and certain voice qualifiers, which she concludes are entirely non-linguistic: "timbre ... is the universal, extra-linguistic factor par excellence in the musical stratum" (391); and she places it on the level of those kinesic phenomena which are unintended by the individual and which sometimes communicate in spite of him. Her position as to whether or not timbre is unintentional (i.e. part of voice set) is, however, left somewhat unclear, since further on in the same paper she allows the con16

It is noteworthy that much of the most interesting work in this field from a linguist's point of view concerns pathological states, whether the study is physiological or psychiatric; see below, pp. 20ff. Cf. also Johnson, Darley,,Spriestersbach, Diagnostic Methods in Speech Pathology (New York, 1963), especially pp. 133ff. 17 The Vowel, Its Nature and Structure (Tokyo, 1958; first published 1941). 18 See Ch. 1, note 2.

DEVELOPMENTS IN PARALINGUISTIC STUDY

19

tribution of timbre to "the expressiveness of the utterance". The absence of any definition of "expressiveness" means that there is no indication of how to resolve the ambiguity in her use of timbre. Her insistence on the universality of timbre would suggest, however, that she would be reluctant to include it within voice set, though the question of universality itself is left unanswered. At one point on p. 391, she says that timbre is "an often indispensable key to the interpretation" of an utterance and that "timbre habits may even differ per language"; at another, she holds that "it is a universal, extra-linguistic factor, we do not have to know the language to understand it". Among the few in Great Britain to have made a special study of the problems of paralanguage in any detail is J. C. Catford. His paper on phonation types is the most recent attempt to relate paralinguistic features to physiological movement and acoustic measurement. 19 As such it is to be warmly welcomed. "Phoneticians should be able to classify 'voice qualities' and other phonatory activities in as systematic a way as they classify supralaryngeal articulation" (§ 12). Catford begins this "preliminary survey" by classifying the basic laryngeal activities in speech in terms of stricture-type and location; vocal fold length, thickness and tension; upper larynx constriction; and vertical displacement of the larynx. Using "known" points of reference ("normal" voice, falsetto, two kinds of whisper), he makes a kinaesthetic-auditory exploration, confirming this in various degrees by laryngoscopy, air-flow data and spectrography. There is a detailed acousticophysiological description of breath, whisper, voice, creak and glottal stop, 20 followed by some examination of the combinations into which these usually enter, viz. breathy voice, whispery voice, whispery creak, voiced creak, whispery voiced creak. He gives a number of auditory judgments apart from the other defining data; and a useful theoretical statement which bears upon our own method (cf. p. 30): 18

"Phonation Types: The classification of some laryngeal components of speech production", In Honour of Daniel Jones (London, 1964). 20 For Catford's descriptions of certain of these, see below pp. 38ff.

20

DEVELOPMENTS IN PARALINGUISTIC STUDY

By paraphonological function we mean that the phonatory difference can be correlated directly (not via linguistic form) with contextual differences: an example is the difference between voice and whisper in English. This difference does not correlate with differences in linguistic form - but it does correlate with a contextual difference: voice is related to "normal" or "unmarked" context, whisper to what may be termed "conspiratorial" context. In both these types of function, the phonatory difference is contrastive in the linguistic sense. By non-phonological function we mean that the phonatory feature or difference is directly related to the situation - as a characteristic of the speaker as an individual, or of the language or dialect which the speaker is using: in this function, phonatory features may be indicative of the speaker's sex, age, health, social class, place of origin, etc. - but they are not contrastive in the linguistic sense.21 It will be noted that we do not follow Catford in restricting paralanguage to features directly correctable with context. In contrast with Europe, the United States has witnessed a very widespread interest in the subject of paralanguage over the last decade, and a number of American scholars, stimulated by an empirical approach to linguistics and the immediate requirements of modern psychiatry and sociology, have produced a veritable renaissance in study and a new perspective. For Trager and many others indebted to him, paralanguage is seen as part of the total communication situation, specifically as part of "metalinguistics". 22 Many of the investigators, however, are either non-linguists or are linguists who subordinate their professional interest in the special circumstances of publication in paralinguistics. 23 Approaches have been inconsistent in their utilisation of linguistic criteria and method, and the systematic and complete study which a 21

§ 20. The basic contrast here is with phonological function by which Catford means "that the phonatory difference can be correlated with a difference between grammatical or lexical forms". 22 "The full statement of the point-by-point and pattern-by-pattern relation between the language and any of the other cultural systems will contain all the 'meanings' of the linguistic forms, and will constitute the metalinguistics of that culture" (Trager, The Field of Linguistics, = Studies in Linguistics: Occasional Papers No. 1, 1949, p. 7). 23 Trager is the exception in attempting to give a coherent outline of the subject from a linguistic point of view; but his work is incomplete, and only a "first approximation". See Ch. 1, note 1.

DEVELOPMENTS IN PARALINGUISTIC STUDY

21

descriptive linguistic approach should involve has been waived for the specific and independent psychiatric aims (the establishment of personality traits in interview analysis) which have motivated such scholars as Hockett,24 Pittenger,25 Smith26 and McQuown. 27 Dissatisfaction with the results achieved by these scholars stems from certain aspects of their methodology and approach which may be summarised as follows : a) The degree of detail involved in the analytic procedure makes progress so slow as to preclude coverage of a sufficiently large corpus of spoken material to provide a reasonable statistical basis for descriptive statements. b) The narrow transcription, reflecting the refined analysis, is too complex typographically and too difficult to read and analyse by reason of the indiscriminate massing of relatively irrelevant data which obscure the basic patterns. When a description reaches such a degree of detail, it is open to some of the crucial objections that can be made to acoustic analysis using machines. c) There is no agreement as to the degree of delicacy to which the description of any given paralinguistic feature should be taken. This results in ambiguity and uncertainty as to the relative merits and importance of the different descriptions. d) There is little perceptible order in the presentation of the observed phenomena in terms of their linguistic significance, and insufficient exemplification to show the extent of systématisation in the material. e) The material used could not in any case provide a valid basis for linguistic statement, because it reflects only a small part of one register of an unusual kind of English, the doctor-patient relation24

C. F. Hockett, R. E. Pittenger, J. Danehy, "The First Five Minutes", A Sample of Microscopic Interview Analysis (Ithaca, 1960). 25 R. E. Pittenger, "Linguistic Analysis of Tone of Voice in Communication of Affect", Psychiatric Research Reports, 8 (1957), 41-54. 26 H. L. Smith's original paper, like Trager's, is specifically linguistic: The Communication Situation (Dept. of State Foreign Service Institute, 1953). A later paper, with Pittenger, has psychiatric orientation: "A Basis for Some Contributions of Linguistics to Psychiatry", Psychiatry, 20 (1957), 61-78. 27 N. McQuown, "Linguistic Transcription and Specification of Psychiatric Interview Material", Psychiatry, 20 (1957), 79-86.

22

DEVELOPMENTS IN PARALINGUISTIC STUDY

ship in a psychiatric context. Reliable deescriptiv statements of English need more than this ; they need to determine and describe the norm before concentrating on departures from the norm. f) There is much disagreement in method and terminology between the various authors, so that although in aggregate the corpora presented are quite considerable, they cannot in fact be considered as a corpus, nor can even the results (for the reasons given in (e)) be confidently related to each other. g) The terminology of description is insufficiently defined in terms of relatively objective data, acoustic or articulatory; one also regrets that the labels for the categories do not always achieve a consistent technical status. The "imitation-labels" used (such as "hostile") are often unfortunate in giving a delusive sharpness and precision of discrimination to impressions which are highly subjective and only weakly susceptible of corroboration. Moreover, they carry the fundamentally undesirable implication that the relationship between a given vocal qualifier and its context is in a one-to-one relation, when in fact it is in a one-to-many (or manyto-one) relation. Ascribing contextual meanings to the different vocal effects of paralanguage is the final stage of the study, after the raw material of the vocal effects constituting paralanguage have been catalogued, classified, and systematised, and after the interrelationships existing between the various systems and subsystems have been presented. 28 It may be of interest, in connection with the above remarks, to 28 Some further criticism may be found in Crystal, "Perspective", pp. 27ff. (see Ch. 1, note 2). Our criticisms, of course, apply chiefly to the practice rather than to the theory of these approaches, and then only to individuals at individual points. Many of the scholars, e.g. Smith, McQuown, Hockett, are well aware of the one-to-many nature of vocal qualifiers, though they use imitation-labels with inadequate definition nonetheless. We fully acknowledge the importance of the work done by these and the other scholars discussed below in developing the subject and in providing much of the stimulus for the present study. Above all, we would not wish to suggest that over-readiness to ascribe meanings to vocal qualifiers is a peculiar characteristic of recent American work; thus, for example, in Literary Style and Music (Thinker's Library edition 1950, pp. 49-57), Herbert Spencer distinguishes loudness, quality, or timbre, pitch, intervals, and rate of variation, with the sole view of attempting to correlate these variables with emotional and other characteristics. It is, indeed,

DEVELOPMENTS IN PARALINGUISTIC STUDY

23

give a brief description of the main points made and terminology used by these linguists in the articles written since 1953. The Communication Situation29 is the outline of a lecture which starts from the viewpoint of symbolics and communication systems in general. The communication systems which man uses are three: language, kinesics (gestures and motions) and vocalisations ("many of the events referred to under the catch-all term 'tone of voice'"). They are "systematised events which can be analysed, by and large, in contrasting pairs and which serve to help in the communication of 'overall' conditions of the individual, such as joy, sorrow; anger, pleasure; agreement, disagreement; alarm, reassurance; interest, disinterest; and the like". There are three main kinds: vocal differentiators ("the two most striking examples here are laughing and crying used as communication"), vocal identifiers (e.g. "'uh huh' as against 'uh-uh' "), and vocal qualifiers, which are explained in more detail. "The system seems to be composed of pairs of contrasting events, each component identifiable in terms of a single acoustic impression, which in turn is the result of a combination of readily determinable physiological occurrences."30 They may support or contradict the message being carried by the other two systems, language and kinesics: "It's not so much what he said, but how he said it." Their status is important: "It is essential to remember that the vocal qualifiers do not have any inherent meaning in the usual sense of that word. In the total context of communication they have a contributory function, but they must be analysed only on the level of differential meaning. In other words, each vocal qualifier is different from each other one and must be

endemic (and perfectly natural) in most approaches to the problem, from the most traditional to the most modern. 20 H. L. Smith, op. cit. 30 Apart from the rather sanguine use of "readily" and the repeated analogy to the discreteness of segmental phoneme oppositions, such statements, and many of those which follow, show that this group of scholars is certainly aware of the problems involved and the methods available for obtaining reliable data. This makes their failure to act upon these ideas in the practical analysis even more surprising.

24

DEVELOPMENTS IN PARALINGUISTIC STUDY

so identified, but the same vocal qualifier might contribute to quite different total meanings" (p. 3).31 "The following elements of the system have so far been identified ... overloudness, oversoftness, overhigh pitch, overlow pitch, overfast tempo, overslow tempo, rasp, openness, drawling, clipping, singing, tonelessness, breaking, whispering." But the absence of acoustic data and only vague articulatory statement make one doubt the utility of these labels; e.g. rasp and openness "have to do physiologically with the amount of muscular tension under which the laryngeal apparatus is held". And the short descriptions of immediate contexts do not take us very far: openness "is associated with the 'tone of voice' of clergymen, politicians, and undertakers". The remainder of the paper then gives separate treatment to intonation patterns as a subsection of morphemics. R. P. Stockwell and J. D. Bowen apply a similar set of labels to Spanish in their article "Spanish Juncture and Intonation" (Language, 32, 1956, 290ff.). In the section headed "Vocalisations", there is a clear presentation, without articulatory or acoustic detail, of Smith's categories, together with brief but excellent Spanish illustrations in a full Trager-Smith prosodic transcription. Pittenger and Smith32 set out to apply the as yet imperfectly formulated theory of paralanguage to "the understanding of human behavior in the area of communication" (specifically in psychiatric work) using the technical advances in electronic sound recording and photography on "an enormous volume of observable data" (which is never presented). Language, "an arbitrary system of vocal symbols by which human beings, as members of a social group and participants in a given culture, interact and communicate ... does not involve all the possible sounds and sound qualities involved in communication" (p. 63). They then develop Smith's original position: "Intonation patterns ... plus the words, the vocal qualifiers, and the kinesics, taken together furnish the totality to which meaning can be assigned... It is the totality of the interrelation of the various components of language and of the other com81 32

Cf. note 30 above. Op. cit., p. 61.

DEVELOPMENTS IN PARALINGUISTIC STUDY

25

munication systems which is the basis for referential meaning... The other vocal phenomena which accompany language can be systematically analysed as qualities and noises separable from language itself" (p. 71). They realise the necessity of a norm, but they say little about it or its relation to the features distinguished from the norm. "In general, any spoken communication will be established on a level or base line of (1) intensity, (2) pitch-range over-all, (3) pitch intervals between the four pitch levels of intonation patterns, (4) degree of tension or laxness of vocal organs, (5) tempo for the uttering of the multiple sound elements within single words, and (6) tempo for the sequential march of words within the utterance" (p. 72). They make interesting suggestions about the co-occurrence of paralinguistic features, though without exemplification: "combinations of certain vocal qualifiers with each other and with certain intonation patterns recur with great frequency". The authors then go on to discuss "some of the numerous contextual uses" of the qualifiers (though this is "not to be considered as giving an exhaustive inventory of the 'meanings' of the events"), and modify their earlier definition of the qualifiers as "polar pairs": "There is some evidence in hand already that the pairs of vocal qualifiers may actually be the positive and negative intensities, to at least three degrees, of phenomena that might be covered by a single term". The variables (of pitch, prominence, etc.) are listed but not defined, being inadequately characterised by contextdescriptions such as "Raised pitch is quite frequently used when adults talk to infants". Breaking (a "vocal differentiator") is, however, given a fuller description in terms of vocal cord tension (p. 74). Finally, there is mention of voice quality (the voice as anxious, hostile, etc.) and voice set (thin, immature, aged, dispirited voice, etc.), but there is insufficient discussion to resolve what seems to be a confusion of criteria in the classification of effects under those heads. N. McQuown33 analyses one hour of spoken material (the first half hour in detail) for phonetics (modifying Pike), phonemics (ditto), "vocal modifiers" (based on Smith, 1953), and morphology 33 Op. cit., pp. 19«.

26

DEVELOPMENTS IN PARALINGUISTIC STUDY

(using Nida). His aim is verbal profile of patient and analyst in a psychiatric situation, for which purpose he admits to "abandoning my role as linguist". He introduces the useful methodological principles of total accountability, replicability and verifiability,34 and admits that his own fragmentary presentation of material is unfortunate. He extends the Pittenger-Smith symbology and categories (e.g. adding a phonetic transcription which is very narrow and difficult to read), but ignores norms, concentrating on the departures from the norm of analyst and patient which "may be person-defining".35 The short example of transcription he gives makes use of imitation-labels, and again there is a minimum of definition, e.g. "warm", "comforting", "cheery", "inconclusive" intonation patterns; "rasp" is interpreted in one instance as "sympathetic"; "stutter" is "anxious"; "tonelessness" is "emotionless", etc., with no additional description given (pp. 82-3). McQuown indeed is worried about these labels, particularly in relation to "content analysis", and he realises the desirability of having definitions based on a substantial amount of physical data, but his paper does not develop the point. We are left with the statement that the different types of what he calls vocal modifiers are "easily distinguishable one from another", there being "very little room for disagreement".36 Pittenger37 analyses a section of interview of a few minutes' duration, discussing the systems of kinesics, language, and tone of voice (which includes voice set, voice quality, and vocalisations). He makes an interesting but vague division between voice set and voice quality, which is "the individual's own perception of the things that are stated in voice set". Thus he opposes as separate phenomena 31

Pp. 79-80; cf. below, p. 32. Our own procedure, though similar to McQuown's, differs from his in various respects, for example in the amount allowed in under the heading of "total accountability". 35 Cf. E. Sapir, Language, 1921, pp. 43 fF.; "Speech as a Personality Trait", Am. J. Sociol., 32 (1927), 892-905. 38 While it is true that his vocal modifiers are easy enough for the native to recognise, it is sanguine to say that there is little room for disagreement. In any case, such a statement does not contribute much to the very considerable problem of description. 37 Op. cit., pp. 41fF.

DEVELOPMENTS IN PARALINGUISTIC STUDY

27

the following: body build (set)/body image (quality); state of health/health image; age/status; sex/gender; human rhythm phase (such as sleeping, waking)/human rhythm image; toxic states/ toxic status; location/locale, etc. There is no further discussion about the meaning of this set/quality division, nor of the word "image" in the definition of voice quality as "the image within the cultural or interaction setting". The following qualities are included, though they are not defined in any precise way: "tempo, rhythm, sloppy articulation, precise articulation, breathlessness, overvoicing, register range, intensity range, rasp and openness" (p. 45). He lists laughing, crying, and breaking as vocalisations, and gives a certain amount of phonetic definition to the vocal segregates (Smith's vocal identifiers). The only features he allows to be called vocal qualifiers are loudness, softness, overhigh and overlow pitch, clipping and drawl, with three degrees from the norm distinguished in each case. G. L. Trager again starts at the most general level with a definition of language and of his assumptions that language, "the principal mode of communication for human beings ... is always accompanied by other communication systems, that all culture is an interacting set of communications, and that communication as such results from and is a composite of all the specific communication systems as they occur in the total cultural complex".38 Speech results from activities which create a background of voice set ("the idiosyncratic, including the specific physiology of the speakers, and the total physical setting"). This is in the area of "prelinguistics". "Against this background are measured three kinds of events employing the vocal apparatus": language; vocalisations ("variegated other noises, not having the structure of language"); and voice qualities ("modifications of all the language and other noises"), which constitute paralanguage and which function in systematic association with language. Paralanguage belongs to "metalinguistics". Trager's list of voice qualities as actual recognisable speech events is not simply a list of paired attributes; rather, "the pairs of terms are more properly descriptive of extremes between 38 Op. cit., p. 3.

28

DEVELOPMENTS IN PARALINGUISTIC STUDY

which there are continua or several intermittent degrees" (p. 5). He admits a gradation in the following features: pitch range (spread upward or downward, and narrowed from above or below), vocal lip control (heavy or plain rasp and slight and full openness), glottis control (overvoicing, undervoicing, slight and heavy breathiness), pitch control {sharp and smooth transition), articulation control (forceful and relaxed), rhythm control (smooth and jerky), resonance (resonant and thin), and tempo (increased and decreased). He then, however, links voice qualities to voice set as "overall or background characteristics of the voice", contrasting these with the vocalisations, "actual specifically identifiable noises (sounds) or aspects of noises". The vocalisations are divided into three groups: vocal characterises which we "talk through" (laughing, crying, snickering, giggling, whimpering and sobbing; yelling, moaning, groaning, whining, breaking, and belching); vocal qualifiers of intensity, pitch height and extent, each having three degrees on each side of a norm; and vocal segregates, where he adds hesitation features, and snorts and sniffs to the list Pittenger gives. These last are given a detailed "multi-dimensional" classification in terms of articulating organs or areas, making this paper rank with the work next to be considered as the main study giving due prominence to the articulatory mechanisms in the description of paralinguistic effects. The First Five Minutes39 involves a narrower transcription and a more detailed description of certain features than anything done previously, but though some new effects are presented, others are ignored because they did not happen to be present in the material. Silence is measured in tenths of a second, but the potentiality of measurement is ignored for other features. Again the aim is psychiatric, with techniques drawn from the disciplines of psychiatry and anthropological linguistics, to "determine the complex correlations, not necessarily between signals and states, but rather between the linguist's specification of the signals and the psychiatrist's specification of the states". There are reasonably full articulatory definitions, for example of "throaty sigh", drawling/ 39

Pittenger, Hockett and Danehy, op. cit.

DEVELOPMENTS IN PARALINGUISTIC STUDY

29

clipping, breathiness; but precise correlates for degrees of these phenomena are still missing: "very short, 'average' duration, longer than average" are the points noted as variables for "throaty sigh". Other items listed include glottal closure, degree of mouth opening, inhalation and exhalation, a nasalised voiceless "gasp", rounded lips, pharyngeal constriction, spirantisation, apico-dental or -alveolar closure and other narrowly defined phonetic features. At a more general level, they list register effects, loudness effects, tempo variation, "squeeze", "sloppy" or "slurred" articulation; openness, rasp and glissando pitch control, and "breaking". There is no attempt made to erect a system classifying all these phenomena in terms of their interrelationships or their linguistic importance; they are rather mentioned as they occur in the material. The authors are aware of the difficulties: "the lack of technical terms to describe the 'manner of delivery' accurately ... and the technical or literary vocabulary to describe the attitudes that are being expressed". Finally, they suggest a breakdown of the conditions under which a speaker may utter a passage unusually fast, giving six possible meanings to the tempo change; but this is not applied to any other part of the material. In a recent book which shows acquaintance with The First Five Minutes but with little other work in which linguists have had a hand, Peter F. Ostwald40 sets up seven "gradients" to account for paralinguistic phenomena. They are rhythmicity (for "those characteristics of sound defined along the gradient" from "rhythmic" to "irregular"); intensity (from "loud" to "soft"); pitch ("high" - "low"); tone ("tonal" - "noisy"); speed ("fast" - "slow"); shape ("impulsive" - "reverberant"); and orderliness ("compact" - "expanded"). There is little evidence that these categories could satisfactorily accommodate speech-data or that the author has adequately considered their relationship to each other or indeed to a linguistic description. There has been a growing interest recently in trying to determine the speech characteristics which form the basis of recognition, a 40

Soundmaking: 1963), pp. 24-30.

The Acoustic

Communication of Emotion (Springfield, 111.,

30

DEVELOPMENTS IN PARALINGUISTIC STUDY

problem given welcome attention by Sapir41; some of the more important contributions are given below.42 These studies are not strictly relevant to our present account, though they provide important information for the background against which paralanguage works, and suggest a procedure for componential discrimination which is in some ways similar to our own. A recent paper by Garvin and Ladefoged43 attempts to characterise the organic speaker-diagnostic variables with a view to machine recognition, and lists the main organic factors amenable to experimental study: fundamental voice frequency, vocal quality ("some factors such as hoarseness or breathiness of the voice can be correlated with the shape of the glottal pulse"), cavity size and nasality. They allow that many of these may be learned, but do not consider the possibility of utilising these variables in a linguistically contrastive sense, this falling outside their scope. Ultimately they hope that "classes of speakers" will be definable in these terms. One interest that this study would have in common with our own would be in considering what substitution an individual might make in expounding a particular prosodic or paralinguistic feature if in fact his voice set was one which made habitual use of a parameter normally a key characteristic in the articulation of the feature in question. Enough has been said in this Chapter, we hope, to show the generally fluid and uncertain state of paralinguistic studies, and to confirm the view expressed in the most recent general handbook of English linguistics available at the time of writing that "investiga41

"Speech as a Personality Trait", loc. cit. Firth, J. R., "Personality and Language in Society", Sociolog. Rev., 42 (1950), 37-52; Shearme, J. N., and Holmes, J. N., "An Experiment Concerning the Recognition of Voices", Lang, and Speech, 3 (1959), 121-131; Allport, G. W., and Cantril, H., "Judging Personality from Voice", J. Soc. Psychol., 5 (1934), 37-55; Taylor, H. C., "Social Agreements on Personality Traits as Judged from Speech", J. Soc. Psychol., 5 (1934), 244-8; Ladefoged, P., and Broadbent, D. E., "Information Conveyed by Vowels", J. Acoust. Soc. Amer., 29 (1957), 98-104; Peterson, G. E., "The Information Bearing Elements of Speech", J. Acoust. Soc. Amer., 21 (1952), 629-37; Ladefoged, P., The Nature of Vowel Quality (Laboratorio de Fonetica Experimental, Coimbra, 1962). 48 "Speaker Identification and Message Identification in Speech Recognition", Phonetica, 9 (1963), 193-9. 42

DEVELOPMENTS IN PARALINGUISTIC STUDY

31

tion into these phenomena of paralanguage is not yet well enough established for us to give anything like an approximate inventory". It is little wonder, then, that debate should continue as to whether "paralinguistic phenomena should be treated as parts of linguistic units ... or as linguistic units themselves".44

44

M. W. Bloomfield and L. Newmark, A Linguistic Introduction to the History of English (New York, 1963), p. 83. For an illuminating summary of recent work on prosodic features and paralanguage (with valuably annotated bibliographical references), see K. L. Pike, Language in Relation to a Unified Theory of the Structure of Human Behavior (Glendale, Cal., 1960), Chapter 13.

3

CATEGORIES O F P A R A L A N G U A G E Earlier definitions of different vocal qualifiers have been criticised for failing to provide sufficiently tested auditory impressions or articulatory and acoustic data of any precision and comprehensiveness. The tendency has been towards the use of imitation-labels based on the unsupported impressions of the individual author. McQuown is one of the few to attempt the exhaustiveness and consistency required for study of the subject, and his methodological principles are practical and somewhat similar to our own procedure.1 Such a procedure depends inherently upon auditory as distinct from instrumental analysis and thus approaches Pike's position from a different angle: "Auditory analysis is essential to phonetic study since the ear can register all those features of sound waves, and only those features, which are above the threshold of audibility and therefore available to any speech community, whereas analysis by instruments must always be checked against auditory reaction because it has no criterion apart from judgments of the ear to indicate what movements or features of sound waves are below the threshold of perception." 2 1

N. McQuown, op. cit. The principles are of total accountability ("everything on the tape must be categorised analytically and adequately rendered by the symbology"), replicability ("investigators can listen to the same tape, and, within the limits of human error, apply the same analytic categories (and their corresponding symbology) and come out with the same transcription, save for minor differences made inevitable by the elasticity of the analytic systems"), and verifiability ("Where there are differences in the transcription... investigators can refer to a particular symbol representing a particular sound-configuration on the tape and can . . . iron out their differences"). 2 Pike, Phonetics, p. 31; as a corollary, a further objection to instrumental analysis is that it too readily yields a mass of minute data, making it difficult for the linguist to discern pattern. One must add, however, that Pike elsewhere is of the opinion that "considerable instrumental study" is still necessary for work specifically in paralanguage (Intonation, p. 99). Cf. also Catford's work, referred to on p. 19 above.

CATEGORIES OF PARALANGUAGE

33

We have necessarily had to use a predominantly auditory technique, both from a negative point of view (the lack of relevant machines, and lack of time - in view of the large corpus - to submit all material to machine analysis) and a positive one (the relative ease of obtaining useful results, despite a few remaining genuine points of query and marginal ambiguities). Our attitude in this is supported by a number of linguists. Danes states : "Today, many linguists and phoneticians ... look more soberly and critically upon the possibilities of instrumental investigation, especially in regard to connected discourse."3 In his Phonetics, Pike naturally gives a great deal of articulatory detail in his treatment of "non-speech" and "marginal" sounds, and it is not his aim to attempt a phonemic study. It is unnecessary in our present study to place so much emphasis on detailed movements of the vocal apparatus, since our intention is at once more "phonemic" and less phonetic than his. The systématisation of the prosodie features which we recognise is naturally not as rigid as the system of English segmental phonemes, but the recognition of prosodie and paralinguistic features, principally utilising auditory analysis, has been based on a procedure which is similar to that used by linguists in determining the phoneme stock of a language. Only those qualifiers are recorded which are judged to be significant, namely those whose omission from an utterance would cause a linguistically-untrained native speaker of English to state that the utterance was "different" in meaning - though this by no means involves him in stating where the difference lies, or what meaning should be attributed to either utterance.4 On this basis it seems 3

F. Dane$, Intonace a véta ve spisovné ceStinë (Prague, 1957), p. 39. Cf. also W. Jassem, Intonation of Conversational English (Wroclaw, 1952), pp. 17-18: "Instrumental investigations are just as 'subjective' as direct observations..."; and M. Schubiger, English Intonation (Tübingen, 1958), p. 2. 4 Paralinguistic features are often reinforced, in the communication situation, by kinesic actions. The unavailability of these on magnetic tape means that ambiguities in interpretation are bound to arise. One example is in the use of creak. It is natural for a low-register voice to lapse easily into creak (cf. speaker SC on Survey Tape 5b. 51), and such occurrences would be regarded as voice set and ignored in analysis. But it is very difficult at times to distinguish such effects from genuine paralinguistic creak, such as might be found in the

34

CATEGORIES OF PARALANGUAGE

possible to erect various sub-systems which are not equivalent in the importance of the contrasts they carry. 5 It is also possible to see some systématisation in a range of vocal effects that are the result of certain co-occurring activities of the vocal organs, but which are difficult to describe because of their insusceptibility to examination and their relatively infrequent individual occurrence. These are the voice "qualities" and voice "qualifications" (see Tables 4 and 5); and it seems necessary to introduce additional definition here, as it is at such points that indeterminacy (and hence ambiguity) is most likely to exist. It might be said that, for this paper, as soon as auditory analysis failed to produce agreement as to the nature of a paralinguistic feature, we had recourse to the additional procedures of determining and presenting articulatory and acoustic data. More reliance is placed on articulatory description in relation to the qualities and qualifications ("q features") than at any other point in this study (though naturally it is essential to have available a point of reference in the discussion of all sounds).6 The most practical and satisfactory method of analysis was (a) to study the parts of the articulatory mechanism involved in producing an agreed imitation of a given feature; (b) to postulate the relation of the individual organs to the total acoustic effect; and (c) to compare "disparaging" tone of voice which is often accompanied by grimace or shrug or hand movement; e.g. I oh mq "you don't want one mq # ! d6 you # (m : low; q : creak) or the common: oh mq "/I don't know # mq # (m : low; q : creak) In cases of indeterminacy, the indication of creak has either been queried as unverifiable, or has been accepted or rejected on other grounds; for instance, where there is strong prominence and fast tempo, creak would be unlikely to occur. (For description of creak, see p. 39). 5 Cf. Quirk and Crystal, "On Scales of Contrast in Connected English Speech", to appear in the J. R. Firth Memorial Volume, for an experimental procedure in erecting such sub-systems. " Cf. Pike, Phonetics, p. 31: "Description based on movements of the vocal apparatus, even though supplemented by acoustic terms, is more convenient than description rendered entirely by means of auditory acoustic judgments, since the latter lacks sufficient points of reference which can be defined without the necessity of establishing them in relation to standards that can be duplicated only by imitation."

CATEGORIES OF PARALANGUAGE

35

the role of individual organs in producing other paralinguistic features. This resulted in the setting u p of the following table of parameters : TABLE 1

PARAMETER

ARTICULATORY EVENT

POLARITY "A"

NORM

POLARITY "Z"

I

Extent of horizontal glottal movement

wide

narrow: trill (slow ... fast) friction (intensive ... slight)

II

Volume of supraglottal cavities

large (open)

small (close)

III

Muscular tension of vocal organs

tense

lax

IV

Vocal cord vibration

present

absent

V

Force of air pressure

strong

weak

VI

Type of air pressure

in phase with syllable

out of phase (spasmodic)

Notes on Table 1: 1) All judgments relating to the above parameters are made on the minimal basis of the syllable; auditory analysis determines the length of a segment affected by any polarity. 2) The Articulatory Events (which present an interesting analogy to distinctive features) have no status of their own in paralanguage but only insofar as they are constituent parts of a particular "bundle feature", and it is the bundle feature which can be directly interpreted as paralinguistic; cf. Note 4, on tension. Thus, the fact that presence/absence of vocal cord vibration is of phonemic importance in English does not affect the syllable-orientated approach here, where Parameter IV is concerned with whether or not the vocal cords are vibrating throughout a syllable or more. This breakdown is not, of course, limited to q features only; all

36

CATEGORIES OF PARALANGUAGE

paralanguage (all speech, indeed) must involve some degree or other of all these parameters, buUin most descriptions a given parameter is ignored unless the linguist regards it as crucial in a phonological contrast. Similarly, the function of these parameters in the constitution of such prosodic features as prominence (which involves increased air pressure) and pitch (which involves larynx-raising) can also be regarded as irrelevant. 3) The Parameters I-VI are to be read independently; that is, there is no necessary compatibility or mutual dependence among the items listed under " A " or those listed under "Z", though the Descriptions given below show that such dependences do exist and are the basis of the analysis. Tension can occur separately, and will be found having its own scale in Tables 4 and 5 (pp. 64 and 68). The degree of muscular tension is assessed irrespective of other accompanying features, for example prominence, though there may well be a high statistical correlation between degrees of the one and the other. There is an analogy with fortisj lenis as articulatory judgments; cf. Pike, Phonetics, p. 128: "Fortis articulation entails strong, tense movements within the types of articulation already described" (i.e. trills, whisper, falsetto); "weak articulation is lenis". Falsetto and tension also display interdependence; as B. Malmberg has said, high or "head" register will be obtained depending on the tension of the thyro-arytenoid muscle (Phonetics, New York, 1963, p. 24); cf. also Chiba and Kajiyama, op. cit. Air pressure is generally egressive but can be ingressive on occasion. 4) The gradations within each parameter except IV form a cline,7 on which arbitrary divisions can be imposed, as in Table 1; alternatively, points along the more/less continuum may be determined by various techniques of quantification. 8 Quantification may be absolute or relative; but because of the limited physiological data available and the difficulty of obtaining measurable results,8 we have largely relied on re' For the recent use of "cline" in linguistics, see M. A. K. Halliday, "Categories of the Theory of Grammar", Word, 17 (1961), 248ff. See also D. L. Bolinger, Generality, Gradietice and the All-or-None (The Hague, 1961), especially pp. 37ff. 8 Cf. C. E. Osgood and others, The Measurement of Meaning (Urbana, 111., 1957), especially Ch. 3; E. Uldall, "Attitudinal Meanings Conveyed by Intonation Contours", Language and Speech, 3 (1960), 223-234. 8 Cf. also J. C. Catford, op. cit., § 10: "There are several good reasons for the relative neglect of phonation types. For the phoneticians these include (i) the difficulty of observing laryngeal activity, particularly by traditional kinaesthetic-auditory techniques, (ii) the fact that phoneticians have always been primarily concerned with setting up descriptive categories for phonic features which are utilised phonologically in languages. Since few languages make phonological

CATEGORIES OF PARALANGUAGE

37

lative judgments of the second type. The parameter polarities are taken as theoretical limits, with a potentially measurable physical correlation, varying - as, of course, does the norm - with each speaker. In fact, however, a given paralinguistic feature seems to be articulated by different speakers with very large agreement as to phonatory mechanism, and it is this that makes general articulatory descriptions possible. Idiosyncratic forms of paralinguistic features of course occur, but these are linked with a more general idiosyncracy, i.e. in the voice set (cf. Ch. I note 4, and Garvin, loc. cit.). DESCRIPTIONS OF VOICE QUALITIES AND QUALIFICATIONS

Description of the features involves labelling, and while we could have used an entirely abstract notation ("feature 1", "feature 2", etc.), we preferred to choose labels which were as close as possible to the "normal" associations of non-linguists' usage. In addition, care was taken not to ignore or contradict past usages in linguistics. Agreement as to the type of effect involved was reached by repeating the sound (vocally or as it occurred on tape in the material) to a number of people, and asking them whether it suggested a label. This usually produced a high level of agreement. Where, however, there was a negative response, a number of labels were then suggested, and the informants were asked to choose. It is significant that in all cases the labels which were preferred for a given voice quality fell within a single collocational range; for example, what some called resonant others called booming and others again resounding. In fact, there were virtually no alternative labels offered for huskiness, creak, whisper, breathiness, and falsetto. The labels for the qualifications caused more disagreement (see below, p. 41), and this was no doubt partly because of the spasmodic articulation, which was difficult to imitate, and also difficult to exemplify clearly in the corpus.10

use of more than two or three types of phonation, no great delicacy of description or classification has seemed to be called for." 10 Cf. Pike, Phonetics, p. 129 : "Even if the phonetician can voluntarily simulate these sounds, they are apt to fall short of the genuine spasmodic production".

38

CATEGORIES OF PARALANGUAGE

Voice qualities11

Strong, smooth, egressive, intra-arytenoid pressure, with relatively narrow glottal opening and complete absence of voicing. Close or open supraglottal cavities; may be lax or tense; increased air pressure brings increasing tenseness, resulting in harsh "stage" whispers. 1) WHISPER:

2) BREATHINESS : Over-aspiration (excessive pressure being released as compared with normal articulation) particularly noticeable on vowels, and on those consonants where there is normally little aspiration. Longer than normal open phase in vocal cord vibration; usually very lax and open organs, with strong air pressure; normally egressive, but the effect of breathiness is always present when (as in some rapid speech) there is ingression. Catford's definitions and acoustic data are given here for comparison: "15.2 Whisper: glottis constricted (estimated area, from the smallest possible chink up to about 25 % of maximal glottal area). Critical rate of flow about 2.5 cl/sec, estimated critical velocity about 1900 cm/sec. Maximum rate of flow about 500 cl/sec. Turbulent flow, with projection of high-velocity jet into pharynx. Acoustic spectrum similar to breath but with considerably more concentration of acoustic energy into formant-like bands. Auditory effect: a relatively 'rich' hushing sound." "15.6... breathy voice: combination of breath + voice: glottis relatively wide open: turbulent airflow as for 'breath' 12 plus vibration of vocal folds. The vocal folds do not meet at the centre line: they simply 'flap in the breeze'. Auditory effect, 'sigh-like' mixture of breath and voice: one form of voiced [h]." 11

Spectrograms 1-5 in the Appendix compare normal voice with whisper breathiness, huskiness and creak. 12 Earlier in § 15, Catford has characterised "breath" as follows: "glottis widely open (estimated area of glottis about 60% to 95% of maximal glottal area). Critical rate of air-flow about 25 cl/sec., maximum about 890 cl/sec., estimated critical velocity about 240 cm/sec. Diffuse low-velocity turbulence..."

CATEGORIES OF PARALANGUAGE

39

3) HUSKINESS: A friction (often accompanied by the "bunching together" of the pharyngeal wall with the root of the tongue) which varies in harshness according to the intensity of air pressure.13 Stronger pressure than whisper; irregular voicing; may be tense or lax, though normally lax does not occur except as voice set (and then especially in women); supraglottal volume varies and on this depends the resonance of the harshness; normally with smooth exhalation.

Glottal trill with minimum breathflow; very narrow amplitude of opening, with glottis tense (supraglottals may be tense or lax); no resonance; may be given various vowel qualities by type of supraglottal opening; normally very low, though the trill can occur over a range of pitch; the voice easily lapses into creak at low pitch; in our description, creak is always voiced with smooth exhalation.14 A reasonable analogy is the sound made by a stick drawn along a picket fence (cf. Pike, Phonetics, p. 126), the taps in creak being of sufficiently low frequency to be separately distinguishable, as is clearly indicated on the spectrogram given on p. 81. A high pitched trill (rare) is thin and weak in effect, but it is noteworthy that the taps of the creak trill do not sum to increase in freqnency with the raising of pitch.

4 ) CREAK:

Catford's description is as follows: "15.4 Creak: low frequency (down to about 40 cps) periodic vibration of a small section of the vocal folds. Mean rates of flow very low - of the order of 1.25 to 2 cl/sec. The precise physiological mechanism of creak is unknown, but only a very small section of 13 Cf. Kaplan's definition of hoarseness (p. 168): "a rough, harsh quality of voice, and the pitch is relatively low". In his paper, "Observations on the Physiology of Hoarseness", P. Moore states: "When there is a breathy voice there is no closed phase in the accompanying action of the larynx;... hoarseness is present when there are irregularly variable consecutive vibrations of the vocal cords" (Proceedings of the Fourth International Congress of Phonetic Sciences, The Hague, 1962, p. 95). 14 "Laryngealization may conveniently be said to be trillization with superimposed voice" (Pike, Phonetics, p. 127).

CATEGORIES OF PARALANGUAGE

40

the ligamental glottis, near the thyroid end, is involved. The auditory effect is of a rapid series of taps, like a stick being run along a railing." He then distinguishes voiced creak ("simultaneous voice + creak: a common type of voice in low toned parts of utterances in RP"), whispery creak (merely described as "whisper + creak"), and whispery voiced creak ("one form of 'beery' or 'whisky' voice"). When this is not in voice set, it is usually restricted to men moving outside of their register (often with increased prominence and tempo). Pinched, narrow vocal cords, organs generally tense; weak air pressure causing a thin voice, which may have a quavery character (cf. tremulousness); exhalation may be smooth or spasmodic; most noticeable on vowels, and in the reduced loudness on certain consonants. Pike discusses this qualifier at some length, and part of his description is quoted here for comparison: "A certain type of vibration of the vocal cords is known as falsetto (i.e. false voice). In this formation a certain 'set' is given to the glottis which may be carried through other sound types - not just the vibratory trill. This characteristic is not subject to auditory articulation analysis, but certainly includes some type of reduced aperture and consequent diminished air stream; when sibilants are uttered with the vocal cords in position for falsetto, they are reduced in audibility just as they are when trillized. Passing from voice to false voice, and vice versa, provides the basis for yodeling. Most women seem incapable of using false voice..." (Phonetics, p. 128). Cf. Kaplan, op. cit., pp. 131 f.; Chiba and Kajiyama, op. cit., pp. 6 and 27 ff. 5) FALSETTO:

6) RESONANCE: Wide glottal opening with high amplitude of vocal cord vibration; larger than normal volume in supraglottal cavities; larynx usually low and mouth wide; smooth, strong egressive air pressure results in a resounding booming effect (cf. Smith's description as the voice of the "clergyman, politician, and undertaker") 15 which may involve laxness and tremolo (in the musical 16

H. L. Smith, The Communication Situation, loc. cit.

CATEGORIES OF PARALANGUAGE

41

sense); a harsher variety (if tense) might be classed as a "shout" (as opposed to a "yell"); the vowels and voiced continants are the most noticeable carriers of this qualifier. The increased air pressure, usually smooth, naturally brings increased prominence. Voice qualifications16 Normally laughter, crying and similar phenomena exist sequentially with language; one stops talking to produce them. But it is still possible to talk through such features, or (from the opposite point of view) to introduce such features into speech. As there are a large number of vocal effects which can be subsumed under the general headings "laughter" and "crying" (and an even greater number of imitation-labels), the problem arises as to where arbitrary divisions in the cline of spasmodic emotion-markers should be made. Sometimes it may be difficult to distinguish the extremes of happiness and sadness in their range of vocal effects (sometimes even including kinesic features), and recourse must be had to the context or situation in making a decision as to their nature. But though the polarities of emotion may be expounded similarly on some occasions, it is easy in the majority of cases to distinguish laughter from crying. That is to say, acoustically, there is normally no disagreement as to the nature of the phenomenon at these extremes, and they may justifiably be taken as polarities. A bigger problem is a listener's awareness of different kinds of laughter, crying, sobbing, etc. Thus, laughter may be perceived as (or suspected of) being half-hearted or even feigned. But since the articulatory action (on which our classification is made) seems to remain much the same, and since certainly the auditory impression is that the "kinds" of laughter have more in common than any single one has with any of the other main phenomena (crying, for instance), we have ignored any such "internal" distinctions. Naturally, however, a wholly "mock" instance (where there is, so to say, deliberate «»«-simulation, as in "boo-hoo" with normal syllabic articulation) is not classed with the qualification system at all. The qualification 18

Spectrograms 6 and 7 in the Appendix illustrate giggle and laugh.

42

CATEGORIES OF PARALANGUAGE

"tremulous" is usually easy to distinguish, and its articulatory mechanism is reasonably clear-cut. The remaining divisions, giggle, sob, are terms covering areas of vocal effect, with the usual characteristics of cline phenomena, the centres being clear enough, the margins vague. It is not possible to say when giggle ends and laugh begins, or when cry ends and sob begins, though doubtless it would be possible to examine a great quantity of data and obtain some measurements (of pulse speed, air pressure, prominence, for example) which would be of value in establishing more objective gradations. It is doubtful, however, whether the results would justify the time and ingenuity involved. All qualifications involve spasmodic air pressure, pulsating breath out of phase with the syllable (see Pike, Phonetics, p. 129). Articulatory correlatives can be studied by setting up parameters for degrees of pulsation type, pulsation speed, oral aspiration, nasal friction, air pressure, amplitude and frequency of vocal cord vibration, and volume and tension of supraglottal cavities. It is possible to make five broad divisions in the types of voice qualification which we would include in paralanguage.17 These divisions have been labelled on the basis of auditory agreement: laugh - giggle - tremulousness - sob - cry We may tabulate some of the more important articulatory components as follows:

17

Cf. Crystal, "Perspective", p. 26. It will be clear that we can make no use of a descriptive marker in terms of ingressive / egressive air flow and place of secondary release. There may be audible ingression at the end of a stretch of utterance, or in between the syllables of a stretch, but this is not marked in the transcription since it is merely a reflex feature of the egressive action in the immediate context. Similarly, one can class secondary release variation as irrelevant. In non-paralinguistic use, all the qualification sounds can be released simultaneously at the mouth upon glottal release or through the nasal cavity; but the latter channel alone is impossible accompanying language and hence is ignored for present purposes.

CATEGORIES OF PARALANGUAGE

43

TABLE 2

Parameters

laugh

giggle

tremulousness18

sob

cry

Pulsation type

periodic

periodic

aperiodic

aperiodic

periodic

Pulsation speed

slow

fast

fast

slow

slow

Oral aspiration

usually excessive

usually inaudible

usually inaudible

usually strong

usually excessive

Nasal friction

variable

variable

inaudible

usually audible

strongly audible

Air pressure19

usually very strong

usually strong

very weak

usually strong

usually very strong20

Vocal cord amplitude

usually very great

often great

usually small

often great

usually very great

Vocal cord vibration frequency

variable

variable

usually high

usually high

usually high

Volume of supraglottal cavities21

usually large

variable

variable

variable

usually large

Tension of supraglottal cavities

usually lax

usually tense

usually lax

usually tense

usually tense

18

One flap of tremulousness (which is not to be equated with the musical vibrato effect) is equivalent to a "catch" in the voice (i.e. one flap or brief roll of glottal trill). This effect is normally perceived on vowels; syllables are affected independently, but the total qualification is often perceived from a sequence of similarly affected syllables. Tremulousness is presumably the same feature as the "breaking" discussed by Smith, Hockett, and others; cf. Crystal, "Perspective", pp. 28-9. 18 There is no separate parameter for prominence, since for voice qualifications this is taken to be a function jointly of the force of the air pressure held at the throat before release and of the volume of the supraglottal cavities. 20 This is often accompanied by audible ingressive reflex. 21 This parameter is included hesitantly, since many shape-variations and degrees of volume can be found with all qualifications. There are certain correlations in the data which suggest corresponding volumes in the production of the qualifications concerned, but these have not been used in this table. Obviously, examples are infrequent in ordinary recorded material.

4

PROSODIC FEATURES

As distinct from the paralinguistic features, which seemed to call for some detailed treatment in terms of articulation and past study, we may deal somewhat more briefly with the prosodic features that we discriminate, pausing only on those categorisations which have been less commonly set up in the past. While tempo, prominence and pitch are permanent features of speech, and while there are individual profiles in terms of them in the voice set, variations are to a considerable extent patterned and conventional. Against the individual's norms in respect of these three features, replicability and agreed recognition make it seem significant to distinguish two contrast systems, "simple" and "complex". The simple system in each case provides for a contrast solely with the norm, while the complex system provides in addition for a contrast in terms of each feature's own variability. The systemic status of all these sets of terms is, as usual, validated by our ability to select a term from each without interdependence, but beyond awareness that some systems (and some terms from particular systems) are more commonly exploited than others, we have at present only begun work on rating their degrees of relevance.1 Since the norms vary from speaker to speaker, there is no question of an absolute measure for the steps away from the norm. It is conceivable that recognition could be made on the basis of an x x objectively measurable ratio (4x, 2x, for example, where x is 2 4

a speaker's norm), but the theoretical status of such a ratio is 1 See our paper, "On Scales of Contrast", loc. cit., and Professor Strang's remarks in Modern English Structure (London, 1962), esp. pp. 56ff.

PROSODIC FEATURES

45

doubtful since it seems likely that the ratio itself varies considerably from speaker to speaker. We feel more confident, therefore, with an assessment based on impression, supported as elsewhere in our analysis by agreed recognition and replicability. For tempo over polysyllabic stretches of utterance, the simple system has four marked terms (the unmarked term being of course the norm): allegrissimo, allegro, lento, lentissimo; the complex system has two marked terms: accelerando, rallentando. The simple tempo system carried by the syllable and syllable segments has only two marked terms, "clipped" and "drawled", corresponding roughly to allegro and lento respectively. There is no complex tempo system for the single syllable. For prominence, the simple system has again four marked terms: pianissimo, piano, forte, fortissimo. This is manifest over both polysyllabic stretches and at single syllables, except nuclear syllables which can carry only the two loud contrasts, forte (unmarked) and fortissimo. The complex system, operating over polysyllabic stretches and at the nucleus, has two marked terms: crescendo, diminuendo. For pitch range, the simple system has for polysyllabic stretches two marked terms, low and high. For individual syllables, the position is rather more complicated, partly because the range system has to be related to the prominence system, but chiefly because it has also to be related to the intonation system in order to allow us to deal separately with tone contours and the pitch range of such contours while dealing adequately with each. That is to say, we do not regard a "high fall" and a "low fall" as two contours, but as phenomena involving selection from two systems: each is a "fall" in the tone system, but with one there is selected from the range system a high starting point and with the other a low starting point. The importance and justification of this approach were demonstrated in our paper referred to on the previous page. Individual forte syllables, whether nuclear or not, carry pitch range contrasts in a system of six marked terms: low drop, drop, continuance, booster, high booster, extra-high booster. The norm or unmarked term is the perceptible drop in pitch level that ob-

46

PROSODIC FEATURES

servers have often noted as characteristic of the progress of pitch in English.2 Undoubtedly the commonest marked terms in the simple system of range affecting prominent syllables are the "drop", "booster", and "high booster", and we have found it linguistically significant to formalise their relation to the pitch level of neighbouring syllables as follows. A "drop" is a pitch level lower than that of the preceding syllable and lower than the expected (unmarked) intersyllable drop would make it. A "booster" is a pitch level higher than that of the preceding syllable. A "high booster" is a pitch level higher than that of the next previous pitch-prominent syllable (or segment opener). The special problems of subordinate intonation units (see pp. 52 if.) require the recognition of a further range system with three marked terms, low, high, and extra-high, operating at the subordinate nucleus and relating this directly to the superordinate nucleus with respect to range. This system is regarded as quite separate from the general range system, the contrasts of which operate throughout the utterance, regardless of subordination. 3 Finally, it should be noted that non-nuclear piano syllables seem to carry a system with only two marked terms, low and high. The complex system for range has three marked terms: monotone, narrow, and wide. These contrasts are carried in the polysyllabic segment, and also in the nuclear syllable with the exception that here the "monotone" term is disregarded since the relevant manifestation is taken to be the exponent of a required term ("level") in the intonation system. Having outlined three contrast classes which clearly have much in common as regards the operation of their contrast systems, we have reached a point where already we are making reference to the intonation system. Before exploring the latter, however, it will be convenient to deal briefly with the remaining contrast classes of which we find it necessary to take account: rhythmicality, tension, and pause. 2

Cf. R. Kingdom's "slowly descending series", Handbook of English Intonation (London, 1958), p. 3. 3 See below, pp. 53ff., for fuller description with examples.

47

PROSODIC FEATURES

Under rhythmicality, we group six characteristics which are clearly recognisable and promptly replicable. Their relationship to each other is not entirely certain but they seem best regarded as three polar pairs. The first polarity concerns one's awareness or otherwise of regular stress-timed pulses, and the terms are therefore simply "rhythmic" and "arhythmic". The second and third polarities concern modes of transition between arsis and thesis. The second involves transition with markedly replicable pitch variation, with extremes in a "spiky" movement on the one hand (with sharp or rapid variations in both pitch height and prominence) and in a "glissando" movement (with smooth, slurred variations) on the other. Both types of movement may be "upward" or "downward". That is, the arsis may be on a relatively low pitch level with a sharp or rapid rise ("spiky") or a smooth or slow rise ("glissando") to the thesis; or the arsis may be on relatively high pitch falling sharply or smoothly to the thesis: who would come on a night like this spiky glissando

w •S ^

«< . •

J

*

. . -N ' • J

. '

* *N

The third polarity concerns a similar opposition in modes of transition but with exponence in prominence variation rather than in pitch variation, the terms being "staccato" and "legato". Thus "staccato" produces an effect analogous to "spiky" by sharp contrasts between a heavily prominent arsis and a very light thesis, such that the two extremes may seem discrete; at the other pole, "legato" is analogous to "glissando" in the smooth transition between the extremes of prominence on arsis and thesis. Despite, however, these suggested analogies between the poles of the second and third types of contrast, and despite the fact also that it seems usual for either of the transition modes "spiky" and "glissando" to co-occur with the "rhythmic" pole of the first type of contrast, there is no absolute interdependence between terms in

48

PROSODIC FEATURES

any of the three types, nor of course between any two of the six characteristics. We therefore postulate the three pairs as marked polar terms in separate systems carried by the polysyllabic segment. Just as there is a tendency for co-occurrence between terms in the rhythmicality systems, so also there is some co-occurrence of substantial features between the third rhythmicality system and the next prosodic system to be discussed, tension. With the latter, however, there is no relevant connexion with stress-timed pulsation as there was with the former, and the "slur" of which we will now speak has nothing to do with the slurred transition from arsis to thesis but a more general intersyllabic slur (as, for example, between thesis syllables) which it would be impossible (or wholly irrelevant) to connect with rhythm. The auditory effect of tension is correlatable with departures from an individual's norm for connected speech along a scale of muscular tension obtaining in the articulatory organs (especially the supraglottals). This subsumes the tension manifested as a distinguishing feature in phonematic units. Four degrees of departure from the norm are distinguished and these are set up as the four marked terms of an independent tension system: "slurred", "lax", "tense", "precise." "Independent", it must be repeated, refers only to the descriptive status of the tension system. It is a commonplace that specific degrees of tension are observable with the terms of other systems; we claim that in such cases they are concomitant and not identifying characteristics. For example, in the preceding chapter (p. 36), it was pointed out that the "falsetto" term in the voice-quality system was accompanied by a general tenseness of the vocal organs. It might similarly be pointed out that the rallentando term in the complex subsystem of tempo is, in our observation, more often than not accompanied by laxness. Quite apart from such partial or general co-occurrences, however, there is a vocal characteristic which is identified in terms of relative tension alone and it is this that the tension system is set up to accommodate. Degrees of tension have, no doubt, been the identifying characteristics that have prompted some linguists to categorise types of

PROSODIC FEATURES

49

articulation as "sloppy", "dogmatic", "mincing" and the like. Although independent investigators might agree in large measure in making identifications with such a framework, there are obvious dangers (and the principle is basically unsatisfactory). Not only can different degrees of tension evoke sharply different reactions and attitudes from person to person; the attitude-labelling can also be strongly affected by the general context of the utterance, and we can readily lose conscious control over the bases of the labelling. No one would deny that it should be possible to make general statements about the meaning (or, more probably, meanings)4 of a degree of tension, but such correlation cannot be both the end-product and a basis of primary classification. We should begin therefore by attempting to categorise the tension features (as consistently and objectively as possible) on an articulatory basis, relative to the individual, and only subsequently look for the correlations between postulated "sames" of tension and formal items in the linguistic and situational context which will enable us to make statements of meaning. Finally, we have features of pause, traditionally and in our view rightly regarded as having a patterned relation to the rest of linguistic behaviour. 5 Here, we would wish to bring together for parallel (though certainly not conflated) consideration hesitation phenomena of two kinds, the one with silence as its exponent, the other ("voiced pause") having as exponents any of a small number of vocalisations of which schwa may be taken as the type. Observation and replicability alike suggest that length of silent pause is its relevant gradient characteristic. We have no reason to believe, however, that absolute length is relevant, but rather that impressionistic relative length varies with the tempo norm of a given speaker and that the unit should not therefore be a particular number of microseconds but an interval (still of course measurable) related solely to an individual's tempo. It has seemed to us 4 The same degree of tension, for example, can certainly accompany sarcasm and grief in two utterances by the same speaker. 6 For some recent observations, see F. Goldman-Eisler in Language and Speech, 1 (1958), pp. 226-31, and elsewhere; R. Quirk, The Study of the Mother Tongue (London, 1961), pp. 18-21.

50

PROSODIC FEATURES

easiest to maintain consistency, and at the same time most linguistically significant, taking as the unit the interval of an individual's rhythm cycle from one arsis to the next. Two longer silent pauses are distinguished, the one being double the length of a unitpause, the other being three or more times the unit length. Shorter pauses than those of unit length appear to be replicably distinctive, and all such shorter pauses are classed together to constitute a fourth term. Our four terms in the silent pause system are thus "brief", "unit", "double" and "treble". Voiced pauses have of course a wide range of manifestations in substance, but these manifestations can be generalised as involving three main characteristics: neutral vowel, nasality (alveolar or bilabial), and - less commonly - a glottal continuant. With the first, it seems possible to equate "length", [a:], with the unit of silent pause, and the short form, [a], with the term "brief"; more tentatively, [an], [am], [Pa:] may be equated with the unit term and repetitions of such vocalisations with the double and treble terms as the case may be. But attempts at a one-for-one relation between voiced and silent pauses are hazardous at this stage. Intonation So far we have described the various systems of prosodic features in terms of the polysyllabic "segment" or of the nuclear or nonnuclear syllable, the tone-unit being of no direct relevance in determining their limits or constitution. We come now to the system which has the tone-unit, the most striking prosodic unit in English speech, as its actual matrix.6 For us, the tone-unit is a stretch of speech (of minimum length, one syllable) in which there is a climax of pitch prominence which takes the form of "nuclear" pitch movement or - in the case of level tones - pitch sustention. Such nuclei are of the following seven types: fall, rise, level, fall-rise, rise-fall, fall-plus-rise, and rise-plus-fall. The first three may be seen as forming a single group from the acoustic point of view; 6

Cf. the portion of utterance containing Pike's "Primary Contour", Intonation of American English, pp. 25ff.; M. A. K. Halliday's "tone group", Archivum Linguisticum, 15 (1963), 1-28, especially pp. 6ff.

PROSODIC FEATURES

51

they are the "simple" nuclei, where the pitch movement is unidirectional or nil; the nucleus is generally realised on a single syllable, though the pitch movement or sustention may be continued on one or more further syllables which constitute the "tail" of the tone-unit. The next two are the "complex" nuclei, the essential characteristic being a rapid change in the direction of the pitch movement on or immediately after the syllable bearing pitch prominence; the second part of each complex nucleus is less prominent than the first and is often less fully realised in range. The final pair are the "compound" or "correlative" nuclear types, each having two nuclear points with a characteristic evenness of pitch pattern between them: a low "trough" between the fall and rise in the one case, a smooth arc of climb and descent or a sustained peak between the rise and fall in the other. Little more need be said here on these nuclear points of toneunits, partly because the basic typology does not differ for the most part from that described in considerable detail in the standard works that have treated English intonation in terms of contours rather than phonemic levels,7 and partly because the principal features of our system have been presented and discussed with reference to particular problems elsewhere.8 In view of what has been said above, it will be realised that such terms as "high fall", "low rise", "narrow rise", or "sliding head" have no place in our intonation system, since variants like these are seen as involving parameters (pitch range, rhythmicality, as outlined above) which are best treated independently from the nuclear contours. 7 See, for example, J. D. O'Connor and G. F. Arnold, The Intonation of Colloquial English (London, 1961); W. Jassem, Intonation of Conversational English (Wroclaw, 1952); R. Kingdon, Groundwork of English Intonation (London, 1958); W. R. Lee, An English Intonation Reader (London, 1960); A. C. Gimson, An Introduction to the Pronunciation of English (London, 1962). Cf. also D. L. Bolinger, the articles in Word, 7 and 14, and the monograph Generality, Gradience and the All-or-None. For a recent systemic treatment, see M. A. K. Halliday in Archivum Linguisticum, 15, and Trans. Phil. Soc., 1963. On the "fallrise" pattern, see the important article by A. E. Sharp in Phonetica, 2 (1958), 127-152, which differs from our treatment in some respects. 8 See Proceedings of the IXth International Congress of Linguists (The Hague, 1964), pp. 484ff., and our more recent paper, "On Scales of Contrast".

52

PROSODIC FEATURES

In one important respect, however, our description of intonation requires amplification here. While much has been written on contours or tunes constituting intonation units of some kind, little detailed attention has been given9 to such units in relationship with each other, either in terms of sequence or embedding. Yet it is noteworthy that there are patterns formed by combinations of toneunits in sequence, and these patterns function at a more abstract level than other features of intonation (such as nucleus or booster) because the basis of the relationship is the tone unit as a whole and not some aspect of one. Indeed, even if such relationships did not force themselves on our attention through configurations in substance itself, one might set up the existence of higher-order patterns as a hypothesis in view of the hierarchic structure of prosodie features that it is found necessary to postulate for other aspects speech of data. 10 Many of these patterns have a high frequency of occurrence. Some, for example the "listing" intonation series, are closely tied to grammatical form and function; with others, there is no necessary one-to-one correlation with particular grammatical patterns. The theory of subordination has been introduced to account for one set of external relationships in terms of the pitch movement of tone units immediately in sequence. Whatever the co-occurring or determining grammatical environment may be, subordination is recognised and defined solely on prosodie grounds.11 The primary characteristic of the subordinate tone-unit is that its pitch contour, while having a complete and independent shape within itself, falls broadly within the total contour presented in the superordinate tone-unit. It may either precede or follow the superordinate nucleus, singly or in combination with other subordinate 9

Among the exceptions, one might cite H. Palmer and F. Blandford, who speak of "repeated tone patterns (or co-ordinating sequences)" in A Grammar of Spoken English (Cambridge, 1939), pp. 24f. ; J. L. M. Trim, who distinguishes "Major and Minor Tone Groups" in Le Maître Phonétique, 112 (1959), 26-29. Cf. also A. C. Gimson, op. cit., pp. 244ff. 10 Cf. "On Scales of Contrast". Acoustic evidence for the more important aspects of subordination is presented in spectrograms 8-19 in the Appendix. 11 For a further comment on the relevance of grammar, see pp. 60f. belo w.

PROSODIC FEATURES

53

units, having the same kind of systématisation (though this is reflected in our transcription with minor differences for reasons of notation economy). To determine whether one of two neighbouring tone-units is superordinate (TU1) or subordinate (TU2) the following criteria are used : a) The nuclear type postulated as subordinate must repeat the direction of the nucleus in TU1, both nuclei usually being one of the two primary categories, fall or rise.12 If this direction is not similar, subordination is not possible, and the tone units must be treated as independent. b) The width of nuclear movement in TU1 must be greater than that in TU2. The range disparity between the nuclear tones is the main factor in determining the subordinate partner, prominence being secondary and non-diagnostic. The types of subordination are based on the kind and degree of this disparity, which is perceived by comparing the starting-positions of the kinetic tones in TU1 and TU2. Either (i) TU2 will start and finish completely outside the range of TU1 ; or (ii) there will be an overlap; or (iii) TU2 will fall completely inside the range of TU1. For the purposes of this paper the latter two categories have been taken together: it is unlikely that much (if any) contrast exists between them. They seem closely correlatable in form and function, and the main contrast is undoubtedly between these and (i). The full range of subordinate units is presented in Table 3 below. From this it will be seen that TU2 a and b are preposed subordinate units and that TU2 c and d are postposed units. The roman numerals by which they are further specified indicate where relevant the kinds of pitch relationship obtaining between TU1 and TU2 as set out in Table 3. It is usual to find a correlation between an increase in pitch width and an increase in prominence, though this does not affect the decision as to the type of subordination involved. There are a few 18

This aspect of the scaling of nuclear types is also discussed in "On Scales of Contrast". Complex tones (v, A) receive a treatment based on their potential relationship with one or other of the two main categories. This is described below, p. 59. For treatment of level tones, see p. 60.

54

PROSODIC FEATURES

cases, however, especially in TU2a and TU2b positions, where a subordinate unit (diagnosed by pitch width and supported by correspondence with grammatical subordination)13 has more prominence than the superordinate unit. But since it is very much more usual to find subordination corresponding to both reduced pitch and reduced prominence (as is evident from the spectrographic material in the Appendix), it seems reasonable to make the latter diagnostic in the case of level nuclei, where pitch range is not applicable. Examples of this are given on p. 60. The development of the system may be represented as follows in three stages: 1) The width of a TU2 unit (or units) occurs in a higher or lower pitch range in relation to the "middle" range of TU1. TU2a TU2c TU1 TU2b TU2d 2) If TU1 is a fall, each of the positions of TU2 then allows at least one subordinate fall whose range either overlaps or falls within TU1 (i, v, ix, xiii), or falls outside the range of TU1 (ii, vi, x, xiv) as outlined above. The possibilities now are TU2a i (ii) TU2c ix (x) TU1 TU2b v (vi) TU2d xiii (xiv) (Bracketed numbers refer to the fall occurring outside TUl's range.) 3) But this is not yet the complete system, because each of the possibilities outlined in 2) has an alternative form with flattened or "narrow" pitch width, comparable to the m: narrow feature of the complex system of range in prosodic features. This narrowness occurs in all parts of the system, and results in the following completed diagram (for either category, fall or rise). The N indicates the narrowed forms. TU2a i (ii) Niii N(iv) TU2c ix (x) Nxi N(xii) TU1 TU2b v (vi) Nvii N(viii) 13

O n the latter, see pp. 60f.

TU2d xiii (xiv) Nxv N(xvi)

55

PROSODIC FEATURES

Theoretically, then, it is possible to have thirty-two types of subordination to distinguish all the important variables of the rise and fall categories. In practice, however, not all of these numerous possibilities are realised with equal frequency. Many have occurred in the corpus examined for this paper; others have not occurred, but are imaginable; others are highly unlikely ever to occur (e.g. TU2a N(iv), TU2c 7V(xii) F). Even features in the latter category are included in the following table, however, to show the full potential of the phenomenon. Examples are given in the phonetic interlinear transcription as well as our own notation with maximal 14 TABLE 3

' ¡[HT#

TU2a i F" i R~ ^ (ii) F'_

ATiii F~

S

Z5 3

\R~ (vi) F\

(vi) r : JVvii F"_ JVvii R N(yiii) F~ N(ym) R ' 14

ZL

3 ±

/ w#]# / '[/£'#]#

HE'] •#

(x)R[

HHN']'#

Nxi F '

l[HN']'#

NxiR

HENY#

Mxii) F ;

HEN']'#

JV(xii) R ; ZZI

/[']

IINI

/•UHNS#]#

;



zx:

xiii R \ z z z

'#

I

'[¡HN'#]#

I '{JEN

'#]#

I '[JEN

'#]#

/ '[/ '#]# /'[/'#]#

'#

(xiv) F \

/ W#]#

'#

(xiv) R ]

lilL'#]#

'#

Nxv F '

/W#]# / w#]#

^riiLNi IILN-]

riiE'#]#

TU2d xiii F ;

z r UN •] - #

Z2.

I W # ] #

(x)F[

z x IlL'i 21IIL1 Z5

T T

¡[EV#

111

TU2b v F\

ixF ix R "

J

Niii R ~ H Z N(iv) F~

TU2c

•#

JVxv R [ ZUL jvxxvi) F ; N(x\i) R '

I ÌILN

'#]#

I '[¡LN

'#]#

Maximal is important. We do not wish to obscure the fundamental relation of the sub-categories of nucleus type to the corresponding major categories. The delicacy which distinguishes / [LN'] (:!)' # , for example, to this maximal degree, merely indicates the place of the subordinate rise in the total system of subordination: its most important characteristic is that it is a rise and not a fall.

56

PROSODIC FEATURES

discrimination, using the following symbols: F = Fall; R = Rise; / = onset of tone unit; [ ] enclose subordinate unit; # terminates tone unit; E, Hand L stand respectively for Extra-high, High and Low beginning-points for the pitch movement. Notes on Table 3: 1) For the reasons given on p. 53, this Table subsumes overlapping subordinate tones as alternative forms in examples i, iii, v, vii, ix, xi, xiii, xv. 2) The patterns xiii F, xiv F, and xvi Fare the most frequent types of subordination so far observed in the material, presumably because they approximate most nearly to a nuclear tail. But whereas a tail has little prominence and its only pitch movement is to follow directly that of the nucleus, the subordinate tone unit has a new pitch contour which always results in increased overall prominence as compared with a tail and hence in a clearly different significance as well. For example, one interpretation of the utterance I /told you [I /didn't LMvant to # ] # is that it "has more to say" than I /told you I -didn't -want to # (where the raised period indicates the continuance of the nuclear tone on stressed syllables), by having - so to say - one and a half "information points" as against one.16 Cf. Lee Hultzen, "Information Points in Intonation", Phonetica, 4 (1959), 107-20. There is an obvious analogy to the helpful concept of "rheme" and "theme" as developed by Vachek and Firbas; see, for example, J. Firbas, "On the Communicative Value of the Modern English Finite Verb", Brno Studies in English, 3 (1961), 79ff. Before going on to illustrate the scheme of units set out in Table 3, some explanatory comments are necessary on notational complications. Since we rank the modulation system of pitch range higher than that of subordination, we can dispense with the use of capital letters to indicate range where the subordinate nucleus in question falls within an " m " stretch of the relevant value; see the last example under B (a) on p. 58. 15

See the comparison of tail and subordination phenomena in Spectrograms 6 and 7 in the Appendix. Coincidentally, Dr M. A. K. Halliday tells us that he has reached a similar informal categorisation of a tonal contrast in English to that expressed above.

PROSODIC FEATURES

57

A less convenient complication in the matter of range arises from the fact that certain characteristics systemically and notationally "marked" in superordinate structures appear to be far from systemically "marked" in subordinate ones, though in the interests of notational consistency it would obviously be inconvenient to reverse the notational practice in such cases. It might transpire, for example, that a range modulation (low, narrow, or both) was the expected accompaniment of a TU2F, despite the fact that (contrary to the general aims of our notation) we would still denote such instances of TU2 more heavily than what would then be the systemically "marked" term [' # ] . It must be emphasised that the "simple" range symbols E, L and H relate the starting point of the subordinate nucleus only to that of the superordinate nucleus and that they are independent of the terms in the booster system. Unless, therefore, the subordinate nucleus begins with the onset (at which point booster symbols are excluded by the theory from any tone unit), it is possible for symbols from the two systems to co-occur. Booster symbols within a TU2 have the same value as they have outside it - the relation of pitch height to the pitch of the preceding syllable or segmentinitiator. With preposed subordinate units, therefore, the boosters involve no complication; for example: it /wouldn't be [//!any] iise# where the H relates any to use and the ! relates any to wouldn't. With postposed subordinate units, however, it will be clear that the consistently "backward" reference of boosters has the effect of making boosters refer only to pitch height of syllables within the TU2 itself. To exemplify the more frequent instances of subordination, the basic patterns may be set out as follows : 16 A. Simple Subordination, i.e. one TU2 either preceding or following a TU1: 19

The examples are from Survey text 5b. 51.

58

PROSODIC FEATURES

there /aren't many -murderers -executed [in /this -country * . » « • > . . . . >S f I . this is /still [the /El&w#] # /[world] ¡wide # which ¡[W 'are] termed # . . ^ • ' "N * J B. Complex Subordination, i.e. more than one TU2 either preceding or following the TU1: a) This is particularly typical of postposed TU2s, of which there are two kinds: i) Where the first subordinated unit (TU2) has another unit subordinated to it in immediate sequence (TU3) which is marked accordingly as if it were a TU2 related to a TU1: this is /not o" bligatory [Mr ¡LWilliams [as you've sug/gested # ] # ] # t

«

*

»

«


i : * w o

-H

>-1

« ed




a 2 V3 13

M a •5

6

s

3

ZI

a

©

o Z

b X o

•a

a

X o WJ 3 ¿3

X) 3 O •o

£

I

o

•a

I

u> "B>

•o >f "o a.

3

•8 8 ts ft •i «

•C x> «

.2

E2 ^