237 103 24MB
English Pages 381 [384] Year 2011
Tones and Features
Studies in Generative Grammar 107
Editors Harry van der Hulst Jan Koster Henk van Riemsdijk
De Gruyter Mouton
Tones and Features Phonetic and Phonological Perspectives edited by
John A. Goldsmith, Elizabeth Hume, and W. Leo Wetzels
De Gruyter Mouton
The series Studies in Generative Grammar was formerly published by Foris Publications Holland.
ISBN 978-3-11-024621-6 e-ISBN 978-3-11-024622-3 ISSN 0167-4331 Library of Congress Cataloging-in-Publication Data Tones and features : phonetic and phonological perspectives / edited by John A. Goldsmith, Elizabeth Hume, Leo Wetzels. p. cm. — (Studies in generative grammar; 107) Includes bibliographical references and index. ISBN 978-3-11-024621-6 (alk. paper) 1. Phonetics. 2. Grammar, Comparative and general—Phonology. I. Goldsmith, John A., 1951- II. Hume, Elizabeth V., 1956- III. Wetzels, Leo. P217.T66 2011 414'.8—dc23 2011030930 Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliogra¿e; detailed bibliographic data are available in the Internet at http://dnb.d-nb.de. © 2010 Walter de Gruyter GmbH & Co. KG, Berlin/Boston Typesetting: Re¿neCatch Ltd, Bungay, Suffolk Printing: Hubert & Co. GmbH & Co. KG, Göttingen 앝 Printed on acid-free paper Printed in Germany www.degruyter.com
Contents
Preface John Goldsmith, Elizabeth Hume and W. Leo Wetzels
vii
1. The representation and nature of tone Do we need tone features? G. N. Clements, Alexis Michaud and Cédric Patin
3
Rhythm, quantity and tone in the Kinyarwanda verb John Goldsmith and Fidèle Mpiranya
25
Do tones have features? Larry M. Hyman
50
Features impinging on tone David Odden
81
Downstep and linguistic scaling in Dagara-Wulé Annie Rialland and Penou-Achille Somé
108
2. The representation and nature of phonological features Crossing the quantal boundaries of features: Subglottal resonances and Swabian diphthongs Grzegorz Dogil, Steven M. Lulich, Andreas Madsack, and Wolfgang Wokurek Voice assimilation in French obstruents: Categorical or gradient? Pierre A. Hallé and Martine Adda-Decker An acoustic study of the Korean fricatives /s, s'/: implications for the features [spread glottis] and [tense] Hyunsoon Kim and Chae-Lim Park
137
149
176
vi Contents
Autosegmental spreading in Optimality Theory John J. McCarthy Evaluating the effectiveness of Uni¿ed Feature Theory and three other feature systems. Jeff Mielke, Lyra Magloughlin, and Elizabeth Hume
195
223
Language-independent bases of distinctive features Rachid Ridouane, G. N. Clements and Rajesh Khatiwada
264
Representation of complex segments in Bulgarian Jerzy Rubach
292
Proposals for a representation of sounds based on their main acoustico-perceptual properties Jacqueline Vaissière
306
The representation of vowel features and vowel neutralization in Brazilian Portuguese (southern dialects) W. Leo Wetzels
331
Index
361
Preface
The papers in this volume are all concerned with two current topics in phonology: the treatment of features, and the treatment of tone. Most of them grew out of a conference at the University of Chicago’s Paris Center in June of 2009 which was organized by friends and colleagues of Nick Clements in tribute to decades of contributions that he had made to the ¿eld of phonology, both in the United States and in France. Nick’s work served as a natural focus for the discussions and interactions that resulted in the papers that the reader will ¿nd in this book. We, the editors, would like to say a bit about Nick’s career and his work in order to set the context. 1.
G. N. Clements
Nick was an undergraduate at Yale University, and received his PhD from the School of Oriental and African Studies, University of London, for a dissertation on the verbal syntax of Ewe in 1973, based on work that he did in the ¿eld. In the 1970s, he spent time as a post-doctoral scholar at MIT and then as a faculty member in the Department of Linguistics at Harvard University. Throughout this period he published a series of very inÀuential articles and books on areas in phonological theory, a large portion of which involved linguistic problems arising out of the study of African languages. His work in this period played an essential role in the development of autosegmental phonology, and his work in the 1980s, when he was a professor of linguistics at Cornell University, was crucial in the development of many of the current views on features, feature geometry, sonority, and syllabi¿cation. He worked closely with students throughout this time—including one of us, Elizabeth Hume—at Cornell. He also co-wrote books with several phonologists (Morris Halle, Jay Keyser, John Goldsmith) and collaborated on many research projects. In 1991, Nick moved to Paris, where he and his wife, Annie Rialland, worked together on projects in phonetics, phonology, and many other things, both linguistic and not. Visiting Nick in Paris became an important thing for phonologists to do when they had the opportunity to come to Paris. Over the next twenty years or so Nick continued to work selÀessly and generously
viii Preface
with students and more junior scholars, and was widely sought as an invited speaker at conferences. Nick passed away a few months after the conference, late in the summer of 2009. Many of his friends (and admirers) in the discipline of phonology had been able to express their admiration for his contributions through their papers and their kind words at the time of the conference in June. This book is offered as a more permanent but equally heartfelt statement of our affection and respect for Nick’s work in phonology and in linguistics more broadly. 2. Tone The proper treatment of tonal systems has long been an area of great activity and curiosity for phonologists, and for several reasons. Tonal systems appear exotic at ¿rst blush to Western European linguists, and yet are common among languages of the world. The phonology of tone is rich and complex, in ways that other subdomains of phonology do not illustrate, and yet each step in our understanding of tonal systems has shed revelatory light on the proper treatment of other phonological systems. At every turn, tonal systems stretch our understanding of fundamental linguistic concepts: many languages exhibit tonal contrasts, in the sense that there are lexical contrasts that are physically realized as different patterns of fundamental frequency distributed globally over a word. But from a phonological point of view, words are not unanalyzable: far from it—they are composed in an organized fashion from smaller pieces, some mixture of feet, syllables, and segments. Breaking a pitch pattern (when considering an entire word) into pieces that are logically related to phonological or morphological subpieces (which is ultimately ninety percent of a phonologist’s synchronic responsibility) has proven time and time again to be an enormous challenge in the arena of tone. One of the classic examples of this challenge can be found in Clements and Ford’s paper (1979) on Kikuyu tone. In Kikuyu, the surface tone of each syllable is essentially the expression of the previous syllable’s tonal speci¿cation. Each syllable (often, though not always, a distinct morpheme) thus has an underlying – we are tempted to say, a logical—tone speci¿cation, but that speci¿cation is realized just slightly later in the word than the syllable that comprises the other part of the underlying form. Morphemes in such a system show utter disregard for any tendency to try to be realized in a uniform way across all occurrences; tones seem to assert their autonomy and the privileges that come with that, and use it to produce a sort of constant syncopation in the beat of syllable against tone.
Preface ix
Is tone, then, different from other phonological features? This question is directly posed by three papers in this volume, that by Nick Clements and colleagues, that by Larry Hyman, and that by David Odden. Each is written with the rich background of several decades of research on languages – largely African tone languages, at least as far as primary research is concerned, but also including the fruits of research done on Asian languages over decades as well. In the end, Clements, Michaud, and Patin conclude that tonal features may well be motivated in our studies of tonal systems, but the type of motivation is different in kind from that which is familiar from the study of other aspects of phonology. Hyman, for his part, is of a similar conviction: if tones are analyzed featurally in the ultimate model of phonology, it is not a step towards discovering ultimate similarity between tone and every other phonological thing: tone’s diversity in its range of behavior keeps it distinct from other parts of phonology. David Odden’s chapter also focuses on the motivation for tonal features. However, his focus is on the types of evidence used to motivate a given feature. Along these lines, he argues that tonal features, like other phonological features, are learned on the basis of phonological patterning rather than on the basis of the physical properties of the sounds (for related discussion, see Mielke 2008). Goldsmith and Mpiranya’s contribution addresses not features for tone, but rather one particular characteristic of tone that keeps it distinct from other aspects of phonology: tone’s tendency to shift its point of realization (among a word’s syllables) based on a global metrical structure which is erected on the entire word. This is similar to the pattern we alluded to just above in Kikuyu, but in Kinyarwanda, certain High tones shift their autosegmental association in order to appear in weak or strong rhythmic positions: a bit of evidence that rhythmicity is an important organization principle of tonal assignment, in at least some languages, much like that seen in accent assignment and rarely, if ever, seen in other aspects of a phonological system. The theme of rhythmicity is continued in the paper by Annie Rialland and Penou-Achille Somé. They hypothesize that there is a relationship between the linguistic scaling in Dagara-Wulé, as manifested in downstep sequences, and the musical scaling in the same culture, as found in an eighteen key xylophone. They suggest that downstep scaling and xylophone scaling may share the property of being comprised of relatively equal steps, de¿ned in terms of semitones. 3.
Features
The hypothesis that the speech chain can be analyzed as a sequence of discrete segments or phonemes, themselves decomposable into a set of
x
Preface
phonological features, has been at the core of almost a century of research in the sound structure of human language. By virtue of their contrastive nature, phonological features function as the ultimate constitutive elements of the sound component in the sound-to-meaning mapping, while, being both restricted in number at the individual language level and recurrent across languages, their intrinsic characteristics are often associated with general properties of human anatomy and physiology. Apart from being distinctive, phonological features appear to be economical in the way they combine to construct phoneme systems and to express, individually or in combination, the regularity of alternating sound patterns, both historically and synchronically. It was discovered by Stevens (1972) that small articulator movements in speci¿c areas of the articulatory space may lead to large acoustic changes, whereas, in other regions, relatively large movements lead to only minor acoustic variations. Stevens’ quantal model of distinctive features forms the theoretical background of the study by Dogil and his colleagues, who discuss the function of subglottal resonances in the production and perception of diphthongs in a Swabian dialect of German. It is observed that Swabian speakers arrange their formant movements in such a way that the subglottal resonance region is crossed in the case of one diphthong and not the other. In Stevens’ model, the de¿ning acoustic attributes of a feature are a direct consequence of its articulatory de¿nition. The relation between articulation and acoustics is considered to be language-independent, although a feature may be enhanced language-speci¿cally to produce additional cues that aid in its identi¿cation. As required by the naturalness condition, phonological features relate to measurable physical properties. Therefore, to the extent that features can be shown to be universal, it is logical to ask what the de¿ning categories of a given feature are that account for the full range of speech sounds characterized by it. This problem is explicitly addressed in the chapter by Ridouane, Clements, and Khatiwada, who posit the question of how [spread glottis] segments are phonetically implemented, and propose a language-independent articulatory and acoustic de¿nition of this feature. Also following the insights of Stevens’ quantal theory, Vaissière elaborates a phonetic notation system based on the combination of acoustic and perceptual properties for ¿ve ‘reference’ vowels and discusses its advantages over Jones’ articulation-based referential system of cardinal vowels. Kim and Park address the issue of how the opposition between the Korean fricatives /s, s’/ is best characterized in phonetic terms. From their acoustic data they conclude that the most important parameter that distinguishes these sounds is frication duration, which is signi¿cantly longer in /s’/ than in /s/. They
Preface xi
propose that this difference is best expressed by reference to the feature [tense]. Discovering the smallest set of features able to describe the world’s sound patterns has been a central goal of phonological theory for close to a century, leading to the development of several different feature theories. The chapter by Mielke, Magloughlin, and Hume compares the effectiveness of six theories to classify actually occurring natural and unnatural classes of sounds. They show that a version of Uni¿ed Feature Theory (Clements and Hume 1995) with binary place features, as suggested by Nick Clements in 2009, performs better than other proposed theories. Another important topic in feature research concerns the relation between the feature structure of phonological representations and phonological processes or constraints. How are segments, morphemes or words represented in terms of their feature composition, and which features pattern together in phonological processes and bear witness to their functional unity? Hallé and Adda-Decker study the latter question by examining whether voice assimilation in French consonant clusters is complete or partial. They show that, of the acoustic parameters involved in the assimilation process, voicing ratios change categorically, whereas secondary voicing cues remain totally or partially unaffected. They propose to describe voicing assimilation in French as a single-feature operation affecting the [voice] feature. Rubach addresses the question whether palatalized and velarized consonants should be treated as complex or as simplex segments in terms of their geometrical representation. Looking at Bulgarian data, he concludes that palatalization as well as velarization on coronals and labials are represented as separate secondary articulations. In his study on mid-vowel neutralizations in Brazilian Portuguese, Wetzels argues for a gradient four-height vowel system for this language. The interaction between vowel neutralization and independent phonotactic generalizations suggests that vowel neutralization cannot be represented as the simple dissociation from the relevant contrastive aperture tier, but is best expressed by a mechanism of marked-to-unmarked feature substitution. McCarthy’s paper provides a detailed discussion of how vowel harmony should be accounted for in Optimality Theory. Since proposals for dealing with vowel harmony as embedded in parallel OT make implausible typological predictions, he proposes a theory of Serial Harmony that contains a speci¿c proposal about the constraint that favors autosegmental spreading within a derivational ‘harmonic serialism’ approach to phonological processes. In addition to the authors noted above and the participants at the 2009 Paris symposium, we would like to acknowledge others who contributed
xii Preface
to this tribute to our friend and colleague, Nick Clements. The University of Chicago generously provided its Paris Center where the symposium was held, and we would like to thank Françoise Meltzer and Sebastien Greppo, Director and Administrative Director of the Paris Center, respectively, for their invaluable assistance in organizing the event. We are also grateful to Deborah Morton of The Ohio State University Department of Linguistics for editorial help in preparing the manuscripts for publication, and to Julia Goldsmith for her assistance in creating the index. Likewise, our appreciation extends to the editorial staff at Mouton de Gruyter, including Julie Miess, and the late Ursula Kleinhenz for her enthusiastic support of this project. John A. Goldsmith, Elizabeth Hume, W. Leo Wetzels References Clements, G.N. and Kevin C. Ford 1979 Kikuyu tone shift and its synchronic consequences. Linguistic Inquiry (10.2): 179–210. Clements, G.N. and Elizabeth Hume 1995 The internal organization of speech sounds. In John A. Goldsmith (ed.), The Handbook of Phonological Theory, 245–306. Oxford: Blackwell. Mielke, Jeff 2008 The Emergence of Distinctive Features. Oxford: Oxford University Press. Stevens, K.N. 1972 The quantal nature of speech; Evidence from articulatory-acoustic data. In: P.B. Denes and E.E. David Jr. (eds.), Human Communication: A Uni¿ed View, 51–66. New York: McGraw-Hill.
1. The representation and nature of tone
Do we need tone features? G.N. Clements, Alexis Michaud, and Cédric Patin Abstract. In the earliest work on tone languages, tones were treated as atomic units: High, Mid, Low, High Rising, etc. Universal tone features were introduced into phonological theory by Wang 1967 by analogy to the universal features commonly used in segmental phonology. The implicit claim was that features served the same functions in tonal phonology as in segmental phonology. However, with the advent of autosegmental phonology (Goldsmith 1976), much of the original motivation for tone features disappeared. Contour tones in many languages were reanalyzed as sequences of simple level tones, calling into question the need for tonal features such as [±falling]. Processes of tone copy such as L(ow) > H(igh) / __ H(igh) were reinterpreted as tone spreading instead of feature assimilation. At about the same time, a better understanding of downstep emerged which allowed many spurious tone levels to be eliminated. As a result, in spite of the vast amount of work on tone languages over the past thirty years, the number of phenomena that appear to require tone features has become signi¿cantly reduced, raising the issue whether the notion of tone features is at all useful. This paper ¿rst reviews the basic functions for which segmental features have been proposed, and then examines the evidence that tone features are needed to serve these or other functions in tone languages. The discussion focuses successively on level tones, contour tones, and register, building on examples from Africa and Asia. Our current evaluation of the evidence is that tone features, to the extent that they appear motivated at all, do not serve the same functions as segmental features. 1.
Introduction
In this introduction, we review criteria that are commonly used in feature analysis in segmental phonology, and suggest that these criteria have not, in general, been successfully extended to tonal phonology. Some important functions of features in segmental phonology are summarized in Table 1.1
4
G.N. Clements, Alexis Michaud, Cédric Patin
Table 1. Some common functions of features in segmental phonology Function distinctive componential
classi¿catory dynamic
example (segments) distinguish phonemes/ tonemes de¿ne correlations (sets distinguished by one feature) de¿ne natural classes (rule targets, rule contexts) de¿ne natural changes (such as assimilation)
/p/ and /b/ are distinguished by [±voice] [–voiced] p t c k [+voiced] b d ܱ g [–sonorant] sounds are devoiced word-¿nally obstruents become [+voiced] before [+voiced] consonants
It is usually held, since the work of Jakobson, Fant and Halle (1952), that one small set of features largely satis¿es all functions. We have illustrated this point by using the feature [±voiced] in the examples above. It is also usually believed that each feature has a distinct phonetic de¿nition at the articulatory or acoustic/auditory level, speci¿c enough to distinguish it from all other features, but broad enough to accommodate observed variation within and across languages. In this sense, features are both “concrete” and “abstract”. With very few exceptions, linguists have also maintained that features are universal, in the sense that the same features tend to recur across languages. Thus the feature [labial] is used distinctively to distinguish sounds like /p/ and /t/ in nearly all languages of the world. Such recurrence is explained by common characteristics of human physiology and audition.2 Although all the functions in Table 1 have been used in feature analysis at one time or another, the trend in more recent phonology has been to give priority to the last two functions: classi¿catory and dynamic. We will accordingly give these functions special consideration here. Feature theory as we understand it is concerned with the level of (categorical) phonology, in which feature contrasts are all-or-nothing, rather than gradient. Languages also have patterns of subphonemic assimilation or coarticulation which adjust values within given phonological categories. Such subphonemic variation does not fall within the classical functions of features as summarized in Table 1, and it should be obvious that any attempt to extend features into gradient phenomena runs a high risk of undermining other, more basic functions, such as distinctiveness.
Do we need tone features? 5
Traditionally, rather high standards have been set for con¿rming proposed features or justifying new ones. The most widely-accepted features have been founded on careful study of evidence across many languages. Usual requirements on what counts as evidence for any proposed feature analysis include those in (1). (1) a. phonetic motivation: processes cited in evidence for a feature are phonetically motivated. b. recurrence across languages: crucial evidence for a feature must be found in several unrelated languages. c. formal simplicity: the analyses supporting a given feature are formally and conceptually simple, avoiding multiple rules, brackets and braces, Greek letter variables, and the like. d. comprehensiveness: analyses supporting a given feature cover all the data, not just an arbitrary subset. Proposed segmental features that did not receive support from analyses meeting these standards have not generally survived (many examples can be cited from the literature). The case for tone features, in general, has been much less convincing than for segmental features. One reason is that much earlier discussion was vitiated by an insuf¿cient understanding of: − − − −
“autosegmental” properties of tone: Àoating tones, compositional contour tones, toneless syllables, etc. downstep: for example, !H tones (downstepped High tones) being misinterpreted as M(id) tones intonational factors: downdrift, ¿nal lowering, overall “declination” contextual variation, e.g. H(igh) tones are often noncontrastively lower after M(id) or L(ow) tones
As a result, earlier analyses proposing assimilation rules must be reexamined with care. Our experience in the African domain is that most, if not all, do not involve formal assimilation processes at all. A second reason, bearing on more recent analysis, is that the best arguments for tone features have often not satis¿ed the requirements shown in (1). Feature analyses of tonal phenomena, on close examination, very often prove to be phonetically arbitrary; idiosyncratic to one language; complex (involving several rules, Greek-letter variables, abbreviatory devices, etc.);
6
G.N. Clements, Alexis Michaud, Cédric Patin
and/or noncomprehensive (i.e. based on an arbitrary selection of “cherrypicked” data). A classic example in the early literature is Wang’s celebrated analysis of the Xiamen tone circle (Wang 1967; see critiques by Stahlke 1977, Chen 2000, among others). Wang devised an extremely clever feature system which allowed the essentially idiosyncratic tone sandhi system of Xiamen to be described in a single (but highly contrived) rule in the style of Chomsky & Halle 1968, involving angled braces, Greek letter variables, etc. Unfortunately, the analysis violated criteria (1a–c), viz. phonetic motivation, recurrence across languages, and formal simplicity. As it had no solid crosslinguistic basis, it was quickly and widely abandoned. The following question can and should be raised: when analyses not satisfying the criteria in (1) are eliminated, do there remain any convincing arguments for tone features? 2. The two-feature model Though there have been many proposals for tone feature sets since Wang’s pioneering proposal (see Hyman 1973, Anderson 1978), recent work on this topic has converged on a model which we will term the Two-Feature Model. In its essentials, and abstracting from differences in notation and terminology from one writer to another, the Two-Feature Model posits two tone features, one dividing the tone space into two primary registers (upper and lower, or high and low), and the other dividing each primary register into secondary registers. The common core of many proposals since Yip [1980] 1990 and Clements 19833 is shown in (2). This model applies straightforwardly to languages that contrast four level tones. (2) register subregister
top H h
high H l
mid L h
low L l
We use the conventional terms “top”, “high”, “mid”, and “low” for the four tones of the Two-Feature Model in order to facilitate comparison among languages in this paper. The model outlined in (2) analyzes these four tones into two H-register tones, top and high, and two L-register tones, mid and low. Within each of these registers, the subregister features, as we will call them, divide tone into subregisters; thus the top and high tone levels are
Do we need tone features? 7
assigned to the higher and lower subregisters of the H register, and the mid and low tones are likewise assigned to the higher and lower subregisters of the L register. The Two-Feature Model, like any model of tone features, makes a number of broad predictions. Thus: − − −
attested natural classes should be de¿nable in terms of its features natural assimilation/dissimilation processes should be describable by a single feature change recurrent natural classes and assimilation/dissimilation processes which cannot be described by this model should be unattested (or should be independently explainable)
We add two quali¿cations. First, more developed versions of the TwoFeature Model have proposed various feature-geometric groupings of tone features. We will not discuss these here, as we are concerned with evidence for tone features as such, not for their possible groupings. Second, there exist various subtheories of the Two-Feature Model. Some of these, such as the claim that contour tones group under a single Tonal Node, have been developed with a view to modeling Asian tone systems (most prominently those of Chinese dialects), while others were proposed on the basis of observations about African languages. Again, we will not discuss these subtheories here except to the extent that they bear directly on evidence for tone features. 3. Assimilation As we have seen, much of the primary evidence for segmental features has come from assimilation processes in which a segment or class of segments acquires a feature of a neighboring segment or class of segments, becoming more like it, but not identical to it. (If it became identical to it we would be dealing with root node spreading or copying rather than feature spreading). We draw a crucial distinction between (phonological) assimilation, which is category-changing, and phonetic assimilation, or coarticulation, which is gradient. A rule by which a L tone acquires a higher contextual variant before H in a language with just two contrastive tone levels, L and H, is not phonological. In contrast, a rule L → M in a language having the contrastive tone levels L, M, and H is neutralizing and therefore demonstrably category-
8
G.N. Clements, Alexis Michaud, Cédric Patin
changing. As we are concerned here with phonological features, we will be focusing exclusively on phonological assimilation.4 Now when we look through the Africanist literature, an astonishing observation is the virtual absence of clear cases of phonological assimilation in the above sense. The vast number of processes described in the literature since the advent of autosegmental phonology involve shifts in the alignment between tones and their segmental bearing units. Processes of apparent tone assimilation such as L → H / __ H are described as tone spreading rather than feature assimilation. One apparent case of assimilation that has frequently been cited in the recent literature proves to be spurious. Yala, a Niger-Congo language spoken in Nigeria, has three distinctive tone levels: H(igh), M(id), and L(ow). This language has been described as having a phonological assimilation rule by which H tones are lowered to M after M or L (Bao 1999, Yip 2002, 2007, after Tsay 1994). According to the primary source for this language, Armstrong 1968, however, Yala has no such rule. Instead, Yala has a downstep system by which any tone downsteps a higher tone: M downsteps H, L downsteps H, and L downsteps M. Downstep is non-neutralizing, so that, e.g. a downstepped H remains higher than a M. Yala is typologically unusual, though not unique, in having a three-level tone system with downstep, but Armstrong’s careful description leaves no doubt that the lowering phenomenon involves downstep and not assimilation.5 Our search through the Africanist literature has turned up one possible example of an assimilation process. Unfortunately, all data comes from a single source, and it is possible that subsequent work on this language may yield different analyses. However, as it is the only example we have found to date, it is worth examining here. Bariba (also known as Baatonu), a Niger-Congo language spoken in Benin (Welmers 1952), has four contrastive tone levels. We give these with their feature analysis under the Two-Feature Model in (3). (Tone labels “top”, “high”, “mid”, and “low” are identical to those of Welmers, but we have converted his tonal diacritics into ours, as given in the last line.) (3) register subregister
top H h aࡉ
high H l á
mid L h Ɨ
low L l à
By a regular rule, “a series of one or more high tones at the end of a word becomes mid after low at the end of a sentence” (Welmers 1952, 87). In
Do we need tone features? 9
rule notation, this gives H1 → M / L __ ]S. Examples are given in (4a–b) (alternating words are underlined): (4) a. ná b`ܧrá buƗ b. ná bóó wá
‘I broke a stick’ (b`ܧrƗ ‘a stick’) ‘I saw a goat’ / ná bìì wƗ ‘I saw a child’
Example (4a) illustrates one condition on the rule: the target H tone of /b`ܧrá/ in ‘I broke a stick’ occurs after L, as required, but does not occur sentence¿nally, and so it does not lower; in the second example (‘a stick’), however, both conditions are satis¿ed, and H lowers to M. (4b) illustrates the other condition: the target H tone of /wá/ in ‘I saw a goat’ occurs sentence-¿nally, but does not occur after a L tone, and so it does not lower; in the second example (‘I saw a child’), both conditions are satis¿ed, and the H tone lowers as expected. Considering the formal analysis of this process, it is obvious that the TwoFeature Model provides no way of describing this assimilation as spreading. Consider the LH input sequence as analyzed into features in (5): (5) register subregister
low L l
high H l
We cannot spread the L register feature from the L tone to the H tone, as this would change it to L, not M. Nor can we spread the l subregister feature from the L tone to the H tone, as this would change nothing (H would remain H). Other analyses of the Bariba data are possible, and we brieÀy consider one here, in which what we have so far treated as a M tone is reanalyzed as a downstepped H tone.6 There is one piece of evidence for this analysis: according to Welmers’ data, there are no M-H sequences. (Welmers does not make this observation explicitly, so we cannot be sure whether such sequences could be found in other data, but for the sake of argument we will assume that this is an iron-clad rule.) We can see two straightforward interpretations for such a gap. One is that M is a downstepped H synchronically, in which case any H following it would necessarily be downstepped. The other is that M is synchronically M, as we have assumed up to now, but has evolved from an earlier stage in which M was !H (see Hyman 1993 and elsewhere for numerous examples of historical *!H > M shifts in West African languages). The absence of M-H sequences would then be a trace of the earlier status of M as a downstepped H.
10
G.N. Clements, Alexis Michaud, Cédric Patin
Looking through Welmers’ description, we have found no further evidence for synchronic downstep in the Bariba data. If Bariba were a true downstepping language, we would expect iterating downsteps, but these are not found in the language. Welmers presents no sequences corresponding to H !H !H, as we ¿nd pervasively in classic downstep systems; we would expect that if the second of two successive M tones were produced on a new contrastive lower level in some examples, Welmers would have commented on it. Also, M does not lower any other tone, notably the top tone. A downstep analysis would therefore have to be restricted by rather tight conditions. In contrast, if M is really M, the only statement needed is a constraint prohibiting M-H sequences, which accounts for all the facts. We conclude that Bariba offers a signi¿cant prima facie challenge to the Two-Feature Model, while admitting that further work on this language is needed before any de¿nitive conclusion can be drawn. 4.
Interactions between nonadjacent tones
We have so far examined possible cases of interactions between adjacent tones. A particularly crucial question for the Two-Feature Model concerns the existence of interactions between nonadjacent tones. We show the TwoFeature Model again in (2): (2) register
top H
subregister
h
high H l
mid L h
low L l
This model predicts that certain nonadjacent tones may form natural classes and participate in natural assimilations. In a four-level system, top and mid share the feature h on their tone tier, and high and low the feature l. Thus, under the Two-Feature Model we expect to ¿nd interactions between top and mid tones, on the one hand, and between high and low tones, in the other, in both cases skipping the intermediate tone. A few apparent cases of such interactions were cited in the early 1980s, all from African languages, and have been cited as evidence for the Two-Feature Model, but no new examples have been found since, as far as we know. Reexamination of the original cases would seem to be called for. A small number of African languages, including Ewe and Igede, have alternations between non-adjacent tone levels. We will examine Ewe here, as it has often been cited as offering evidence for the Two-Feature Model
Do we need tone features? 11
(Clements 1983, Odden 1995, Yip 2002). We will argue that while the alternations between nonadjacent tones in Ewe are genuine, they do not offer evidence for a feature analysis, either synchronically or historically. The facts come from a rule of tone sandhi found in a variety of Ewe spoken in the town of Anyako, Ghana, as originally described by Clements 1977, 1978. While most varieties of Ewe have a surface three-level tone system, this variety has a fourth, extra-high level. We will call this the “top” level consistent with our usage elsewhere in this paper. These four levels are characterized in the Two-Feature Model in the same way as the other fourlevel systems discussed so far (see 2 above). The tone process of interest was stated by Clements 1978 as follows. Whenever an expected M tone is Àanked by H tones on either side, it is replaced by a T(op) tone, which spreads to all Àanking H tones except the very last. Examples are shown in (6). (6) /Ɲkpé + ‘stone’ /àtyÕғkƝ + ‘medicine’ /gƗ + ‘money’ /nyܧғnnjví + ‘girl’
mƝܳbé/ ‘behind’ dyÕғ / ‘on’ + áܩé/ hԁmƝ + gã´ғ ‘sum’ ‘large’ INDEF á + wó + vá/ DEF PL ‘come’
ĺ Ɲkpeࡉ meࡉ ܳbé ‘behind a stone’ ĺ àtyÕࡉ keࡉ dyÕғ ‘on medicine’ ĺ gà hòmeࡉ gã´ғ aࡉ ܩé ‘much money’ ĺ ny ࡉܧnĦ-vÕࡉ aࡉ wĘ vá ‘the girls came’
In the ¿rst example, the M tone of the second word /mƝܳbé/ ‘behind’ shifts to T since it is Àanked by H tones. The second example shows that this sandhi process is not sensitive to the location of word boundaries (but see Clements 1978 for a discussion of syntactic conditions on this rule). In the third example, the targeted M tone is borne by the last syllable of /hԁmƝ/ ‘sum’; this M tone meets the left-context condition since the rising tone on the ¿rst syllable of /hԁmƝ/ consists formally of the two level tones LH (see Clements 1978 for further evidence for the analysis of contour tones in Ewe into sequences of level tones). The fourth example shows the iteration of T spreading across tones to the right. This rule must be regarded as phonological since the Top, i.e. extra-high, tones created by this process contrast with surface high tones at the word level: (7) /nú ‘thing’ /nú ‘thing’
+ +
nyƗ ‘wash’ nyá ‘know’
+ +
lá/ AGENT lá/ AGENT
ĺ ĺ
nĦ-nyaࡉ -lá ‘washer (wo)man’ nú-nyá-lá ‘sage, scholar’
12
G.N. Clements, Alexis Michaud, Cédric Patin
In Clements’ original analysis (1983), as recapitulated above, the tone-raising process involves two steps, both invoking tone features. First, the H register feature spreads from the H tones to the M tone, converting it into T. Second, the h subregister feature of the new T tone spreads to adjacent H tones, converting them into T tones (the last H tone is excluded from the spreading domain). It is the ¿rst of these processes that is crucial, as it gives evidence for tone assimilation between nonadjacent tone levels – prime evidence for the Two-Feature Model. The analysis we have just summarized is simple, but it raises a number of problems. First, there is no apparent phonetic motivation for this process: not only does it not phonologize any detectable natural phonetic trend, it renders the location of the original M tone unrecoverable. Second, no other phonologically-conditioned raising process of this type has come to light; this process appears to be unique to Anyako Ewe, and is thus idiosyncratic. Third, though the analysis involves two rules, there is in fact no evidence that two distinct processes are involved; neither of the hypothesized rules applies elsewhere in the language. (Top tones arising from other sources do not spread to H tones.) Thus, the rule seems arbitrary in almost every respect. Notably, it does not satisfy the ¿rst three criteria for feature analysis as outlined in (1). Are other analyses of these data possible? We will consider one here that draws on advances in our knowledge of West African tonal systems in both their synchronic and diachronic aspects. More recent work on tone systems has brought to light two common processes in West African languages. First, H tones commonly spread onto following L tone syllables, dislodging the L tone. This is a common source of downstep. Schematically, we can represent this process as H L H ĺ H H ! H. Second, by a common process of H Tone Raising, H tones are raised to T before lower tones. Thus we ¿nd H ĺ T / __ L in Gurma (Rialland 1981) and Yoruba (Laniran & Clements 2003). There is some evidence that such processes may have been at work in the Ewe-speaking domain. Clements 1977 observes that some speakers of western dialects of Ewe (a zone which includes Anyako Ewe) use nondistinctive downstep. Welmers 1973: 91 observes distinctive downstep in some dialects, and observes that the last H preceding a downstep + H sequence is “considerably raised”. Accordingly, we suggest a historical scenario in which original H M H sequences underwent the following changes:
Do we need tone features? 13
(8) Processes introduction of nondistinctive downstep H spread, downstep becomes distinctive H raising before downstep, rendering it nondistinctive loss of downstep T spreads to all Àanking H tones but the last
result HM!H HH!H HT!H HTH TTT
In this scenario, there would have been no historical stage in which M shifted directly to T. Any synchronic rule M → T would have to conÀate two or three historical steps. Inspired by this scenario, we suggest an alternative analysis in which M Raising is viewed as the “telescoped” product of several historical processes. In a ¿rst step, all consecutive H tones in the sandhi domain are collapsed into one; this is reminiscent of a cross-linguistic tendency commonly referred to as the Obligatory Contour Principle (see in particular Odden 1986, McCarthy 1986). The ¿nal H remains extraprosodic, perhaps as the result of a constraint prohibiting ¿nal T tones in the sandhi domain. Second, H M H sequences (where M is singly linked) are replaced by T: see Table 2. Table 2. A sample derivation of ‘the girls came’, illustrating the reanalysis of M Raising as the product of several historical processes.
nyܧғnnjví a wó vá H M HH H H
underlying representation
nyܧғnnjví á wó vá HM H (H)
1. OCP(H), subject to extraprosodicity (no overt change)
nyࡉܧnĦ vÕࡉ aࡉ wĘ vá
2. replacement of H M H by T
T
(H)
This analysis is, of course, no more “natural” than the ¿rst. We have posited a rule of tone replacement, which has no phonetic motivation. However, it correctly describes the facts. Crucially, it does not rely on tone features at all. Ewe is not the only African language which has been cited as offering evidence for interactions among nonadjacent tone levels. Perhaps the bestdescribed of the remaining cases is Igede, an Idomoid (Benue-Congo,
14
G.N. Clements, Alexis Michaud, Cédric Patin
Niger-Congo) language spoken in Nigeria (see Bergman 1971, Bergman & Bergman 1984). We have carefully reviewed the arguments for interactions among nonadjacent tone levels in this language as given by Stahlke 1977 and ¿nd them unconvincing. In any case, no actual synchronic analysis of this language has yet been proposed (Stahlke’s analysis blends description and historical speculation). Such an analysis is a necessary prerequisite to any theoretical conclusions about features. In sum, examining the evidence from natural assimilations and predicted natural classes of tones, the Two-Feature Model appears to receive little if any support from African languages. Con¿rming cases are vanishingly few, and the best-known of them (Ewe) can be given alternative analyses not requiring tone features. We have also described a potential discon¿rming case (Bariba). Perhaps the most striking observation to emerge from this review is the astonishingly small number of clearly-attested assimilation processes of any kind. Whether this reÀects a signi¿cant fact about West African tonology, or merely shows that we have not yet looked at enough data, remains to be seen. 5.
Register features in Asian languages
The concept of register has long been used in studies of Asian prosodic systems, with agreement regarding several distinct points. Specialists agree that Asian prosodic systems give evidence of register at the diachronic level: the present-day tonal system of numerous Far Eastern languages results from a tonal split conditioned by the voicing feature of initial consonants that created a ‘high’ and a ‘low’ register (Haudricourt 1972). The question we will raise here is whether register features in the sense of the Two-Feature Model are motivated at the synchronic level. In view of a rather substantial literature on this topic, this question might seem presumptuous were it not for our impression that much of the evidence cited in favor of register features suffers from the same shortcomings that we have discussed in the preceding sections in regard to African languages. To help organize the discussion, we begin by proposing a simple typology of East Asian tone languages, inspired by the work of A.-G. Haudricourt 1954, 1972, M. Mazaudon 1977, 1978, M. Ferlus 1979, 1998, E. Pulleyblank 1978, and others. This is shown in Table 3. Each “type” is de¿ned by the questions at the top of the table. The ¿rst question is: Is there a voiced/voiceless contrast among initial consonants? In certain East Asian languages, mostly reconstructed, a distinctive voicing
Do we need tone features? 15 Table 3. A simple typology of East Asian tone languages, recognizing 4 principal types
Type 1 Type 2 Type 3 Type 4
voicing contrast among initials?
distinctive phonation registers?
distinctive tone registers?
+
–
–
– –
+ –
– +
–
–
–
examples
Early Middle Chinese (reconstructed) Zhenhai Cantonese (see below) most Mandarin dialects; Vietnamese; Tamang
contrast is postulated in initial position (e.g. [d] vs. [t], [n] vs. [nࡢ ]). This contrast transphonologized to a suprasegmental contrast in the history of most languages; it is preserved in some archaic languages (e.g. some dialects of Khmou). The second question is: are there distinctive phonation registers? By “phonation register” we mean a contrast between two phonation types, such as breathy voice, creaky voice, and so on. Phonation registers usually include pitch distinctions: in particular, in languages for which reliable information is available, breathy voice always entails lowered pitch, especially at the beginning of the vowel. Various terms have been proposed for distinctive phonation types, including “growl” (Rose 1989, 1990). Phonetically, phonation register is often distributed over the initial segment and the rhyme. In this sense, phonation register can usually be best viewed as a “package” comprising a variety of phonatory, pitch, and other properties, and it may sometimes be dif¿cult to determine which of these, if any, is the most basic in a linguistic or perceptual sense. The third question is: are there distinctive tone registers? The putative category of languages with two distinctive tone registers consists of languages that allow at least some of their tones to be grouped into two sets (high vs. low register), such that any tone in the high register is realized with higher pitch than its counterpart(s) in the low register. In languages with distinctive tone registers, any phonation differences between a high-register tone and its low-register counter-part must be hypothesized to be derivative (redundant with the register contrast). The typology set out in Table 3 is synchronic, not diachronic, and is not intended to be exhaustive. Further types and subtypes can be proposed, and some languages lie ambiguously on the border between two types. Interestingly, however, successive types in this table are often found to
16
G.N. Clements, Alexis Michaud, Cédric Patin
constitute successive stages in historical evolutions. Also, since voicing contrasts are typically lost as tone registers become distinctive, there is no direct relation between consonant voicing and tone; this fact explains the absence of a further type with a voicing contrast and distinctive tone registers. It should be noted that only type 3 languages as de¿ned above can offer crucial evidence for a phonologically active tone register feature. Such evidence could not, of course, come from Type 1, 2 or 4 languages, which lack (synchronic) tone registers by de¿nition. In our experience, clear-cut examples of type 3 languages – “pure” tone register languages – are not easy to come by. Some alleged type 3 languages prove, on closer study, to be phonation register languages. In others, the proposed registers are historical and are no longer clearly separated at the synchronic level. Most East Asian languages remain poorly described at the phonetic level, so that the typological status of many cannot yet be determined. The small number of clear-cut type 3 languages may be due in part to insuf¿cient documentation, but it could also be due to the historical instability of this type of system, as suggested by Mazaudon 1988.7 The de¿ning properties of type 3 languages are the following: 1. 2. 3.
no voicing contrast in initials no phonation register distinctive high vs. low tone registers, as schematized below: melodic type 1 melodic type 2 melodic type 3 etc. high register Ta Ta Ta … low register Tb Tb Tb … In each column, Ta is realized with higher pitch than Tb (some tones may be unpaired).
As a candidate type 3 language we will examine Cantonese, a member of the Yue dialect group spoken in southern mainland China. This language is a prima facie example of a type 3 language as it has no voicing contrast in initial position, only marginal phonation effects at best, and a plausible organization into well-de¿ned tone registers. Our main source of data is Hashimoto-Yue 1972, except that following Chen 2000 and other sources, we adopt the standard tone values given in the Hanyu Fangyin Zihui, 2nd ed. (1989). There are several ways of pairing off Cantonese tones into registers in such a way as to satisfy the model of a type 3 tone language. The standard
Do we need tone features? 17
pairings, based on Middle Chinese (i.e. etymological) categories, are shown in (9). (9) high register low register
I
II
III
IVai
[53]~[55]
[35]
[44]
[5q]
[21]~[22]
[24]
[33]
IVaii
[4q] [3q]
The [53]~[55] variants are conditioned by individual and morphosyntactic variables (Hashimoto-Yue 1972: 178–180, who considers the high falling variant [53] as underlying). Of course, this particular set of pairings has no analytical priority over any other in a purely synchronic analysis. The implicit assumption is that these are the most likely to form the basis of synchronic constraints and alternations. These pairings (as well as the alternatives) satisfy our third criterion for a Type 3 language. However, we have been unable to ¿nd any phonetic studies that con¿rm the pitch values above, which are partly conventional. The crucial question for our purposes is whether or not Cantonese “activates” register distinctions in its phonology. That is, is there evidence for a feature such as [±high register] in Cantonese in the form of rules, alternations, etc.? Contrary to some statements in the literature, Cantonese has a rather rich system of tonal substitutions and tone sandhi, and two of these phenomena are particularly relevant to this question. Cantonese tonal phonology is well known for its system of “changed” tones. According to this system, some words, mostly nouns, are produced with the changed tones 35 or (less productively) 55, instead of their basic lexical tones. This shift is usually associated with an added component of meaning, such as ‘familiar’ or ‘opposite’ (Hashimoto-Yue 1972: 93–98). Some examples are shown in (10). (10)
replacement by 35: 儮yࡅ y:21 ĺ yࡅ y:35 ‘¿sh’ ᴢleƱ23 ĺ leƱ35 ‘plum’ ㎲ty:n22 ĺ ty:n35 ‘satin’ 㿜kܣƱ33 ĺ kܣƱ35 ‘trick’
replacement by 55: 䰓ྼA: 44 Ʊi: 21 ĺ A:44 Ʊi:55 ‘aunt’ 䭋tshœƾ21 ‘long’ ĺ tshœƾ55 ‘short’ 䘴yࡅ y:n23 ‘far’ ĺ yࡅ y:n55 ‘near’ 㸿sA:m53 ĺ sA:m55 ‘clothes’
A feature-based analysis of the changed tones is possible, but requires a complex analysis with otherwise unmotivated “housekeeping” rules (see Bao 1999: 121–127, for an example).
18
G.N. Clements, Alexis Michaud, Cédric Patin
A more interesting source of evidence for a register feature comes from a regular rule of tone sandhi which Hashimoto-Yue describes as follows (1972: 112): “a falling tone becomes a level tone if followed by another tone that begins at the same level, whether the latter is level or falling”. She states the following rules: (11) 53 ĺ 21 ĺ
55 / __ 53/55/5 22 / __ 21/22
Some examples follow: ᑨ䆹Ʊܼƾ53 kܧ:i 53 ĺ Ʊܼƾ55 kܧ:i 53 ‘should, must’ 咏⊍mA:21 Ʊܣǎ21 ĺ mA:22 Ʊܣǎ21 ‘sesame oil’ Let us consider the analysis of these alternations. A rather simple analysis is possible under the Two-Feature Model, if we allow Greek-letter variables or an equivalent formal device to express the identity of two feature values, as in (12): (12) register tier subregister tier
[α register] /\ h l Ļ h
[α register] / h…
This rule states that the low component of a falling tone shifts to high, provided it is followed by a tone beginning with a high component and that both tones belong to the same register. This analysis makes crucial use of both register features and subregister features, assigned to separate tiers. It correctly describes both cases. A notable aspect of this rule, however, is that it describes alternations among variants of the same tone. That is, as we saw in (11), [53] and [55] are variants of the same tone, as are [21] and [22]. The rules are therefore “subphonemic”, raising the question of whether they are phonological in the strict sense – that is, category-changing rules – or gradient phonetic rules. In the latter case, they would not constitute evidence for tone features, since features belong to the phonological level (see our introductory discussion). To make a clear case for a phonological alternation we would need a set of alternations between contrastive tones, such as [53] ~ [35] and [21] ~ [24]. Thus, in spite of the rather elegant analysis that can be obtained under the Two-Feature Model, these facts do not make a clear-cut case for features.
Do we need tone features? 19
We know of no other alternations that support a feature-based analysis of Cantonese tones. However, certain static constraints described by HashimotoYue (110–111) are most simply stated in terms of a low register feature, and possibly in terms of the level/contour distinction, if [53] and [21] are taken to be underlying8 (Roman numerals refer to the categories in (9)): − − −
unaspirated initial consonants do not occur in syllables with the lowregister I and II tones (“contour” tones?) aspirated (voiceless) initial consonants do not occur in syllables with the low-register III and IV tones (“level” tones?) zero-initial syllables do not occur with low-register tones
These constraints, which are clearly phonological, might be taken as evidence for a low-register feature. However, static constraints have never carried the same weight in feature analysis as patterns of alternation, the question being whether they are actually internalized as phonological rules by native speakers. We conclude that Cantonese does not offer a thoroughly convincing case for tone features. The interest of looking at these facts is that Cantonese represents one of the best candidates for a type 3 language that we have found. We have also surveyed the literature on tone features in other Asian languages. Up to now, we have found that arguments for tone features typically suffer from dif¿culties which make arguments for a register feature less than fully convincing: − −
evidence is often cited from what are actually Type 2 or 4 languages very many analyses do not satisfy the criteria for feature analysis outlined in (1)
One reason for these dif¿culties, in the Chinese domain at least, is the long history of phonetic evolution that has tended to destroy the original phonetic basis of the tone classes. This has frequently led to synchronically unintelligible tone systems. As Matthew Chen has put it, the “vast assortment of tonal alternations… defy classi¿cation and description let alone explanation. As one examines one Chinese dialect after another, one is left with the bafÀing impression of random and arbitrary substitution of one tone for another without any apparent articulatory, perceptual, or functional motivation” (Chen 2000, 81–82). The near-absence of simple, phonetically motivated processes which can be used to motivate tone features contrasts with the wealth of convincing
20
G.N. Clements, Alexis Michaud, Cédric Patin
crosslinguistic data justifying most segmental features. This may be the reason why most tonologists, whether traditionalist or autosegmentalist, have made little use of (universal) features in their analyses. As Moira Yip has tellingly observed, “Most work on tonal phonology skirts the issue of the features” (Yip 2007, 234). 6. Why is tone different? Why is it that tones do not lend themselves as readily to feature analysis as segments? We suggest that the answer may lie in the monodimensional nature of level tones: – –
segments are de¿ned along many intersecting phonetic parameters (voicing, nasality, etc.); such free combinability of multiple properties may be the condition sine qua non for a successful feature analysis tone levels (and combinations thereof) are de¿ned along a single parameter, F0; there is no acoustic (nor as yet, articulatory) evidence for intersecting phonetic dimensions in F0–based tone systems
The latter problem does not arise in phonation-tone register systems, in which phonation contrasts are often multidimensional involving several phonetic parameters (voicing, breathy voice, relative F0, vowel quality, etc.), and can usually be identi¿ed with independently-required segmental features. Given the monodimensional nature of level tones, it is dif¿cult to see how a universal tone feature analysis could “emerge” from exposure to the data. Unless “wired-in” by “Universal Grammar”, tone features must be based on observed patterns of alternation, which, as we have seen, are typically random and arbitrary across languages. In contrast, patterns based on segmental features, such as homorganic place assimilation, voicing assimilation, etc., frequently recur across languages (see Mielke 2008 for a description of recurrent patterns drawn from a database of 628 language varieties). 7.
Conclusion
We have argued that the primitive unit in tonal analysis may be the simple tone level, as is assumed in much description work. Tone levels can be directly
Do we need tone features? 21
interpreted in the phonetics, without the mediation of features (Laniran & Clements 2003). Tone levels are themselves grouped into scales. (The issue whether all tone systems can be analyzed in terms of levels and scales is left open here.) Although this paper has argued against universal tone features, it has not argued against language-particular tone features, which are motivated in some languages. We propose as a null hypothesis (for tones as for segments) that features are not assumed unless there is positive evidence for them. (For proposed language-particular features in Vietnamese, involving several phonetic dimensions, see Brunelle 2009.)
Acknowledgments Many thanks to Jean-Michel Roynard for editorial assistance.
Notes 1. Another theoretically important function, namely bounding (de¿ning the maximum number of contrasts), will not be discussed here. 2. Some linguists have maintained that features are innate in some (usually vaguely-de¿ned) sense. However, recurrence across languages does not entail innateness, which is an independent hypothesis; for example, some current work is exploring the view that features can be developed out of experience (Mielke 2008). This issue is peripheral to the questions dealt with in this paper and will not be discussed further here. 3. Yip 1980 originally proposed two binary features called [±upper register] and [±raised]. However, since the development of feature-geometric versions of this model (Bao 1999, Chen 2000, and others), these have tended to be replaced by H and L, or h and l. 4. In a broader sense of the term “phonology”, any rule, categorical or gradient, which is language-speci¿c might be regarded as phonological. This indeed was the view of Chomsky & Halle 1968, though it is less commonly adopted today. 5. The facts of Yala are summarized in Anderson 1978 and Clements 1983. 6. We are indebted to Larry Hyman for e-mail correspondence on this question. 7. Mazaudon’s Stage B languages correspond approximately to our type 3 languages. 8. However, we have not seen convincing evidence for taking either of the alternating tones [53]~[55] or [21]~[22] as basic.
22
G.N. Clements, Alexis Michaud, Cédric Patin
References Anderson, Stephen R. 1978 Tone features. In: Fromkin, Victoria A. (ed.) Tone: a linguistic survey. New York/San Francisco/London: Academic Press, 133–176. Armstrong, Robert G. 1968 Yala (Ikom): A terraced-level language with three tones. Journal of West African Languages 5, 49–58. Bao, Zhiming 1999 The Structure of Tone. New York/Oxford: Oxford University Press. Bergman, Richard 1971 Vowel sandhi and word division in Igede. Journal of West African Languages 8, 13–25. Bergman, Richard & Bergman, Nancy 1984 Igede. In: Bendor-Samuel, John (ed.) Ten Nigerian tone systems. Jos and Kano: Institute of Linguistics and Centre for the Study of Nigerian Languages, 43–50. Brunelle, Marc 2009 Tone perception in Northern and Southern Vietnamese. Journal of Phonetics 37, 79–96. Chen, Matthew Y. 2000 Tone sandhi: Patterns across Chinese dialects. Cambridge: Cambridge University Press. Chomsky, Noam & Halle, Morris 1968 The Sound Pattern of English. New York: Harper & Row. Clements, Nick 1977 Four tones from three: the extra-high tone in Anlo Ewe. In: Kotey, P.F.A. & Der-Houssikian, H. (eds.), Language and Linguistic Problems in Africa. Columbia (South Carolina): Hornbeam Press, 168–191. 1978 Tone and syntax in Ewe. In: Napoli, D.J. (ed.) Elements of Tone, Stress, and Intonation. Washington: Georgetown University Press, 21–99. 1983 The hierarchical representation of tone features, in: Dihoff, Ivan R. (ed.) Current Approaches to African Linguistics. Dordrecht: Foris, 145–176. Ferlus, Michel 1979 Formation des registres et mutations consonantiques dans les langues mon-khmer. Mon-Khmer Studies 8, 1–76. 1998 Les systèmes de tons dans les langues viet-muong. Diachronica 15, 1–27. Goldsmith, John 1976 Autosegmental phonology. Ph.D. diss., M.I.T., New York: Garland Publishing, 1980.
Do we need tone features? 23 Hashimoto-Yue, Anne O. 1972 Phonology of Cantonese. Cambridge: Cambridge University Press. Haudricourt, André-Georges 1954 De l’origine des tons en vietnamien. Journal Asiatique 242, 69–82. 1972 Two-way and three-way splitting of tonal systems in some Far Eastern languages (Translated by Christopher Court) In: Harris, Jimmy G. & Noss, Richard B. (eds.), Tai phonetics and phonology. Bangkok: Central Institute of English Language, Mahidol University, 58–86. Hyman, Larry M. (ed.) 1973 Consonant Types and Tone. Los Angeles: Department of Linguistics, University of Southern California. Hyman, Larry M. 1993 Register tones and tonal geometry. In: van der Hulst, Harry & Snider, K. (eds.), The Phonology of Tone: the Representation of Tonal Register. Berlin & New York: Mouton de Gruyter, 75–108. Jakobson, Roman, Fant, Gunnar & Halle, Morris 1952 Preliminaries to Speech Analysis. Cambridge, Massachusetts: MIT Acoustics Laboratory. Laniran, Yetunde O. & Clements, Nick 2003 Downstep and high raising: interacting factors in Yorùbá tone production. Journal of Phonetics 31, 203–250. Linguistics Centre of the Department of Chinese, Beijing University (࣫Ҁᄺ Ё䇁㿔᭛ᄺ㋏䇁㿔ᄺᬭⷨᅸ) (ed.) 1989 ∝䇁ᮍ䷇ᄫ∛ [Phonetic Dictionary of Chinese Dialects], second edition. Beijing: Wenzi Gaige Publishing House (᭛ᄫᬍ䴽ߎ⠜⼒). Mazaudon, Martine 1977 Tibeto-Burman tonogenetics. Linguistics of the Tibeto-Burman Area 3, 1–123. 1978 Consonantal mutation and tonal split in the Tamang subfamily of Tibeto-Burman. Kailash 6, 157–179. 1988 An historical argument against tone features. Proceedings of Congress of the Linguistic Society of America. New Orleans. Available online: http://hal.archives-ouvertes.fr/halshs-00364901/ McCarthy, John 1986 OCP Effects: Gemination and Antigemination. Linguistic Inquiry 17. Mielke, Jeff 2008 The Emergence of Distinctive Features. Oxford: Oxford University Press. Odden, David 1986 On the role of the Obligatory Contour Principle in phonological theory. Language 62, 353–383. 1995 Tone: African languages. In: Goldsmith, John (ed.) Handbook of Phonological Theory. Oxford: Blackwell.
24
G.N. Clements, Alexis Michaud, Cédric Patin
Pulleyblank, Edwin G. 1978 The nature of the Middle Chinese tones and their development to Early Mandarin. Journal of Chinese Linguistics 6, 173–203. Rialland, Annie 1981 Le système tonal du gurma (langue gur de Haute-Volta). Journal of African Languages and Linguistics 3, 39–64. Rose, Philip 1989 Phonetics and phonology of Yang tone phonation types in Zhenhai. Cahiers de linguistique – Asie Orientale 18, 229–245. 1990 Acoustics and phonology of complex tone sandhi: An analysis of disyllabic lexical tone sandhi in the Zhenhai variety of Wu Chinese. Phonetica 47, 1–35. Stahlke, Herbert 1977 Some problems with binary features for tone. International Journal of American Linguistics 43, 1–10. Tsay, Jane 1994 Phonological pitch. University of Arizona. Wang, William 1967 Phonological Features of Tones. International Journal of American Linguistics 33, 93–105. Welmers, William E. 1952 Notes on the structure of Bariba. Language 28, 82–103. 1973 African language structures. Berkeley: University of California Press. Yip, Moira 1990 The Tonal Phonology of Chinese. Garland Publishing, New York. Original edition. Cambridge, Massachusetts: Indiana University Linguistics Club. 2002 Tone. Cambridge, U.K.: Cambridge University Press. 2007 Tone. In: De Lacy, Paul (ed.) The Cambridge Handbook of Phonology. Cambridge: Cambridge University Press, 229–252.
Author’s’ af¿liations G.N. (Nick) Clements, LPP (CNRS/Université Paris 3) Alexis Michaud, LACITO (CNRS/Université Paris 3): [email protected] Cédric Patin, LLF (CNRS/Université Paris 7): [email protected]
Rhythm, quantity and tone in the Kinyarwanda verb John Goldsmith and Fidèle Mpiranya 1.
Introduction
In this paper, we discuss the some aspects of the tonology of the verbal inÀectional system in Kinyarwanda. There is a considerable amount of literature on tone in Kinyarwanda and in Kirundi (for example, Sibomana 1974, Coupez 1980, Mpiranya 1998, Kimenyi 2002), two languages which are so similar that the two can be considered dialects of a single language. We have bene¿ted from previous analyses of both languages, and especially from work done in collaboration with Firmard Sabimana (see Goldsmith and Sabimana 1985) and with Jeanine Ntihirageza, both linguists and native speakers of Kirundi. Nonetheless, the focus in the present paper is Kinyarwanda, which is the native language of one of the present authors (FM). We wish to emphasize that even restricting ourselves to the material discussed below, there are some differences between Kirundi and Kinyarwanda, and while the differences are small, they are signi¿cant. Despite the considerable work that exists already on the tone of the verbal system, a number of important questions – even basic ones – remain relatively obscure, and we hope that the present study will contribute to a better understanding of them. We plan to present a more comprehensive account of the tonology of the verbal system in the future. We use the following abbreviations: SM TM FOC OM FV inf B
Subject marker Tense marker Focus marker Object marker Final vowel In¿nitive marker Basic (underlying) tone
Our goal has been to develop a formal account of tone which is as similar as possible to the analysis of tone in the other Bantu languages that are reasonably closely related. But the fact is that despite our bias in this regard,
26
John Goldsmith and Fidèle Mpiranya
the analysis that we present here is quite different from what we expected, and from those proposed for nearby Bantu languages. In keeping with some earlier analyses, our account leans heavily on postulating metrical structure established from left to right, needed in order to account for the shifting and spreading of high tone. But the most surprising aspect of this analysis is that there is no general tonology of the verbal High tone as such: each High tone has a behavior that is directly tied to its morphological status or origin, and the shift of High tone occurs both towards a metrically Weak and a metrically Strong position, depending on the morphological status of the High tone in question, a fact that we did not expect, and that we were, in retrospect, biased against. We will begin by sketching the overall analysis in general terms, and we describe the conclusions which we have reached. The motivation and justi¿cation will be presented over the course of the paper, and indeed, our reasons for formulating the generalizations as we do may not be entirely clear until the data is seen in detail. 1.
The general structure of the Kinyarwanda verb is similar to that found in a range of familiar, and relatively closely related, Bantu tone languages; see Figure 1, where we present an schema of the Bantu verb – one that is incomplete, but suf¿ciently detailed for our present purposes.
macrostem stem
Ø
tu we
Ø
ra
ki class 7
mu class 1
bon see
er applicative
a unmarked mood
Extensions
Final Vowel
macrostem Negative Subject Tense Focus Marker Marker Marker Marker
Object Marker
Object Marker
Radical
Hroot domain Hpost domain tu-ra-kí-mú-bónera ‘we will see it for him’
Figure 1. Verbal structure tone windows
Rhythm, quantity and tone in the Kinyarwanda verb 27
2. 3.
4.
5.
6.
7.
Some morphemes have underlying tones and others do not. There is a High/Low tonal contrast among the verb roots, although there is no evidence that what we might call Low toned verb roots bear a Low tone as such; they are best analyzed as bearing no tone. Speaking of a High/Low tonal contrast is a matter of convenience. There is no lexical tonal contrast among the subject markers. In most environments, the Subject Marker (SM) appears on a low tone, the result of no tone associating with it. In a few environments, a High tone is associated with the Subject Marker. There is a suf¿xal high tone, a suf¿xal morpheme which we indicate as Hpost, that appears in certain morphological environments. When there are no Object Markers in the verb, the suf¿x Hpost appears on the second syllable of the stem, but when there are OM pre¿xes, it appears further to the left. For specialists in historical or comparative Bantu tone, this tone is especially interesting. Its behavior is quite different from the verbal suf¿x High tone, or tones, that we observe in closely related Bantu languages. In particular, it is common to ¿nd a High tone that appears on the mora that follows the ¿rst mora of the verb radical, and in those languages in which there is a tonal contrast among the verb radicals, this High tone typically appears when the verb radical is Low (or toneless). This tone, however, never appears shifted to a position earlier in the word, as far as we are aware. In addition, there is a distinct High tone that is associated with the Final Vowel in a number of verbal patterns, such as the subjunctive. This difference does not naturally carry through to the Kinyarwanda system, as far as we can see at the present time. There is a leftward shift of High tone in some cases that appears to be rhythmically motivated. If we group moras into groups of two from left to right, then it is natural to label one as strong and one as weak, even if the choice is a bit arbitrary. We label these feet as trochees (Strong-Weak). Hroot shifts leftward to a Strong position; Hpost shifts leftward to a Weak position: this is the conclusion that we mentioned just above that was surprising, and it will become clearer when we consider some speci¿c examples. Kinyarwanda is relatively conservative among the Bantu languages in maintaining a vowel length contrast, and it appears to us to be impossible to avoid speaking of moras in the analysis of the prosodic system. However, not all moras show the same behavior, and in some cases, the second mora of a long vowel acts differently than the mora of a short syllable in a weak position. That much is perhaps not
28
John Goldsmith and Fidèle Mpiranya
surprising: the ¿rst and the second mora in a bimoraic syllable may not have all the same privileges. But there are cases where a High tone that we might expect (based simply on counting moras, and distinguishing odd from even positions) to appear on the second mora of a long vowel will instead associate with the immediately following mora, which is to say, in the following syllable. We interpret this as an expression of quantitity-sensitivity in the accentual system: in particular, if the left-to-right assignment of metrical positions should encounter (so to speak) a long (i.e., two-mora) vowel in a Strong position, it treats the long vowel as comprising the strong position of the trochee, with the weak position falling in the subsequent syllable. One of the aspects of the verbal tone pattern that makes its analysis so dif¿cult is the fact that there are few generalizations that hold for High tones in general. Instead, we ¿nd that in order to make sense of the data, we must talk about several different High tones – these tones are different in the sense that what makes the tones different is their grammatical function, rather than their phonetic description. In brief, 1. 2.
3.
One of the High tones is the High tone that is part of a verb radical’s underlying form; Another High tone is a formal marker (some would say, a formal morpheme, if we allow ourselves to speak of morphemes that do not have a speci¿able sense or unique grammatical function) that appears typically to the right of the verb radical; A third High tone is part of the negative pre¿x nti- (although in the surface representation, that High tone is typically associated with a different syllable).
We will try to show that the principles that account for the appearance of each of these tones is different. We thus are not led to a set of rules which must be applied sequentially, as has often been the case in the analysis of other related Bantu languages. The analysis does not draw us towards an optimality theoretic analysis, either, because the complexities of the analysis involve morphological speci¿cations that appear to be inconsistent with a view that makes strong claims about the universality of phonological constraints. We list on the facing page the four distinct High tones, according to this analysis. One of the reasons for distinguishing these classes is that not only (as we have just said) the left to right position of each tone in the word is determined by different principles, but in addition, there is a sort of
Rhythm, quantity and tone in the Kinyarwanda verb 29
competition among the tones, in the sense that when the post-radical High Hpost is present, the radical’s High tone does not appear (it is deleted, in generative terminology); and when the nti’s High tone appears, neither the radical’s tone nor the post-radical High tone appears. However, there are pairs of High tones that can co-exist: the Tense Marker raa’s High tone, for example, can occur along with the radical High tone. (1)
Types of High tone name
Type
“normal” position
Hneg Hpost Hroot HTM
nti (negation) post-radical grammatical tone radical lexical tone tense marker ráa, záa
Syllable after nti syllable after radical ¿rst syllable of radical on TM
In Figure 1, we have given a schematic of the most important positions for morphemes in the Kinyarwanda verb. We have indicated towards the bottom the range of positions in which the Hroot tone can (or does) associate, and the range of positions for the Hpost . We are not quite certain as to whether these domains have a real status in the system, or whether the range of positions that we have indicated there is simply the logical consequence of the other rules and constraints posited in the grammar. There is one case below which suggests the former interpretation is correct, in connection with the tonal behavior of the inceptive (ráa) tense: viewing this tonal domain as having some linguistic reality would perhaps provide the best account for the placement of the radical High tone there. We draw the reader’s attention to the curious fact that while this analysis depends more heavily on tones’ morphological status than is found in analyses of related Bantu tone languages, the analysis is not thereby more concrete. That is, it is often the case that diachronic development leads a language from a situation in which a phonological effect is governed by phonological considerations only, to one, a little later, where the conditioning factor is not the phonological environment, but rather the speci¿c morphological identity of the neighboring morphemes – velar softening in English, for example. In such cases, the triggering environment is present, visible, and directly observable. In the present case, however, the High tones that we observe do not wear their categorization on their sleeves, so to speak: it requires an analytic leap to decide that a given High tone in a given word is marked as Hneg or Hpost. The most complex aspect of the tone system is the shifting and reassociations of these tones. To understand this, we must distinguish between
30
John Goldsmith and Fidèle Mpiranya
the placement of the radical High tone, and the post-radical High tone. Both of these tones shift to the left, and in their reassociation they remain within the macrostem (which is to say, they remain to the right of the Tense Marker). But they shift according to different principles. By macrostem, we mean the part of the verb that begins after the Tense Marker, consisting of all Object Markers and the verb stem as well. In addition, the macrostem includes the secondary pre¿xes which appear in much the same position as Object Markers do; this is depicted graphically in Figure 1. The radical High Hroot shifts to the beginning of the macrostem – that is, the ¿rst syllable of the macrostem. Actually, what we ¿nd on the surface suggests that it might be more appropriate to say that the radical High tone spreads to the ¿rst syllable of the macro-stem. Essentially what we ¿nd is this: the radical High tone appears on (i.e., is associated with) the ¿rst syllable of the macrostem, but in addition, we may ¿nd the tone spread further to the right, as far to the right as the radical itself – the only condition being that the entire span of Highs must be odd in number (which here means one or three). Such a condition seems to make more sense on the view that the radical High spreads to the left, and is then delinked in a right-to-left fashion to satisfy a “parity” condition, to which we will return below. This is illustrated in Figure 2. The post-radical High Hpost shifts leftward to a position within the macrostem which is an even-numbered position – but there are two slightly different principles that determine how we count. If the macrostem has 2 or more Object Markers, “Secondary” pre¿xes are counted in this, as they behave like Object Markers quite generally. Then counting begins with the beginning of the macrostem, otherwise, counting begins at the beginning of the word. The even-numbered positions are “strong” in the sense that they attract the post-radical High: Hpost shifts to the leftmost even-numbered position in the macrostem. It is dif¿cult to avoid the sense that the Hpost is a syncopated High tone – in music, the term syncopation refers to a prosodic impulse that is on an off-beat, or in present terms, a Weak metrical position. The shifting of association of this tone preserves this aspect of syncopation, and we suspect that this is an important fact. 1.1.
In¿nitive
We look ¿rst at the in¿nitive. Its negative is formed with the pre¿x -ta-, not nti-. In the tabular representation of the verbal tone pattern, we use “B” to indicate the basic or inherent tone of the verb radical.
Rhythm, quantity and tone in the Kinyarwanda verb 31 macrostem tu ra bon a H root macrostem tu ra mu bon a Hroot
tu ra
ki mu bon er a H root
tu ra
ki ha mu bon er a Hroot
Figure 2. Foot marking for Hroot association
(2)
Af¿rmative in¿nitive
Gloss
Low tone
ku rim a ku rer a ku rog a ku rut a ku geend a
to cultivate to raise (children) to poison to surpass to go
High tone
ku bón a ku búr a ku bík a ku báag a ku béer a
to see to lack to crow to butcher to suit
32
John Goldsmith and Fidèle Mpiranya
(Continued) Negative in¿nitive
Gloss
Low tone
ku tá rim á ku tá rer á ku tá rog á ku tá rut á ku tá geend á ku tá geend án a
not to cultivate not to raise (children) not to poison not to surpass not to go not to go with
High tone
ku tá bon á ku tá bur á ku tá bik á ku tá baag á ku tá beer á ku dá teek á ku dá teek ér a
not to see not to lack not to crow not to butcher not to suit not to cook not to cook for
(3) Basic tone assignment for each morphological pattern TM in¿nitive 2.
Hroot
Hpost
B
Present tense
2.1. Af¿rmative In the simple case where there are no Object Markers (OMs), the basic or lexical tone marking of the verb radical appears on the radical itself. However, one of the central issues in Kinyarwanda morphotonology is how to account for what appears to be a shifting of a High tone’s position, or association, from the radical when we compare the surface tone pattern of verbs with no Object Markers (OMs) and verbs with a single OM. As noted above, we propose that this leftward shift is best understood in terms of a rhythmic pattern which is established by creating binary feet from left to right from the beginning of the word: including the negative pre¿x nti- in the case of Kinyarwanda. We look ¿rst at the af¿rmative present tense form of the verb.
Rhythm, quantity and tone in the Kinyarwanda verb 33
It is clear that the High tone in these forms is the High tone of the verb radical, but it will be associated to a position to the left of the radical if there is such a position within the macrostem. Furthermore, this tone may appear associated with either one or three syllables: the maximum number possible if the tone’s association is not to move outside of its domain, de¿ned as the macrostem up to the radical. Consider ¿rst the behavior of verb radicals with a short vowel, given in (5), and next the behavior of verb radicals with a long vowel, given in (6). The long vowel stems do not behave differently in any important way in this tense. In Figure 2, we present the foot construction made on these verbs, and one can see that the Hroot always associates to the ¿rst (i.e., leftmost) Strong position within the macrostem, which is indicated with a dotted-line box. (4) Basic tone assignment for each morphological pattern TM in¿nitive present tense af¿rmative (focus) (5)
Hroot
Hpost
B B
Present tense af¿rmative short vowel Singular subject
Plural subject
Root: -rim- (Low tone: to cultivate) No OM
n da rim a u ra rim a a ra rim a
tu ra rim a mu ra rim a ba ra rim a
One OM ki (cl 7) n da ki rim a u ra ki rim a a ra ki rim a
tu ra ki rim a mu ra ki rim a ba ra ki rim a
Two OMs ki mu (cl 7, 1)
n da ki mu rim ir a u ra ki mu rim ir a a ra ki mu rim ir a
tu ra ki mu rim ir a mu ra ki mu rim ir a ba ra ki mu rim ir a
Three OMs ki ha mu (cl 7, 16, 1)
n da ki ha mu rim ir a u ra ki ha mu rim ir a a ra ki ha mu rim ir a
tu ra ki ha mu rim ir a mu ra ki ha mu rim ir a ba ra ki ha mu rim ir a
34
John Goldsmith and Fidèle Mpiranya
(Continued) Root: -bón- (High tone: to see) No object marker n da bón a u ra bón a a ra bón a
tu ra bón a mu ra bón a ba ra bón a
One OM mu (him/her)
n da mú bon a u ra mú bon a a ra mú bon a
tu ra mú bon a mu ra mú bon a ba ra mú bon a
Two OMs mu (him/her)
n da kí mú bón er a u ra kí mú bón er a a ra kí mú bón er a
tu ra kí mú bón er a mu ra kí mú bón er a ba ra kí mú bón er a
Three OMs mu (him/her)
n da kí há mú bon er a u ra kí há mú bon er a a ra kí há mú bon er a
tu ra kí há mú bon er a mu ra kí há mú bon er a ba ra kí há mú bon er a
(6)
Present tense af¿rmative long vowel Singular subject
Plural subject
Low tone: -geend- ( to go) No OM
n da geend a u ra geend a a ra geend a
tu ra geend a mu ra geend a ba ra geend a
One OM
n da ha geend a u ra ha geend a a ra ha geend a
tu ra ha geend a mu ra ha geend a ba ra ha geend a
High tone -téek- (High tone: to cook) No OM
n da téek a a ra téek a a ra téek a
tu ra téek a mu ra téek a ba ra téek a
One OM
n da gí teek a u ra gí teek a a ra gí teek a
tu ra gí teek a mu ra gí teek a ba ra gí teek a
Rhythm, quantity and tone in the Kinyarwanda verb 35
2.2.
Two OMs mu (him/her)
n da kí mú téek er a u ra kí mú téek er a a ra kí mú téek er a
tu ra kí mú téek er a mu ra kí mú téek er a ba ra kí mú téek er a
Three OMs mu (him/her)
n da kí há mú teek er a u ra kí há mú teek er a a ra kí há mú teek er a
tu ra kí há mú teek er a mu ra kí há mú teek er a ba ra kí há mú teek er a
Negative
When we turn to the negative form of the present tense, we see a different pattern of a shifting High tone. When there is no OM, the suf¿xal High tone appears on the second syllable of the stem (-er-, in the cases examined here). However, when there is a single OM, we see a complex set of data present when we look at long and short vowels in Kinyarwanda. In (8), we present these forms. (7) Basic tone assignment for each morphological pattern TM in¿nitive present tense af¿rmative (focus) present tense negative
Hroot B Ø
Hpost B H
(8) Present tense negative short vowel Tone neutralized
Singular subject
Plural subject
-bón- (to see) No OM
One OM
Two OMs, 3rd person
sii m bon ér a ntu u bon ér a nta a bon ér a sii n ki bón er a ntu u ki bón er a nta a ki bón er a nta a ki mú bon er a
nti tu bon ér a nti mu bon ér a nti ba bon ér a nti tu ki bón er a nti mu ki bón er a nti ba ki bón er a nti ba ki mú bon er a
36
John Goldsmith and Fidèle Mpiranya
(Continued) ReÀex OM, 3rd person Three OMs, 3rd person
nti y ii bón er a
nti b ii bón er a
nta a ha kí mu bon er a nti ba ha kí mu bon er a long vowel
Tone neutralized
Singular subject
Plural subject
-geend- (to go) (3rd person only) No OM One OM Two OMs ReÀex
nt aa teek ér a nt aa mu téek er a nt aa ki mú teek er a nti y ii téek er a
nti ba teek ér a nti ba mu téek er a nti ba ki mú teek er a nti b ii téek er a
In Figure 3, we observe the behavior of a tone that appears to shift leftward, though that is simply a metaphor derived from comparing different forms from the same inÀectional paradigm. The situation is more complex when the vowel in the verb radical is long. A generalization that merely counts odd- and even-numbered positions fails to generate the correct data, and the details of this are shown in Figure 4. 3. 3.1.
Inceptive: -ráaKinyarwanda
This tense only exists in the negative in Kinyarwanda; see (10) and Figure 6. The verb radical keeps its lexical tone, High or Low, but in the presence of OM pre¿xes, the radical’s lexical tone is pulled leftward: if there is one OM, the tone is maintained on the root, and if there are two OMs, the tone is placed on the second OM (counting, as ever, from left to right). If there are 3 OMs, the tone is placed on the second OM, just as it is in the two OM case, but we ¿nd spreading of the High tone from that second OM to the verb radical. We note that in all cases, the High tone moves to an evennumbered position, and furthermore, if the original position of the High tone
Rhythm, quantity and tone in the Kinyarwanda verb 37
macrostem nti tu bon er a
Hpost
nti tu ki bon er a
Hpost
nti tu ki mu bon er a
Hpost
nti tu ki ha mu bon er a
Hpost Figure 3. Rhythmic structure in negative present tense (short vowel)
38
John Goldsmith and Fidèle Mpiranya
Quantity-insensitive foot assignment (incorrect)
Quantity-sensitive foot assignment (correct)
nti tu teek er a Hinst
nti tu teek er a
Hpost
Hinst
Hpost
nti tu gi teek er a Hpost
nti tu ki mu teek er a Hpost
nti tu ki ha mu teek er a Hpost Figure 4. Rhythmic structure in negative present tense (long vowel -teek-)
Rhythm, quantity and tone in the Kinyarwanda verb 39
was in an even-numbered position (here, mora 6 of the word), it remains in place, and we ¿nd spreading from the 4th to the 6th position. If the High tone had been on an odd numbered position, the tone moves, rather than spreads, leftward to the 4th mora. In putting things this way, we have overlooked the fact that our description of quantity-sensitive rhythm-assignment does not correctly deal with the case with no OM, for both the short- and the long-vowel radicals, and we highlight this in Figure 5. Why do we ¿nd (b) in reality, and not (a)? This is clearly a radical High tone, not a Hpost , so according to the analysis presented here, it should associate with a strong position. Why does the ráa TM not take the ¿rst mora of the radical as the weak position in its foot? If we follow the analysis presented here, the TM ráa does just that, as it should, when there is one or more OM. So why do we ¿nd the situation as we do in Figure 5? The only answer we have is both partial and tentative: If the Hroot must associate with a strong position within the macrostem – and that is the heart of our present proposal – then the only such position is the one indicated in Figure 5, and it is to the right of the ¿rst mora of the verb stem. In no cases does a root’s High tone appear to the right of the ¿rst mora of the radical; that is what we indicated in Figure 1 above. If that generalization has some real status in the language, and the language uses that domain-based generalization to govern where the tone associates, then we have perhaps the basis of an account, or answer, to this question. expected? (a)
observed (b)
n t i b a r a´ a b o n e r a Hneg HTM Hroot
n t i b a r a´ a b o n e r a Hneg HTM
Hroot
Figure 5. raa with no OMs
(9) Basic tone assignment for each morphological pattern nti in¿nitive present tense af¿rmative (focus) present tense negative inceptive
TM
H H (ráa)
Hroot
Hpost
B B Ø
H
B
40
John Goldsmith and Fidèle Mpiranya
(a)
n t i b a r a´ a b o n a Hneg HTM Hroot
(b)
n t i b a r a´ a b i b o n a Hneg HTM
(c)
Hroot
n t i b a r a´ a b i h a b o n a Hneg
HTM
Hroot
macrostem
(d)
n t i b a r a´ a k i h a m u b o n e r a Hneg HTM
Hroot
Figure 6. Rhythmic structure in negative inceptive with High-toned radical
(10)
-ráaLow tone radical No OM 1 OM 2 OM 3OM
short vowel
long vowel
nti ba ráa rim a nti ba ráa ha rim a nti ba ráa bi ha rim a nti ba ráa bi ha mu rim er a
nti ba ráa geend a nti ba ráa ha geend a nti ba ráa bi ha geend an a nti ba ráa bi ha mu geend er a
Rhythm, quantity and tone in the Kinyarwanda verb 41
High tone radical No OM 1 OM 2 OM 3OM 4.
short vowel
long vowel
nti ba ráa bón a nti ba ráa ha bón a nti ba ráa bi há bon a nti ba ráa bi há mú bón er a
nti ba ráa téek a nti ba ráa ha téek a nti ba ráa bi há teek a nti ba ráa bi há mú téek er a
Future
The future is marked by TM záa/zaa in Kinyarwanda. The tone pattern behaves just like the parallel case of ráa. 4.1. Af¿rmative indicative future In Kinyarwanda, nothing special happens in the case of 2 OMs, other than the shift to even-numbered positions. Note that in the af¿rmative, all syllables are Low: (11) Future af¿rmative -zaa- or -záaLow tone radical No OM 1 OM 2 OM 3 OM
short vowel
long vowel
ba zaa rim a ba zaa ha rim a ba zaa bi ha rim a ba zaa bi ha mu rim ir a
ba zaa geend a ba zaa ha geend a ba zaa bi ha geend an a ba zaa bi ha mu geend an ir a
High tone radical No OM 1 OM 2 OM 3 OM
short vowel
long vowel
ba zaa bon a ba zaa ha bon a ba zaa bi ha bon a ba zaa bi ha mubon er a
ba zaa teek a ba zaa ha teek a ba zaa bi ha teek a ba zaa bi ha mu teek er a
42
John Goldsmith and Fidèle Mpiranya
4.2. Negative (12) Basic tone assignment for each morphological pattern nti in¿nitive present tense af¿rmative (focus) present tense negative inceptive negative future af¿rmative future negative (non-focused)
TM
H H
H (ráa)
H
H (záa)
Hroot B B Ø B Ø B
Hpost
H
(13) Future negative nti- + -záaLow tone radical No OM 1 OM 2 OM 3 OM
short vowel
long vowel
nti ba záa rim a nti ba záa ha rim a nti ba záa bi ha rim a nti ba záa bi ha mu rim ir a
nti ba záa geend a nti ba záa ha geend a nti ba záa bi ha geend an a nti ba záa bi ha mu geend an ir a
High tone radical No OM 1 OM 2 OM 3 OM
5. 5.1.
short vowel
long vowel
nti ba záa bón a nti ba záa ha bón a nti ba záa bi há bon a nti ba záa bi há mú bón er a
nti ba záa téek a nti ba záa ha téek a nti ba záa bi há teek a nti ba záa bi há mú téek er a
Far past Far past af¿rmative
In the af¿rmative, there is neutralization between radicals of High and Low tone; both have a High tone (or in the non-focused forms, no tone, i.e. low tone).
Rhythm, quantity and tone in the Kinyarwanda verb 43
(14) Far Past (non-focused) -áLow tone radical No OM 1 OM 2 OM 3 OM
short vowel
long vowel
ba á rim aga ba á ha rim aga ba á bi ha rim aga ba á bi ha mu rim ir aga
ba á geend aga ba á ha geend aga ba á bi ha geend an aga ba á bi ha mu geend an ir aga
High tone radical No OM 1 OM 2 OM 3 OM
short vowel
long vowel
ba á bon aga ba á ha bon aga ba á bi ha bon aga ba á bi ha mu bon er aga
ba á teek aga ba á ha teek aga ba á bi ha teek aga ba á bi ha mu teek er aga
(15) Far Past (focused) -áraLow tone radical No OM 1 OM 2 OM 3 OM
short vowel
long vowel
ba ára rím aga ba ára ha rímaga ba ára bi há rim aga ba ára bi há mu rim ir aga
ba ára géend aga ba ára ha géend aga ba ára bi há geend an aga ba ára bi há mu geend an ir aga
High tone radical No OM 1 OM 2 OM 3 OM 5.2.
short vowel
long vowel
ba ára bón aga ba ára ha bón aga ba ára bi há bon aga ba ára bi há mu bon er aga
ba ára téek aga ba ára ha téek aga ba ára bi há teek aga ba ára bi há mu teek er aga
Far past negative
This form is necessarily non-focused, and (we believe) this is why there is no High tone either on the radical (in the case of High toned verbs).
44
John Goldsmith and Fidèle Mpiranya
(16) Far Past (non-focused) -áNo OM 1 OM 2 OM 3 OM
short vowel
long vowel
nti ba á rim aga nti ba á ha rim aga nti ba á bi ha rim aga nti ba á bi ha mu rim ir aga
nti ba á geend aga nti ba á ha geend aga nti ba á bi ha geend an aga nti ba á bi ha mu geend an ir aga
High tone radical No OM 1 OM 2 OM 3 OM
short vowel
long vowel
nti ba á bon aga nti ba á ha bon aga nti ba á bi ha bon aga nti ba á bi ha mu bon er aga
nti ba á teek aga nti ba á ha teek aga nti ba á bi ha teek aga nti ba á bi ha mu teek er aga
The negative Far Past does not have a focused form. (17) Basic tone assignment for each morphological pattern nti in¿nitive present tense af¿rmative (focus) present tense negative inceptive negative future af¿rmative future negative (non-focus) far past af¿rmative (focus) far past negative 6. 6.1.
H H H
TM
H (ráa) H (záa) H (ára) H (á)
Hroot B B Ø B Ø B H Ø
Hpost
H
Recent past Recent past af¿rmative
The only difference with the Far Past here is that the TM is on a low tone.
Rhythm, quantity and tone in the Kinyarwanda verb 45
(18) Recent Past (non-focused) -aLow tone radical No OM 1 OM 2 OM 3 OM
short vowel
long vowel
ba a rim aga ba a ha rim aga ba a bi ha rim aga ba a bi ha mu rim ir aga
ba a geend aga ba a ha geend aga ba a bi ha geend an aga ba a bi ha mu geend an ir aga
High tone radical No OM 1 OM 2 OM 3 OM
short vowel
long vowel
ba a bon aga ba a ha bon aga ba a bi ha bon aga ba a bi ha mu bon er aga
ba a teek aga ba a ha teek aga ba a bi ha teek aga ba a bi ha mu teek er aga
In the following forms, we note a sequence of three adjacent moras in each case, but on the surface this is not distinct from other two-mora vowels. (19) Recent Past (focused) -aaLow tone radical No OM 1 OM 2 OM 3 OM
short vowel
long vowel
ba aa rim aga ba aa ha rim aga ba aa bi ha rim aga ba aa bi ha mu rim ir aga
ba aa geend aga ba aa ha geend aga ba aa bi ha geend an aga ba aa bi ha mu geend an ir aga
High tone radical
No OM 1 OM 2 OM 3 OM
short vowel
long vowel
ba aa bón aga ba aa ha bón aga ba aa bi há bon aga ba aa bi há mu bon er aga
ba aa téek aga ba aa ha téek aga ba aa bi há teek aga ba aa bi há mu teek er aga
46
John Goldsmith and Fidèle Mpiranya
6.2.
Recent past negative
(20) Recent Past (non-focused) -aLow tone radical No OM 1 OM 2 OM 3 OM
short vowel
long vowel
nti ba a rim aga nti ba a ha rim aga nti ba a bi ha rim aga nti ba a bi ha mu rim ir aga
nti ba a geend aga nti ba a ha geend aga nti ba a bi ha geend an aga nti ba a bi ha mu geend an ir aga
High tone radical
No OM 1 OM 2 OM 3 OM
short vowel
long vowel
nti ba a bon aga nti ba a ha bon aga nti ba a bi ha bon aga nti ba a bi ha mu bon er aga
nti ba a teek aga nti ba a ha teek aga nti ba a biha teek aga nti ba a bi ha mu teek er aga
(21) Basic tone assignment for each morphological pattern nti in¿nitive present tense af¿rmative (focus) present tense negative inceptive negative future af¿rmative future negative (non-focus) far past af¿rmative (focus) far past negative recent past af¿rmative (focus) recent past negative (non-focus) 7.
H H H
TM
H (ráa) H (záa) H (ára) H (á) H (áa)
Hroot B B Ø B Ø B H Ø B Ø
Hpost
H
Subjunctive
In the af¿rmative subjunctive, we see the interaction between two generalizations: ¿rst, the special placement of a High on the second of two (or more) OMs, and second, the placement of the suf¿xal tone on an even-
Rhythm, quantity and tone in the Kinyarwanda verb 47
numbered mora, counting from the beginning of the word (as long as there are at least 4 moras to the word). However, this account does not yet cover all the data, as we illustrate in Figure 7. In the negative subjunctive, we see a situation in which the SM (subject marker) is associated with a High tone, which we analyze functionally as part of the negative nti-pre¿x. (22) Subjunctive af¿rmative Kinyarwanda short vowel No OM 1 OM 2 OM 3 OM
long vowel
ba rim é ba ha rim é ba bi há rim e ba bi há mu rim ir e
ba geend é ba ha geénd e ba bi há geend an e ba bi há mu geend an ir e
(23) Subjunctive negative Kinyarwanda No OM 1 OM 2 OM 3 OM 8.
short vowel
long vowel
nti bá rim e nti bá ha rim e nti bá bi ha rim e nti bá bi ha mu rim ir e
nti bá geend e nti bá ha geend e nti bá bi ha geend an e nti bá bi ha mu geend an ir e
Conclusion
We have only begun to deal with the complexities of tone assignment to the Kinyarwanda verb in this paper, but we hope that the material that we have presented is at the very least suggestive of how rhythmic structure may interact with tone association in Kinyarwanda, and by implication in Kirundi and perhaps in some other Lacustrine Bantu languages.
48
John Goldsmith and Fidèle Mpiranya
Predicted but wrong:
Correct:
ba rim e
ba rim e Hpost
H post
Predicted and correct:
ba ha rim e Hpost Predicted but wrong:
Correct: ? ba bi ha rim e
ba bi ha rim e Hpost
Hpost Correct:
Predicted but wrong:
? ba bi ha mu rim ir e
ba bi ha mu rim ir e Hpost
Figure 7. Rhythmic structure in af¿rmative subjunctive
Hpost
Rhythm, quantity and tone in the Kinyarwanda verb 49
References Coupez, André 1980 Abrégé de grammaire rwanda. Butare: Institut national de recherche scienti¿que. Goldsmith, John and Firmard Sabimana 1985 The Kirundi verb. In: Francis Jouannet (ed.), Modèles en Tonologie (Kirundi et Kinyarwanda), 19–62. Paris: Editions du centre national de la recherche scienti¿que. Kimenyi, Alexandre 2002 A Tonal Grammar of Kinyarwanda: Autosegmental and Metrical Analysis by Alexandre Kimenyi. Lewiston: Edwin Mellen. Mpiranya, Fidèle 1998 Perspective fonctionnelle en linguistique comparée les langues bantu. Lyon: CEL. Sibomana, Leonidas 1974 Deskriptive Tonologie des Kinyarwanda. Hamburg: Helmut Buske Verlag.
Do tones have features? Larry M. Hyman 1.
Introduction: Three questions about tone
In this paper I address the question of whether tones have features. Given that most phonologists accept either binary features or privative elements in their analyses of segmental systems, it may appear surprising that such a question needs to be asked at all. However, as I discuss in Hyman (in press) and below, tone has certain properties that appear to be unique within phonological systems. Hence, it could also be that featural analyses of tones are not necessary, even if they are well-founded in consonant and vowel phonology. Before considering whether tones have features, there are two prior questions about tone which will bear on my conclusion: (1) Question #1: Why isn’t tone universal? Question #2: Is tone different? Question #3: Do tones have features? The ¿rst question is motivated by the fact that all languages exploit pitch in one way or another, so why not lexical or grammatical tone? It is generally assumed that somewhere around 40–50% of the world’s currently spoken languages are tonal, although the distribution is highly areal, covering most of Subsaharan Africa and East and Southeast Asia, as well as signi¿cant parts of Mexico, the Northwest Amazon, and New Guinea. There would seem to be several advantages for universal tone: First, tone presents few, if any articulatory dif¿culties vs. consonants (which all languages have). Second, tone is acoustically (hence perceptually?) simple, F0, vs. consonants and vowels. Third, tone is acquired early (Li and Thompson 1978, Demuth 2003), such that nativists may even want to claim that human infants are prewired for it. Thus, if all of the languages of the world had tone, we would have no problem “explaining” why this is. The more interesting question, to which I will return in §5, is why tone isn’t universal. The second question is whether tone is different. In Hyman (in press) I suggested that tone is like segmental phonology in every way – only “more so”, in two different senses: (i) Quantitatively more so: tone does certain
Do tones have features? 51
things more frequently, to a greater extent, or more obviously (i.e. in a more straightforward fashion) than segmental phonology; (ii) Qualitatively more so: tone can do everything segments and non-tonal prosodies can do, but segments and non-tonal prosodies cannot do everything tone can do. This “more so” property contrasts with the articulatory and perceptual simplicity referred to in the previous paragraph. As Myers and Tsay (2003: 105–6) put it, “...tonal phenomena have the advantages of being both phonologically quite intricate and yet phonetically relatively straightforward (i.e. involving primarily a single perceptual dimension, although laryngeal physiology is admittedly more complex).” There is so much more you can do with tone. For example, as seen in the Giryama [Kenya] forms in (2), the tones of one word may be realized quite distantly on another (Philippson 1998: 321): (2) a. ku-tsol-a b. ku-on-a =| H
ki-revu ki-révu
‘to choose a beard’ ‘to see a beard’
/-tsol-/ ‘choose’ /-ón-/ ‘see’
In (2a) all of the TBUs are toneless, pronounced with L(ow) tone by default. In (2b), the H(igh) of the verb root /-ón-/ ‘see’ shifts long distance to the penult of the following word, which then ends with a H-L sequence. Put simply, segmental features and stress can’t do this. They are typically word-bounded or interact only locally at the juncture of words. Thus, no language has been known to transfer the nasality of a vowel to the penult of the following word. Similarly, one word does not normally assign stress to the next. While tone is capable of a rich lexical life as well, it has an equal potential at the phrase level, where the local and long-distant interaction of tones can produce a high degree of opacity (differences between inputs and outputs) and analytic open-endedness. In short, tone can do everything that segmental and accentual phonology can do, but the reverse is not true. Some of this may be due to the fact that tone systems can be extremely paradigmatic or syntagmatic, exclusively lexical or grammatical. Thus consider the eight tone patterns of Iau [Indonesia: Papua] in (3). (3) Tone Nouns H bé ‘father-inlaw’ M bƝ ‘¿re’
Verbs bá ‘came’ bƗ
‘has come’
totality of action punctual resultative durative
52
Larry M. Hyman
H↑H
bé ↑´ ‘snake’
bá ↑´
‘might come’
totality of action incompletive resultative punctual telic punctual
LM
bè թ
‘path’
bà թ
‘came to get’
HL
bê
‘thorn’
bâ
HM
bé թ
‘Àower’
bá թ
ML
bƝ`
‘small eel’ bƗ`
HLM bê թ
‘tree fern’ bâ թ
‘came to end point’ ‘still not at telic incompletive endpoint’ ‘come (process)’ totality of action durative ‘sticking, telic durative attached to’
As seen on the above monosyllables (where↑´ = super-high tone), the same eight tones contrast paradigmatically on both word classes, although with a lexical function on nouns vs. a grammatical function on verbs (Bateman 1990: 35–36). Compare this with the representative ¿nal vs. penultimate H tone in the Chimwiini [Somalia] paradigm in (4): (4) singular n-ji:lé ‘I ate’ ji:lé ‘you sg. ate’ jí:le ‘s/he ate’
plural chi-ji:lé ni-ji:lé wa-jí:le
‘we ate’ ‘you pl. ate’ ‘they ate’
The properties of Chimwiini are as follows (Kisseberth (2009): (i) there is grammatical tone only, i.e. no tonal contrasts on lexical morphemes such as noun stems or verb roots; (ii) H tone is limited to the last two moras; (iii) ¿nal H is morphologically conditioned, while penultimate H is the default; (iv) ¿rst and second person subjects condition ¿nal H vs. third person which takes the default penultimate H. As seen, the only difference between the second and third person singular [noun class 1] is tonal: ji:lé vs. jí:le. However, as seen now in (5), the ¿nal or penultimate H tone is a property of the phonological phrase: (5) a. jile: n֍ amá ‘you sg. ate meat’ jile ma-tu:ndá ‘you sg. ate fruit’ b. jile: n֍ áma ‘s/he ate meat’ jile ma-tú:nda ‘s/he ate fruit’ In fact, when there is wide focus, as in (6), each phonological phrase gets the appropriate ¿nal vs. penultimate H tone:
Do tones have features? 53
(6) a. Ø-wa-t֍ ind֍ il֍ il֍ e w-aaná ] n֍ amá ] ka: chi-sú ] b. Ø-wa-t֍ ind֍ il֍ il֍ e w-áana ] n֍ áma ] ka: chí-su ]
‘you sg. cut for the children meat with a knife’ ‘s/he cut for the children meat with a knife’
Although phrasally realized, the Chimwiini ¿nal vs. penultimate patterns reÀect an original tonal difference on the subject pre¿xes. Thus, compare the following from the Cahi dialect of Kirimi (where ↓ = downstep): (7) a. /҂-k҂ -túng-a/→ b. / ҂´ -k҂-túng-a/→
҂-k҂-túng-á ҂´ -k ҂´ -↓túng-á H
‘s/he is tying’ ‘you sg. are tying’
H
As seen, the second person subject pre¿x has a H tone, while the segmentally homophonous [noun class 1] third person singular subject pre¿x is toneless. This suggests the following implementation of the Chimwiini facts: (i) ¿rst and second person subject markers have an underlying /H/ tone; (ii) this H tone links to the last syllable of the phonological phrase; (iii) any phonological phrase lacking a H tone receives one on its penult. While tone is dense and paradigmatic in Iau, it is sparse and syntagmatic in Chimwiini – so much so that the question even arises as to what the ¿nal vs. penultimate H tone contrast is: (8) a. morphology? b. phonology? c. syntax? d. intonation?
(a property of [+1st pers.] and [+2nd pers.] subject pre¿xes); (property of the phonological phrase – H is semi-demarcative) (property of the syntactic con¿gurations which de¿ne the P-phrases) (not likely that there would be a ¿rst/second person intonation)
Note also that since the ¿nal H tone targets the end of a phonological phrase, it is not like phrasal morphology, e.g. English -’s, which is restricted to the right edge of a syntactic noun phrase. Again, tone is different: there does not seem to be a segmental or metrical equivalent.
54
Larry M. Hyman
This, then, brings us to the third question: Do tones have features? If yes, are they universal “in the sense that all languages de¿ne their speech sounds in terms of a small feature set” (http://nickclements.free.fr/featuretheory. html)? If no, how do we talk about different tone height and contours and their laryngeal interactions? As Yip puts it: “A satisfactory feature system for tone must meet the familiar criteria of characterizing all and only the contrasts of natural language, the appropriate natural classes, and allowing for a natural statement of phonological rules and historical change. In looking at East Asian tone systems the main issues are these: (a) How many different tone levels must be represented? (b) Are contour tones single units or sequences of level tones? (c) What is the relationship between tonal features and other features, especially laryngeal features?” (Yip 1995: 477; cf. Yip 2002: 40–41)
These and other issues will be addressed in subsequent sections. In §2 I will outline the issues involved in responding to this question. In the following two sections we will look at whether features can capture tonal alternations which arise in multiple tone-height systems, ¿rst concerning tonal morphology (§3) and second concerning abstract tonal phonology (§4). The conclusion in §5 is that although tone features may be occasionally useful, they are not essential. I end by suggesting that the existence of tone features is not compelling because of their greater autonomy and unreliable intersection with each other and other features. This explains as well why tone is different and not universal. 2.
Do tones have features?
In addressing the above question, the central issue of this paper, it should ¿rst be noted that there has been no shortage of proposals of tone features and tonal geometry. (See Anderson 1978, Bao 1999, Snider 1999, and Chen 2000: 96 for tone-feature catalogs.) However, there has been little agreement other than: (i) we would like to avoid features like [RISING] and [FALLING]; (ii) we ought in principle to distinguish natural classes of tones by features; (iii) we ought in principle to be able to capture the relation of tones to laryngeal features, e.g. voicing, breathiness, creakiness. However, at the same time, there has been a partial “disconnect” between tone features and tonal analysis: Tone features are barely mentioned, if at all, in most theoretical and descriptive treatments of tone. Tone features are, of course, mentioned in a textbook on tone, but read on:
Do tones have features? 55 “Although I have left unresolved many of the complex issues bearing on the choice of a feature system, in much of the rest of this book, it will not be necessary to look closely at the features of tone. Instead we will use just H, M, L, or tone integers, unless extra insights are to be gained by formulating the analysis in featural terms.” (Yip 2002: 64)
In actual practice, unless a researcher is speci¿cally working on tone features, s/he is likely to avoid them. Thus compare two recent books on Chinese tonology, Bao (1999) vs. Chen 2000). Bao is speci¿cally interested in developing a model of tonal geometry and tone features, which thus pervade the book. Chen, on the other hand, is interested in a typology of tone sandhi rules and how they apply, hence almost totally avoids features, using Hs and Ls instead. Since tone and vowel height are both phonetically scalar, it is not surprising that similar problems arise in feature analyses. For example, the respective coalescence of /a+i/ and /a+u/ to [e] and [o] is hard to describe if /a/ is [+low] and /i/ and /u/ are [+high], since the desired output is [–high, –low]. Similarly, the coalescence of a HL or LH contour to [M] is hard to describe if H = [+STIFF] and L = [+SLACK], since the desired output is [–STIFF, –SLACK]. Scalar chain shifts such as i → e → ܭand H → M → L are notorious problems for any binary system. Still, phonologists do not hesitate to use binary height features for vowels, but often not for tones. The problem of tone features is largely ignored in two-height systems, where there is little advantage to using, say, [±UPPER] over H and L. Instead, the issue concerns the nature of the H/L contrast, which can be privative and/ or binary, as in (9). (9) a. /H, L/
e.g. Baule, Bole, Mende, Nara, Falam, Kuki-Thaadow, Siane, Sko, Tanacross, Barasana b. /H, Ø/ e.g. Afar, Chichewa, Kirundi, Ekoti, Kiwai, Tinputz, Una, Blackfoot, Navajo, Seneca c. /L, Ø/ e.g. Malinke (Kita), Ruund, E. Cham, Galo, Kham, Dogrib, Tahltan, Bora, Miraña d. /H, L, Ø/ e.g. Ga, Kinande, Margi, Sukuma, Tiriki, Munduruku, Puinave, Yagua
Another variant is to analyze level tones as /H/ vs. /Ø/, but contour tones as /HL/ and /LH/, as in Puinave: “L-tones are considered phonetic entities, which are therefore not speci¿ed lexically, except for the L-tones that are part of the contrastive contour tones” (Girón Higuita and Wetzels 2007).
56
Larry M. Hyman
Assuming the possibility of underspeci¿cation, similar analytical possibilities occur in three-height tone systems, as in (10). (10) a. /H, M, L/
b. /H, Ø, L/ /Ø, M, L/ /H, M, Ø/
c. /H, M, L, Ø/
Beyond the above possibilities is the fact that in some systems M is a distinct third tone equally related to H and L, while in others M may be asymmetrically related to one of the tones. This produces output possibilities such as the following, where the ↑ and ↓ arrows represent raising and lowering, respectively: (11) a. /H, M, L/ b. /↑H, H, L/ c. /H, ↓H, L/ d. /H, ↑L, L/ e. /H, L, ↓L/
M is equally related to /H/ and /L/ M is a non-raised variant of /H/ M is a lowered variant of /H/ M is a raised variant of /L/ M is a non-lowered variant of /L/
e.g. Tangkhul Naga (pers. notes) e.g. Engenni (Thomas 1978) e.g. Kom (Hyman 2005) e.g. Kpelle (Welmers 1962) e.g. Ewe (Smith 1973, Stahlke 1971, Clements 1978)
While some languages have three underlying contrastive tone heights (11a), others derive the third height by the indicated process in (11b-e). As indicated in (12), both Kom and Ik have two underlying, but three surface tone heights: (12) a. Kom /H, L/ L-H → L-M (→ M) b. Ik /H, L/ L-H → M-H (→ M)
(Hyman 2005) (Heine 1993)
Whereas Kom regularly lowers a H to M after L, Ik raises a L to M before H. Since the triggering tone may be lost, the M becomes surface-contrastive in both languages. Finally, M may derive from the simpli¿cation of a HL or LH contour tone, e.g. Babanki L-Hࢎ L-H → L-M-H (Hyman 1979a: 23). The above possibilities arise independent of whether the raising or lowering process creates only one additional pitch level (as in the cited languages) or whether there can be multiple upsteps and downsteps. The above all assumes that tone features de¿ne pitch levels rather than pitch changes. In a pitch-
Do tones have features? 57
change system such as Clark’s (1978), the H, M and L tone heights could be represented as /↑/, /Ø/ and /↓/. In principle, even more interpretations should be possible in systems with four or ¿ve surface-contrasting tone heights. Some such systems can be shown to derive from three (or even two) underlying tones, e.g. Ngamambo, whose four heights H, M, ↓M, L can be derived from /H/ and /L/ (Hyman 1986a). While it is sometimes possible to argue that the four (~ ¿ve) tone heights form “natural classes” (see below), equally common are cases such as in (13) where such evidence is weak or lacking: (13) a. Five levels: Kam (Shidong) [China] (Edmondson and Gregerson 1992) (5=highest, L=lowest) ݚa11 ݚa22 ݚa33 ݚa44 ݚa55 ‘thorn’ ‘eggplant’ ‘father’ ‘step over’ ‘cut down’ b. Four level + ¿ve contour tones in Itunyoso Trique [Mexico] (Dicanio 2008) Level Falling Rising ȕȕe4 ‘hair’ li43 ‘small’ yãh45 ‘wax’ nne3 ‘plough (n.)’ nne32 ‘water’ yah13 ‘dust’ nne2 ‘to tell lie’ nne31 ‘meat’ 1 nne ‘naked’ Where multiple contrasting tone heights join into natural classes the assumption is that they share a feature. For this purpose numerous tone-feature proposals have appeared in the literature, among which those in the following table, based on Chen (2000: 96), where 5 = the highest and 1 = the lowest pitch: (14)
5 (=H) a. Halle and STIFF + Stevens (1971) SLACK – b. Yip (1980) UPPER + HIGH + ROW 1 h c. Clements (1983) ROW 2 h d. Bao (1999) STIFF + SLACK – # < 545 lgs. with n tone heights: 12
4
+ – h l + + 26
3 (=M) – –
140
2
– + l h – – 367
1 (=L) – + – – l l – +
58
Larry M. Hyman
As seen in the top row, linguists often identify the tone heights with integers, as it is not even clear what to call the tones. Thus, in a four-height system, the middle two tones are sometimes called “raised mid” and “mid”, sometimes “mid” and “lowered mid”. There also is no agreement on which accents to use to indicate these two tones: While, [Ɨ] unambigously indicates M tone in a three-height systems, in a four-height system it sometimes indicates the lower of the two M tones, sometimes the higher. The numbers in the bottom line of (14) indicate how many tone systems I have catalogued out of 545 with ¿ve, four, three and two underlying tone heights. As seen, systems with more than three heights are relatively rare as compared with two- and three-height systems. For the purpose of discussion let us assume the following feature system, with Pulleyblank’s (1986: 125) replacement of Yip’s HIGH with RAISED: (15) Yip/Pulleyblank tone feature system (M = a “lower-mid” tone) UPPER RAISED
H + + 4
M + – 3
M – + 2
L – – 1
The natural classes captured by such a system are the following: (16)
[+UPPER] H, M 4, 3
[–UPPER] M, L 2, 1
[+RAISED] H, M 4, 2
[–RAISED] M, L 3, 1
The interesting groupings are those captured by [±RAISED], since the tone heights 4,2 and 3,1 are not contiguous. While such pairings are sometimes observed (see Gban in §3), there are problems inherent in this and the other feature proposals in (14): (17) a. 5–height systems: b. 4–height systems: c. 3–height systems:
no way to characterize a ¿fth contrasting tone height no way to characterize the inner two tone heights (3,2) as a natural class potential ambiguity between two kinds of mid tones (3 vs. 2)
Prior to the establishment of the feature system in (15), when features such as [HIGH] and [LOW] were in currency, the general response to the problem
Do tones have features? 59
in (17a) was to propose a third feature such as MID (Wang 1967), to expand the inventory in the mid range, or EXTREME (Maddieson 1971) which, expanding the inventory at the top and bottom, has the dubious property of grouping 1,5 as a natural class. Concerning the problem in (17b), either a [–EXTREME] speci¿cation, like [+MID], could group together the 3,2 tones in a four-height system. However, such features have not gained currency and appear almost as ad hoc as [αUPPER, -αRAISED]). Given that there are only ¿ve contrasting levels, the argument for three binary tone features is considerably weakened if there is no principled way to pare the eight logical feature combinations down to ¿ve height values. Of course, there is always the possibility that the same tone height might have different feature values in different tone systems, which brings us to the problem in (17c): The M tone in a three-height system can be either [+UPPER, –RAISED] or [–UPPER, +RAISED], an issue which is taken up in §3 and §4 below. All of these problems raise the question of how abstract the tonal representations should be allowed to be: A scalar pitch system with 2, 3, 4 or 5 values would be much more concrete, hence arguably the more natural solution were it not for the general acceptance of binary features or privative “elements” in segmental phonology and elsewhere, e.g. in morphology (Corbett and Baerman 2006). In the following two sections we will take a close look at how the features in (15) fare in the analysis of selected three-height tone systems. §3 is concerned with tonal morphology and §4 with “abstract” tonal phonology. Both involve the potential featural ambiguity of phonetically identical M tones as [+UPPER, –RAISED] and [–UPPER, +RAISED], even in the same language. Although Bao (1999: 186) sees the dual representation of M as a virtue of the theory, we shall see that such tone features do not always yield a revealing account of M tone properties. 3. Tonal morphology and M tone In this section we will examine how the tone features in (15) account for tonal morphology. Focus will be on tonal marking on verbs. One argument for tone features would be that they can function independently as tonal morphemes, e.g. marking the inÀectional features of tense, aspect, mood, polarity, person and number. We begin with two four-level tone systems whose inÀectional tones tell two quite different stories. The ¿rst is Iau, whose eight tone patterns in (3) were seen to be lexical on nouns, but morphologically determined on verbs, as in (18).
60
Larry M. Hyman
(18) punctual durative incompletive
telic totality of action resultative HL H LM HLM ML M ↑ HM HH
Although Iau verbs lend themselves to a paradigmatic display by morpheme features, the portmanteau tonal melodies do not appear to be further segmentable into single tones or features. A quite different situation is found in the subject pronoun tones in Gban [Ivory Coast], as reported by Zheltov (2005: 24): (19) 1st pers. 2nd pers. 3rd pers.
present sg. pl. ˜ܼ 2 u2 ܭܭ2 aa2 ܭ1 ܧ1 [–raised]
past sg. pl. ˜ܼ 4 u4 [+upper] ܭܭ4 aa4 ܭ3 ܧ3 [–upper] [+raised]
In the present tense, third person subject pronouns are marked by a 1 tone (=lowest), while ¿rst and second person pronouns have a 2 tone. In the past tense, each tone is two levels higher: third persons receive 3 tone, while ¿rst and second persons have 4 tone. In this case tone features work like a charm: As indicated, ¿rst/second persons can be assumed to be marked by [+UPPER] and third person by [–UPPER]. These pronouns receive a [–RAISED] speci¿cation in the present tense vs. a [+RAISED] speci¿cation in the past tense. (The same result would be achieved if we were to reverse [UPPER] and [RAISED] to mark tense and person, respectively.) It is cases like Gban which motivate Yip’s (1980) original proposal, based on tonal bifurcation in East and Southeast Asia: If [±UPPER] represents the original tonal opposition, often attributable to a laryngeal distinction in syllable ¿nals, [±RAISED] can potentially modify the original contrast and provide the four-way opposition (which does not always produce four tone levels in the Asian cases). As (19) demonstrates, the same historical development has produced a four-height system whose natural classes include 1,2 (present tense), 3,4 (past tense), 1,3 (¿rst and second person) and 2,4 (third person). Although Gban is a Mande language, similar four-level systems are found in other subgroups of Niger-Congo, e.g. in Igede [Nigeria; Benue-Congo] (Stahlke 1977: 5) and Wobe [Liberia; Kru] (Singler 1984).
Do tones have features? 61
Given the neatness of the Gban example, let us now consider how the features [UPPER] and [RAISED] function as tonal morphemes in three-height systems. A number of languages have the tonal properties in (20). (20) a. noun stems contrast /H/, /M/ and /L/ lexically b. verb roots contrast only two levels lexically – but are realized with all three levels when inÀectional features are spelled out Again, it is the assignment of verb tones which is of interest. The relevant tone systems fall into two types, which are discussed in the following two subsections. 3.1. Type I: H/M vs. M/L verb tones In the ¿rst, represented by Day [Chad] (Nougayrol 1979), the two verb classes have the higher/lower variants H/M vs. M/L: (21) a.
/yuu/ [+u] ‘put on, wear’ yúú ynjnj
/yuu/ [–u] ‘drink’ ynju yùù
completive incompletive L–
/yuu, H/ ‘put on, wear’ yúú ynjnj
/yuu, M/ ‘drink’ ynju yùù
completive incompletive [–1]
/yuu/ [+2] ‘put on, wear’ yúú [+2] ynjnj[+1]
completive incompletive
[+r] [–r]
b.
c.
/yuu/ [+1] ‘drink’ ynjnj [+1] yùù [Ø]
In (21a) the lexical contrast is assumed to be [±UPPER], while the (in) completive aspect assigns [±RAISED]. This produces a situation where both [+UPPER, –RAISED] and [–UPPER, +RAISED] de¿ne phonetically identical M tones. The question is how one might account for the above facts without features. (21b) posits a lexical contrast between /H/ and /M/. The completive aspect is unmarked, while the incompletive aspect has a /L/ pre¿x which combines
62
Larry M. Hyman
with the lexical tone of the verb. The resulting LH and LM contours would then have to simplify to M and L, respectively. Since contours are rare in the language (Nougayrol 1979: 68), this is not problematic. A corresponding scalar solution is sketched in (21c), where it is assumed that the /H/ and /M/ verbs have values of [+2] and [+1], respectively. As seen, completive aspect is unmarked, while incompletive aspect contributes a value of [–1]. When the integers combine, there are again two sources of [+1] M tone and one source each of [+2] H and [Ø] L tone. While all three analyses capture the limited data in (21), the question is how they fare when the verb is bi- or trisyllabic. The regular tone patterns are schematized in (22). (22) completive incompletive
σ M H HL H-M L M ML M-M
σ−σ σ−σ−σ H-L M-M M-L H-H-L M-L L-M L-ML M-M-L
As seen, bi-syllabic verbs must end M or L. (The ¿nal contour of L-ML will be discussed shortly.) The one regular trisyllabic pattern shows that it is only the last syllable that is affected, with inÀectional [±RAISED] targeting the H-H ~ M-M on the ¿rst two syllables. Let us, therefore, add to the analysis in (21a) that the ¿nal syllable is [–UPPER] and contrastively prespeci¿ed for [±RAISED]. This produces the feature speci¿cations in (23). (23) underlying UPPER RAISED
completive UPPER RAISED
incompletive UPPER RAISED
σ−σ + – + H-M + – + + M-M + – – +
σ−σ + – – H-L + – + – M-L + – – –
σ−σ – – + M-M – – + + L-M – – – +
σ−σ – – – M-L – – + – *L-L – – – –
correct: L-ML As seen, all of the tones come out correctly except for the bottom right hand form, where completive M-L is predicted to alternate with L-L rather than the correct L-ML. The /H, M, L/ analysis in (21b) is better equipped to get the right output. Recall that in this analysis that verb roots are /H/ vs. /M/.
Do tones have features? 63
When the incompletive L is pre¿xed to M-M and M-L inputs, we obtain the intermediate representations LM-M and LM-L. The LM-M becomes L-M by delinking the M from the ¿rst syllable. Assuming that the same happens in the second case, all that needs to be said is that the delinked M reassociates to the second syllable to produce the ML contour. Since there is no input M in either the featural or scalar analyses, one might attempt to provide one by fully specifying verb roots, with a [+UPPER, +RAISED] /H/ verb becoming [+UPPER, –RAISED] M in the incompletive. (There would no longer be any need for a completive [+RAISED] pre¿x.) However, this still does not solve the problem. Since the M-L verb would have a [–UPPER, +RAISED] speci¿cation on its ¿rst syllable, the [–RAISED] incompletive pre¿x would only change the value of [RAISED], not delink it. We therefore would have to propose that the incompletive pre¿x is fully speci¿ed as [–UPPER, +RAISED]. What this does is make the analysis exactly identical to the /H, M, L/ analysis in (21b), where there was no need to refer to features at all. The same is true of the scalar analysis, where the [–1] incompletive pre¿x would have to contour with the [+1] M, as if it were a real tone, not a pitch-change feature. We conclude that there is no advantage of a featural analysis of tone in Day – or in Gokana [Nigeria] which has a similar system (Hyman 1985). 3.2. Type 2: H/L vs. M verb tones There is a second type of system where nouns have a three-way lexical contrast between H, M and L and verbs a two-way contrast. While in the type 1 languages the two-way contrast is identi¿able as a relatively higher vs. lower verb tone, in type 2 one verb class alternates between H and L, while the other is a non-alternating M. First documented in Bamileke-Fe’fe’ (Hyman 1976), consider the H~L alternations on the ¿rst (= root) syllable of verbs in Leggbó (Hyman et al. 2002), where the second tone is suf¿xal: (24) Root tone: Perf./Prog. Habitual Irrealis
MCA/ORA /L/ /M/ H-M M-M L-L M-L L-L M-L
SRA /L/ /M/ L-M M-M L-L M-L L-L M-L
/L/ H-M H-M L-L
NEG /M/ M-M M-M M-L
(MCA: main clause af¿rmative; SRA, ORA: subj./obj. relative af¿rmative; NEG: negation)
64
Larry M. Hyman
Unless we adopt an ad hoc feature such as MID or EXTREME, there is no synchronic reason why H and L should alternate to the exclusion of M. Paster’s (2003) solution is to propose that L is the underspec¿ed tone in Leggbó such that H or L pre¿xes can be assigned to it. A M root would resist these pre¿xal tones since it is speci¿ed. The solution has some appeal as Leggbó has only a few LH and HL tonal contours, hence little need to prespecify L tone. However, it cannot work for Bamileke-Fe’fe’, which has numerous LM contours and Àoating L tones. While Hyman (1976) provided an abstract analysis involving Àoating H tones on both sides of the L, the alternative is to simply accept the arbitrariness of the H/L alternations, which represent morphological processes of “replacive” tone. In this respect they no more need to have a featural account than the replacive tone sandhi of Southern Min dialects, e.g. Xiamen 24, 44 → 22 → 21 → 53 → 44 (Chen 1987). Type 2 systems thus provide even less evidence for tone features than type 1. 4. Tonal phonology and M tone While the previous section sought evidence for features from the behavior of tonal morphemes which are assigned to verb forms, in this section we shall seek purely phonological evidence for features in three-height tone systems. Since the systems in (14b-d) provide four distinct feature con¿gurations they also make the prediction that a three-height system could have two phonologically contrasting tones which are phonetically identical, as summarized in (25) (25) a. /4/ and /3/ could be two kinds of phonetic H tone b. /3/ and /2/ could be two kinds of phonetic M tone c. /2/ and /1/ could be two kinds of phonetic L tone In the following subsections we shall consider Villa Alta Yatzachi Zapotec, which represents (25c), and Kagwe (Dida), which represents (25b). The question will be whether tone features can be helpful in accounting for such behaviors. 4.1. Two kinds of L tone in Villa Alta Yatzachi Zapotec According to Pike ([1948] 1975), Villa Alta Yatzachi Zapotec [Mexico] has three surface tones, H, M, and L, as well as HM and MH contours on monosyllabic words. However, there are two kinds of L tones: those which
Do tones have features? 65
remain L in context vs. those which alternate with M. Pike refers to these as class A vs. class B, respectively. In (26), these are identi¿ed as La and Lb: (26) a. Lb → b. La : Lb :
M /__ {M, H} bìa ‘cactus’ bìa ‘animal’
bìa gǀlƯ ‘old cactus’ bƯa gǀlƯ ‘old animal’
Rule (26a) says that class B L tones are raised to M before a M or H tone. As seen in (26b), there are actual minimal pairs, i.e. words which are phonetically identical in isolation but which have different behaviors in the raising context. Assuming that we do not want to identify the two L tones by means of a diacritic, as Pike does, there are two possible featural strategies we might attempt. The ¿rst in (27a) is to fully specify Lb as [–UPPER, +RAISED], a lower-mid (M) tone, featurally distinct from both M and L: (27)
a. Lb is fully speci¿ed as /M/
UPPER RAISED
H + +
M + –
Lb – +
L – –
b. Lb is underspeci¿ed for [RAISED] H + +
M – +
Lb –
L – –
The second strategy in (27b) is to underspecify Lb for exactly the feature that alternates, namely [RAISED]. This makes Lb featurally non-distinct from both /M/ and /L/. The rules needed under each analysis are formulated in (28). (28) a. if Lb is fully speci¿ed as [–UPPER, +RAISED] [–UPPER, +RAISED] → [–RAISED] /__ [–UPPER, –RAISED] b. if Lb is underspeci¿ed for [RAISED] [o RAISED] → [α RAISED] /__ [α RAISED] In (28a) the lower-mid tone becomes L when followed by L. Since the lowering has to occur also before pause, we would have to assume a prepausal L% boundary tone. In (28b), the underspeci¿ed [RAISED] feature acquires the same value as what follows it, thereby becoming [+RAISED] before H and M, but [–raised] before L(%). Except for the use of the alpha notation to represent feature spreading, both analyses seem reasonable up to this point. Now consider a second process where the H of the second part of a compound is lowered to M after both La and Lb:
66
Larry M. Hyman
(29) a. /dè-/ (La) ‘denominalizer’ + /zíz¸/ ‘sweet’ → dèzƯz¸ ‘a sweet’ b. /nìs/ (Lb) ‘water’ + /yíݦ/ ‘¿re’ → nƯsyƯҌ ‘kerosene’ Assuming this is assimilation rather than reduction (perhaps questionable), the rules would be as follows: (30) a. if Lb is fully speci¿ed as fourth tone [–UPPER, +RAISED] [+UPPER] → [–RAISED] / [–UPPER] # __ b. if Lb is underspeci¿ed for [RAISED] [+UPPER] → [–UPPER] /{ [–UPPER, {–RAISED, o RAISED} ] }# __ Each of the above rules has a problem. In (30a), the change of feature value is not explicitly formalized as an assimilation, e.g. by spreading of a feature. Instead, [+UPPER] changes to [–RAISED] after [–UPPER]. The rule in (30b) can be expressed as the spreading of a preceding [–UPPER], but requires the awkward disjunction in the environment so that M tone, which is [–UPPER, +RAISED], does not condition the rule. Note that one cannot ¿rst ¿ll in [o RAISED] as [–RAISED], since, as seen in (29b), [o RAISED] becomes [+RAISED] by the rule in (28b). It is thus not obvious that features are helpful in distinguishing the two kinds of L tone in this language. 4.2. Two kinds of M tone in Kagwe (Dida) The problem is even more acute in Kagwe (Dida) [Ivory Coast], which has two types of M tone (Koopman and Sportiche 1982): /M/ (class A) alternates between M and H, while /M/ (class B) remains M. The rule in question is formulated in (31a). (31) a. Ma
→
H
b. Ma : lƝ jǀ c. Mb
: kpࡄݞ lэթ
/
Ma __
otherwise Ma → M (= Mb)
‘spear’ ‘child’
mànƗ lé mànƗ jó
‘this spear’ ‘this child’
‘bench’ ‘elephants’
mànƗ mànƗ
‘this bank’ ‘these elephants’
kpࡄݞ lэթ
As indicated, Ma becomes H after another Ma. Alternations are seen after the L-Ma word mànƗ ‘this/these’ in (31b). Mb tones do not change after mànƗ in (31c).
Do tones have features? 67
As in the case of Zapotec Lb, two possible underlying representations of Ma are considered in (32). (32) a. Ma is fully speci¿ed as /M/ UPPER R AISED
H + +
Ma + –
Mb – +
b. Ma is underspeci¿ed for [UPPER] L – –
H + +
Ma +
Mb – +
L – –
In (32a), Ma is fully speci¿ed as M vs. phonetically identical Mb, which has the features of a lower-mid. In (32b), Ma is underspeci¿ed for the feature which alternates, namely [UPPER], hence is non-distinct from both /H/ and /Mb/. The rules needed under each of these analyses are formulated in (33). (33) a. Ma is fully speci¿ed as [+UPPER, –RAISED] [+UPPER, –RAISED] → [+RAISED] / [+UPPER, –RAISED] __ b. Ma is underspeci¿ed for [UPPER] [o UPPER] → [+UPPER] / [o UPPER] __ [o UPPER] → [–UPPER] In (33a), the raising rule appears to be dissimilatory, perhaps an OCP effect? The question here is why the language would not permit a succession of abstract [+UPPER, –RAISED] tones, at the same time allowing phonetically identical [M-M] sequences from three other sources: /Ma-Mb/, /Mb-Ma/, /Mb-Mb/. The rule would make sense only if Kagwe has an output condition *[+UPPER, –RAISED], with all remaining such tones converting to [–UPPER, +RAISED]. However, this would be a very abstract analysis indeed. The rule in (33b) is even more suspect: Why should [o UPPER] become [+UPPER] only if preceded by another [o UPPER]? While Koopman and Sportiche (1982) do point out that other Dida dialects have four contrasting tone heights as suggested by the matrix in (32a), there are other possible analyses of Ma. One is to treat Ma as /M/ and Mb as /Ø/. The dissimilation rule would thus become M → H / M __. Even better is to represent Ma either as a MH contour tone, as in (34a), or as a M tone followed by a Àoating H, as in (34b). (34) a. Ma as a contour σ MH
b. Ma as M + Àoating H σ MH
68
Larry M. Hyman
c. Ma → H as plateauing σ σ = MH M H If Ma is analyzed as M followed by Àoating H, as in (34b), the “raising” rule can be formulated as a common case of H tone plateauing, as in (34c). In fact, one might even attempt such an interpretation of Villa Alta Yatzachi Zapotec Lb, which could be a L followed (preceded?) by a Àoating M. What this means is that featural analyses may in some cases be denecessitated by the availability of contour representations and Àoating tones. Both of the representations in (34a,b) at least give a principled reason why Ma becomes H after another Ma. 4.3.
Lowered or downstepped M tone?
In the preceding two subsections we have considered two three-height tone systems which have two classes of phonetically identical tones: La vs. Lb in Villa Alta Yatzachi Zapotec and Ma vs. Mb in Kagwe. While these Lb and Ma alternate with M and H, respectively, the output system still remains one of three tone heights. A slightly different situation is found in Jibu [Nigeria] (Van Dyken 1974: 89), whose “class 1” vs. “class 2” M tone properties are summarized and exempli¿ed in (35). (35) a. “a class 2 mid tone is lowered when it follows a class 1 mid tone.” tƯ ↓wa-֓ n žà ‘he is buying cloth’ (tƯ = M1, wa-֓ n = M2) b. “both a class 1 mid tone and a class 2 mid tone are lowered when they follow a lowered mid tone.” knj sƗ ↓bƗi bnj ‘he made bad thing’ (knj, sƗ, bnj = M1, bƗi = M2) As indicated, Jibu appears to have a surface four-height system with the need to distinguish between two types of “M” tone. Since it is M2 which undergoes lowering, it seems appropriate to analyze it as involving a L+M sequence in one of the ways in (36). (36) a. M2 as a contour σ LM
b. M2 as Àoating L + linked M σ LM
Do tones have features? 69
What is crucial in this process is that M2 establishes a new (lower) M level to which all subsequent M tones assimilate. Thus, in (35b), M1 bnj is realized on the same level as the preceding tone of ↓bƗi and not higher. The prediction of the [UPPER] and [RAISED] tone features is that the inner two tones of a fourheight system should not be systematically related, since they bear opposite values of both features. In fact in every case I know where M assimilates to M after another M, the latter can be interpreted as a non-iteratively downstepped ↓M, as in Jibu, Gwari, Gokana, Ngamambo etc. (Hyman 1979a, 1986a), and possibly Bariba, the example which Clements, Michaud and Patin (this volume) cite. This observation raises the question of whether iterative ↓H, ↓M, and ↓L downsteps should be captured by a feature system vs. an independent register node or tier (cf. the same question concerning the relation between vowel height and ATR (Clements 1991)). In summary, while M tones should provide unambiguous evidence for features, instead questions arise due to their phonological properties (recall (11)). For every case where tone features appear to be useful, or at least usable, there is another case where they either don’t provide any insight or run into dif¿culties. Why this may be so is the issue with which I conclude in §5. 5.
Conclusion
From the preceding sections we conclude that the case for tonal features is not particularly strong. This is revealed both from the speci¿c examples that have been examined as well as the widespread practice of referring to tones in terms of H, M, L or integers. Let us now revise and reorder the questions that were raised in (1) and ask: (37) a. Why is tone different? b. Why is the case for tone features so weak? c. Why isn’t tone universal? It turns out that the answer to all three questions is the same: Tone is different because of its greater diversity and autonomy compared to segmental phonology. Because of its diversity tone is hard to reduce to a single set of features that will do all tricks. Because of its autonomy, feature systems that have been proposed, even those which relate tones to laryngeal gestures, are not reliable except perhaps at the phonetic level. Given that tone is so diverse and so poorly “gridded in” with the rest of phonology, it is not a good
70
Larry M. Hyman
candidate for universality. Let us consider the two notions of diversity and autonomy a bit further. In the preceding sections we have caught only a glimpse of the extraordinary diversity of tone systems. Languages may treat tone as privative, /H, Ø/, equipollent, /H, L/, or both, /H, L, Ø/. Given that F0, the primary phonetic correlate of tone, is scalar, the question is whether some systems treat tone as “gradual”: “Gradual oppositions are oppositions in which the members are characterized by various degrees or graduations of the same property. For example: the opposition between two different degrees of aperture in vowels... or between various degrees of tonality.... Gradual oppositions are relatively rare and not as important as privative oppositions.” (Trubetzkoy [1939] 1969: 75)
Because of the phonetically gradient nature of tone, the use of integers to represent tone heights has some appeal. Speakers are capable of distinguishing up to ¿ve tone heights and all of the pitch changes between them, whether as contours within a single syllable or as steps up and down between syllables. Preserving the pitch changes between syllables sometimes has interesting effects in tonal alternations. As seen in (38a), in the Leggbó ‘N1 of N2’ construction, if the second noun has a L pre¿x, it will be raised to M (the genitive marker /Ɨ/ is optionally deleted): (38) a. L-L → gè-bòò ‘squirrel’ lì-gwàl ‘leaf’ b. L-M → lì-zǀl ‘bird’ gè-dƯ ‘palm’ c. L-M-M → gѓ̖-kэmэ ‘disease’ ѓ̖-kƗƗlƗ ‘European’ d. M-M → ѓթ-ppyƗ ‘market’ H-M → lí-dzƯl ‘food’
M-L lídzƯl Ɨ gƝ-bòò Ưzù Ɨ lƯgwàl M-H gѓ̖mmà Ɨ lƯzól ànààn Ɨ gƝdí L-H-M ìzѓ̖ƯƗ gѓթkэmэ ѓ̖ttэ Ɨ ѓթkáƗlƗ M-M lèdzìl Ɨ ѓթppyƗ H-M ѓթvvѓ̗n Ɨ lídzƯl
‘food of squirrel’ ‘odor of leaf’ ‘beak’ (mouth of bird) ‘palm oil’ ‘cause of disease’ ‘house of European’ ‘day of market’ ‘place of food’
As seen in (38b), if N2 has a L pre¿x and a M stem-initial syllable, the L-M sequence will become M-H. It is as if the raising process were one of upstep, ↑ /L-M/, designed to preserve the step up between the L and M syllables of the input. The fact that only the ¿rst syllable of a M-M stem is affected in (38c)
Do tones have features? 71
is neatly accounted for by Steriade (2009): Although the output preserves the pitch change of /L-M/, there is no requirement to preserve the lack of a pitch change of a /M-M/ input. The examples in (38d) show that there is no change if the N2 has a M or H pre¿x. To appreciate further how some tone systems care about such “syntagmatic faithfulness”, consider the realization of /L-HL-H/ in the following Grass¿elds Bantu languages [Cameroon]: (39) a. b. c. d.
Language Mankon Babanki Babadjou Dschang
e. Kom f. Aghem
Output L-H-ĹH L-M-H L-H-ĻH L-ĻH-H L-M-M L-H-H
Process H-upstep HL-fusion H-downstep HL-fusion+ downstep H-lowering L-deletion
Reference Leroy (1979) Hyman (1979b) (personal notes) Hyman and Tadadjeu (1976) Hyman (2005) Hyman (1986b)
While all six languages simplify the HL input, thereby minimizing the number of ups and downs (Hyman 1979a: 24) and all but the last preserve a trace of both the H and the L, they make different choices as to what to preserve in terms of the syntagmatic relations. The upstep in Mankon is similar to what was seen in Leggbó: when the L of HL-H delinks, the rise to the next tone is preserved by means of upstepping the following H. Similarly, the step up is preserved in Babanki, this time by fusing the HL to a M tone. While the H-ĻH in Babadjou realizes the drop that should have occurred between the two Hs, there is no pitch change between the second and third syllables in Bamileke-Dschang and Kom, which unambigously encode the lost L, or in Aghem, which shows no trace of the L at all. Having established some of the extraordinary diversity of tone systems, let us now address the issue of autonomy. Tone, of course, was the original autosegmental property (Goldsmith 1976), and there is no problem demonstrating the advantages of representing tone on a tier separate both from its TBU and from the segmental features. Although tones require segments in order to be pronounced, I would argue that tones are not reliably integrated into a system of articulatory or acoustic features the way consonants and vowels are. For example, [+high, –low] not only de¿nes a class of high vowels, /i, ü, ݁, u/, with F1 and F2 de¿ning a two-dimensional “gridded” vowel space, but also a systematic intersection with palatal and velar consonants (Chomsky and Halle 1968). [+UPPER, –RAISED], on the other hand, only de¿nes a H tone,
72
Larry M. Hyman
not a class of tones. We might therefore switch to [+STIFF, –SLACK] (Halle and Stevens 1971) to relate H tone to voiceless obstruents and implosives and L tone to voiced and breathy voiced obstruents. While intersections of tones with laryngeal features or phonation types (aspiration, breathiness, glottalization, voicing) appear to provide evidence that tone features are “gridded in”, note ¿rst that [±STIFF, ±SLACK] de¿ne only three possibilities, whereas there can be up to ¿ve contrasting tone heights. More importantly, tone-laryngeal interactions are notoriously unreliable. As has been long known from diachronic studies in Southeast Asia and Athabaskan, the same laryngeal source can correspond diachronically to either H or L (see the various papers in Hargus and Rice 2005). Within Southern Bantu, so-called depressor consonants are not necessarily voiced (Schachter 1976, Traill 1990, Downing 2009). Even implosives, long held to be “pitch raisers”, show inconsistent tonal correspondences (Tang 2008). A particularly striking anti-phonetic case comes from Skou [Indonesia: Papua], where “there are no words with a L tone melody in which any syllable has a voiced stop onset” (Donohue 2004:87). This is reminiscent of Newman’s (1974:14) description of Kanakuru verbs, which are H-L after voiced obstruents, L-H after voiceless obstruents and implosives, and contrastively H-L vs. L-H when sonorant-initial. While [UPPER] and [RAISED] and the comparable systems in (14c,d) were designed to mirror diachronic, laryngeally-induced tonal bifurcations in Chinese and elsewhere, the synchronic reÀexes may involve a level vs. contour contrast, rather than producing a four-height tonal system: (40) a. Thakali (Hari 1971: 26)
tense lax
level falling H HL L LHL
b. Grebo (based on Newman 1986: 178) level rising [+RAISED] M MH [–RAISED] L LM
Starting with a /H/ vs. /HL/ contrast in Thakali [Nepal], a lower (“lax”) register adds an initial L feature which converts /H/ to L, but combines with the HL to produce a LHL contour tone. However, as Mazaudon and Michaud (2008: 253–4) point out for closely related Tamang, the 2 x 2 pairings are not always obvious. The same can be said about Chinese, where Bao’s (1999) tone sandhi analyses in terms of two sets of features {H, L} and {h, l}, as well as {hl} and {lh} contours, lend themselves to alternative interpretation and “do not come without complication” (Hyman 2003: 281).
Do tones have features? 73
In fact, it is not clear that diachronic developments inevitably lead to the positing of tone features. Mazaudon (1988: 1) argues that tones do not change by shared features, rather Jeder Ton hat seine eigene Geschichte [each tone has its own history]. As in the present paper, she ¿nds little value in analyzing tones in terms of features: “It seems to me that tones are simply different from segments and should be treated differently in the phonology.... My best present proposal would be that tones do not break up into features until the phonetic level, and that consequently these ‘features’ (which I propose to call ‘parameters’ to distinguish them clearly from distinctive features) are inaccessible to the phonology.” (Mazaudon 1988: 7)
Nowhere is this clearer than in those systems where one tone is arbitrarily replaced by another. As mentioned above, in non-phrase-¿nal position in Xiamen every tone is replaced by an alternate tone, as follows: 24, 44 → 22 → 21 → 53 → 44 (Chen 1987). Despite attempts, any featural analysis of such scales is hopeless. Mortensen (2006) cites a number of other tone chains which are quite abstract and diverge signi¿cantly from following a phonetic scale such as L → M → H. I would argue that tone is capable of greater abstractness than segmental phonology – or, at least, that comparable abstract analyses are better supported in tone than elsewhere. This has to do with the greater extractability of pitch and tonal patterns than segmental distributions: Thus, the Xiamen “tone circle” is clearly productive, while the synchronic status of the Great English Vowel Shift is more controversial. The greater autonomy and extractability of tone are also responsible for its more extensive activity at the phrase level, as seen in morphophonemic alternations in Chinese, Otomanguean, African and other tone systems. In short, tone is the most isolable gesture-based phonological property. This property is undoubtedly related to the fact that pitch also provides the best, if not universal expression of intonation, marking whole clauses and utterances. However, lexical, post-lexical, and intonational tones cannot be pronounced by themselves, unlike vowels and most consonants whose features may produce pronounceable segments of themselves. In fact, Harris and Lindsey (1995) and Harris (2004) have developed a “minimalist” approach to segmental features where no representation is unpronounceable. It is hard to see how this could be extended to tone, since a pitch feature cannot be pronounced by itself. While one might think that this would force tone to become inextricably tied to segments, just the reverse is true:
74
Larry M. Hyman
Tone is highly independent (autosegmental) and free to enter into abstract relationships including many which defy a featural interpretation. Of course tone is not alone in having these properties. Length and metrical stress, two other non-featural prosodic properties, also show high autonomy. However, neither vowel nor consonant length has the complexity of tone, as contrasts are normally limited to two values, long vs. short. While stress is both complex and abstract like tone, it is typically (de¿nitionally?) word bound. We thus return to the initial observation: Tone can do everything that non-tonal phonology can do, but not vice-versa. While some languages require every word to have a H tone, like word-stress, no language requires every word to have a stop or a high vowel. Thus, if tones consist of features, they are the only features that can be obligatorily required of a word. To conclude, there seems to be little advantage to treating tones other than the way that most tonologists treat them: as privative elements that are related to each other through their relative and scalar phonetic properties (cf. Mazaudon above). It thus may make most sense to adopt the integer system even for two-height systems: /H, L/ = /2, 1/, /H, M, L/ = /3, 2, 1/, and so forth.
References Anderson, Stephen R 1978 Tone features. In: Victoria Fromkin (ed.), Tone: A Linguistic Survey, 133–175. New York: Academic Press. Bao, Zhiming 1999 The Structure of Tone. New York and Oxford: Oxford University Press. Bateman , Janet 1990 Iau segmental and tonal phonology. Miscellaneous Studies of Indonesian and other Languages in Indonesia (1): 29–42. Chen, Matthew 1987 The syntax of Xiamen tone sandhi. Phonology Yearbook 4: 109– 149. 2000 Tone Sandhi. Cambridge: Cambridge University Press. Chomsky, Noam and Morris Halle 1968 The Sound Pattern of English. New York: Harper and Row. Clark, Mary M 1978 A dynamic treatment of tone with special attention to the tonal system of Igbo. Bloomington: IULC.
Do tones have features? 75 Clements, G. N. 1978 Tone and syntax in Ewe. In: Donna Jo Napoli (ed.), Elements of Tone, Stress and Intonation, 21–99. Washington, D.C.: Georgetown University Press. 1983 The hierarchical representation of tone features. In: Ivan R. Dihoff (ed.), Current Approaches to African Linguistics (vol. 1), 145–176. Dordrecht: Foris. 1991 Vowel height assimilation in Bantu languages. Proceedings of the Seventeenth Annual Meeting of the Berkeley Linguistics Society, Special Session on African Language Structures, 25–64. 2005 The role of features in phonological inventories. Presented at Journée “Les géometries de traits/Feature geometries”, Université de Paris 8 and Fédération Typologie et Universaux en Linguistique (TUL), Paris, 3 December 2005 (powerpoint in English). Clements, G. N., Alexis Michaud and Cédric Patin. 2011 Do we need tone features? Paper presented at the Symposium on Tones and Features, University of Chicago Paris Center, June 18–19, 2009. Corbett, Greville G. and Matthew Baerman 2006 Prologomena to a typology of morphological features. Morphology 16.231–246. Demuth, Katherine 2003 The acquisition of Bantu languages. In: Derek Nurse and Gérard Philippson (eds.), The Bantu Languages, 209–222. London: Routledge. Dicanio, Christian 2008 The phonetics and phonology of San Martin Itunyoso Trique. Ph.D. diss. University of California, Berkeley. Donohue, Mark 2004 A grammar of the Skou Language of New Guinea. Ms. National University of Singapore. http://rspas.anu.edu.au/~donohue/Skou/ index.html Downing, Laura J. 2009 On pitch lowering not linked to voicing: Nguni and Shona group depressors. In: Michael Kenstowicz (ed.), Data and Theory: Papers in Phonology in Celebration of Charles W. Kisseberth. Language Sciences 31.179–198. Edmondson, Jerald Al. and Kenneth J. Gregerson 1992 On ¿ve-level tone systems. In Shina Ja J. Juang and William R. Merri¿eld (eds), Language in Context: Essays for Robert E. Longacre, 555–576. SIL and University of Texas at Arlington. Girón Higuita, J.M. and W. Leo Wetzels 2007 Tone in Wãnsöhöt (Puinave), Colombia. In: W. Leo Wetzels (ed.), Language Endangerment and Endangered Languages: Linguistic and
76
Larry M. Hyman
Anthropological Studies with Special Emphasis on the Languages and Cultures of the Andean-Amazonian Border Area, 129–156. Leiden: CNWS. Goldsmith, John 1976 Autosegmental phonology. Ph.D. diss., Department of Linguistics, Massachusetts Institute of Technology. Halle, Morris and Kenneth Stevens 1971 A note on laryngeal features. Quarterly Progress Report (101):198– 213. Cambridge, MA: MIT Research Laboratory of Electronics. Hargus, Sharon and Keren Rice (eds.) 2005 Athabaskan Prosody. Amsterdam: John Benjamins. Hari, Maria 1971 A guide to Thakali tone. Part II to Guide to Tone in Nepal. Tribhvan University, Kathmandu: SIL. Harris, John 2004 Release the captive coda: the foot as a domain of phonetic interpretation. In: J. Local, R. Ogden and R. Temple (eds.), Phonetic Interpretation: Papers in Laboratory Phonology 6, 103–129. Cambridge: Cambridge University Press. Harris, John and Geoffrey Lindsey 1995 The elements of phonological representation. In: Jacques Durand and Francis Katamba (eds.), Frontiers of Phonology: Atoms, Structures, Derivations, 34–79. Harlow, Essex: Longman. Heine, Bernd 1993 Ik Dictionary. Köln: Rüdiger Köppe Verlag. Hyman, Larry M. 1976 D’où vient le ton haut du bamileke-fe’fe’? In: Larry M. Hyman, Leon C. Jacobson and Russell G. Schuh (eds.), Papers in African linguistics in Honor of Wm. E. Welmers, 123–134. Studies in African Linguistics, Supplement 6. Los Angeles: University of California, Los Angeles. 1979a A reanalysis of tonal downstep. Journal of African Languages and Linguistics (1):9–29. 1979b Tonology of the Babanki noun. Studies in African Linguistics (10): 159–178 1985 A Theory of Phonological Weight. Dordrecht: Foris Publications. 1986a The representation of multiple tone heights. In: Koen Bogers, Harry van der Hulst, and Maarten Mous (eds.), The Phonological Representation of Suprasegmentals, 109–152. Dordrecht: Foris. 1986b Downstep deletion in Aghem. In: David Odden (ed.), Current Approaches to African Linguistics, vol. 4, 209–222. Dordrecht: Foris Publications.
Do tones have features? 77 2003
Review of Bao, Zhiming. 1999. The Structure of Tone. New York and Oxford. Oxford University Press. Linguistic Typology (7):279–285. 2005 Initial vowel and pre¿x tone in Kom: Related to the Bantu Augment? In Koen Bostoen and Jacky Maniacky (eds.), Studies in African Comparative Linguistics with Special Focus on Bantu and Mande: Essays in Honour of Y. Bastin and C. Grégoire, 313–341. Köln: Rüdiger Köppe Verlag. In press Tone: is it different? In: John Goldsmith, Jason Riggle and Alan Yu (eds.), The Handbook of Phonological Theory, 2nd Edition. Blackwell. Hyman, Larry M., Heiko Narrog, Mary Paster, and Imelda Udoh 2002 Leggbó verb inÀection: A semantic and phonological particle analysis. Proceedings of the 28th Annual Berkeley Linguistic Society Meeting, 399–410. Hyman, Larry M. and Maurice Tadadjeu 1976 Floating tones in Mbam-Nkam. In: Larry M. Hyman (ed.), Studies in Bantu Tonology, 57–111. Southern California Occasional Papers in Linguistics 3. Los Angeles: University of Southern California. Kisseberth, Charles W. 2009 The theory of prosodic phrasing: the Chimwiini evidence. Paper presented at the 40th Annual Conference on African Linguistics, University of Illinois, Urbana-Champaign, April 9–11, 2009. Koopman, Hilda and Dominique Sportiche 1982 Le ton abstrait du Kagwe. In : Jonathan Kaye, Hilda Koopman and Dominique Sportiche (eds.), Projet sur les Langues Kru, 46–59. Montreal: UQAM. Leroy, Jacquelines 1979 A la recherche de tons perdus. Journal of African Languages and Linguistics (1) : 55–71. Li, Charles N. and Sandra A. Thompson 1977 The acquisition of tone in Mandarin-Speaking Children. Journal of Child Language (4): 185–199. Maddieson, Ian 1971 The inventory of features. In: Ian Maddieson (ed.), Tone in Generative Phonology, 3–18. Research Notes 3. Ibadan: Department of Linguistics and Nigerian Languages, University of Ibadan. Mazaudon, Martine 1988 An historical argument against tone features. Paper presented at the Annual Meeting of the Linguistic Society of America, New Orleans. 2003 Tamang. In: Thurgood, Graham and Randy J. LaPolla. The SinoTibetan Languages, 291–314. London and New York: Routledge.
78
Larry M. Hyman
Mazaudon, Martine and Alexis Michaud 2008 Tonal contrasts and initial consonants: A case study of Tamang, a “missing link” in tonogenesis. Phonetica (65): 231–256. Mortensen, David R. 2006 Logical and substantive scales in phonology. Ph.D. diss., University of California, Berkeley. Myers, James and Jane Tsay 2003 A formal functional model of tone. Language and Linguistics (4): 105–138. Newman, Paul 1974 The Kanakuru Grammar. Leeds: Institute of Modern English Language Studies, University of Leeds, in association with the West African Linguistic Society. 1986 Contour tones as phonemic primes in Grebo. In: Koen Bogers, Harry van der Hulst, and Maarten Mous (eds.), The Phonological Representation of Suprasegmentals, 175–193. Dordrecht: Foris. Nougayrol, Pierre 1979 Le Day de Bouna (Tschad). I. Eléments de Description Linguistique. Paris: SELAF. Paster, Mary 2003 Tone speci¿cation in Leggbo. In: John M. Mugane (ed.), Linguistic Description: Typology and Representation of African Languages. Trends in African Linguistics (8): 139–150. Philippson, Gérard 1998 Tone reduction vs. metrical attraction in the evolution of Eastern Bantu systems. In Larry M. Myman and Charles W. Kisseberth (eds), Theoretical Aspects of Bantu Tone, 315–329. Stanford: C.S.L.I. Pike, Eunice Victoria. 1975 Problems in Zapotec tone analysis. In: Brend, Ruth M. (ed.), Studies in Tone and Intonation by Members of the Summer Institute of Linguistics, University of Oklahoma, 84–99. Basel: S. Karger. Original edition, IJAL (14): 161–170, 1948. Pulleyblank, Douglas 1986 Tone in Lexical Phonology. Dordrecht: D. Reidel. Schachter, Paul 1976 An unnatural class of consonants in Siswati. In: Larry M. Hyman, Leon C. Jacobson and Russell G. Schuh (eds.), Papers in African Linguistics in Honor of Wm. E. Welmers, 211–220. Studies in African Linguistics, Supplement 6. Singler, John Victor 1984 On the underlying representation of contour tones in Wobe. Studies in African Linguistics (15): 59–75.
Do tones have features? 79 Smith, Neil 1973
Tone in Ewe. In Eric E. Fudge (ed.), Phonology. London: Penguin. Original edition, Quarterly Progress Report (88), 290–304. Cambridge, MA: MIT Research Laboratory of Electronics.
Snider, Keith 1999 The geometry and features of tone. Dallas: Summer Institute of Linguistics. Stahlke, Herbert 1971 The noun pre¿x in Ewe. Studies in African Linguistics, Supplement 2, 141–159. 1977 Some problems with binary features for tone. International Journal of American Linguistics (43): 1–10. Steriade, Donca 2009 Contour correspondence: tonal and segmental evidence. Paper presented at Tones and Features: A Symposium to Honor Nick Clements, Paris, June 18–19, 2009. Tang, Katrina Elizabeth 2008 The phonology and phonetics of consonant-tone interaction. Ph.D. diss. University of California, Los Angeles. Thomas, Elaine 1978 A Grammatical Description of the Engenni Language. University of Texas at Arlington: Summer Institute of Linguistics. Traill, Anthony 1990 Depression without depressors. South African Journal of African Languages (10): 166–172. Trubetzkoy, N.S. 1969 Grundzüge der Phonologie [Principles of Phonology]. Translated by Christiane A. M. Baltaxe. Berkeley: University of California Press. Original edition: Travaux du cercle linguistique de Prague 7, 1939. van Dyken, Julia 1974 Jibu. In: John Bendor-Samuel (ed.), Ten Nigerian Tone Systems, 87–92. Studies in Nigerian Languages, 4. Jos and Kano: Institute of Linguistics and Centre for the Study of Nigerian Languages. Wang, William S.-Y. 1967 Phonological features for tone. International Journal of American Linguistics (33): 93–105. Welmers, William E. 1962 The phonology of Kpelle. Journal of African Languages (1): 69– 93. 1973 African Language Structures. Berkeley and Los Angeles: University of California Press.
80
Larry M. Hyman
Yip, Moira 1980
The tonal phonology of Chinese. Ph.D. diss., Department of Linguistics, Massachusetts Institute of Technology. 1995 Tone in East Asian Languages. In: John Goldsmith (ed.), Handbook of Phonological Theory, 476–494. Oxford: Basil Blackwell. 2002 Tone. Cambridge: Cambridge University Press. Zheltov, Alexander 2005 Le système des marqueurs de personnes en gban: Morphème syncrétique ou syncrétisme des morphèmes. Mandenkan (41): 23–28.
Features impinging on tone* David Odden A long-standing puzzle in phonological theory has been the nature of tone features. Chomsky and Halle (1968) offers features for most other phonological properties of language, but no proposals for tone were advanced there. Research into the nature of tone features has focused on three basic questions. First, how many levels and features exist in tone systems? Second, what natural classes and phonological changes are possible in tonal grammars? Third, how should segmental effects on tone be modeled? Although many proposals provide enlightening answers to individual questions, no proposal handles all of the facts satisfactorily. The purpose of this paper is to address the unity of tone features. I argue that the basic source of the problem of answering these questions lies in incorrect assumptions about the nature of features, speci¿cally the assumption that there is a single set of predetermined features with a tight, universal mapping to phonetics. I argue for Radical Substance Free Phonology, a model where phonological features are learned on the basis of grammatically-demonstrated segment classes rather than on the basis of physical properties of the sounds themselves, making the case for such a theory from the domain of tone. Empirically, I show that, like voicing, vowel height is a feature relevant to synchronic tonal phonology, drawing primarily on facts from the Adamawa language Tupuri.
*
Research on this paper was made possible in part with the support of CASTL, University of Tromsø. I would like to thank Dieudonne Ndjonka, who provided me with my data for Tupuri, and Molapisi Kagisano, who provided me with my data for Shua, and Mike Marlo, Charles Reiss and Bridget Samuels for comments on an earlier version of this paper. Earlier versions of this paper have been presented at the Universities of Tromsø, Amsterdam, Indiana, and Harvard, as well as at the symposium in honor of G. Nick Clements.
82
David Odden
1. The nature of features The point of theoretical departure for this investigation is the set of representational and computational assumptions which have characterized research in non-linear phonology for numerous years. Features are obviously essential in this theorizing, since they are the basis for grouping sounds together in phonological rules. There has been a long-standing question regarding the ontology of features, whether they are fundamentally phonetic descriptions of sounds which phonologies refer to, or purely formal in nature, only serving the purposes of phonological classi¿cation and lacking intrinsic phonetic content. The traditional viewpoint is that a phonetic classi¿cation of speech sounds should provide the conceptual underpinnings for phonological analysis, early exempli¿cations of this view being found in the works of Sweet, Sievers, Jespersen and Jones. The emergence of a distinction between phonetics (“speech”) and phonology (“language”) following de Saussure raises questions as to the proper relationship between phonetics and phonology. Trubetzkoy (1939) observes (p. 11) that “most of these [acoustic and articulatory] properties are quite unimportant for the phonologist...[who] needs to consider only that aspect of sound which ful¿lls a speci¿c function in the system of language” [emphasis mine], and that (p. 13) “...the linguistic values of sounds to be examined by phonology are abstract in nature. They are above all relations, oppositions etc., quite intangible things, which can be neither perceived nor studied with the aid of the sense of hearing or touch”. The grasping of this distinction between phonetics and phonology leads to the essential question about the ontology of these relations. Trubetzkoy states (p. 91): “The question now is to examine what phonic properties form phonological (distinctive) oppositions in the various languages of the world”. An important presupposition contained in this question is that phonetic properties do indeed form phonological oppositions, which is to say, Trubetzkoy (and others) assume that the elements of phonology are phonetically de¿ned. Despite various differences between theories of phonology over many years, the assumption of phonetic-fundamentality has been at the root of most phonological theorizing. In the line of research pertaining to formal feature theory that starts with Jakobson, Fant and Halle (1952), up through Chomsky and Halle (1968), it has been a standard assumption that phonological classes are stated in terms of universal phonetically-de¿ned features which are physical properties. For example, The Sound Pattern of English (SPE) (Chomsky and Halle 1968) states that “The total set of features is identical with the set of phonetic
Features impinging on tone 83
properties that can in principle be controlled in speech; they represent the phonetic capabilities of man and, we would assume, are therefore the same for all languages.” (pp. 294–5), and that “the phonetic features are physical scales and may thus assume numerous coef¿cients, as determined by the rules of the phonological component” (p. 297). The perspective that features are fundamentally phonetic descriptions of language sounds, which happen to be used in phonological grammars, has been a major claim of generative grammar.1 A problem with the SPE theory of features is highlighted in Campbell (1974), namely the connection between round vowels and labial consonants. The formal problem is that labiality can trigger or be triggered by vocalic rounding, but feature theory does not explain this fact. (1) Finnish: Tulu:
k ĺ v / u__u (in weak-grade context) ܺ ĺ u / labial C C0 __
Finnish [v] is not round, nor are Tulu [p,m], therefore the changes in (1) are formally arbitrary changes, which was considered undesirable. In the autosegmental era, attention was paid to restricting rules and representations to disallow arbitrary changes. McCarthy (1988) nicely summarizes reasoning in this period with his observation that: “...phonological theory has made great progress toward this goal by adhering to two fundamental methodological premises. The ¿rst is that primary emphasis should be placed on studying phonological representations rather than rules. Simply put, if the representations are right, then the rules will follow.” [p. 84]
The problem, according to McCarthy, is that common nasal place assimilation “is predicted to be no more likely than an impossible one that assimilates any arbitrary set of three features, like [coronal], [nasal], and [sonorant]” (p. 86). The resolution to this problem lies in an interaction between a better theory of representations and a better theory of rules: “The idea that assimilation is spreading of an association line resolves the problem raised by (3). Assimilation is a common process because it is accomplished by an elementary operation of the theory – addition of an association line” [p. 86]
A representational theory such as that of Clements (1985) or Clements and Hume (1995) would then let representations dictate rules for us.
84
David Odden
It is correct that some emphasis must be placed on the representational atoms of phonological grammars, but attention must also be paid to the theory of rules, since simply knowing that [coronal], [labial] and [dorsal] exhaust the constituent [place] does not thereby explain the impossibility of a rule copying any arbitrary set of three features, such as [coronal], [nasal], and [sonorant]. The computational issue can be addressed by positing, as Clements (1985: 244) states, that “assimilation processes only involve single nodes in tree structure”, which naturally leads to the conclusion of Clements and Hume (1995: 250) that “phonological rules perform single operations only”. Taken together, theories of rules and representations should delimit the class of possible phonological operations. The single-operation theory is intended to rule out a vast set of arbitrary phonological operations, so that only (2a), (3a) would be possible rules of post-nasal voicing or postvocalic spirantization. (2) Post-nasal voicing a. [+voice] C [+nas]
*b.
C
[+voice] ĸ Ø
C [+nas]
(3) Post-vocalic spirantization a. [+cont]
*b.
V C
C
[+cont] ĸ Ø V
C
The theory of “assimilation as spreading” leads to a problem if features are fairly precise descriptions of speech events. In the Sagey (1986) analysis of Tulu (4), [labial] spreads from a consonant to following Ѡ, so that the vowel gains a labial speci¿cation. Then, the feature [round] is added by default, because [labial] is linked to a vowel, which gives an actually round vowel. Since there is just one multiply-linked token of [labial] in the representation, the labial stop itself should also be round; but there is no reason to believe that the phonetic output has round p. (4) a
p
t
[labial]
a
p
t
[labial]
a
p
t
[labial] [round]
u
Features impinging on tone 85
It is implausible that the phonetic output is *[apwtu], and similar issues arise in the uni¿cation of vowel fronting and coronal (Hume 1994), where /ot/ ĺ [öt], not *[öty]. If features are interpreted narrowly and strictly by the classical de¿nitions, an ad hoc method of cleaning up outputs is required. Uni¿ed Features Theory allows a different solution to the problem in (5), by claiming that features are more abstract than in SPE, and are subject to more complex phonetic interpretation. Fine-grained interpretation, in the case of place features, depends not just on the terminal feature, but on its relationship to a dominating node, so labial immediately dominated by C-Place is realized as bilabial or labiodental but labial immediately dominated by V-Place is realized as lip rounding. (5) a
p t CP CP CP VP
[labial]
a
p t CP CP CP
a
p t CP CP CP
VP
VP = [round]
[labial]
[labial]
The core, universal feature [labial], [coronal] etc. would be less de¿nite from the physical perspective, and would be neutral as to interpreting labial as “round” with lip protrusion versus as labial compression / approximation. Analogous reasoning equates the features for consonantal coronal and vowelfrontness in Hume (1994), and voicing and low tone in Bradshaw (1999). Such abstracting away from phonetic speci¿cs is crucial to realizing the goal of an empirically tenable representational theory which works in tandem with a computational theory that prohibits arbitrary feature-insertions such as (6). (6) ܺ ĺ u after a labial
C +ant –cor
V +hi +back +round
i ĺ a after uvular, laryngeal
C –ant –cor
V +hi –back +low
The possibility of viewing processes of fronting triggered by coronals and rounding triggered by labials as constituent-spreading depends
86
David Odden
substantially on a degree of detachment between features and their phonetic content. There is a basic logical Àaw in the idea of a formal principle whereby assimilation must be treated as spreading. Such a principle does not address the possibility of a non-assimilatory rule where o becomes ö before a labial, or i becomes a before a coronal. That is, attempting to restrict the formalism of just assimilatory rules will ultimately yield no restriction on phonology, since the fact of being an assimilatory rule is not a self-evident linguistic primary – it is an analyst’s concept. From the formal perspective, a crucial part of the logic is that arbitrary feature changes such as (7) with insertion of [–back] triggered by presence of Labial and insertion of [+low] triggered by Coronal should also not be possible rules. (7) * V Place [o]
Place [–bk]
Lab
* V Place [i]
Place [+low]
Cor
In short, formal phonology must severely limit insertions, if McCarthy’s desideratum is to be realized. Allowing rules such as (7) under any guise undermines the claim that correct representations yield a more restrictive view of possible rules. A complete prohibition on feature insertion would face serious empirical problems, since insertion certainly exists in the form of segments, provides well-formed syllables (onsets are created, vowel-epenthesis exists), the ¿lling in of unspeci¿ed features, and OCP-driven insertion of opposite values. Understanding the nature of insertions is a vital part of understanding phonological computations. I do not focus on that matter here, and assume that strong limits on insertions are possible. The central point of this section is to show that a predictive formal theory cannot rely just on representations, it also needs a valid theory of how representations are acted on so as to rule out a vast range of direct feature changes of the type that would be possible in SPE theory. We consider the consequences of such a restrictive formal theory for the concept of features in section 4. 2. A substance-free perspective Given that phonological rules operate in terms of variously-intersecting sets of segments accessed by features, two fundamental questions about features
Features impinging on tone 87
are whether they are universally pre-wired and unlearned, and whether they are ontologically bound to phonetic substance. In the generative context, the dominant trend has followed Chomsky and Halle (1968), and presumes that the atoms of phonological representation are phonetically de¿ned. Recent trends in phonology (exempli¿ed by Archangeli and Pulleyblank 1994 and Hayes, Kirchner and Steriade 2004) have returned to the SPE program of denying the distinction between phonetics and phonology, elaborating the theory of phonology with teleological principles designed to model optimal production and perception within grammar. An alternative to features as phonetic descriptions is that they are formal, substance-free descriptions of the computational patterning of phonemes in a language, the classes which the sounds of a language are organized into for grammatical rules. One of the earliest phonological analyses of a language (Sanskrit), PƗini’s Aৢ৬ƗdhyƗyƯ, is founded on a strictly algebraic system of referring to segment classes by specifying the ¿rst and last members of the class, as they appear in a phonologically-ordered list of Sanskrit segments. Thus the class “voiceless stops” is identi¿ed by the formula khay, and sibilants are çar; the class “voiceless aspirate” cannot be described in this system, which is appropriate since it plays no role in the grammar of Sanskrit as a distinct class. In the modern era, Hjelmslev (1939) is a well-known pregenerative proponent of the view that phonological analysis should not be founded on assumptions of phonetic substance. Thus a phonological theory which eschews reference to the physical manifestation of speech is certainly possible. There have been such countervaling tendencies in generative phonology, for example Foley (1977), Dresher, Piggott, and Rice (1994), Harris and Lindsey (1995), Hale and Reiss (2000, 2008), Samuels (2009), which emphasize the formal computational aspects of phonology and which assert that phonology is autonomous from phonetics, thus the atoms of phonological computation and representation have only an accidental relationship to principles of production and perception. Such a perspective emphasizes the independence of synchronic grammatical computations from the historical causes of those computations, and places substance-dependent considerations outside of the domain of grammar and inside of the domain of the study of language change and acquisition. Within this substance-free tradition, the Parallel Structures Model (PSM) of representations (Morén 2003, 2006) advances a formal theory of segmental representation by minimizing the number of representational atoms (features) and maximizes relational resources (hierarchical structure), pursuing certain essential ideas of UFT to their logical limit. An important aspect of this model
88
David Odden
of representation is the high degree of phonetic abstractness of features, especially manner-type features, where PSM eliminates numerous features such as “nasal”, “lateral”, “continuant”, “consonantal” and “sonorant” in favor of combinations of two abstractly de¿ned features, “open” and “closed” and more free combinations of these features with the nodes “V-place” and “C-place”. In PSM, nasal consonants across languages can, in principle, be represented in many different ways, effectively answering the question “are nasals [+continuant] or [–continuant]” by saying “either, depending on the facts of the language”. Pursuing the logic of this model even further, the theory of Radical Substance-Free Phonology (RSFP) in Odden (2006), Blaho (2008) posits an entirely substance-free formal theory of phonology, holding that a phonology is a strictly symbolic computational system, that the terminal representational primitives which a phonological computation operates on is the feature, but features have no intrinsic physical de¿nition or interpretation. Physical interpretation of a phonological representation is handled by the phonetic interpretive component, which has a wholly different nature from that of the phonological component. Phonological segments are the perceptual primitives that feature induction starts with: a child learning English knows that [p] is not [t], and must discover what formal properties distinguish these sounds (and unite these sounds in being opposed to [b], [d]). The general nature of that difference in a grammar is feature speci¿cation, and rules operate as usual, being stated in terms of features, because the device of features is mandated by the theory of rule syntax – which is the locus of grammatical universality. The crucial difference between RSFP and theories using SPE or similar universal features is that in RSFP, the features used in a language and the relationship between physical properties of phoneme realization and featural analysis must be learned from grammatical patterns, and are not predetermined by acoustic or articulatory events. RSFP holds that principles of UG do not refer to speci¿c features, and that UG is unaware of the substance of features. Phonetically-grounded facts, especially markedness, are outside the scope of what grammar explains, but are within the scope of non-grammatical theories of perception, language change and acquisition, which partially explain the data patterns that the rule-system generates. Functional factors are relevant only when they affect the actual data which the next generation uses to learn a particular grammar. If certain segments {a,b,d,f} function together in the operation of a rule, those segments have some feature(s) in common, e.g. [W]. Universal physical pre-de¿nitions of phonological features are unnecessary to be able
Features impinging on tone 89
to pronounce outputs or identify inputs: actual experience with the language is. Since the primary data for language acquisition are a complex function of numerous antecedent human factors, there are many (but not unlimited) possible mappings from phonetic fact to grammar. It follows from this that languages can give different featural analyses to the same phonetic fact. If the patterns of segment classes of two languages are substantially different, it is expected that the featural analyses of those segments will differ, even when the phonetic events that they map to are essentially the same. The logic of RSFP is inductive, working from the facts to the conclusion, and is not deductive, working from pre-existing conclusions to languageparticular facts. The primary fact which the child knows and builds on is that the language has particular segments such as i, e, o, p, b, th, n, k, ƾ. When the facts of the language show that th, n pattern as a class to the exclusion of other segments, the child knows that those segments have some common feature that classi¿es them and can induce the class labeled [coronal]; when p, b, th, k act as a class separate from n, ƾ, the child knows that some other feature e.g. [oral] distinguishes th and n.2 Numerous competing feature systems to describe a given phonetic distinction could be induced, but such competition is always crosslinguistic. Thus nasals may be stops in one language and continuants in another; within a single language, a sound cannot simultaneously have and not have the same feature. The facts of the primary linguistic data determine what grammar is induced, and within a language there is a single, non-contradictory analysis of the facts (modulo the possibility of feature changing operations such as default ¿ll-in applying at a particular derivational stage, a possibility available to substance-dependent theories of features as well). While there cannot be a single theory of the substance of features across languages, there is a single theory of the syntax of features. Especially relevant to the concerns of this paper, the only predicted grammatical limits on natural classes and segment/ tone effects are of the kind “could not be part of a linguistically computable rule”. 3. Tone features There have been a number of contradictory proposals for tonal representations, including Wang (1967), Sampson (1969), Woo (1969) and Maddieson (1972), all of which are capable of describing 5 tone levels (an important consideration for a theory of tone, since there are languages with 5 distinctive levels). These theories differ in terms of the classes that they predict to be
90
David Odden
possible, for example the proposal of Wang (1967) predicts that tone levels 1,2,4,5 excluding 3 could function as a class de¿ned as [–Mid] whereas the competing theories do not allow this. Sampson (1969) and Maddieson (1972) predict that levels 1,2,3 can function as a class de¿ned as [–Low] whereas the competitors do not. Woo (1969) predicts that levels 1,3,5 can be a class de¿ned as [–Modify]. None of these theories predicts the wellknown interaction between consonant voicing and tone (see below), which is addressed by the features of Halle and Stevens (1971), but on the other hand the latter theory does not handle more than 3 tone levels. One widely-adopted tone feature theory is the Yip-Pulleyblank model (8), which assumes a register feature [upper] dividing tone space into upper and lower registers, and [raised] which subdivides registers into higher and lower internal levels. All upper register tones are higher than any lower register tones, and a raised tone in the lower register is physically lower than the nonraised tone of the higher register. (8)
SH + +
upper raised
H + –
M – +
L – –
A signi¿cant empirical advantage of this system is that it explains a surprising phonological alternation, the physically-discontinous assimilation of the feature [+raised], as Clements (1978) documents for Anlo Ewe. In that language, a Mid (M) tone becomes Superhi (SH) when Àanked by H tones. What is surprising is that the M of the postposition [mƝgbé] actually becomes higher than the triggering H. From the perspective of a featural analysis of tones in the raised/upper model, this is perfectly sensible as a rule assimilating the feature [upper]. (9) Ɨkplэթ meթ gbé ĺ Ɨkplэթ meթ gbé Ɲkpé meթ gbé ĺ Ɲkpeկ meկ gbé H
M x –upper
+upper –raised
‘behind a spear’ ‘behind a stone’ H +upper
+raised
–raised
Since M is the raised tone of the lower register and H is the non-raised tone of the upper register, it follows that if M takes on just the register feature of H, then it becomes the raised tone of the upper register, which is SH.
Features impinging on tone 91
The prediction of RSFP, with respect to tones as well as other features, is that other arrangements of the same phonetic facts into phonological systems are possible, because numerous competing feature systems can be induced from the simple fact of having 4 tone levels, and the competition is only narrowed down by looking at natural class behavior. To see that this predicted outcome is realized, we turn to a language with a different treatment of 4 levels, namely Kikamba. The 4–level tonal space of this language is divided by a distinction between high and low tones, which are further differentiated by being plain versus [extreme], that is, at the outsides of the tonal space. Note that the feature [extreme], which groups together the “inner” and “outer” tones into natural classes, was one of the tone features proposed in Maddieson (1972). Following that model of tone representation, the highest tone in Kikamba, SH, is a [+extreme] H, and the lowest tone, SL, is a [+extreme] L. (10) Kamba (Roberts-Kohno and Odden notes; Roberts-Kohno 2000) vғ vࡉ v vࡍ SH H L SL H + + – – extreme + – – + This analysis of 4 tones is induced from the natural class patterns of the phonology, which show a paradigmatic connection between H and SH triggered by SL. In¿nitives have a ¿nal SL, which spreads to the second mora of a long penult. Verbs are also lexically differentiated as to whether their ¿rst root mora has an underlying H versus L. (11) L verbs
ko-kon-Ж ko-kѓѓճl-Ж H verbs ko-kélakely-Ж ko-kétek-Ж
‘hit’ ‘strain’ ‘tickle’ ‘occur’
ko-kэlэk-Ж ko-sitaЖk-Ж ko-kóolok-Ж ko-tálaЖƾg-Ж
‘stir’ ‘accuse’ ‘advance’ ‘count randomly’
As shown in (12), whenever H comes before SL, that H becomes SH, which spreads to a ¿nal SL vowel. (12) ko-taկ l-aկ ko-tǢ-aկ /kotálЖ/
‘count’ ‘pluck’ /kokóТlyЖ/
ko-kĘТly-Ж
‘ask’
/ko-tǢá ճ /
The raising of H to SH before SL is easily comprehensible as the assimilation rule (13) if this language employs a feature [extreme] grouping SH and SL together.
92
David Odden
(13) H V
V [+extreme]
Expressed in the raised/upper model, the rule would be an arbitrary feature change, which we have sought to rule out on theoretical grounds. (14) [+upper]
[-upper]
V
V
[+raised]
[-raised]
Other evidence for analysis of Kikamba tones in terms of [extreme] is seen in (15), which illustrates the fact that a lexical [extreme] speci¿cation in nouns is deleted when the noun is followed by a modi¿er. (15) N maio mabaatá ekwaaseկ moemО
big N maio manѓկnѓկ mabaatá manѓկnѓկ ekwaasé enѓկnѓկ moemi monѓկ nѓկ
‘bananas’ ‘ducks’ ‘sweet potato’ ‘farmer’
This is a simple feature deletion with the feature [extreme], but is an arguably unformalizable rule under an upper-raised analysis. (16) Extreme-theory: Upper-raised theory:
[+extreme] ĺ Ø / ___ .... X ]NP *[Įupper] ĺ [-Įraised]/ ___ .... X ]NP
Finally in (17), certain verb forms cause a SL, which is just a speci¿cation [+extreme], to shift to the end of their complement, explaining the alternations on postverbal Moѓma and maio. (17) maio máaMoѓma ngáatálá maiТ ngáatálá maio máaMoѓmЖ
‘bananas’ ‘of Moema’ ‘I will count bananas’ ‘I will count bananas of Moema’
Features impinging on tone 93
De¿nite forms of nouns have SH tone on the ¿rst syllable, which of course involves a [+extreme] speci¿cation. The presence of SH then blocks the shift of SL from the verb. (18) maկ io ngáatálá maկ io máaMoѓma
‘the bananas’ ‘I will count the bananas of Moema’
Blockage is expected given that [+extreme] originates from the verb and thus precedes the target phrase-¿nal position, since there is an intervening [extreme] speci¿cation. (19)
[+ex] ngáatálá maio máaMoѓmЖ
[+ex] [+ex] ngáatálá maկ io máaMoѓma
Thus the same surface tone system – four levels – are analyzed different ways in different languages, supporting the claim of RSFP that features are learned and not universally identical. A similar point can be made with respect to segment/tone interactions. The best-known effect is the so-called depressor effect, whereby voiced consonants are associated with L-tone behavior. See Bradshaw (1999) for an extensive treatment and a theoretical account of the facts within UFT. An example is the pattern in Nguni languages such as Siswati, where H becomes a rising tone after a depressor. In (20), underlying H from the in¿nitive pre¿x (underlined) shifts to the antepenult. (20) kú-¿k-a kú-gez-a kú-ge|-a
‘to arrive’ ‘to bathe’ ‘to chop’
ku-fík-el-a ku-gČz-el-a ku-ge|-êl-a
‘to arrive for’ ‘to bathe for’ ‘to chop for’
When the onset of a syllable with a H is a depressor, H appears as a rising tone. That rising tone is sometimes eliminated by shifting H to the penult as in (21a), but this does not take place when the onset of the penult is also a depressor as in (21b). (21) a. kú-ge|ela ĺ kugé|ela ĺ kugČ|ela ĺ kuge|êla b. kú-gezela ĺ kugézela ĺ kugČzela This connection between L tone behavior and voiced obstruents has been know for many years, at least since Maspéro (1912), and is recognised in
94
David Odden
the Halle and Stevens feature system by reducing voicing and L tone to a single feature, [slack]. The Halle and Stevens account faces the problem that a complete identi¿cation of voicing and L tone accounts for the phonological relevance of voiced consonants in creating rising tones and blocking risedecomposition in Siswati, but it does not account for the irrelevance of voiced consonants with respect to the rule shifting H to the antepenult, which operates across all consonants. This is resolved in the model of Bradshaw (1999) which equates L tone and voicing as one feature, L/voice, which allows multiple dominating nodes. When dominated by a tone node it is realized as L tone, and when dominated by a segmental laryngeal node it is realized as voicing. (22) L tone
Voiced consonant root
Tone
Laryngeal
L/voi
L/voi
This accounts for the facts of consonant-tone interaction in Siswati by a spreading rule. (23)
[L/v]
[L/v] H
Laryngeal
Lar Tone g
Tone e zela
The alternative would be an arbitrary insertion feature rule, of the kind that nonlinear representations are supposed to render unnecessary. (24) Ø ĺ L
/ C ____ [+voice]
Since rules can be plane- and tier-sensitive, this “predicts that there can be an interweaving of transparency and opacity” (Bradshaw 1999: 106) as seen in the differential blocking or transparency of tone shifts by intervening depressors.3 This then is the fundamental and most common interaction between tone and segmental content, that voiced consonants bring about L tone behavior.
Features impinging on tone 95
The success of this explanation depends very much on relaxing the degree of phonetic speci¿cness of features. In the next section, I turn to a different effect, one that is quite rare in synchronic phonologies and also much more abstract than the consonant / tone connection, namely the effect of vowel height on tone. 4. Vowel height and tone I begin with methodologically-instructive data from the Khoisan language Shua. This language also has 4 tone levels, and certain facts suggest a historical sound change relating tone and vowel height. H, M and L tones are illustrated in (25), whose appearance is fairly unrestricted. (25) LL LH
ML
HL HH
//àȕù //àà tùrù hàrí c’èé dàۤ dòá k’ùí !Ɨȕà pƝè //’njm̖ muթ ˾ u`˾ jíbè k u̗˾ u̖˾ shórì shúnú xwéé
‘Ày (v)’ ‘¿ngernail’ ‘rat’ ‘dish’ ‘cry’ ‘turtle’ ‘grass’ ‘speak’ ‘Àat’ ‘jump’ ‘cut’ ‘see’ ‘axe’ ‘exit’ ‘tobacco’ ‘breathe’ ‘white’
LM LHL
MM MH HM
/’àǀ ‘à˾ Ɨ˾ bìƝ khàâ khòê tshèê g//àî //ùmը //’a˾ Ư˾
‘snake’ ‘know’ ‘zebra’ ‘give’ ‘person’ ‘daytime’ ‘run’ ‘low’ ‘buy’
!huթ ˾ u̗˾ mwƝdí //’ámթ séƝ n/únj xáۤ zírá
‘push’ ‘moon’ ‘hit’ ‘take’ ‘thigh’ ‘lion’ ‘bird’
What is noticeable in (26) is that the Superhi tone appears almost exclusively on high vowels. (26) SH
//Õկ í kĦۤ njĦú xĦۤ Ħú
‘song’ ‘heavy’ ‘black’ ‘sand’ ‘send’
SS
shĦbĦ kaկ rÕկ
‘light’ ‘hard’
SL
ݦy˾ u̗ ̗ ˾ u˾̖
‘sit’
96
David Odden
There are no melodic tones or alternations in the language to support a synchronic connection between vowel height and tone. It may be assumed that there was a historical change in the language explaining the uneven synchronic distribution of the tones, but that provides no warrant for encoding that relationship in the synchronic grammar. Induction of a featural relation between vowel height and tone requires a de¿nite, categorial patterning in the grammar, which Shua lacks. A strong case for synchronically connecting tone and vowel height comes from Tupuri, a member of the Mbum group of Adamawa languages spoken in Chad and Cameroun. Tupuri also has 4 tone levels, Superhi (SH = vࡉ ), High (H=vғ ), Mid (M=vࡃ ) and Low (L=vҒ ), and the language presents a grammaticalized transplanar segment effect between vowel height and the sub-register feature [raised]. Static distributional evidence from nouns in Tupuri proves to be as unrevealing as it was in Shua. As the forms in (27a) show, there is a predominance of high vowels in nouns with superhi tone, but as (27b) shows, this is just a tendency. (27) a. dĦ rÕկ ƾ s҂կk t҂կ b. faկ y kэկs
‘name’ ‘hair’ ‘ear’ ‘hole’ ‘¿eld rat’ ‘card game’
h ҂կ ˾ tÕկ ƾ š҂կk tÕկ haկ r pѓկr
‘bone’ ‘house’ ‘smoke’ ‘head’ ‘palm leaf’ ‘priest’
There is a also a tendency visible in (28a) for high tone to appear on non-high vowels, but as (28b) shows, this is just a statistical tendency. (28) a. Ҍã̗ y fѓ̗k káw pã̗ sám b. ѐѢ̗Ѣ̗ pѢ̗n š҂̗
‘bean’ ‘smile’ ‘relative’ ‘milk’ ‘sheep’ ‘race’ ‘beard’ ‘Àour’
ѐáƾ kák láw sák tã̗ y hѢ̗Ѣ̗ š҂̗ ˾ tܞy
‘hare’ ‘chicken’ ‘afternoon’ ‘haunch’ ‘feather’ ‘grudge’ ‘horn’ ‘rabbit’
As (29a,b) show, mid and low tones appear freely on any vowel. (29) a. bƗy fэթэթ
‘testicle’ ‘amusement’
dэթ h эթ ˾ y
‘arm’ ‘nose’
Features impinging on tone 97
hƗn k҂թ šѢթ ˾ Ѣթ ˾ fuթ ˾ y b. ƾwã` y jàw hìn wѢ̖l
‘calabash’ ‘wood’ ‘¿sh’ ‘fur’ ‘female’ ‘spear’ ‘brother’ ‘boy’
krƗƾ pѢթr tlnjm k҂թƾ jàk wày n҂̗ ˾ y҂̖
‘wing’ ‘horse’ ‘tongue’ ‘leg’ ‘mouth’ ‘dog’ ‘oil’ ‘middle’
Thus the non-alternating static lexical distribution is unrevealing of the grammar of Tupuri: no phonological rules affect tones in nouns, and there is no reason to posit any rules of grammar to account for these data. Verb tone is entirely different, since tones in verbs alternate paradigmatically, meaning that there is something which the grammar must account for. Unlike nouns, verbs have no lexically-determined tone. Instead, verbs receive their tones in a classical autosegmental fashion via concatenation of morphemes, which includes Àoating-tone tense markers. Verb tone is entirely predictable according to the following informal rules. First, root vowels are generally M-toned. An expected M on the syllable after the 3s pronoun becomes SH. The entire verb stem tone becomes H and the 3s pronoun itself has a SH in the present tense. The paradigm in (30) shows the pattern for monosyllabic roots. (30) in¿nitive šƗƗ-gì dѓթf-gì bэթm-gì þƯk-gì ѐѢթk-gì yѓթr-gì
1s past njì šƗƗ njì dѓթf njì bэթm njì þƯk njì ѐѢթk njì yѓթr
3s past Ɨ šaկ aկ Ɨ dѓկf Ɨ bэկm Ɨ þÕկ k Ɨ ѐѢկ k Ɨ yѓկr
1s present njì šáá njì dѓ̗f njì bэ̗m njì þík njì ѐѢ̗k njì yѓ̗r
3s present aկ šáá aկ dѓ̗f aկ bэ̗m aկ þík aկ ѐѢ̗k aկ yѓ̗r
‘dig’ ‘cook’ ‘play’ ‘pound’ ‘think’ ‘write’
Polysyllabic verbs particularly show that SH appearing after the 3s past pronoun only affects the ¿rst syllable of the verb, whereas the H of the present tense is manifested on all syllables of the verb. (31) in¿nitive b҂թl҂թl-gì kѢթlѓթr-gì
1s past njì b҂թl҂թl njì kѢթlѓթr
3s past Ɨ b҂կl҂թl Ɨ kѢկ lѓթr
1s present njì b҂̗l҂̗l njì kѢ̗lѓ̗r
3s present aկ b҂̗l҂̗l ‘roll over’ կa kѢ̗lѓ̗r ‘draw’
The analysis of this pattern is that verb roots have no underlying tone, and by default any toneless vowel receives M. Pronouns have tones, so the
98
David Odden
invariantly L toned 1sg pronoun njì has L, whereas the 3sg pronoun is toneless /a/ plus a Àoating SH. The Àoating SH of /a/ docks to a following vowel if it is toneless, and otherwise docks to the preceding toneless pronoun. Whether the following verb has a tone is determined by tense-aspect. The present/ imperfective has a Àoating H which spreads to the vowels of the root. In the presence of such an inÀectional H, the pre¿xal SH docks to the pronoun (aկ šáá “he digs”). Otherwise the SH from the pronoun appears on the verb (Ɨ šaկ aկ “he dug”). Illustrative derivations are given in (32). (32) /lܧƾ-gì/ /njì lܧƾ/ /a ࡉ lܧƾ/ /njì lܧƾ ғ/ /a ࡉ lܧƾ ғ/ [njì lܧғƾ] a ࡉ lܧғƾ a lࡉܧƾ [lࡃܧƾ-gì] [njì lࡃܧƾ] [Ɨ lࡉܧƾ]
[aࡉ lܧғƾ]
underlying melody mapping rightward SH docking T´ docking default M
In one tense, the imperative, the tone of verbs is determined by properties of segments in the verb. Relevant segmental factors include both the voicing of the root-initial consonant and the height of the vowel. We begin by considering verb roots with a non-high vowel, where the consonant determines the verb’s tone. In (33a) we observe voiceless consonants conditioning a H tone, and in (33b) we see implosives and glottalized sonorants triggering H tone. (33) a. Ҍѓ̗ ˾k hэ̗t fѓ̗r káp sát tã̗ b. ьál ѐár w’ár
‘fry’ ‘eat fufu dry’ ‘return’ ‘plant’ ‘sweep’ ‘braid’ ‘nail’ ‘insult’ ‘kill’
Ҍэ̗ ˾ k há frѓ̗k klѓ̗w4 šѓ̗ ˾ѓ̗ ˾ tám ьѓ̗s ѐáw
‘braise’ ‘give’ ‘scratch’ ‘squeeze’ ‘cut’ ‘chew’ ‘divide’ ‘hold’
In contrast, (34a) shows a L tone when the initial consonant is a voiced sonorant, and (34b) shows the same tone after a (voiced) plain sonorant. (34) a. bэ̖m dѓ̖ѓ̖
‘play’ ‘dip fufu’
bàr dѓ̖f
‘cover’ ‘make soup’
Features impinging on tone 99
dэ̖k gѓ̗ ˾ѓ̗ ˾ gràk jѓ̖l b. lѓ̖ làà mà na̖a˾ ̖˾ rэ̖t wa˾̖a˾̖ yѓ̖r
‘repeat’ ‘raise’ ‘put across’ ‘stoop’ ‘fall’ ‘hear’ ‘beat’ ‘undress’ ‘burn’ ‘speak’ ‘write’
dà gэ̖ƾ gàs ja̖a˾ ̖˾ lэ̖ƾ làs màƾ nyàà ràƾ wàk yэ̖k
‘want’ ‘huddle’ ‘sift’ ‘fray’ ‘bite’ ‘maltreat’ ‘carry’ ‘take’ ‘promenade’ ‘scratch’ ‘bathe’
The examples in (35) show that when the root begins with a prenasalized consonant (which is always voiced), the verb’s tone is L. (35) mbѓ̖t mbàr ƾgàà
‘stretch’ ‘bear’ ‘decapitate’
mbэ̖k ndѓ̖p ƾgàp
‘diminish’ ‘¿ll a hole’ ‘measure’
Finally, (36) shows that in polysyllabic verbs, the tone of both syllables is determined by the consonant property of the root-initial syllable. (36) šárák ƾgàràk
‘tear up’ ‘dress up’
hárák ƾgàràs
‘break’ ‘undercook’
Thus with non-high vowels, the choice H vs. L is determined by the voicing of the ¿rst consonant, as predicted by Bradshaw’s model of consonant-tone interaction. The particular arrangement of which consonants behave as tonedepressors is unique to Tupuri, but similar to the pattern of other languages. Voiced obstruents are depressors, which is the common case, and implosives (which are predictably and not necessarily phonologically voiced) are not tone depressors. Voiced sonorants are also tone depressors which is not the common pattern (but is attested in some languages); the fact that glottalized glides are non-depressors in contrast to plain glides has no known analog in other languages, since they have not been attested in language with consonant-tone effects.5 Conceptualized in terms of Yip-Pulleyblank features, the feature analysis of the 4 Tupuri tones is analogous to that of Ewe. Taking into consideration the segmental analogs, though, L/voice is the opposite of upper register, that is, in Tupuri, a better name would be “low register”. H is treated as the
100
David Odden
non-raised tone of the upper register (SH being the raised tone in that register), and L is the lower tone of lower register (M being the raised tone of that register). (37) L/voi (upper) hi (raised)
vࡉ SH – +
vғ H – –
vØ M + +
v` L + –
The H ~ L alternation is thus a consonantally-triggered register-change: L/ voice a.k.a. “–upper” spreads qua register from the initial consonant, and combines with an existing [–raised] i.e. H tone to yield the lowest tone, L. The rule is, essentially, the same as the standard spreading of low register (L/ voice) in (23) targeting a [–raised] tone in the imperative (spreading of low register from a consonant is not a general phenomenon in the language, it is morphologically restricted to the imperative). The imperative tense itself is characterized by a Àoating H ([–raised]) tone, which spreads to all vowels. This results in the following contrast. (38)
á r á k
ng à r à k L/v Tone node
[–raised]
[–raised]
Whether or nor [šárák] is further speci¿ed [–L/v] by default depends on whether features are binary and fully speci¿ed in the output. However, when the vowel of the verb root is [+high], a different pattern is found. Consider ¿rst the examples in (39), where the initial consonant is nondepressor and the syllable is either VV or VR (R = liquid, glide or nasal). Observe that the tone pattern of the syllable is H-SH. (39) Ҍܞyկ þ ƭ ̗ ƭկ húlկ kíÕկ kúlկ ьílկ
‘arrange’ ‘start’ ‘cover’ ‘turn around’ ‘have blisters’ ‘entertain’
Ҍúrկ f u˜֤ yկ k҂̗rկ kíƾկ síÕկ ѐúĦ
‘Ày’ ‘pull’ ‘¿ght’ ‘spend the year’ ‘announce’ ‘pound’
Features impinging on tone 101
I assume that sonorant coda consonants are moraic, thus the surface generalization is that the ¿rst mora has H and the second has SH. Compare the following examples with the same initial consonant and an obstruent coda: the verb in the imperative has just SH on its one mora. (40) ҌÕկ k hѢկ k krѢկ k ѐѢկ k
‘pant’ ‘dry’ ‘scratch’ ‘think’
þÕկ k k҂կp tĦf
‘pound’ ‘cover’ ‘spit’
A compact statement of the distribution of tone in the imperative, when the vowel is [+high], is that the ¿nal mora bears SH tone. As the data of (41a,b,c) show, the phonatory properties of the root-initial syllable do not inÀuence tone. (41) a. b´҂lկ dƭ ̗ ƭկ gúmկ b. l҂̗ƾկ lѢ̗wկ r҂̗mկ rúĦ c. ndúĦ ndúlկ
‘open’ ‘deform’ ‘beat millet’ ‘try’ ‘taste’ ‘pinch’ ‘yell at’ ‘come’ ‘pierce’
b҂̗mկ gírկ gúnկ líƾկ m҂̗nկ ríƾկ wíÕկ ngílկ
‘thrash’ ‘stir sauce’ ‘witch’ ‘learn’ ‘break’ ‘advise’ ‘say’ ‘provoke’
Analogous monomoraic roots with an initial voiced consonant are seen in (42). (42) a. dĦk gĦp b. lÕկ k rĦk yѢկ k c. ngÕկ t
‘vomit’ ‘suffocate’ ‘swallow’ ‘ripen’ ‘dry in sun’ ‘turn’
g҂կs
‘taste’
lĦp w҂կt
‘immerse’ ‘swell up’
Finally, disyllabic verbs can be observed in (43). (43) b҂̗l҂կl
‘roll over’
ndúlĦp
‘overcook meat’
The analysis of this pattern is as follows. A segmentally conditioned rule, Imperative Raising, changes the H tone of the imperative, bleeding the
102
David Odden
consonantally-triggered tone-lowering rule motivated above. By Imperative Raising, a [+high] vowel raises H to SH on the last mora, therefore the postdepressor tone is no longer [–raised]. In Yip-Pulleyblank terms, [+upper, –raised] becomes [+raised] when there is a high vowel in the root. The central theoretical question, then, is how the tone feature [+raised] can be acquired from a [+high] vowel, in a theory which prohibits arbitrary feature insertions. The solution is simple, and is parallel to the equation of voicing and L tone (register). In Tupuri, the tonal feature [raised] and the vowel feature [hi] are one and the same feature. When linked to a tone node, [hi] is realized as the high tone in the register – i.e. [+raised] – and when linked to a vowel place node, it is realized as a high vowel. The rule is formalized in (44). (44) Imperative raising [hi/raised] V-place Tone
(in the imperative)
] The effect of this rule is seen in (45). (45)
T T nd
u
= [ndú ]
Vpl [hi/raised] This solution to the problem of height-conditioned tone raising in Tupuri is possible only if features are abstract – they do not intrinsically describe speci¿c physical events. 5.
Conclusions
A central claim of Radical Substance Free Phonology is that features are not universally pre-de¿ned, and that only the formal mechanism of features
Features impinging on tone 103
exists as part of universal grammar. Each feature must be learned on the basis of the fact that it correctly de¿nes classes of segments within the grammar. A prediction of this claim is that a given phonetic fact or phonological contrast could be analysed into features in a number of different ways. We have seen above that this is the case, regarding the analysis of tone-heights.6 While the theory formally allows essentially any logically coherent organization of segments into classes, mediated by learned features, this does not mean that the theory predicts that all or many of those computational possibilities will actually be realized. As emphasized by Hale and Reiss (2008), attested languages are a function not just of the theory of computation, but also are indirectly the result of extragrammatical constraints on acquisition and language change, which determine the nature of the data which a child uses to induce a grammar. Grammars are created by children in response to language facts, so any factors that could diachronically affect the nature of the primary data could indirectly inÀuence the shape of a synchronic grammar. Since phonetic factors obviously affect what the child hears and thus the inductive base for grammatical generalizations, there are reasons for grammatical processes to have a somewhat phonetically-natural appearance. Is there a sensible functional explanation for why such a correlation between vowel height and tone raising would have arisen? The phenomenon of intrinsic pitch is well-known in phonetics – high vowels universally have higher F0 than comparable non-high vowels, on the order of 15 Hz (see Whalen and Levitt 1995 for a crosslinguistic study). This is often explained by mechanical pulling on the larynx by the tongue, increasing vertical tension on the vocal chords, and is sometimes explained based on perception of F1 with reference to F0 (the fact that high vowels are ones where F1 and F0 are close, thus raising F0 in high vowels enhances this percept). While this effect is generally believed to be imperceptible (Silverman 1987, Fowler and Brown 1997), see Diehl and Kluender (1989 a,b) who claim that intrinsic pitch is under speaker control. It is apparent that this phonetic tendency was in fact “noticed” and ampli¿ed by pre-modern Tupuri speakers, yielding the grammaticalization of an earlier low-level physically-based trend. While extremely rare,7 such rare patterns are essential to understanding the nature of “possible grammars”. Notes 1. See Bromberger and Halle (2000) for a more contemporary af¿rmation of the phonetic grounding of features, in terms of an intention to produce an articulatory action.
104
David Odden
2. Standard features names may be conventionally kept; or, features may be labeled with arbitrary indices such as F5, as noted in Hall (2007). There is no signi¿cance to the name assigned to a feature in a language in RSFP, any more than SPE phonology claims that the “intrinsic name” for the tongue-blade raising feature is [coronal] rather than [lingual]. Whether features are binary or privative, on the other hand, is a fundamental question about UG. 3. The formal details of how plane- and tier-sensitivity remain to be worked out, in the framework of the theory of adjacency conditions developed in Odden (1994). 4. Notice that the stem-initial consonant is a voiceless obstruent, a non-depressor, but the consonant immediately preceding the H-toned vowel is a sonorant, a depressor. The alternation between H and L is triggered by the root-initial consonant. 5. Words which might be thought to begin with a vowel have a noticeable phonetic glottal stop, which is preserved phrasally, and is included in the transcriptions here. Note that glottal stop, if it is phonologically present, does not behave as a tone-depressor, see e.g. Ҍѓ̗k ‘fry’. This is somewhat noteworthy, since in Kotoko (Odden 2007), glottal stop is a tone depressor. 6. A further example of feature-duality is the relationship between vowel laxness and L tone, demonstrated by Becker and Jurgec (nd) for Slovenian, which they argue has tone-lowering alternations triggered by lax mid vowels. 7. To the best of my knowledge, the only other case of synchronically-motivated tone / vowel-height connection is found in certain Japanese dialects, discussed in Nitta (2001).
References Archangeli, Diana and Douglas Pulleyblank 1994 Grounded Phonology. Cambridge, MA: MIT Press. Becker, Michael and Peter Jurgec. nd Interactions of tone and ATR in Slovenian. http://roa.rutgers.edu/¿les/995–1008/995–BECKER-0–0.PDF. Blaho, Sylvia 2008 The Syntax of Phonology: A Radically Substance-Free Approach. Ph. D. dissertation, University of Tromsø. Bradshaw, Mary 1999 A Crosslinguistic Study of Consonant-Tone Interaction. Ph. D. diss., The Ohio State University. Bromberger, Sylvain and Morris Halle 2000 The ontology of phonology (revised). In: Noel Burton-Roberts, Philip Carr and Gerard Docherty (eds.), Phonological Knowledge: Conceptual and Empirical Issues, 18–37. Oxford: Oxford University Press.
Features impinging on tone 105 Campbell, Lyle 1974 Phonological features: problems and proposals. Language 50: 52–65. Chomsky, Noam and Morris Halle 1968 The Sound Pattern of English. New York: Harper & Row. Clements, G. Nick 1978 Tone and syntax in Ewe. In: Donna Jo Napoli (ed.) Elements of Tone, Stress, and Intonation, 21–99. Washington: Georgetown University Press. 1983 The hierarchical representation of tone features. In: Ivan R. Dihoff (ed.), Current approaches to African linguistics, Volume. 1, 145–176. Dordrecht: Foris 1985 The geometry of phonological features. Phonology Yearbook 2: 225–52. 1991 Place of articulation in consonants and vowels: a uni¿ed theory. Working Papers of the Cornell Phonetics Laboratory 5: 77–123. Clements, G. Nick and Elizabeth Hume 1995 The internal organization of speech sounds. In: John Goldsmith (ed.), The Handbook of Phonological Theory, 245–306. London: Blackwell. Diehl, Randy and Keith Kluender 1989a On the objects of speech perception. Ecological psychology 1: 121–144. 1989b Reply to commentators. Ecological psychology 1: 195–225. Dresher, B. Elan, Glyne Piggott and Keren Rice. 1994 Contrast in phonology: overview. Toronto Working Papers in Linguistics 14: iii–xvii. Foley, James 1977 Foundations of Theoretical Phonology. (Cambridge studies in linguistics, 20). Cambridge: Cambridge University Press. Fowler, Carol A. and Julie M. Brown 1997 Intrinsic F0 differences in spoken and sung vowels and their perception by listeners. Perception and Psychophysics 59: 729–738. Hale, Mark and Charles Reiss 2000 “Substance abuse” and “dysfunctionalism”: current trends in phonology. Linguistic Inquiry 31: 157–169. 2008 The phonological enterprise. Oxford: Oxford University Press. Hall, Daniel Currie 2007 The role & representation of contrast in phonological theory. Ph. D. diss., University of Toronto. Halle, Morris and Ken Stevens 1971 A note on laryngeal features. RLE Quarterly Progress Report 101: 198–312. MIT. Harris, John and Geoff Lindsey 1995 The elements of phonological representation. In: Jacques Durand and Francis Katamba (eds.), Frontiers of Phonology: Atoms, Structures, Derivations, 34–79. London, New York: Longman.
106
David Odden
Hayes, Bruce, Robert Kirchner and Donca Steriade 2004 Phonetically-Based Phonology. Cambridge: Cambridge University Press. Hjelmslev, Louis 1939 Forme et substance linguistiques. In Essais de linguistique II. Copenhagen: Nordisk Sprog- og Kulturforlag Hume, Elizabeth 1994 Front Vowels, Coronal Consonants and their Interaction in Nonlinear Phonology. New York: Garland. Jakobson, Roman, C. Gunnar M. Fant & Morris Halle 1952 Preliminaries to Speech Analysis: The Distinctive Features and their Correlates. Technical Report 13. Massachusetts: Acoustics laboratory, MIT. Maddieson, Ian 1972 Tone system typology and distinctive features. In: André Rigault and René Charbonneau (eds), Proceedings of the 7th International Congress of the Phonetic Sciences, 957–61. The Hague: Mouton. Maspéro, Henri 1912 Etudes sur la phonétique historique de la langue annamite. Les initiales. Bulletin de l’Ecole francçaise d’Extrême-Orient, Année 1912, Volume 12, Numéro 1: 1–124. McCarthy, John 1988 Feature geometry and dependency: a review. Phonetica 43:84–108. Morén, Bruce 2003 The Parallel Structures Model of feature geometry. Working Papers of the Cornell Phonetics Laboratory 15: 194–270. 2006 Consonant-vowel interactions in Serbian: features, representations and constraint interactions. Lingua 116: 1198–1244. Nitta, Tetsuo 2001 The accent systems in the Kanazawa dialect: the relationship between pitch and sound segments. In: Shigeki Kaji (ed.) Cross-Linguistic Studies of Tonal Phenomena, 153–185. Tokyo: ILCAA. Odden, David 1994 Adjacency parameters in phonology. Language 70: 289–330. 2006 Phonology ex nihilo. Presented at University of Tromsø. 2007 The unnatural phonology of Zina Kotoko. In: Tomas Riad and Carlos Gussenhoven (eds) Tones & Tunes Vol 1: Typological Studies in Word and Sentence Prosody, 63–89. Berlin: de Gruyter. Pulleyblank, Douglas 1986 Tone in Lexical Phonology. Dordrecht: Reidel. Roberts-Kohno, R. Ruth 2000 Kikamba phonology and morphology. Ph. D. diss., The Ohio State University.
Features impinging on tone 107 Sagey, Elizabeth 1986 The representation of features and relations in nonlinear phonology. Ph. D. diss., Massachusetts Institute of Technology. Sampson, Geoffrey 1969 A note on Wang’s “Phonological features of tone”. International Journal of American Linguistics 35.62–6. Samuels, Bridgett 2009 The structure of phonological theory. Ph. D. diss., Harvard University. Silvermann, Kim 1987 The structure and processing of fundamental frequency contours. Ph. D. diss. University of Cambridge. Trubetzkoy, Nikolai S. 1969 Grundzüge der Phonologie [Principles of Phonology]. Translated by Christiane A. M. Baltaxe. Berkeley: University of California Press. Original edition: Travaux du cercle linguistique de Prague 7, 1939. Wang, William 1967 Phonological features of tone. International journal of American linguistics 33: 93–105. Whalen, Doug and Andrea Levitt 1995 The universality of intrinsic F0 of vowels. Journal of phonetics 23: 349–366. Woo, Nancy 1969 Prosody and phonology. Ph. D. diss., Massachusetts Institute of Technology. Yip, Moira 1980 The tonal phonology of Chinese. Ph. D. diss., Massachusetts Institute of Technology.
Downstep and linguistic scaling in Dagara-Wulé Annie Rialland and Penou-Achille Somé 1.
Introduction
This chapter is in keeping with George N. Clements’s work on tonal languages and downstep, particularly his work with Yetunde Laniran on Yoruba’s downstep. Moreover, it concerns subjects which were very important in Nick’s heart and life: phonology, music and Africa. There is a tradition in analysing downstep in musical terms: intervals, register, key-lowering. The present chapter will continue and extend this tradition, proposing to bridge tighter linguistic and musical scalings, at least in some African tone languages. Downstep has been studied in many languages and from various points of view. In this chapter, we will concentrate on phonetic studies that deal with the nature of intervals in downstep, their calculation, the whole “geometry” of the system with its reference lines. Depending upon the language, various types of scaling and references lines were found. However, other factors aside, all downstep calculations involved a constant ratio between downsteps. A constant ratio is precisely also a central characteristic of musical intervals. We begin by presenting the various types of downstep which have been found, taking into account reference lines and scalings (both parameters are linked, as shown below in (1). Then, by studying Dagara downstep, we propose that it can be viewed as being based on (roughly) equal intervals within a musical scale (semitones) in the same way as another African language such as Kono, whose tonal system shares many common points with the Dagara one (J. T. Hogan and M. Manyeh, 1996). As downstep cannot be an isolated phenomenon, we expect other manifestations of musical type intervals. Using evidence from a repetition task, we hypothesize that the remarkable parallelism found among the ¿ve speakers’ productions is due to a common linguistic scale involving musical-type intervals (tones, semitones, cents). Thus, we bring together two types of arguments in favor of a linguistic scaling based on a musical scaling. Finally, as a ¿rst step in comparing downstep scaling and Dagara-Wulé music scaling, we present a
Downstep and linguistic scaling in Dagara-Wulé 109
preliminary study of the scaling of an eighteen key xylophone which also involves (roughly) equal intervals. We begin by providing some background on the studies of downstep (or “downdrift”), their scalings and reference lines involved in their calculation. 2.
Downsteps, scalings and reference lines
The phonetic implementation of downstep (either “automatic downstep” also called “downdrift”, or non-automatic ones which are always termed “downsteps”) has been studied in various languages. Since the seminal article by Liberman and Pierrehumbert (1984), the calculation of downstep has been found to involve a ratio (or some type of unit, related in an exponential way with hertz, such as ERBs), and reference lines. Based on available studies of downstep, we can distinguish three types, depending upon the reference line involved in their calculation: 1) Downstep with an H tone reference line; 2) Downstep with an asymptote between the last H and the bottom of the speaker’s range; 3) Downstep without a reference line. We will see that the whole geometry of a downstep system generally involves several reference lines with various roles (see below). The best studied language with a H tone reference line is Yoruba. In-depth studies were performed by Y. Laniran and G. N Clements (see Laniran 1992; Y. Laniran and G.N. Clements 2003). Yoruba is a three-tone language spoken in Nigeria. Its downstep (or downdrift) is triggered by L tones alternating with H tones. There is no non-automatic or distinctive downstep; for example, there is no downstep due to a Àoating L tone. The fact that this downstep is not distinctive is important, as it can be absent without any loss of distinctivity. Studying the phonetic realizations of downstep, Laniran and Clements (2003) found that the basic H tone value (given by H realizations in all H tone sequences) provides a reference line for the realizations of downstepped H in a sequence of alternating H and L tones. This reference line is reached by the second or third downstepped H tones. In the following graph, reproduced from Laniran (1992), values of H in all H-tone utterances are represented by empty circles. Black circles correspond to the F0 value of tone realizations in an utterance with alternating H and L tones, and triangles represent values of L tones in an all L-tone utterance.
110
Annie Rialland and Penou-Achille Somé H 110
L
100
HL
90 80 70 60
L
H
L
H
L
H
L
H
L
(b) Figure 1. F0 curves of all-H-tone-utterances (empty circles), all-L-tone utterances (empty triangles) and utterances with alternating L and H tones (black circles). Reproduced from Laniran (1992).
In the alternating H and L realization, we observe that the second H, which is downstepped, is almost lowered to the basic H value and that the second one is right on the H tone line. Once this H tone reference line is reached, the following Hs are not lowered any more but instead are realized on the H tone reference line. We note also that the ¿rst HL interval is the largest one. Downstep strategies vary depending upon speaker, thus determining the number and size of the downsteps above the “basic tone value” and the ways of “landing” on the H reference line (“soft” landing / “hard” landing). “Soft” landing, observed in one speaker out of three, refers to an asymptotic decay, while “hard landing” is a more abrupt pitch lowering. In many languages, the lowering of H tones is not limited by a H reference line but is asymptotic to a reference line below the last H tone. In their article on English downstep, Liberman and Pierrehumbert (1984) recognize equal steps between downsteps, given an exponential scale based on a constant ratio d and a reference line. Consider the equation that they propose: Hn = d(Hn–1 – r) + r Hn is the F0 of the n-downstep. d is a ratio (between 0 and 1), Hn–1 is the value in hertz of the n – 1 downstep, r is a reference line and the asymptote of the system. The reference line is between the last H and the bottom of the pitch range. It is an abstract line, without linguistic meaning. In the space over the
Downstep and linguistic scaling in Dagara-Wulé 111
reference line, the ratio between a downstep and a following one is constant (0.8, for example). This means that the steps between downsteps are equal within an exponential scale related to the hertz scale. A similar equation has been found to be a good predictor of downstep (more precisely downdrift) in Chichewa, a two-tone Bantu language (Myers 1996). In the same line of analysis, Pierrehumbert and Beckman (1988) proposed a calculation of Japanese downstep with equal steps and a reference line below the last H tone. However, the calculation is more complex as a “conformed space” was introduced. In Spanish, Prieto and al. (1996) propose a model without an asymptote but with a limitation at a rather low level within speakers’ pitch ranges, which is generated by the equation itself. A second asymptote (for the L tones) has been included in the calculation to account for downstep in two African languages: Igbo (Liberman et al. 1993) and Dschang Bamileke (Bird, 1994). Igbo is a two-tone Kwa language with downdrift and phonological downstep. Interestingly, Liberman et al. (1993) note that their equation ¿ts downdrift realizations but not downstep realizations, which indicates that, in Igbo, downstep and downdrift are phonetically different. Dschang-Bamileke is known for its complex tonology. It has no downdrift and the phonological nature of its downstep has been debated (“partial”/“total”). Bird’s mode of calculation is partly different from Liberman et al.’s and is applied to alternating L and !H tones and not to sequences of downsteps (H!H!...). Van Heuven (2004) studied the realization of Dutch downstep, which is not an “automatic downstep” as in English, Spanish or Japanese. It does not result from the alternation of H and L tones but can be analyzed as triggered by Àoating L tones and H spreading as in many African languages. Thus, sequences of downsteps in Dutch are realized as successions of terraces. Van Heuven found that equal steps between downsteps can be retrieved, provided that measurements were given in ERB. The conversion of hertz into ERB involves a logarithm, which means again, an exponential relationship. In this view, there is no need for a reference line in the equation. Moreover, Van Heuven noticed three reference values (or lines) in the system: 1) an H value, which is the value of the ¿rst H tone of an utterance, independently of the number of downsteps it contains, 2) a last H value, the value of the last H in a sequence of downsteps, independently of the number of downsteps, 3) a last L value, the ending point of all utterances. Note that these lines, which will also be found in Dagara (see section 2), do not intervene in the downstep calculation in Dutch. The last study that we will be mentioning, by J. T. Hogan and M. Manyeh (1996), concerns Kono, a two-tone Mandé language, with automatic
112
Annie Rialland and Penou-Achille Somé
downstep or downdrift and phonological downsteps. The utterances that were studied contained automatic as well as non-automatic downsteps (sequences such as H!H). These authors found equal steps between downsteps when measurements were in musical intervals. Consequently, they did not need a reference line in their calculation of downsteps. The relationship between two downsteps can simply be expressed by the number of semitones between them. From the preceding review of the articles on downdrift/downstep, we can draw one important conclusion: equal steps without asymptote (in ERB or semitones) have been found in languages with sequences of downsteps (H!H) or terraces as in Dutch or Kono. In the next part, we consider Dagara downsteps, more precisely, the realizations of “non-automatic” downsteps or downsteps triggered by a Àoating L tone, showing that they are “equal-step” downsteps based on measurements in musical intervals. We also consider Akan-Asante downsteps based on Dolphyne’s data (1994), proposing the same kind of analysis as in Dagara. Some data on Dagara downdrift is also considered, showing that downstep and downdrift with alternating H and L tones are implemented differently. 1.
Dagara-Wulé downstep
The present study concerns the Wulé dialect of Dagara, spoken in Burkina Faso. Dagara is a Gur language of the Oti-Volta sub-family. The main references on Dagara-Wulé tonology and phonology are: Systématique du signi¿ant en Dagara, variété Wulé, by P.-A. Somé (1982), “L’inÀuence des consonnes sur les tons en dagara: langue voltaïque du Burkina Faso”, P. A. Somé (1997), and “Dagara downstep: How speakers get started” (A. Rialland and P.A. Some. (2000). The last publication provides our starting point in this chapter. Dagara-Wulé is a two-tone language, with many polar tone af¿xes. As a result, all L tone utterances are very short; it is not possible to get long sequences of L tones, which could provide a reliable L reference line, similar to the Yoruba one. We will consider only the H reference line, the relevant line for our purposes. Besides downdrift (or “automatic” downstep) triggered by L tones realized on a syllable, Dagara has phonological downstep due to Àoating L tones. These Àoating L tones occur in many words or across word boundaries and, consequently, an utterance could contain several downsteps, such as the following one:
Downstep and linguistic scaling in Dagara-Wulé 113
(1) dábá ƾmá!Ĕ jܭғ!Ď ló!ná HH H’H H’H H’H HH HLH HLH HLH 1 1 1 2 2 3 3 4 man turtledove egg fell down « The egg of the man’s turtle dove fell down » The numbers represent the levels of the H tones in a traditional way: 1 being the highest H tone and 4 the lowest H tone. Downsteps are due to Àoating L tones, which are indicated with an underlined L in the second line of tonal notation. Note that a coda consonant is moraic and bears a tone. In this example, -Ĕ at the end of ƾmá!Ĕ and –Ď at the end of jѓ̗!Ď both bear a downstepped H tone. The phonological nature of these downsteps, due to Àoating L tones, is antagonistic to the presence of a H reference line similar to the Yoruba one as they are distinctive and cannot simply be cancelled as is the case with Yoruba non-distinctive downstep. Considering the Dagara-Wulé system, the questions are the following: How does Dagara keep its phonological downsteps realized? How do the downstep intervals vary? Are they kept constant or not? Are there asymptotes, baselines? Are there anticipatory raisings associated with this type of downstep and the sequences of downsteps, and what are their nature (phonological, phonetic or paralinguistic)? Attempting to answer these questions, we will mainly consider two corpora. The ¿rst corpus contains sentences with an increasing number of downsteps as well as all-H-tone-sentences of various lengths, and the second one includes a large set of sentences, randomly selected. They differ also by the task involved: reading for the ¿rst corpus and repetition after the second author for the second corpus. 2.1. Analysis of a read corpus Consider the ¿rst corpus which includes utterances with an increasing number of H tones, such as the following: 1. 2. 3.
dábá dábá bíé dábá bíé pܧғg táráná
“a man” “a man’s child” “a wife of a man’s child is arriving”
It also contains utterances with an increasing number of downsteps such as the following:
114
3. 4. 5. 6. 7.
Annie Rialland and Penou-Achille Somé
dábá “a man” dábá ƾmá!Ĕ man turtle-dove “a man’s turtle-dove” dábá ƾmá!Ĕ jܭғ!Ď lón!á man turtle-dove egg fell down “the egg of the man’s turtle-dove fell down” dábá ƾmá!Ĕ jܭғ!Ď pú!rá ƾmȓ!ná pݜғܧғ man turtle-dove egg burst sun in “the egg of the man’s turtle-dove bursts in the sun” báá!rܭғ ƾmá!Ĕ jܭғ!Ď pú!rá ƾmȓ!ná pݜғܧғ Baare turtle-dove egg burst sun in “the egg of Baare’s turtle-dove bursts in the sun”
The sentences were read and recorded in various orders, interspersed with distractors by three bilingual French-Dagara male speakers in Paris. Each of them was presented at least twice and repeated three times. Some results based on this corpus have been presented in Rialland and Somé (2000). A fourth male speaker (speaker A) was recorded in Burkina Faso. In Rialland and Somé (2000), we found that, as in Yoruba, all H tone utterances are basically Àat with an optional ¿nal lowering and that they do not exhibit any anticipatory raising associated to their length or the number of H tones that they contain. The Àatness of all-H-tone utterances and the absence of anticipatory raising in these sentences are also con¿rmed by the second corpus and will be exempli¿ed by examples taken from this corpus (see Figures 7 and 8). In the following paragraphs, we will refer to the regression lines of the all-H-tone utterances which were calculated in Rialland and Somé (2000). In utterances with downstep, we will consider the nature of the intervals and the question of the equality of intervals between downsteps. We begin with the following graphs that show the realization of ¿ve downstep utterances (5D utterances) by four Dagara-Wulé speakers. The sentence is the following: 7.
báá!rܭғ ƾmá!Ĕ jܭғ!Ď pú!rá ƾmȓ!ná pݜғܧғ Baare turtle-dove egg burst sun in « the egg of Baare’s turtle-dove bursts in the sun »
Measurements were taken in order to minimize consonantal inÀuence (in general in the middle of the vowels) and transitional effects. F0 is measured on the following syllable when a downstep domain begins with a moraic
Downstep and linguistic scaling in Dagara-Wulé 115 10 8
semitones
6 4
A D C B
2 0 –2 –4 –6 –8
1 H
2 H
!
3 H
!
4 H
!
5 H
!
!
6 H
Figure 2. Downstep F0 measurements in 5D-utterances, as realized by 4 Dagara speakers (A, B, C and D).
consonant. Thus, the value of the second downstep (point 3 on the abscissa) is measured on the syllable jܭғ!Ď. The F0 value is taken on the second syllable when two syllables form the downstep domain, on ƾmȓ ! (point 5) in pú!rá ƾmȓ!ná, for example. Each point corresponds to the mean of 6 repetitions. The unit chosen for this graph is the semitone (different from Rialland and Some 2000). The conversion between hertz and semitones is based on the following equation: f st = 12 log2 (f hz/127.09) with a 0 line at 127.09 Hz, a reference value which can be used for male and female voices (cf. Traunmüller and Eriksson 1995, Traunmüller 2005). This line is helpful for comparing speakers. The semitone is used as the unit in this graph since semitones are considered the best unit for speaker comparisons (see Nolan 2003, in particular) and since we expect semitones to be the appropriate units for our study of downstep (see Hogan and Manyeh, 1996, for Kono). The mean values of downstep intervals, referred as i, are the following: speaker A: 〈i〉 = 1.8 st (ı = 0.3 st), speaker B: 〈i〉 = 2 st, ı = 0.2, speaker C: 〈i〉 = 1 st, ı = 0.2 st, speaker D 〈i〉 = 1 st, ı = 0.2 st. The downsteps are roughly equal (ı being between 0.2 and 0.3 st) along the utterances but differ depending upon speaker (mean values were between 1 and 2 st). This speaker-dependent difference is not surprising: it is related to the pitch range of each speaker. Two speakers (A and B) have a relatively large and similar pitch range (9 semitones), while being different in terms of global pitch height (speaker A being 7 semitones above speaker B). The two other speakers (C and D) have a much smaller pitch range (5 semitones) and differ
116
Annie Rialland and Penou-Achille Somé
only slightly in terms of global pitch height (2 semitones). These values of downsteps (between 1 and 2 semitones) are rather small, as pointed out by various Africanists who listened to our recordings. Note that since downstep intervals in semitones are equal (or roughly equal), the ratio of any two following downstep intervals in hertz is constant (or nearly constant). This ratio is comparable to the constant d in Liberman and Pierrehumbert (1984)’s equation. Constant ratios could account for F0 values if the intervals in semitones remain stable, in the following way: Hn = d(Hn–1). The difference with Liberman and Pierrehumbert’s equation is that there is no reference line (r), as in the calculation of musical intervals. In our 5 D utterances, the mean value of the ratios between two downsteps are the following: Speaker A: 0.90, Speaker B: 0.89, Speaker C: 0.94, Speaker D: 0.94. These mean ratios account rather well for the measurements, as the standard deviation of the differences between measured values and predicted values based on these mean ratios is 2 Hz and as no difference between a measured value and a predicted value exceeds 4 Hz. We now consider graphs of utterances with between one and six downsteps by our four speakers. There is an overlay of three reference lines (the all-H-tone regression line is dashed and the last H reference line for (a) Speaker A 10 8 1D 2D 3D 4D 5D 6D
semitones
6 4 2 0 –2 –4
1 H
2 H
!
!
3 H
!
4 H
!
5 H
!
6 H
7 H
!
Figures 3a, b, c, d. Downstep F0 measurements in 1–7D utterances, as realized by four Dagara speakers. Three reference lines are overlaid: the allH-tone regression line is dashed and the two last H tone lines (one up to 5D utterances, the second in 5–7D utterances) are plain. An arrow indicates also the distance between the highest H and the all-H-tone regression line. (Continued)
Downstep and linguistic scaling in Dagara-Wulé 117 (b) Speaker B 4 2 2D 3D 4D 5D 6D 7D
semitones
0 –2 –4 –6 –8 –10
1
H
2 !
3
H
!
H
!
4
5
6
H
!
H
!
H
7
8
H
!
H
!
(c) Speaker C 6 4 semitones
2
2D 3D 4D 5D
0 –2 –4 –6 –8
1
2
H
3
H
4
H
!
!
!
5
H
!
6
H
!
H
(d) Speaker D 8 6 semitones
4
2D 3D 4D 5D
2 0
–2 –4 –6
1
H
2 !
3
H
!
H
4 !
H
5
H
!
Figures 3. (Continued)
6
H
!
118
Annie Rialland and Penou-Achille Somé
1 to 4 downsteps as well as the last H reference line for a higher number of downsteps are plain). An arrow indicates the distance between the highest H and the all-H-tone regression line. A visual examination of these graphs indicates that downsteps within a given sequence (with 1, 2 … or 7D) tend to be equal, except for the last one in short utterances, which is larger. Note that Dagara provides a mirror image of Yoruba, as in Dagara it is the last step which tends to be larger and not the ¿rst one. In order to provide a quantitative approach to the question of the equality of downsteps, we will consider the ratios between downsteps, which are directly related to the size of intervals in semitones, as seen above, the means of these ratios and their standard deviations. Speaker A has rather equal downsteps in 5D and 6D utterances (mean ratio = 0.91, σ = 0.03), in 3D utterances (mean ratio: 0.90, σ = 0.03), excluding the last step, which is more important. These values are very close to the ones we found in 4 D utterances. Speaker B has fairly stable intervals of downsteps in 5D utterances (mean ratio = 0.90, σ = 0.03), in 3D utterances (mean ratio = 0.90), with only two steps considered. Speaker C also has rather equal intervals in 5D utterances (mean ratio = 0.94, σ = 0.02), in 3D utterances (0.92). Note that intervals are slightly larger in 3D utterances than in 5D utterances, which indicates a slight compression of the intervals in longer utterances. Speaker D behaves in the same way as speaker C: He has equal steps in 5D utterances (mean ratio = 0.94, σ = 0.02) and in 3D utterances (mean ratio = 0.92). The stability of the ratios between two downsteps, as expressed by their means and standard deviation, con¿rms the visual examination of the graph in semitones: The intervals between downsteps are fairly equal in semitones (and, as a consequence, in the ratios corresponding to the musical intervals). We now consider the reference lines. In Yoruba, realizations of all-H-tone utterances provide a reference line for the realization of H tones in Yoruba. In Dagara, H tone lines overlaid on our graphs are regression lines of all-H-tone utterances. We observe that the position of this line varies greatly from one speaker to another. For example, it is rather low in speaker A’s range, close to the last downstep in 1 to 4D utterances. It is higher within other speakers’ ranges, in particular in that of speaker B. Thus, the H tone line varies in a speaker’s range and is clearly not an asymptote, as many downstepped H tones are realized below its level. We now consider the ¿nal H tone reference line. We observe that ¿nal H tones tend to be realized on the same line, independently of the number
Downstep and linguistic scaling in Dagara-Wulé 119
of downsteps. The tendency for the last tone to be realized on a baseline provides an explanation for the larger step when the sentence is shorter: a bigger step is needed to reach this baseline. Moreover, a second H tone baseline can be recognized for the ¿nal H tone of speakers A and B when utterances get longer. These two speakers lower their voice one more degree in order to implement an increased number of downsteps (up to six or seven), while the maximum number of downsteps realized by the other speakers is ¿ve. The arrow on the left of each graph indicates the difference between the highest H tone and the H tone line, and consequently the maximum amount of anticipatory raising in the realization of the H tones. The maximum can reach 6 st or be much smaller (2 st). These data show that all speakers start higher than the H tone reference line, if there is at least one downstep. However, the amount of anticipatory raising varies. Our corpus, which includes sentences with an increasing number of words and downsteps, clearly favors this anticipatory raising. Nonetheless, it should be noted that there is a clear-cut difference between the absence of anticipatory raising in all-H-tone utterances and its presence in sequences with H and L tone alternations. The large variation in anticipatory raisings suggests a global adjustment in pitch range in order to accommodate a larger number of intervals. Let’s recapitulate our conclusions. Dagara-Wule downstep intervals tend to be fairly constant except for the ¿nal one, when expressed in semitones. Further, speakers have different pitch ranges and intervals. It was also found that the all-H-tone line is not an asymptote. Speakers tend to reach a baseline at the end of the downstep sequence and increase the last step, if necessary to reach it. There is also a lower baseline for some speakers, at the end of utterances with more than ¿ve downsteps. Moreover, anticipatory raising is always present but its amplitude varies, depending upon the speakers. In the following section we brieÀy consider downdrift in order to compare its patterning with downstep. 2.2.
Comparison with downdrift
Consider downdrift realization in speaker A’ s utterances with alternating LH tones. At ¿rst blush, it can be seen that the pattern shows asymptotic effects. One of the signatures of an asymptotic pattern is the fact that the ¿rst step is
120
Annie Rialland and Penou-Achille Somé 10 8 1 2 3 4 5 6
semitones
6 4 2 0 –2 –4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 L H L H L H L H L H L H L H
Figure 4. F0 measurements in utterances with L and H tone alternations, as realized by speaker A.
the largest. In this case, it is between 3 and 5 semi-tones, while it was around 2 semitones in the downstep realizations of the same speaker (see Figure 3a). The following steps then decrease rapidly. After the third step, the lowering tends to be quite small until the end of utterance where a ¿nal lowering is observed, strongly pulling down the pitch of the last H tone and slightly modifying the pitch of the penultimate H tone. In this chapter, we do not attempt to determine the calculation of this downdrift. However, like the other downdrift mentioned previously, it does not have equal steps in terms of semitones and its general pro¿le is asymptotic. It can be compared to a progressively dampened oscillation. 2.3.
Similarities with Akan-Asante downstep and downdrift
Akan-Asante downstep seems to share many properties with Dagara downstep. Like Dagara, Akan-Asante has two tones with downstep. Further, anticipatory raisings related to downstep realizations have been observed in this language (Stewart, 1965). We converted data taken from Dolphyne (1994) into semitones and plotted them as we did for Dagara. The following graph (Figure 5) shows realizations of 3D utterances by ¿ve Akan-Asante speakers. The graph clearly shows regularity and con¿rms “the fairly uniform pitch drop” between downsteps noticed by Dolphyne on the basis of her
Downstep and linguistic scaling in Dagara-Wulé 121
10 8 semitones
6
sp.1 sp.2 sp.3 sp.4 sp.5
4 2 0 –2 –4
1
2
3
4
Figure 5. Downstep F0 measurements in a 3D utterance by ¿ve speakers in AkanAsante (based on Dolphyne’s data, 1984)
measurements in hertz and her own knowledge of Akan-Asante, as a native speaker of the language. Akan-Asante also seems to be similar to Dagara in terms of the difference between downstep and downdrift. Moreover, Akan-Asante has the signature of an “asymptotic” downdrift: a large ¿rst step.
2.4. Analysis of a corpus of repetitions The ¿rst corpus, produced by bilinguals, included sentences with an increasing number of words and downsteps. We found that their structure could be easily predicted and this could inÀuence anticipatory raisings. The second corpus involves speakers who are not bilingual. Additional recording sessions were organized in Burkina in the Wulé dialect region. However, as monolingual Dagara speakers are illiterate, the second author, who is a native speaker of Dagara-Wulé, read the sentences and had the speakers repeat after him. In this way, the data came from what could be referred to as a “repetition task”. Each speaker was recorded independently. The sentences were presented in three different random orders and repeated three times. Thus, each sentence was recorded nine times. A variety of repetition tasks have been used in psycholinguistics. Repetition tasks and shadowing tasks (an extreme form of repetition task)
122
Annie Rialland and Penou-Achille Somé
have been used to test various theories of speech processing and to explore links between perception and production (cf. Marslen-Wilson 1973, Mitterer and Ernestus 2008, among others). In these various tasks, it has been shown that speakers do not imitate but instead rely on their own semantic, syntactic and phonological processing. In fact, the difference between repetition and imitation belongs to everyday experience. When repeating, speakers keep their own variants of consonants, vowels and phonological rules (for example, their own pattern of dropping schwas in French), while when imitating, they attempt to reproduce precisely the speaker’s variants. Keeping these observations in mind, let’s consider the second corpus which includes forty utterances of different lengths and tone patterns, such as the following: (8) à nȓb´ܭ the people (9) à bíé the child
wá n!á came bá Neg.
kúl é is gone Neg “The child is not gone.”
(10) à sáá mon!á kà zݜғmܧғ y!í the rain rained and ¿sh got out (11) à bìbìl the child
“The people came.”
“It rained and the ¿sh got out.”
kݜҒn wón wúló “The child who does not whoNeg. hear adviser listen to an adviser,
ݜݦғlܧғnݜғ kà bܧҒzܻҒ `ܭmá wúl is the one who is advised him-it-is that hole-red Hab. advis´ε. by the red hole (=tomb)” Ý (12) nààn nááb! bà dàng bè mág!ángd ε chief cow Neg. never be other- side-river Neg “The chief’s cow must never be on the other side of the river” (proverb) We begin by comparing the same long utterance as produced by the ¿ve speakers (1–5). The values are plotted in hertz (Figure 6a), ERBs (Figure 6b) and semitones (6c). The utterance is: (13) à dáán bèná ƾmܻҒnná !yíd á dààká dìoĔ “There is millet beer in the compound of Nminna in Daaka’s house.”
Downstep and linguistic scaling in Dagara-Wulé 123
hz
(a)
350 325 300 275 250 225 200 175 150 125 100 75
(b)
sp. 1 sp. 2 sp. 3 sp. 4 sp. 5
1
2 3 4 5 6 7 8 9 10 11 12 13
9 8 sp. 1 sp. 2 sp. 3 sp. 4 sp. 5
ERB
7 6 5 4 3
(c)
1
2 3 4 5 6 7 8 9 10 11 12 13
17,5 15
semitones
12,5 sp. 1 sp. 2 sp. 3 sp. 4 sp. 5
10 7,5 5 2,5 0 –2,5 –5
1
2 3 4 5 6 7 8 9 10 11 12 13
Figure 6. F0 curves of the utterance: à dáán bèná ƾmѡ̖nná !yíd á dààká dìoĔ “There is millet beer in the compound of Nminna in Daaka’s house.” by 5 speakers, in hertz (Figure 6a), in ERBs (Figure 6b), and semitones (Figure 6c).
124
Annie Rialland and Penou-Achille Somé
Each point corresponds to three repetitions. There is one measurement on short vowels and two on long vowels or long diphthongs such as ion [yܧѺ:] in dìoĔ. The tessitura of the speakers differs as follows. Three male speakers (speakers 2, 4, 5) have a low-pitched voice, one male speaker (speaker 3) has a higher-pitched voice than the three other male speakers, and the female speaker (speaker 1) has the highest tessitura. This sample of voices covers one octave. At ¿rst blush, it can be seen that there is a striking parallelism between the ¿ve realisations, which is even clearer when represented in ERBs or semitones. We evaluated the parallelism numerically, using coef¿cients of variation of the differences between curves. If this coef¿cient is 0%, it means that the distance between 2 curves does not vary and, consequently that the 2 curves are parallel. We compared the realizations of speakers 1–3, two at a time, leaving aside speakers 4 and 5 which are quite similar to speaker 2. The mean of the coef¿cient of variation between the curves of these speakers are: hertz: 〈cυ〉 = 24%, ERBs〈cυ〉 =11%, semitones 〈cυ〉 = 10%. While a higher score for Hz was expected, it is well known that hertz is not a suitable unit to compare speakers. There is a slight advantage to using semitones as opposed to ERBs in terms of parallelism among the three speakers. These results are consistent with Nolan’s (2003) results which were based on an imitation task in English. Based on our results and Nolan’s similar results, we chose to use semitones as a unit for our study, keeping in mind that we might have obtained rather similar results with ERBs. The semitone was also a more convenient unit since it is a musical unit, which could then be used for comparison when considering musical scales. All speakers execute the same score with similar intervals in terms of semitones. The parallelism between the ¿ve realizations indicates that the process involved is a simple transposition. There are two possible explanations for this parallelism. First, we can hypothesize that speakers extract the musical score directly from Achille Somé’s speech and conform their speech to his score or that they parse the sentence at all linguistic levels (phonetically, syntactically, etc.) and produce an analog of it, according to their linguistic knowledge and patterns. The second hypothesis implies that they produce the same score, because it is the score that they would have produced anyway, given their linguistic experience and the whole context. Traditional arguments in favor of the second hypothesis come from mistakes (Marslen-Wilson 1973). In fact, we also found mistakes in our corpus. One of the most common is the omission of the de¿nite article or conversely, the addition of a de¿nite marker. This type of mistake supposes that the speakers
Downstep and linguistic scaling in Dagara-Wulé 125
go through a complete linguistic path (semantic, syntactic, morphological…) in order to produce their utterances. This second hypothesis would imply that speakers encode similar intervals in similar contexts. Consider additional utterances by the ¿ve speakers, beginning with two all-H-tone utterances (except for the ¿rst tone) of different lengths (Figures 7 and 8). These sentences are rather Àat (except for the L tone at the beginning), with a slight optional lowering on the last word of the longer utterances. There is no signi¿cant difference between the H maxima of these sentences (plus another all-H-tone utterance), despite their length differences (ANOVA: F (2,44) = 0.36, p = 0.7). These sentences have been interspersed with many non-all-H sentences, which prevents any inÀuence between them. The curves show that all speakers tend to be consistent in the production of their H tone line, independently of the length of the utterance. The data also con¿rms that there is no anticipatory raising associated with sequences of H tones. In all of these sentences, there is a L tone at the beginning and we assume that it has no inÀuence on the following H tones except a local inÀuence on the ¿rst H tone, which is lowered. This is veri¿ed in examples where an initial à is present in one form but absent in another; the presence of à with a L tone does not modify the pitch of the following H tones, except for the ¿rst one. Noticing the same fact in Yoruba, Laniran and Clements (2003) concluded that downstep is triggered only by the HL order and not by the reverse tonal order.
20
semitones
15 sp. 1 sp. 2 sp. 3 sp. 4 sp. 5
10 5 0 –5
–10
1 L
2 H
3 H
4 H
5 H
6 H
Figure 7. F0 curves of the all-H-tone-utterances (except for the L at the beginning à bíé bá kúlé “The child is not gone.” by 5 speakers.
126
Annie Rialland and Penou-Achille Somé 20
semitones
15 sp.1 sp.2 sp.3 sp.4 sp.5
10 5 0 –5
–10
1 L
2 H
3 H
4 H
5 H
6 H
7 H
8 H
Figure 8. F0 curves of the all H tone utterance (except for the L at the beginning): à dábá bíétáráná “A man’s child is arriving” by 5 speakers
Let’s now consider an utterance with one downstep (Figure 9). The literal translation of this utterance is as follows: à tݜғnbܧғ z!úó má ná the work overcome me Assert “I have too much work” 20
semitones
15 sp. 1 sp. 2 sp. 3 sp. 4 sp. 5
10 5 0 –5
–10
1
2
3
L
H
H
4
5
H
H
!
6
7
H
H
Figure 9. F0 curves of the 1D utterance: à t҂̗nbэ̗ z úómáná “I have too much work” by 5 speakers. !
Downstep and linguistic scaling in Dagara-Wulé 127
The single downstep in the utterance is quite large. Its interval corresponds to 3 or 4 semi-tones, depending upon the speaker. The realizations of the ¿ve speakers remain quite parallel. However, the realizations differ in terms of the amplitude of the anticipatory raisings, as shown by the graphs in Figure 10, which combine for each speaker two F0 curves shown previously: his/her mean F0 curve of an allH-tone utterance (Figure 8) and his/her mean F0 curve of an 1D utterance (Figure 9). The F0 curves of all-H-tone utterances with a L at the beginning (or (L) H…) are represented by lines with empty circles, while lines with plain circles refer to F) curves of 1D utterances. The ¿gures have been ranked in descending order of anticipatory raisings, the ¿rst speaker (speaker 1) having the largest anticipatory raising and the last speaker (speaker 5) having almost none. These examples con¿rm our ¿ndings in the ¿rst corpus: anticipatory raisings vary considerably from one speaker to another and the H tone line does not provide an asymptote in the system. We now consider an utterance with two downsteps (Figure 11): à sáánná w!ánȓná n!ܭғnd káán the foreigner brought meat fat “The foreigner brought fat meat” The realization of this sentence illustrates the difference between the ¿rst downstep and the last one in a sequence with two downsteps: the mean value for the ¿rst downstep interval is 2 st while it is 4 st for the last one. We now consider an utterance with three downsteps: one of them triggered by a L tone and the 2 others by a Àoating L tone (Figure 12): kܧғn b!á kpȓܭҒd- ݜғ!ܭғ hunger NEG get into him NEG “He is not hungry.” The downstep interval is around 2 semitones. It can be noted that the dropping interval due to a low tone realized on a mora (on ܭҒ of kpȓѓ̖d-҂̗) is more important than the downstep interval. These data also con¿rm the previous ¿ndings that the last step of the downstep is not increased when the tone realizations get closer to a low baseline. In this second corpus, we found that the speakers’ transpositions are parallel and the variations in the speakers’ pitch ranges are small. This parallelism could not be achieved if the “scores” played by all of the speakers
128
Annie Rialland and Penou-Achille Somé (a) Speaker 1 20
semitones
15 10 1D (L)H
5 0 –5
–10
1
2
3
4
5
6
7
8
(b) Speaker 4 20
semitones
15 10 1D (L)H...
5 0 –5
–10
1
2
3
4
5
6
7
8
(c) Speaker 2 20
semitones
15 10 1D (L) H...
5 0 –5
–10
1
2
3
4
5
6
7
8
Figure 10. F0 curves of a (L)H… utterance (line with light circles) and of a 1D utterance (line with black circles) for ¿ve speakers. (Continued)
Downstep and linguistic scaling in Dagara-Wulé 129 (d) Speaker 3 20 15 10 1D (L) H...
5 semitones
0 –5
–10
1
2
3
4
5
6
7
8
(e) Speaker 5 20 15 semitones
10 (L) H... 1D
5 0 –5
–10
1
2
3
4
5
6
7
8
Figure 10. (Continued) 20
semitones
15 10 5 0
sp. 1 sp. 2 sp. 3 sp.4 sp.5
–5 –10
1 2 3 4 5 6 7 8 9 10 Figure 11. F0 curves of the 2D utterance: à sáánná w!ánȓnán!ܭғnd káán “The foreigner brought fat meat”
130
Annie Rialland and Penou-Achille Somé 20
semi-tones
15 sp.1 sp.2 sp.3 sp.4 sp.5
10 5 0 –5
–10
1
2
3
4
5
6
Figure 12. F0 curves of the 3D utterance: kэ̗n b!á kpȓѓ̖d-҂̗!ܭғ “He is not hungry.” were not based on similar intervals. The speakers parsed the “score” of the model in terms of intervals and transposed it within their tessitura. While the “score” was reproduced by the speakers in a similar manner, we showed that they vary in terms of tessitura, as well as in terms of their anticipatory raisings. This con¿rms that the raising is part of a general adjustment of the voice in order to make room for the realization of numerous intervals. While we expected to see variations in pitch range in the repetitions since pitch range is speaker-dependent, the variation is also linguistically signi¿cant. Expansion of pitch range, for example, is typically used in questions and focus. Reduction of pitch range is common in post-focus. In Dagara, there are important pitch ranges variations in discourse (foregrounding and backgrounding, in particular). We suggest that in these repetitions, pitch ranges and the intervals associated to them have been reproduced, probably because they belong to the linguistic system of Dagara. 4.
Comparison with the scaling of the Dagara Wulé eighteen key xylophone
In this paragraph, which is quite tentative, we make an attempt to compare linguistic and musical scaling in Dagara-Wule, based on a preliminary study of the scaling of an eighteen key xylophone, belonging to the second author’s family.
Downstep and linguistic scaling in Dagara-Wulé 131
The xylophone, which was in the family compound of the second author’s family in Dagara-Wulé region, was recorded over the phone from Paris. The keys were struck by a xylophone player in a decreasing order in terms of pitch, from the highest to lowest. Note that only seventeen keys out of the eighteen keys are played in the Dagara Wulé music. Thus, the eighteenth key was not considered in our test. F0 measurements were made on the stable part of each note with PRAAT (Boersma and Weenink, 2010), from narrow band spectrograms and spectrum slices. When the ¿rst harmonic was not available, F0 was inferred from the pitch difference between two successive harmonics. A musical notation was also provided by a French musician and singer who was not familiar with African temperaments. It includes usual notes such as A, B, etc. and + or – symbols to indicate whether the pitch of a key was higher or lower than the note used to transcribe it. This notation appears on line (1) in the table below. The table also shows the values of each key in hertz (line 2) and in semitones (line 3) with a baseline at 127.09 hz (the same as in the previous studies). The scaling is pentatonic, with a D note (or C#) recurring regularly after four other notes (keys: 1, 6, 11, 16). Note that the pitch of one key (key 14) could not be established. We now consider the intervals, as they have been transcribed by our French musician and established from the measurements: In the musical notation, intervals vary basically between 2 semitones and 2.5 semitones (except for the last one, which is larger). These two intervals do not seem to be organised into any recursive pattern within the pentatonic scale. The measured intervals show a slightly larger dispersion: from 1.8 st to 2.7 st. Again, no recursive pattern seems to emerge. The mean of these Table 1. Musical notation, hertz values, semitones values of a 17 key DagaraWule xylophone. key
1
2
3
4
5
6
7
8
B
A
G-
E
D–
B
A+
(2) hertz (3) st
C# 560 25.7
506 23.9
450 21.9
393 19.5
336 16.8
292 14.4
251 11.8
223 9.7
key
9
10
11
12
13
14
15
16
17
F# + 190
E
D
B+
A
?
E
D
B
163
147
126
110
?
85
73
62
7
4.3
2.5
–0.1
–2.5
?
–7
–9.6
–12,4
(1) note
(1) note (2) hertz (3) st
132
Annie Rialland and Penou-Achille Somé
Table 2. Intervals between the 17 xylophone keys, as transcribed by a French musician (line 1) and measured with PRAAT (line 2). The unit is the semitone. Intervals between keys
1/2
2/3
3/4
4/5
5/6
6/7
7/8
8/9
(1) Notation (2) Measur.
2 1.8
2 2
2.5 2.4
2.5 2.7
2.5 2.4
2.5 2.6
2.5 2.1
2.5 2.7
12/13 13/14 14/15
15/16
16/17
2 2.6
3 3.2
Intervals between keys
9/10
10/11
11/12
(1) Notation (2) Measur.
2.5 1.8
2 1.8
2.5 2.6
2.5 2.4
? ?
? ?
intervals (excluding the last one) is 2.4 semitones with a 0.4 st standard deviation. Based on the absence of recursive patterns and the rather small standard deviation, we suggest analyzing the Dagara-Wulé xylophone intervals as (roughly) equal, within a pentatonic scale. Consequently, we hypothesize that there is a relationship between the linguistic scaling in Dagara-Wulé, as manifested in downstep sequences, and the musical scaling in the same culture, as found in a eighteen key xylophone. Both seem to share a common basis: (roughly) equal steps in terms of semitones. However, this common point might be coincidental, as downstep is widespread in African languages and there is a large variety of tunings found in African music. 5.
Conclusion
In the ¿rst part of this chapter, we examined the realization of downsteps in Dagara-Wulé by ¿ve speakers and showed rather equal intervals between them when they are expressed in semitones. In the second part, it was shown that in a repetition task, the productions of the speakers were quite parallel within a musical scale (with tones, semitones, cents). These two sets of data converge towards a hypothesis: There is a linguistic pitch scaling in a language such as Dagara (with two tones and downstep) based on musical type intervals (de¿ned by a ratio between two frequencies). This pitch scaling emerges in these two phenomena but does not determine the whole tone realization. Thus, an equal step-based downstep co-exists
Downstep and linguistic scaling in Dagara-Wulé 133
with an asymptotic downdrift, an oscillating con¿guration which might be triggered by constraints on the production of alternating L and H tones. The Dagara culture is also well known for its xylophone music. Considering the scaling of the eighteen key xylophone of Penu-Achille Somé’s family, we hypothesise that downstep scaling and xylophone scaling might have a common point: (roughly) equal steps, in terms of semitones. This study is quite tentative, being based on the scaling of one instrument, and could only be considered a ¿rst step in the investigation of relationships between linguistic and musical scalings in this culture. Analyzing Dagara pentatonic xylophone music in its various components as well as speech transposition on the xylophone would shed light on the role of intervals and reference lines in music and in speech. References Bird, Steven 1994 Automated Tone Transcription. In S. Bird (ed.). Proceedings of the First Meeting of the ACL Special Interest Group in Computational Phonology. Las Cruces (MN, USA): ACL Boersma, Paul and David Weenink 2010 PRAAT: doing phonetics by computer. http://www.fon.hum.uva.nl/ praat Dolphyne, Florence A. 1994 A Phonetic and Phonological study of Downdrift and Downstep in Akan. ms. van Heuven, Vincent J. 2004 Planning in speech melody: production and perception of downstep in Dutch. In H. Quené and V. J. van Heuven (eds.). On speech and language: Studies for Sieb G. Nooteboom, 83–93. LOT Occasional Series. Utrecht University. Hogan John T., and Morie Manyeh 1986 Study of Kono Tone Spacing. Phonetica 53: 221–229. Laniran, Yetunde. O. 1992 Intonation in tone languages: the phonetic implementation of tones in Yoruba. Ph.D. diss., Cornell University. Laniran, Yetunde O., and George N. Clements 2003 Downstep and high raising: interacting factors in Yoruba tone production. Journal of Phonetics 31. 203–250 Liberman, Mark, J. Michael Shultz, Soonhyun Hong, and Vincent Okeke 1993 The phonetic interpretation of tone in Igbo. Phonetica 50. 147– 160
134
Annie Rialland and Penou-Achille Somé
Liberman, Mark, and Janet Pierrehumbert 1984 Intonational invariance under changes in pitch range and length. In: M. Aronoff, and R. T. Oehrle (eds.), Language and sound structure. 157–233. Cambridge, MA: MIT Press. Marslen-Wilson, William 1973 Linguistic structure and speech shadowing at very short latencies. Nature 244. 522–523 Mitterer, Holger, and Mirjam Ernestus 2008 The link between perception and production is phonological and abstract: Evidence from the shadowing task. Cognition 109. 163–173 Myers, Scott 1996 Boundary tones and the phonetic implementation of tone in Chichewa. Studies in African Linguistics 25: 29–60. Nolan, Francis 2003 An experimental evaluation of pitch scales. Proceedings of the 15th Congress of Phonetic Sciences. Barcelona. 771–774 Pierrehumbert, Janet B., and Mary E. Beckman 1988 Japanese Tone Structure. Linguistic Inquiry Monographs 15. Cambridge (Ma, USA). MIT Press Prieto, Pilar, Chilin Shih, and Holly Nibert 1996 Pitch downtrend in Spanish. Journal of Phonetics 24. 445–473 Rialland, Annie, and Penu-Achille Somé 2000 Dagara downstep: how speakers get started. In V. Carstens and F. Parkinson (eds.). Advances in African Linguistics 251–262. 4. Trenton (N.J., U.S.A.) Africa World Press. Somé, Penu-Achille 1982 Systématique du signi¿cant en Dagara: variété wúlé. Paris. L’Harmattan-ACCT Somé, Penu-Achille 1997 InÀuence des consonnes sur les tons du Dagara: langue voltaïque du Burkina Faso. Studies in African Linguistics 27–1. 3–47 Stewart, John M 1965 “The typology of the twi tone system”, preprint of the Bulletin of the Institute of African Studies I, Legon. Traunmüller, Hartmut 2005 Auditory scales of frequency representations. http://www.ling.su.se/staff/hartmut/bark.htm Traunmüller, Hartmut, and Anders Eriksson 1995 The perceptual evaluation of F0 excursions in speech as evidenced in liveliness estimations. Journal of the Acoustical Society of America 97. vol 3. 1905–191
2. The representation and nature of phonological features
Crossing the quantal boundaries of features: Subglottal resonances and Swabian diphthongs1 Grzegorz Dogil, Steven M. Lulich, Andreas Madsack, and Wolfgang Wokurek1 1.
Introduction
In phonology, as it has been laid out since Trubetzkoy (1939/1969), distinctive features organize natural classes of sounds. Classes of sounds are considered natural if their members function together in phonological rules and sound laws across languages. These functional criteria of de¿ning and choosing a set of distinctive features prevailed into generative models of phonology (Chomsky and Halle 1968), however they have been substantially enriched by such formal considerations such as feature hierarchy (Clements and Hume 1995) and feature economy (Clements 2003). The criteria of phonetic and physiological naturalness played a minor role in the systems of distinctive features, with the exception of considerations of auditory distinctiveness (Liljencrants and Lindblom 1972; Flemming 2005). The acoustic theory of speech production at its outset (Fant 1960) has de¿ned a universal set of features which allowed an unconstrained set of speech sounds from which languages were supposed to select a subset of their distinctive oppositions (Jakobson, Fant and Halle 1952). Further research on distinctive features within the acoustic theory of speech production has led to the seminal discovery of the quantal theory of speech (Stevens 1972, 1989, 1998). In his, at ¿rst hermetic but now textbook proof argument, Stevens has shown that an acoustically motivated set of distinctive features is universally constrained by a set of nonlinear articulation-toacoustic mappings characteristic of the human speech production apparatus. Stevens ތquantal model proved that equal movements of the articulators do not lead to equal movements in the acoustic parameters of speech. To the contrary, he discovered that some small articulator movements lead to large acoustic changes, and, in other areas of articulatory space, large movements lead to small variation in the acoustic parameters. Following Lulich (2010), we will name the regions in which a small articulatory
138
Grzegorz Dogil, Steven M. Lulich, Andreas Madsack, and Wolfgang Wokurek
change leads to a large acoustic change “boundaries”, and we will name the areas in which there is small acoustic change in spite of large articulatory movements “states”. The boundary and its two Àanking states form the basis of the de¿nition of any distinctive feature within the quantal theory (Lulich 2010; Stevens and Keyser 2010). Moreover the speech production system is constrained by the avoidance of boundary areas, because of the great acoustic instability caused by the movement of articulators across these areas. One set of natural, physiologically motivated boundaries and states is de¿ned by the subglottal cavities (the trachea and the main bronchi). The subglottal airway, just like the vocal tract, has its own natural resonant frequencies. Unlike the vocal tract, the subglottal airway does not have articulators. Hence, subglottal resonances are roughly constant for each speaker and do not vary much within and across utterances of a single speaker. As such they are ideal as a set of boundaries by which a distinctive feature can be de¿ned. Stevens (1998: 299–303) has proven that subglottal resonances (labeled Sg1, Sg2, etc. hereafter) when coupled with the supraglottal resonating system (giving rise to formants F1, F2, etc.) lead to formant discontinuities in strictly de¿ned narrow-band frequency regions.2 The discontinuities are not only spectrally visible (Stevens 1998; Stevens and Keyser 2010) but they also affect the perception of vowels and diphthongs (Lulich, Bachrach and Malyska 2007). The narrow-band regions of acoustic instability de¿ned by subglottal resonances are an ideal candidate for what quantal theory considers as a boundary between +/– values of distinctive features. Indeed, convincing evidence, particularly for the feature [back] has been provided for English (Chi and Sonderegger 2004; Lulich 2010). In this paper we will provide additional evidence for the “boundariness” character of subglottal resonances in German, a language with a particularly crowded vowel space. Morever, we will show that this boundary is used to distinguished two types of otherwise indistinguishable diphthongs in a Swabian dialect of German. 2.
Subglottal resonances
Recent studies have shown that subglottal resonances can cause discontinuities in formant trajectories (Chi and Sonderegger, 2007), are salient in speech perception (Lulich, Bachrach and Malyska, 2007), and useful in speaker normalization (Wang, Lulich and Alwan, 2010), suggesting that variability
Crossing the quantal boundaries of features 139
in the spectral characteristics of speech is constrained in ways not previously noticed. Speci¿cally, it is argued that 1) for the same sound produced in different contexts or at different times, formants are free to vary, but only within frequency bands that are de¿ned by the subglottal resonances; and 2) for sounds which differ by certain (place of articulation) distinctive features, certain formants must be in different frequency bands. For instance, given several productions of the front vowel [æ], the second formant (F2) is free to vary only within the band between the second and third subglottal resonances (Sg2 and Sg3), but in the back vowel [a] F2 must lie between the ¿rst and second subglottal resonances (Sg1 and Sg2). The feature [+/–back] is therefore thought to be de¿ned by whether F2 is below Sg2 ([+back]) or above it ([–back]). An analysis of formant values in multiple languages published in the literature has shown that the second subglottal resonance (Sg2) lies at the boundary between [–back] and [+back] vowels (cf. Sonderegger 2004; Chi and Sonderegger 2004, for the analysis of 53 languages). Moreover, the individual formant values tend to avoid the individual subglottal resonance boundaries. The individual adult speakers of English (Chi and Sonderegger 2004) and Hungarian (Csapó et al. 2009) tended to produce [–back] vowels with F2 higher than Sg2 (F2>Sg2), and [+back] vowels with F2 lower than Sg2 (F2Sg2, and back vowels had F2Sg1 whereas all other vowels had F1Sg2 F2Sg1 F1 0.9) on the one hand and “not fully voiced” (v-ratio in the [0, 0.9] range) on the other. However, the close similarity of the distributions within [0, 0.9] v-ratios between baseline and assimilation conditions does not hold across the board. It is found with stops, for both voicing and devoicing assimilation and with fricatives for voicing assimilation but not clearly for devoicing assimilation. In the latter situation, v-ratios tend to decrease within the [0, 0.9] range. This suggests a different process than a simple exchange between the two categories proposed above or, perhaps, different v-ratio de¿nitions of these categories. In section 4.5, we adopt a modeling approach to test further the categorical and graded accounts of voice assimilation. 4.5. v-ratio distributions: Modeling the changes caused by assimilation By a categorical, discrete account of voice assimilation, there are two phonetically de¿nable categories – voiced and voiceless – and the voice assimilation process simply is a switch from one category to the other. From a radical view of categorical assimilation, category switch always occurs in assimilation-licensing contexts. From a less radical view, there is either category switch or no category change. By the gradient, continuous account, assimilation can be viewed as a phonetic shift toward one of the two categories.7 How can we model these two contrasting views of assimilation
162
Pierre A. Hallé and Martine Adda-Decker 100
80
frequency
UV-UV
UV-V
60
40
20
0 0
1
v-ratio Figure 2. shift model for underlyingly voiceless obstruents: made-up v-ratio distribution in voiceless context (plain line: baseline condition) and predicted distribution in voiced context (dashed line: assimilation).
with respect to the single parameter examined so far – v-ratio? Figures 2 and 3 illustrate possible “shift” and “switch” scenarios, respectively, in the case of voicing assimilations. Devoicing assimilations are assumed to yield symmetrical scenarios. In these hypothetical scenarios, phonetic shift is modeled by an increase in v-ratio along the entire range of v-ratio values. This basically entails a rightward shift of the initial distribution. In particular, the leftmost peak that corresponds to the lowest v-ratios in the baseline condition (UV-UV) is shifted to the right by a constant amount. In our modeling, we have incorporated limit conditions and stochastic variation around a constant v-ratio shift d, as shown in (4). (4) fa(v) = α × fb(max(1, min(0, v + d + e(v))) (where v stands for v-ratio, fa(v) and fb(v) for the frequency of v-ratio v in the baseline ( fb) and assimilation ( fa) conditions; the scaling factor α ensures a constant cumulated frequency; d is the mean v-ratio shift.) The categorical switch scenario is modeled as a partial exchange between two posited categories, voiced and voiceless, with no category-
Voice assimilation in French obstruents 163 100
boundary 1
80
boundary 2 UV-V (2)
frequency
UV-UV 60
40
UV-V (1)
20 0 0
1
v-ratio
Figure 3. category switch model for underlyingly voiceless obstruents: madeup v-ratio distribution for UV-UV (plain line: baseline) and predicted distributions for UV-V (assimilation) for two categorical boundaries (dashed line: boundary 1 at 0.2; dotted line: boundary 2 at 0.8).
internal changes with respect to v-ratio. In such a model, the de¿nition of the two categories is critical: The category boundary between [–voice] and [+voice] must be speci¿ed in the v-ratio dimension. The data discussed in 4.4 suggested a category boundary at a rather high v-ratio (cf. Figure 1). This is illustrated as ‘boundary 2’ in Figure 3. For the sake of comparison, a low v-ratio boundary is illustrated as ‘boundary 1’, hence two variants of the switch model: ‘switch 1’ and ‘switch 2’ (see Figure 3). The exchange model shown in (5) ensures that the within-category distributions of v-ratio are left unchanged after assimilation has applied. ⎧ ⎨ ⎩
⎧ f (v) × r, v < vc (5) fa(v) = ⎨ b , s = (r − 1) × ∑ fb (u) [0,v ] f (v) × (1− s), v ≥ v ⎩ b c c
∑ fb (u)
[vc,1]
(where vc stands for a boundary v-ratio between the hypothetical voiceless and voiced categories; r and s specify the amount of “exchange” between the two categories with no change in cumulated frequency.)
164
Pierre A. Hallé and Martine Adda-Decker
Which of these models best predicts the observed data? We computed, for each model and each underlying voicing, the v-ratio distribution in the assimilation condition predicted from the baseline distribution. This was done separately for stops and fricatives. The parameters d for the shift model (amount of shift), and r and s for the switch models (amount of exchange) were estimated so that modeled and observed assimilations yield the same overall v-ratio. Figures 4A-C provides an illustration for the voicing assimilation of voiceless stops. To compare the models, root mean square deviations between modeled and observed distributions were computed. The results are shown in Table 3. The switch model with [0, .9] and ].9, 1] v-ratios de¿ning [–voice] and [+voice], respectively, clearly yields a better ¿t than the other two models, with a mean prediction error of about 2%. In detail, the adjustment is very good for all conditions excepted for devoicing in fricatives (5% error). A closer inspection of the data reveals that assimilation does not affect within-category mean v-ratios except for this latter condition. Thus, for fricatives only, and for the devoicing direction of assimilation, there is a slight UV-UV 70
UV-V
A
60
frequency
50 40 30 20 10 0 .05
.15
.25
.35
.45
.55
.65
.75
.85
.95
Figure 4. v-ratio distributions for voiceless fricatives: (A) as observed in UV-UV (bold plain line) and UV-V (bold dashed line with triangles) contexts. (Continued)
Voice assimilation in French obstruents 165 switch 1
switch 2
UV-V 70
shift
B
60 50 40 30 20 10 0 .05
.15
.25
.35
.45
.55
.65 .75
.85
.55
.65
.85
.95
30 C
20
10
0
.05
.15
.25
.35
.45
.75
intervals (center v-ratio values)
Figure 4. (Continued) (B) and (C) as observed in UV-V and predicted by the models; (C) shows the [0, .9] interval of (B) with zoomed in frequencies.
166
Pierre A. Hallé and Martine Adda-Decker
Table 3. Adjustment scores (RMS prediction errors in %) for the three models examined; for the switch models, the [–voice] and [+voice] categories are de¿ned by ranges of v-ratio variation: [0, .1] and ].1, 1] for ‘switch 1,’ [0, .9] and ].9, 1] for ‘switch 2’
model
stop
shift switch 1 switch 2
16.9 6.6 0.9
V-UV fricative 11.7 18.6 5.1
stop 14.4 16.0 1.4
UV-V fricative 17.3 18.0 1.5
average ¿t 15.1 14.8 2.2
trend toward gradient assimilation in terms of v-ratio. Yet, for the most part, voice assimilations in French seem categorical in nature with respect to the voicing ratio parameter. Moreover, the data suggest a rather narrow phonetic de¿nition of the categories. Voiced obstruents seem to be fully voiced in terms of v-ratio, whereas the voiceless category can be loosely speci¿ed as “not fully voiced.” This points toward a default, unmarked [–voice] value of the [voice] feature. The marked value [+voice] is signaled phonetically by full voicing, with v-ratio = 1. But is v-ratio = 1 a suf¿cient condition for a segment to be [+voice]? Logically, that condition is necessary but perhaps not suf¿cient. The consequence in perception is that obstruents that are “not fully voiced” in terms of v-ratio (v-ratio < 1) should be perceived as [–voice], whereas obstruents that are fully voiced (v-ratio = 1) may be perceived as [+voice]. In the last section we examine recent perceptual data suggesting that v-ratio = 1 is not suf¿cient for the perceptual system to treat a segment as [+voice], at least in cases where the ambiguity between [+voice] and [–voice] cannot be resolved at the lexical level, that is, in cases where the surface form could correspond to different underlying forms, as in [sud] for either soude or soute. 5. Subtle traces of voicelessness In a recent paper, Snoeren, Segui, and Hallé (2008) used cross-modal associative priming to test for the effect of voice assimilation on lexical access. They used potentially ambiguous words such as soute /sut/ ‘hold,’ which is confusable with soude /sud/ ‘soda’ when strongly assimilated, that is, when pronounced close to [sud]. Other examples of minimal pairs for ¿nal consonant voicing included trompe, jatte, bec, rite, bac, rate, etc. (There are only about twenty such minimal pairs in French.) Snoeren, Segui and Hallé
Voice assimilation in French obstruents 167
(2008) asked whether strongly voice-assimilated soute (pronounced close to [sud]) would activate the word “soute” not only at a phonological form level but further, at a lexical-conceptual level. In order to do so, they used natural assimilations of soute (pronounced in such utterances as une soute bondée ‘a crammed compartment’) that were very strongly assimilated (with v-ratio = 1). These word forms, extracted from the embedding utterances, were used as auditory primes in a cross-modal association priming experiment. For instance, “baggage” (‘luggage’) was paired with either soute pronounced [sud] or unrelated gratte. Other assimilated word forms such as jupe pronounced [ݤyb], which has no minimal pair for ¿nal consonant voicing, were used for comparison purposes: one possible outcome was indeed that only these unambiguous word forms would be accessed at a lexical-conceptual level. But the results clearly showed that unambiguous and potentially ambiguous word forms induced a comparable priming effect of about 40 ms. Hence, the lexical entry “soute” was activated by the strongly assimilated form [sud]. The critical question was of whether the word form [sud] for plain soude would also activate “soute.” Indeed, spoken word recognition can be found to be relatively tolerant of mispronunciations (Bölte and Coenen 2000; Connine, Blasko, and Titone 1993; etc.). A second experiment showed that [sud] extracted from une soude brute ‘a raw soda’ did not prime “baggage” at all. The priming effect found with assimilated soute thus could not be due to form similarity with soude. The only possible explanation of these data was that strongly assimilated forms (with v-ratio = 1), such as soute pronounced [sud], retain something of their underlying [–voice] speci¿cation. Snoeren, Segui and Hallé (2008) therefore set up to analyze the detailed acoustic characteristics of the assimilated stimuli they used. Table 4 summarizes the measurements that showed assimilated soute indeed retained something of /sut/. “V/(V+closure)” summarizes the classic durational cues to voicing in obstruents: Longer preceding vowel and shorter closure for voiced obstruents, hence larger V/ (V+closure). It seems virtually unaffected by voice assimilation. F0 on the preceding vowel offset seems to almost neutralize. Finally, the amplitude of glottal pulsing seems weaker for assimilated soute than for plain soude, suggesting that gradiency in voicing may be reÀected not only by graded temporal extension, as found in several studies (Barry and Teifour 1999; Gow and Im 2004; Jansen and Toft 2002), but also by graded amplitude of glottal pulsing. To summarize, Snoeren, Segui and Hallé’s (2008) study clearly suggested that v-ratio cannot account entirely for the patterns of assimilation that are found in natural speech. Whereas some acoustic parameters seem to vary
168
Pierre A. Hallé and Martine Adda-Decker
Table 4. Acoustic measurements of plain soute and soude (bold face) and of strongly assimilated soute (v-ratio=1) used in Snoeren, Segui and Hallé (2008)
v-ratio V/(V+closure) F0 at V offset energy in closure
V-V soude brute
UV-V soute bondée
UV-UV soute pleine
1 0.605 224 Hz 69.1 dB
1 0.564 231 Hz 67.4 dB
0.38 0.568 249 Hz 65.2 dB
in an all-or-none manner – v-ratios change categorically, and durational parameters do not change – some others seem to vary in a graded manner (for example, amplitude of glottal pulsing). The picture of voicing assimilation is thus far more complex than previously thought. 6. Discussion Let us summarize the observations that our corpus study made possible. The v-ratio means computed for the four voicing contacts suggested that voiceassimilated obstruents have intermediate v-ratios between those observed for their underlying voicing and those for the opposite voicing (Table 1). We showed that these means masked a different reality which could only be uncovered by examining distributions. Distributional data indeed suggested that assimilation takes place only part of the time but is complete, with respect to v-ratio, when it does take place. More precisely, two voicing categories may be de¿ned phonetically (again, with respect to v-ratio): full-voicing and partial-voicing. Assimilation, when it takes place, is basically a switch between these two phonetic categories. How often does assimilation take place? A rough estimate can be obtained from the inspection, in Table 2, of the variation in frequency of the “full voicing” pattern according to voicing contact. Since the frequency of this pattern increased by about 48% from UV-UV to UV-V contacts, for both stops and fricatives, we may infer that voicing assimilation takes place about 48% of the time. Likewise, devoicing assimilation seems to occur about 30% of the time for stops but 67% of the time for fricatives, an asymmetry already noted in 4.2. In section 5, we noted that secondary cues to voicing must remain unaffected or partially unaffected by assimilation since listeners can recover the intended voicing of fully voiced items, such as soute pronounced [sud]. Indeed, acoustic measurements
Voice assimilation in French obstruents 169
revealed subtle differences between such items. In other words, apparently fully voice-assimilated forms retain traces of their underlying voicelessness. How can we reconcile the divergent observations for v-ratios and “secondary cues”? Such dissociation between primary and secondary cues is reminiscent of the recent ¿ndings of Goldrick and Blumstein (2006) on tongue twisters inducing slips of the tongue. They found that when “k” was erroneously produced as [g] or “g” as [k], traces of the targeted consonant’s VOT were found in the faulty productions. However, the “slip of the tongue” productions showed no traces of the targeted consonant in “local” secondary cues to voicing (F1 onset frequency, burst amplitude). As for the non-local cue examined – the following vowel’s duration – it was faithful to the targeted consonant. For example, erroneous [k]s for targeted “g”s had a slightly shorter VOT than [k]s for plain /k/s, but had F1 and burst characteristics typical of /k/s, and maintained the long following vowel duration observed for plain /g/s.8 (Symmetrical patterns obtained in the case of erroneous [g]s for targeted “k”s.) Goldrick and Blumstein claimed their data supported a “cascade” mechanism translating phonological planning into articulatory implementation: both the targeted and the slipped segment’s representations were activated during phonological planning, resulting in a mix of both during articulation implementation. They also found evidence of cascading activations between the posited lexical level (or “lexeme selection”) and phonological planning, all this supporting a cascade processing architecture across the board. The assimilation data can be analyzed within the same framework of speech production planning and articulation (Levelt 2002). Following lexical selection, phonological planning may proceed in several steps: A ¿rst step may activate a canonical representation at level 1 posited in (1–2); when words are assembled together, contextual phonological processes may apply, activating level 2 representations in a subsequent step. Thus, similar to the tongue twister situation, the voice assimilation process (or, more generally, any phonological alternation process) entails the activation of several phonological representations and representation levels, cascading to the articulation implementation stage. Hence, the possible mixed articulation implementation. This, together with coarticulation effects, must contribute to phonetically mixed outputs. Like Goldrick and Blumstein (2006), Snoeren, Segui and Hallé (2008) found a dissociation of cues in the observed voice assimilations but at the same time, the observed patterns were quite different. In assimilating contexts, v-ratios changed categorically but F0 on the preceding vowel, and waveform amplitude during stop closure underwent
170
Pierre A. Hallé and Martine Adda-Decker
incomplete change, whereas preceding vowel duration did not change at all (just like Goldrick and Blumstein’s following vowel durations). Goldrick and Blumstein (2006) interpreted the observed dissociation of cues as revealing the role of subsyllabic assembly mechanisms in articulatory implementation. They regarded the fate of secondary cues as explained by a lesser perceptual motivation. But this explanation obviously lacks consistency: Why should some perceptually unimportant cues be completely neutralized and some others entirely maintained? We propose instead that the dissociation of cues is due to different time-courses of phonological planning and articulation implementation. For instance, the resistance to assimilation for Àanking vowel duration might be due to an early step of metric/prosodic planning completed before the assimilation process switches the [voice] speci¿cation of the assimilated segment. In the same way, we might interpret the weaker glottal pulsing during closure in voicing assimilations than in plain voicing as due to a later-occurring speci¿cation of voicedness in the assimilation case. We are of course aware that these interpretations are for the time being quite speculative and that more speci¿c research is necessary to address, for instance, the issue of timing within the phonological planning stage and its possible consequences for articulation implementation. Before closing, let us examine brieÀy the classical articulatory phonology account of assimilations in terms of gestural overlap (cf., for example, Browman and Goldstein 1992). Whatever the gestures involved for voicing in French – plausibly, glottal opening-closing for voiceless obstruents and glottal critical adduction for voiced ones (see Best and Hallé 2010 for an overview) – the gestural overlap account predicts that assimilations occur in perception rather than in production, and are all the more likely to occur when speech rate is fast, or prosodic conditioning entails increased overlap. In other words, according to standard articulatory phonology, no discrete modi¿cation of phonological speci¿cation ever occurs in “phonological alternations”. Gestural speci¿cations are not deleted nor switched: gestures may only overlap and hide each other in perception, especially at fast rates. One might want to test for this contention in corpus data: Assimilation degree should be stronger at faster rates. We attempted to do this by separating the data into four C1#C2 duration ranges (from less than 120 ms to more than 240 ms) and found a trend toward more frequent assimilations for shorter durations. This leaves open the possibility that high speech rate favors assimilation. To conclude, our data seems more readily amenable to a discrete rather than graded account of voice assimilation. In the scenario we propose, the classic description of assimilation, as found in (1), applies within the phonological
Voice assimilation in French obstruents 171
planning stage in speech production: level 1 /sut/ produces level 2 [sud]. This takes us back to the typology of phonological alternations offered by Nick Clements: Voice assimilation belongs to the single-feature type. The quali¿cations we propose to his typology are twofold. First, we propose, following Goldrick and Blumstein (2006), that a cascading architecture characterizes the translation from phonological code to articulation: Both /sut/ and /sud/ feed articulation implementation. In that view, the assimilations that are incomplete at the phonetic level, either quantitatively for a single cue (e.g., v-ratio, amplitude of glottal pulsing) or in terms of dissociation between cues, reÀect cascading translation from phonological planning to articulation implementation, with different time courses of activation/ deactivation for different levels of representation. In other words, whereas the classic description of the assimilation process in (1) offers a static picture, we propose to consider the dynamics of its component parts. As a second quali¿cation, we introduced the notion of occurrence in the application of a phonological process. Immediate context determines whether assimilation is applicable or not. Yet, it seems that the actual occurrence of assimilation requires further determinants. What determines whether assimilation takes place or not? This question is indeed open to future investigation on the licensing factors that might operate beyond immediate context. Acknowledgments This research was supported by an ANR grant (PHON-REP) to the ¿rst author. Notes 1. Coarticulation is viewed here as a mechanical consequence of temporal overlap in articulation between consecutive sounds (Fowler and Saltzman 1993; Browman and Goldstein 1990, 1992). Coarticulation occurs with vowels (Öhman 1966; Magen 1993) or tones (Abramson 1979; Xu 1994), and indeed with consonants, in all the situations whereby sounds in contact differ in some phonetic dimension. 2. Should we consider the pronunciation [ܧptԥni ]ݓinstead of [ܧbtԥni ]ݓfor obtenir as a case of within-word voice assimilation? This is a matter of debate. From a synchronic point of view, we may argue that the lexical form of “obtenir” is simply stored as /ܧptԥnir/ and there is no phonological context around or within that word possibly licensing an alternation with the /ܧbtԥnir/ form. However, at the abstract morphophonemic level, obtenir contains the pre¿x {ob-}, hence the phoneme /b/. The fact that obtenir has a /p/ at a less abstract level can be captured
172
3. 4.
5.
6. 7.
8.
Pierre A. Hallé and Martine Adda-Decker by a transformation rule governing the alternation between /p/ and /b/ in {ob-}, that is, by an assimilation rule taking place between levels of representation. The case of médecin is different because its pronunciation can alternate between [medԥs]ܭѺ and [mets]ܭѺ or [mܭts]ܭѺ . Interestingly, ‘é’ in médecin is pronounced [e] more often than []ܭ, although it should be [ ]ܭin the closed syllable /mܭt/ of /mܭt.sܭ/Ѻ . This deviant pronunciation is symptomatic of a morphophonemic level of representation in which ‘é’ is indeed /e/, as reÀected in the surface forms of médical, médicament, etc. Note that place assimilation may additionally occur in this example (Niebuhr, Lancia, and Meunier 2008). A discussion of the reliability and precision of the measurements presented here falls out the scope of this paper. There are indeed potential shortcomings in any automatic alignment system as well as in any automatic decision on acoustic voicing. (Manual labeling and measurement procedures are not error free either.) But the analyses proved to produce rather consistent and homogeneous patterns of results, which is about all what is needed for the present study. We compared this voicing decision procedure with a procedure based on the harmonics-to-noise ratio (HNR: a measure of acoustic periodicity) exceeding a ¿xed threshold. We set this threshold to 0 dB, which corresponds to equal energy in the harmonics and in the noise. The two methods yielded similar patterns of results. The opposite pattern we observe for French is also contrary to the naive intuition about regressive voice assimilation that the right edge of C1 should be affected by a following C2 with a different underlying voicing. Similar ideas have been offered by Massaro and Cohen (1983) in a different context. They proposed a new test for categorical perception in which listeners had to rate stimuli of a /b/-/d/ continuum on a 1–5 scale, as 1 if they heard /b/ up to 5 if they heard /d/. Categorical perception predicts that subjects’ ratings to a given stimulus be distributed along the 1–5 scale as two modes centered on the extreme ratings 1 and 5, whereas continuous perception predicts a single mode centered on a rating value depending on the stimulus, from 1 for /b/s to 5 for /d/s. That is, continuous perception predicts a distributional shift from one stimulus to another, whereas categorical perception predicts a switch between the two modes 1 and 5. In this study, the speech materials were strictly controlled and “distributions” restricted to limited dispersion around mean values. In other words, virtually all observed slips had slightly non-canonic VOT values.
References Abramson, Arthur S. 1979 The coarticulation of tones: An acoustic study of Thai. In T. Thongkum, P. Kullavanijaya, V. Panupong, and T. Tingsabadh (eds.), Studies in Tai and Mon-Khmer Phonetic and Phonology in Honour
Voice assimilation in French obstruents 173 of Eugénie J.A. Henderson, 1–9. Bangkok: Chulalongkorn University Press. Adda-Decker, Martine, and Lori Lamel 1999 Pronunciation variants across system con¿guration, language and speaking style. Speech Communication 29 (2–4): 83–98. Barry, Martin, and Ryad Teifour 1999 Temporal patterns in Arabic voicing assimilation. In Proceedings of the 14th International Congress of Phonetic Sciences, 2429–2432. Best, Catherine T., and Pierre A. Hallé 2010 Perception of initial obstruent voicing is inÀuenced by gestural organization. Journal of Phonetics 38: 109–126. Bölte, Jens, and Else Coenen 2000 Domato primes paprika: Mismatching pseudowords activate semantic and phonological representations. In Proceedings of the SWAP Conference, 59–62. Nijmegen, The Netherlands. Boersma, Paul 2001 Praat, a system for doing phonetics by computer. Glot International 5 (9/10): 341–345. Browman, Catherine, and Louis Goldstein 1990 Gestural speci¿cation using dynamically-de¿ned articulatory structures. Journal of Phonetics 18: 299–320. 1992 Articulatory phonology: an overview. Phonetica 49: 155–180. Burton, Martha W., and Karen E. Robblee 1997 A phonetic analysis of voicing assimilation in Russian. Journal of Phonetics 25: 97–114. Clements, Georges N. 1985 The geometry of phonological features. Phonology Yearbook 2: 225–252. Connine, Cynthia, Dawn Blasko, and Debra Titone 1993 Do the beginnings of spoken words have a special status in auditory word recognition? Journal of Memory and Language 32: 193–210. Darcy, Isabelle, Franck Ramus, Anne Christophe, Katherine Kinzler, and Emmanuel Dupoux 2009 Phonological knowledge in compensation for native and non-native assimilation. In Franck Kügler, Caroline Féry, and Ruben van de Vijver (eds.), Variation and gradience in phonetics and phonology, 265–310. Berlin: Mouton De Gruyter. Dilley, Laura C., and Mark A. Pitt 2007 A study of regressive place assimilation in spontaneous speech and its implications for spoken word recognition. Journal of the Acoustical Society of America 122 (4): 2340–2353. Flemming, Edward 1997 Phonetic detail in phonology: Evidence from assimilation and coarticulation. In K. Suzuki and D. Elzinga (eds.), Southern Workshop on Optimality Theory: Features in OT. Coyote Papers.
174
Pierre A. Hallé and Martine Adda-Decker
Fouché, Pierre 1969 Traité de prononciation française. Paris: Klincksieck. Fowler, Carol A., and Elliot Saltzman 1993 Coordination and coarticulation in speech production. Language and Speech 36 (2, 3): 171–195. Gaskell, Gareth, and William Marslen-Wilson 1996 Phonological variation and inference in lexical access. Journal of Experimental Psychology: Human Perception and Performance 22: 144–158. 2001 Lexical ambiguity resolution and spoken word recognition: Bridging the gap. Journal of Memory and Language 44: 325–349. Gaskell, Gareth, and Natalie Snoeren 2008 The impact of strong assimilation on the perception of connected speech. Journal of Experimental Psychology: Human Perception and Performance 34 (6): 1632–1647. Gauvain, Jean-Luc, Gilles Adda, Martine Adda-Decker, Alexandre Allauzen, Véronique Gendner, Lori Lamel, and Holger Schwenk 2005 Where are we in transcribing French broadcast news? In Proceedings of Interspeech? 2005–Eurospeech, 1665–1668. Goldrick, Matthew, and Sheila Blumstein 2006 Cascading activation from phonological planning to articulatory processes: Evidence from tongue twisters. Language and Cognitive Processes 21 (6): 649–683. Gow, David W. 2001 Assimilation and anticipation in continuous spoken word recognition. Journal of Memory and Language 24: 133–159. 2002 Does English coronal place assimilation create lexical ambiguity? J. Exp. Psychology: Human Perception and Performance 28: 163–179. Gow, David W., and Aaron M. Im 2004 A cross-linguistic examination of assimilation context effects. Journal of Memory and Language 51: 279–296. Grammont, Maurice 1933 Traité de Phonétique. Paris: Delagrave. Hallé, Pierre A., and Martine Adda-Decker 2007 Voicing assimilation in journalistic speech. In Proceedings of the 16th International Congress of Phonetic Sciences, 493–496. Jansen, Wouter 2004 Laryngeal contrast and phonetic voicing: a laboratory phonology approach to English, Hungarian, and Dutch. Ph.D. diss., University of Groningen. Jansen, Wouter, and Zoë Toft 2002 On sounds that like to be paired (after all): An acoustic investigation of Hungarian voicing assimilation. SOAS Working Papers in Linguistics 12: 19–52.
Voice assimilation in French obstruents 175 Lahiri, Aditi, and Henning Reetz 2010 Distinctive features: Phonological underspeci¿cation in representation and processing. Journal of Phonetics 38: 44–59. Levelt, Willem 2002 Phonological encoding in speech production. In Carlos Gussenhoven and Natasha Warner (eds.), Papers in Laboratory Phonology VII, 87– 99. Berlin: Mouton De Gruyter. Magen, Harriet S. 1993 The extent of vowel-to-vowel coarticulation in English. Journal of Phonetics 25: 187–205. Martinet, André 1955 Economie des changements phonétiques: traité de phonologie diachronique. Berne: Francke. Massaro, Dominic W., and Michael M. Cohen 1983 Categorical or continuous speech perception: a new test. Speech Communication 2: 15–35. Niebuhr, Oliver, Leonardo Lancia, and Christine Meunier 2008 On place assimilation in French sibilant sequences. In Proceedings of the 8th International Seminar on Speech Production, 221–224. Öhman, Sven E. G. 1966 Coarticulation in VCV utterances: spectrographic measurements. Journal of the Acoustical Society of America 39 (1): 51–168. Rigault, André 1967 L’assimilation consonantique de sonorité en français: étude acoustique et perceptuelle. In B. Hála, M. Romportel, & P. Janota (eds.), Proceedings of the 6th International Congress of Phonetic Sciences, 763–766. Prague: Academia. Snoeren, Natalie, Pierre A. Hallé, and Juan Segui 2006 A voice for the voiceless: Production and perception of assimilated stops in French. Journal of Phonetics 34 (2): 241–268. Snoeren, Natalie, Juan Segui, and Pierre A. Hallé 2008 On the role of regular phonological variation in lexical access: Evidence from voice assimilation in French. Cognition 108 (2): B512–B521. Xu, Yi 1994 Production and perception of coarticulated tones. Journal of the Acoustical Society of America 95 (4): 2240–2253.
An acoustic study of the Korean fricatives /s, sƍ/: Implications for the features [spread glottis] and [tense] Hyunsoon Kim and Chae-Lim Park 1.
Introduction
Halle and Stevens (1971) classi¿ed the Korean non-fortis fricative as aspirated with the speci¿cation of [+spread glottis] (henceforth, [s.g.]) for glottal opening, and suggested that the fortis fricative /sƍ/ in Korean is speci¿ed for the feature [–s.g.], for glottal closing. Moreover, Kagaya’s (1974) ¿berscopic data of the Korean fricatives showed that the maximum glottal opening of the non-fortis fricative is as wide as that of the aspirated stops /ph, th, tsh, kh/ in word-initial position, though it reduces to almost half of that of the stops in word-medial position. In his acoustic data, aspiration was found after the non-fortis fricative when followed by the vowel /i/, /e/ or /a/ word-initially and word-medially, but such aspiration was not observed in the fortis fricative in the same contexts. From the phonetic data, Kagaya proposed that the non-fortis fricative is aspirated with the speci¿cation of [+s.g.] and that the fortis fricative is speci¿ed as [–s.g.] in line with Halle and Stevens (1971) (see Kim et al. (2010) for a more detailed literature review). However, based on recent stroboscopic cine-MRI data on the Korean fricatives, Kim, Maeda, and Honda (2011) have shown that the two fricatives are similar to the lenis and fortis coronal stops /t, ts, tƍ, tsƍ/, not to the aspirated ones /th, tsh/ in terms of glottal opening both word-initially and word-medially, and that aspiration occurs during transitions from a fricative to a vowel and from a vowel to a fricative, regardless of the phonation type of the fricatives. In addition, in the comparison of the phasing between the tongue apex and the glottal width of the fricatives with that of the aspirated stops /th, tsh/ in Kim, Honda, and Maeda (2011), it was found that the tongue apex-glottal phasing of the non-fortis fricative is not like that of the aspirated stops. Thus, Kim, Maeda, and Honda (2011) have proposed that the Korean non-fortis fricative is lenis (/s/), not aspirated (/sh/) and that the two fricatives are speci¿ed as [–s.g.]. Kim et al. (2010) have provided further acoustic and aerodynamic evidence for the feature speci¿cation of [–s.g.] in the fricatives. The acoustic
An acoustic study of the Korean fricatives /s, sƍ/ 177
data have shown that the absence or presence of aspiration is not relevant for the distinction of the fricatives because aspiration can occur during transitions from the two fricatives to a following vowel both word-initially and wordmedially, regardless of the phonation type of the consonants. The aerodynamic data have revealed that the fricatives are similar to the lenis and fortis coronal stops, not to the aspirated stops /th, tsh/, in terms of airÀow. According to Kim, Maeda, and Honda (2011), what differentiates the fricatives is the tensing of the tongue blade and the vocal folds during the oral constriction of the fricatives, in line with the newly de¿ned feature [tense] in Kim, Maeda, and Honda (2010). The stroboscopic cine-MRI study of the fricatives has shown that oral constriction is narrower and longer, with the apex being closer to the roof of the mouth in /sƍ/ than in /s/, the pharyngeal width is longer in /sƍ/ than in /s/, and the highest tongue blade and glottal height is sustained longer in /sƍ/ than in /s/. It is proposed then that the concomitant tongue/larynx movements are incorporated into the feature [±tense]: the fortis /sƍ/ is speci¿ed as [+tense], like fortis and aspirated stops, and the lenis /s/ as [–tense] like lenis stops. The aerodynamic data of Kim et al. (2010) provides further evidence for the feature [tense] in the fricatives. AirÀow resistance (that is, oral-constriction resistance) is signi¿cantly greater for /sƍ/ than /s/ during oral constriction. Given that “airÀow resistance is directly related to the oral constriction shape and that it is consistently higher in /sƍ/ than in /s/,” Kim et al. (2010: 154) have suggested that “the constriction during the frication of /sƍ/ is stronger than during that of /s/ in that the stronger the constriction is, the higher the resistance (e.g., Stevens 1998).” The stronger constriction during /sƍ/ is “articulatorily correlated with narrower and longer oral constriction” and “the higher or longer glottal raising in /sƍ/” in Kim, Maeda, and Honda (2011). The present paper is a follow-up study to the acoustic part of Kim et al. (2010) and examines whether the laryngeal characterization of the fricatives in terms of the features [s.g.] and [tense] are also acoustically supported. We extended the scope of the acoustic experiment of Kim et al. (2010) in which two subjects (one male and one female) took part, by recruiting ten native speakers of Seoul Korean (¿ve male and ¿ve female). In addition, we investigated not only the presence/absence of aspiration and voicing of the fricatives, as in Kim et al. (2010), but also frication duration of the two fricatives and F0 at the beginning of a following vowel. If the fricatives are both speci¿ed as [–s.g.], we can say that aspiration has nothing to do with the distinction of the fricatives. Thus, one might expect aspiration to occur during transitions from a fricative to a following
178
Hyunsoon Kim and Chae-Lim Park
vowel, regardless of the phonation type of the fricatives in both word-initial and word-medial positions, as shown in Kim et al. (2010). If the fricatives are differentiated by the feature [±tense], as in Kim, Maeda, and Honda (2011), then frication duration, which is articulatorily correlated with oral constriction, would be longer in /sƍ/ than in /s/ both word-initially and wordmedially. In addition, given that the highest glottal position in /sƍ/ is often the same as in /s/, though the duration of the highest glottal position tends to be longer in /sƍ/ than in /s/ (Kim, Maeda, and Honda 2011), it is probable that F0 values at the voicing onset of a vowel would often be the same after /s/ and /sƍ/. We also examined whether voicing could occur in both /s/ and /sƍ/, as in Kim et al. (2010), or only in the intervocalic word-medial fricative /s/, as observed in Cho, Jun, and Ladefoged (2002) for their proposal that /s/ is lenis like its counterpart stops due to intervocalic voicing. This paper is structured as follows. In sections 2 and 3, we provide the method and results of our acoustic experiments and discuss the implications of the acoustic data, respectively. A brief conclusion is in section 4. 2. Acoustic experiments 2.1.
Method
As in Kim et al. (2010), we put the two fricatives /s, sƍ/ in /_V_V/ where V is one of the eight Korean monophthongs /a, æ, ݞ, ܭ, o, u, ܺ, i/, as shown in (1). (1) /sasa/ /sæsæ/ /sݞsݞ/ /sܭsܭ/ /soso/ /susu/ /sܺsܺ/ /sisi/
/s’as’a/ /s’æs’æ/ /s’ݞs’ݞ/ /s’ܭs’ܭ/ /s’os’o/ /s’us’u/ /s’ܺs’ܺ/ /s’is’i/
The test words, which are all nonsense words, were embedded in the frame sentence /næka __ palܺmhapnita/ ‘I pronounce __.’ On a single page, sentences with the test words written in Korean orthography were randomized with two ¿ller sentences at the top and the bottom. The sentences were read ¿ve times at a normal speech rate by ten subjects (¿ve male, ¿ve female) all of whom were in their early 20s. The average age of our subjects was 24.5 years old.
An acoustic study of the Korean fricatives /s, sƍ/ 179
/s/
aspiration
/æ/
aspiration /s/
aspiration
/æ/
Figure 1. Wide-band spectrogram of /sæsæ/ taken from a female subject.
Each subject familiarized him/herself with the test words by reading them a few times before recording, and then read them as naturally as possible during recording. A Shure SM57–LC microphone was connected to a PC (SONY-VGN-T236L/W) and Praat was used in recording the subjects. All 800 tokens obtained in this way (16 test words x 10 subjects x 5 repetitions) were then analyzed in Praat. Figure 1 shows how the duration of aspiration was measured after the offset of the fricative /s/ in /sæsæ/, as well as how frication during the oral constriction was measured word-initially and word-medially. The frication phase of the fricatives is marked by an arrow with a dotted line at the bottom of the spectrogram, and is identi¿ed by the major region of noise energy above 4 kHz as an alveolar fricative (e.g., Fant 1960; Kent and Read 2002). Aspiration following the frication noise is marked by an arrow with a solid line. The aspiration phase is identi¿ed by noise covering a broad range of frequencies with relatively weak energy. In addition, F0 values at the onset of a vowel following /s/ and /sƍ/ were measured both word-initially and word-medially. Also, we examined whether the vocal folds vibrated or not during the frication of the two fricatives. 2.2. 2.2.1.
Results Frication
Table 1 presents the frication duration of the fricatives /s, sƍ/ averaged over ¿ve repetitions from our ten subjects both word-initially and word-medially
180
Hyunsoon Kim and Chae-Lim Park
Table 1. The average frication duration (ms) of the fricatives /s, sƍ/ (a) word-initially and (b) word-medially in /_V_V/, where V is one of the eight Korean monophthongs /a, æ, ݞ, ܭ, o, u, ܺ, i/. a. Word-initial position /_a/ /_æ/ /_ݞ/ /_ܭ/ /_o/ /_u/ /_ܺ/ /_i/
/s/ 56.5 61.5 61.3 63.6 66.4 92 95.3 101.4
/sƍ/ 79.1 85.3 84.2 85.8 90.9 99.8 106.2 104.3
b. Word-medial position /a_a/ /æ_æ/ /ݞ _ ݞ/ /ܭ_ܭ/ /o_o/ /u_u/ /ܺ_ܺ/ /i_i/
/s/ 48 55.1 50.4 55.8 55.5 75.7 80.9 86.4
/sƍ/ 106.1 108.7 107.4 105 112.6 119.6 129.6 129.5
in the context /_V_V/, where V is one of the eight Korean monophthongs /a, æ, ݞ, ܭ, o, u, ܺ, i/, as in (1). We can note that the frication duration of /sƍ/ is longer than that of /s/ in all the vowel contexts both word-initially and word-medially. A paired samples two-tailed t-test showed that frication duration is signi¿cantly longer in the fortis fricative /sƍ/ than in the lenis /s/ both word-initially and wordmedially (t(7) = –5.7, p < .0008 for /s/ vs. /sƍ/ in word-initial position; t(7) = –24.3, p < .0001 for /s/ vs. /sƍ/ in word-medial position). Another paired samples two-tailed t-test showed that the average frication duration of /s/ is signi¿cantly longer in word-initial position than in word-medial position (t(7) = 8.8, p < .0001). However, the frication duration of /sƍ/ is signi¿cantly greater in word-medial position than in word-initial position (t(7) = –24.8, p < .0001). In addition, we compared the frication duration of /s/ and /sƍ/ in each vowel context. As shown in Table 2, paired samples two-tailed t-tests revealed that frication duration is signi¿cantly longer in /sƍ/ than in /s/ both word-initially and word-medially before all the vowels except before /i/ in word-initial position. 2.2.2. Aspiration Table 3 presents the average aspiration duration after the offset of the fricatives word-initially and word-medially in the test words in (1). It is noteworthy that aspiration occurs not only after the offset of the fricative /s/
An acoustic study of the Korean fricatives /s, sƍ/ 181 Table 2. Paired samples two-tailed t-tests of frication duration in (a) word-initial and (b) word-medial fricatives /s/ vs. /sƍ/ in /_V_V/. a. Word-initial position /_a/ /_æ/ /_ݞ/ /_ܭ/ /_o/ /_u/ /_ܺ/ /_i/
/s/ vs. /sƍ/ t(49) = –9.7, t(49) = –8.6, t(49) = –9, t(49) = –9.8, t(49) = –11.8, t(49) = –2.8, t(49) = –4.1, t(49) = –1.1,
p< .0001 p< .0001 p< .0001 p< .0001 p< .0001 p< .0067 p< .0002 p> .2961
b. Word-medial position /a_a/ /æ_æ/ /ݞ_ݞ/ /ܭ_ܭ/ /o_o/ /u_u/ /ܺ_ܺ/ /i_i/
/s/ vs. /sƍ/ t(49) = –15.3, t(49) = –14.8, t(49) = –14.5, t(49) = –14.6, t(49) = –16.3, t(49) = –13.5, t(49) = –12.8, t(49) = –12.5,
p< .0001 p< .0001 p< .0001 p< .0001 p< .0001 p< .0001 p< .0001 p< .0001
but also after that of the fortis fricative /sƍ/, no matter which vowel follows the two fricatives. In word-initial position, aspiration duration is the longest before the vowel /a/ after the offset of the fricatives /s/ and /sƍ/. In word-medial position, it is the longest before /ݞ/ after the offset of the fricative /s/ and before /ܭ/ after the offset of the fricative /sƍ/. No matter which vowel follows the fricatives, aspiration duration is longer after the offset of the fricative /s/ than /sƍ/ in Table 3. A paired samples two-tailed t-test showed that aspiration duration is signi¿cantly longer after the offset of the lenis fricative /s/ than after the offset of the fortis /sƍ/ both word-initially and word-medially (t(7) = 4.9, p < .0017 for /s/ vs. /sƍ/ in word-initial position; t(7) = 4.8, p < .0019 for /s/ vs. /sƍ/ in word-medial position). Another paired samples two-tailed t-test showed Table 3. The average aspiration duration (ms) after the offset of the fricatives /s, sƍ/ (a) word-initially and (b) word-medially in /_V_V/. a. Word-initial position /_a/ /_æ/ /_ݞ/ /_ܭ/ /_o/ /_u/ /_ܺ/ /_i/
/s/ 20.1 16.6 18.9 17.9 18.7 11.7 11.4 10
/sƍ/ 13.5 10.7 10.5 10.6 11.1 9.6 9.7 8.7
b. Word-medial position /a_a/ /æ_æ/ /ݞ _ ݞ/ /ܭ_ܭ/ /o_o/ /u_u/ /ܺ_ܺ/ /i_i/
/s/ 12 12.8 13.1 12.5 12.9 9.5 10 9.6
/sƍ/ 10.8 11.3 10.3 11.9 10.7 8.7 9.4 8.5
182
Hyunsoon Kim and Chae-Lim Park
that the average aspiration duration after the offset of /s/ is signi¿cantly longer in word-initial position than in word-medial position (t(7) = 4.4, p < .003). In contrast, the aspiration duration after the offset of /sƍ/ in wordinitial position is not signi¿cant, compared with that in word-medial position (t(7) = .8, p > .4236). Multiple repeated measures ANOVAs with Vowel context as the main factor and aspiration duration as the dependent variable showed that vowel contexts in relation to aspiration duration are highly signi¿cant after the offset of /s/ (F(7, 280)=20.5, p < .0001 for word-initial /s/; F(7, 280)=7.8, p < .0001 for word-medial /s/) and also after the offset of /sƍ/ (F(7, 280)=6.3, p < .0001 for word-initial /sƍ/; F(7, 280)=6.5, p < .0001 for word-medial /sƍ/). This indicates that aspiration duration is affected by vowel contexts after the offset of the two fricatives in both word-initial and word-medial positions. Paired samples two-tailed t-tests also revealed that aspiration duration is dependent on vowel contexts both word-initially and word-medially, regardless of the phonation type of the fricatives. For example, in wordinitial position, aspiration duration after the offset of /s/ is signi¿cantly longer before /a/ than it is before /i/ (t(49)=6.1, p < .0001), /ܺ/ (t(49)=4.8, p < .0001) and /u/ (t(49)=4.7, p< .0001), whereas it is not signi¿cant in the comparison of /_a/ vs. /_æ/ (t(49)=1.7, p > .105), /_a/ vs. /_ݞ/ (t(49)=.9, p > .3977), /_a/ vs. /_ܭ/ (t(49)=1.4, p > .1639), and /_a/ vs. /_o/ (t(49)=.8, p > .4463). In contrast, the comparison of aspiration duration after the offset of the fricative /sƍ/ in word-initial position shows that it is signi¿cantly longer (p < .05) before the vowel /a/ than before the other vowels. In word-medial position, aspiration duration is also dependent on vowel contexts not only after the fricative /s/ but also after /sƍ/: aspiration duration after the two fricatives is signi¿cantly longer in /a_a/ than in /i_i/ (t(49)=3, p < .0047 for /s/; t(49)=3.6, p < .0008 for /sƍ/), /u_u/ (t(49)=2.6, p < .0116 for /s/ ; t(49)=3.1, p < .0036 for /sƍ/) and /ܺ_ܺ/ (t(49)=2.9, p < .0061 for /s/ ; t(49)=3.1, p < .0036 for /sƍ/), whereas it is not signi¿cant (p > .1 for both /s/ and /sƍ/) in the comparison of /a_a/ vs. /æ_æ/, /a_a/ vs. /ݞ_ݞ, /a_a/ vs. /ܭ_ܭ/, and /a_a/ vs. /o_o/. 2.2.3.
F0
The average F0 values at the voice onset of a vowel after the fricatives are presented in Table 4. We can note that F0 at the voice onset of a vowel is higher after /sƍ/ when followed by the vowel /a/, /ݞ/ and /o/ in word-initial position and when followed by /ݞ/, /o/ and /ܺ/ in word-medial position. Yet, it is not always
An acoustic study of the Korean fricatives /s, sƍ/ 183 Table 4. The average F0 values at the voice onset of a vowel after the fricatives /s, sƍ/ (a) word-initially and (b) word-medially in /_V_V/ (unit: Hz). a. Word-initial position /_a/ /_æ/ /_ݞ/ /_ܭ/ /_o/ /_u/ /_ܺ/ /_i/
/s/ 194.6 203.6 199.3 201.8 195.6 208.6 204.3 207.3
/sƍ/ 198.1 197.8 201.3 198 202.3 204.6 203.5 199.6
b. Word-medial position /a_a/ /æ_æ/ /ݞ _ ݞ/ /ܭ_ܭ/ /o_o/ /u_u/ /ܺ_ܺ/ /i_i/
/s/ 204.4 212.9 201.5 212.3 209.6 214.6 211.3 210.4
/sƍ/ 204.2 206.6 206.9 206.3 211.8 211.4 215.2 203.1
higher after /sƍ/. For example, F0 is higher when the fricative /s/ is followed by /æ/, /ܭ/, /u/, /ܺ/, and /i/ in word-initial position, and by /æ/, /ܭ/, /u/, and /i/ in word-medial position. A paired samples two-tailed t-test showed that the average F0 value is not signi¿cant after /s/ and /sƍ/ both word-initially and word-medially (t(7)=.7, p > .504 for /s/ vs. /sƍ/ in word-initial position; t(7)=.8, p > .439 for /s/ vs. /sƍ/ in word-medial position). We also compared F0 at the voice onset of each vowel after /s/ and /sƍ/ in word-initial and word-medial positions. Paired samples two-tailed t-tests showed that F0 at the voice onset of a vowel is not statistically signi¿cant after /s/ and /sƍ/ in both word-initial and word-medial positions, no matter which vowel follows the fricatives, as shown in Table 5. Table 5. Paired samples two-tailed t-tests of F0 values at vowel onsets after (a) word-initial and (b) word-medial fricatives /s/ vs. /sƍ/. a. Word-initial position /_a/ /_æ/ /_ݞ/ /_ܭ/ /_o/ /_u/ /_ܺ/ /_i/
/s/ vs. /sƍ/ t(49) = –1.5, t(49) = 1.4, t(49) = –.7, t(49) = 1.1, t(49) = –1.6, t(49) = 1.8, t(49) = .2, t(49) = 1.8,
p> .1464 p> .1572 p> .466 p> .2861 p> .1214 p> .0834 p> .8465 p> .0754
b. Word-medial position /a_a/ /æ_æ/ /ݞ_ݞ/ /ܭ_ܭ/ /o_o/ /u_u/ /ܺ_ܺ/ /i_i/
/s/ vs. /sƍ/ t(49) = .1, t(49) = 1.5, t(49) = –.8, t(49) = 1.8, t(49) = –.7, t(49) = 1.3, t(49) = –3.8, t(49) = 1.9,
p> .9388 p> .1369 p> .4366 p> .0709 p> .495 p> .1845 p> .0949 p> .0603
184
Hyunsoon Kim and Chae-Lim Park
2.2.4.
Voicing
Figure 2 presents the number of tokens of the fricatives /s/ and /sƍ/ which have voice bars (i) at the beginning of, (ii) up to the middle of, and (iii) throughout the frication in (a) word-initial and (b) word-medial position. As shown in Figure 2, voicing occurs at the beginning of, up to the middle of, and also throughout the frication of the fricatives, regardless of phonation in word-initial position 400 /s/ /s'/
350
(tokens)
300 250
280
249
200 150
119
112
100 50
10
0
(i)
(ii)
1 (iii)
in word-medial position 400 /s/
350
(tokens)
250
/s'/
301
300 229
200 150 113
100
95 58
50 0
4 (i)
(ii)
(iii)
Figure 2. The number of tokens where voicing occurs (i) at the beginning of, (ii) up to the middle of, and (iii) throughout the frication of the fricatives /s, sƍ/ in (a) word-initial and (b) word-medial position.
An acoustic study of the Korean fricatives /s, sƍ/ 185
type, not only in word-initial position but also in word-medial position. The percentage of voicing at the beginning of the frication of the two fricatives is 66% (249 for /s/ and 280 for /sƍ/ among 800 tokens) in word-initial position. The same percentage is observed in word-medial position (229 for /s/ and 301 for /sƍ/ among 800 tokens). However, the percentage of voicing observed up to the middle of the frication is reduced to 29% (112 for /s/ and 119 for /sƍ/ among 800 tokens) in word-initial position and 26% (113 for /s/ and 95 for /sƍ/ among 800 tokens) in word-medial position. The occurrence of complete voicing throughout the frication is much further reduced to 1% (10 for /s/ and 1 for /sƍ/ among 800 tokens) in word-initial position and 8% (58 for /s/ and 4 for /sƍ/ among 800 tokens) in word-medial position. In order to examine whether or not voicing of the two fricatives is dependent on vowels, we checked the frequency of voicing in each vowel context word-initially and word-medially. Table 6 shows the number of tokens where voicing occurs (i) at the beginning of, (ii) up to the middle of, and (iii) throughout the frication of the fricatives in each vowel context (a) word-initially and (b) word-medially. Voicing occurs at the beginning of the frication of the two fricatives, no matter which vowel follows the consonants. This is also true of voicing up to the middle of the frication, except that the tokens of voiced /s/ and /sƍ/ are relatively small before /u/ in word-initial position and in /u_u/ in word-medial position, respectively. In the case of complete voicing throughout the frication, voicing occurs in the fricative /s/ in word-medial position, in all the vowel contexts, though the tokens of voiced /s/ are relatively small in /o_o/ and /ܺ_ܺ/.
3.
Discussion
We suggest that the present acoustic data support the laryngeal characterization of the fricatives in terms of the two binary features [±s.g.] and [±tense] in Kim, Maeda, and Honda (2011), as in Kim et al. (2010): the lenis fricative /s/ is speci¿ed as [-s.g., -tense] and the fortis fricative /sƍ/ as [-s.g., +tense]. First, the present acoustic data on aspiration duration con¿rm that the two fricatives /s, sƍ/ are speci¿ed as [-s.g.], because aspiration noise occurs during transitions from a fricative to a following vowel both word-initially and word-medially, regardless of phonation type, as shown in Kim, Maeda, and Honda (2011) and Kim et al. (2010). For example, in the tongue apexglottis phasing in /sasa/ and /s’as’a/, Kim, Maeda, and Honda (2011) have noted that aspiration could arise when glottal width is less than the distance of the tongue apex from the roof of the mouth, regardless of the phonation type
186
Hyunsoon Kim and Chae-Lim Park
Table 6. The number of tokens with voicing (i) at the beginning of, (ii) up to the middle of, and (iii) throughout the frication of the two fricatives /s, sƍ/ in each vowel context (a) word-initially and (b) word-medially. a. Word-initial position
/_a/ /_æ/ /_ݞ/ /_ܭ/ /_o/ /_u/ /_ܺ/ /_i/ total :
(i) 29 32 37 31 30 46 39 33 249
/s/ (ii) 18 16 11 17 19 4 11 16 112
(iii) 3 1 2 2 1 0 0 1 10
(i) 33 38 35 32 39 38 32 33 280
/sƍ/ (ii) 17 12 15 17 11 12 18 17 119
(iii) 0 0 0 1 0 0 0 0 1
(i) 31 40 34 37 36 46 43 34 301
/sƍ/ (ii) 18 10 15 13 14 3 7 15 95
(iii) 1 0 1 0 0 1 0 1 4
b. Word-medial position
/a_a/ /æ_æ/ /ݞ_ݞ/ /ܭ_ܭ/ /o_o/ /u_u/ /ܺ_ܺ/ /i_i/ total :
(i) 22 24 27 29 26 34 36 31 229
/s/ (ii) 17 17 14 14 22 10 11 12 113
(iii) 11 9 9 7 2 10 3 7 58
of the two fricatives. Thus, aspiration could arise not only word-medially but also word-initially during transitions from a fricative to a vowel and from a vowel to a fricative, but its duration is likely to be shorter after /sƍ/. Shorter aspiration duration after /sƍ/ both word-initially and word-medially was con¿rmed in the present study (Table 3) as well as in Kim et al. (2010). Longer aspiration in /s/ than in /sƍ/ in both word-initial and word-medial positions, as in Table 3 and also in Kim et al. (2010) can be attributed to a longer transition or a slower speed of transition due to its wider glottal opening than that in the fortis fricative in line with Kim, Maeda, and Honda (2011). Therefore, the wider glottal opening in the fricative /s/ takes a little
An acoustic study of the Korean fricatives /s, sƍ/ 187
longer to achieve the adduction necessary for a following vowel, resulting in longer aspiration than in the fortis fricative which has a narrower glottal opening. Furthermore, note that aspiration duration is signi¿cantly longer after /s/ in word-initial position than in word-medial position, whereas it is not signi¿cant after /sƍ/ in either word-initial or word-medial position in the present study, as it was in Kim et al. (2010). Longer aspiration duration after /s/ in word-initial position than in word-medial position can be attributed to a wider glottal opening during /s/ word-initially than word-medially. As shown in Kim, Maeda, and Honda (2011), the glottal width of the lenis fricative /s/ in word-initial position is almost twice as large as that in wordmedial position in their two subjects. In contrast, the glottal width of the fortis fricative /sƍ/ does not change much, no matter whether the fricative is in word-initial or word-medial position in Kim, Maeda, and Honda (2011). Thus, we can expect aspiration duration not to be signi¿cant after /sƍ/ in both word-initial and word-medial positions, as in the present study and also in Kim et al. (2010). Moreover, it is noteworthy that in word-initial position, aspiration duration after the fricative /s/ is signi¿cantly longer before /a/ than before /i/, /ܺ/ and /u/ in the present study. This is also given the same account in line with Kim, Maeda, and Honda (2011): a longer transition or a slower speed of transition is expected from the fricative to the low vowel than to the high vowels, because the distance of the tongue apex from the roof of the mouth is greater at the transition from /s/ to the low vowel /a/ than to the high vowels. Recall also that aspiration duration after the two fricatives is signi¿cantly longer in /a_a/ than in /i_i/, /ܺ_ܺ/ and /u_u/ in word-medial position. The same account can be given to the longer aspiration duration after /s/ and /sƍ/ in /a_a/. Given the data on aspiration duration in the present study and also in Kim et al. (2010), as well as the articulatory study in Kim, Maeda, and Honda (2011), we can say that aspiration does not give rise to the distinction of the two fricatives. Therefore, both the lenis /s/ and the fortis /sƍ/ are speci¿ed as [-s.g.] (see H. Kim (2011) for phonological arguments for the feature speci¿cation of [-s.g.] in the fricatives). The second piece of evidence for the feature [tense], as proposed in Kim, Maeda, and Honda (2011), comes from the data on frication duration in Tables 1 and 2. In the MRI study, it was found that during oral constriction, /sƍ/ as opposed to /s/ occurs with a longer oral constriction with the apex being closer to the roof of the mouth, longer pharyngeal width and a longer highest tongue blade. The difference in oral constriction duration between the two fricatives is acoustically correlated with that in frication duration
188
Hyunsoon Kim and Chae-Lim Park
in the present study (see also Cho, Jun, and Ladefoged (2002) for longer frication duration in /sƍ/ than in /s/ when followed by the vowel /a/). Thus, the frication duration of the fricatives is considered an acoustic correlate of the feature [tense]. As for the difference in frication duration between /s/ and /sƍ/, which can be expressed by virtue of the feature [tense], we can refer to recent perception studies (e.g., S. Kim 1999; Lee and Iverson 2006) and loanword data (H. Kim 2007, 2009) in Korean. According to S. Kim (1999), Korean speakers are sensitive to acoustic differences in the frication duration of English [s]. Given that the frication duration is shorter in the English [s] in consonant clusters than in the single [s] (e.g., Klatt 1974), Korean speakers perceived the English [s] in consonant clusters as the lenis fricative /s/ and the single [s] as the fortis fricative /sƍ/. The results of the perception studies can be explained by reference to the feature [tense], as in the Korean adaptation of English [s] (H. Kim 2007, 2009). As shown in (2a), the English single [s], which is longer than [s] in consonant clusters, is borrowed as the fortis fricative /sƍ/ into Korean. Yet, short [s] in consonant clusters in English is borrowed as the lenis fricative /s/, as in (2b). (2) Korean treatment of the English [s] English words Korean adapted forms a. salad s’æl.lݞ.tܺ sign s’a.in excite ik.s’a.i.thܺ bus pݞ.s’ܺ kiss khi.s’ܺ b. sky sܺ.kha.i snap sܺ.næp disco ti.sܺ.kho display ti.sܺ.phܺl.lܭ.i According to H. Kim (2007, 2009), the subphonemic duration difference in the English [s] is interpreted in Korean in terms of the feature [±tense]. Hence, the longer duration of the English single [s] is interpreted as a cue to the [+tense] fricative /sƍ/, while the shorter duration of the English [s] occurring in consonant clusters is a cue to the [-tense] fricative /s/. The same is true of Korean adaptation of the French [s]. Similar to English [s], the French [s] has a shorter frication duration in consonant clusters than when it is a single [s] (e.g., O’Shaughnessy 1981). As in the Korean adaptation of the English [s] in (2), H. Kim (2007) has proposed that the acoustic difference
An acoustic study of the Korean fricatives /s, sƍ/ 189
in frication duration between the fricatives in terms of the feature [tense] also plays a major role when Korean speakers borrow the French fricative [s]. Thus, as shown in (3a), the French single [s], which is longer than [s] in consonant clusters, is borrowed as the fortis fricative /sƍ/ into Korean. Yet, the shorter [s] occurring in consonant clusters in French is borrowed as the lenis fricative /s/, as in (3b). (3) Korean treatment of the French [s] French words Korean adapted forms a. Sartre s’a.lܺ.thܺ.lܺ Sorbonne s’o.lܺ.pon.nܺ Seine s’ܭn.nܺ Nice ni.s’ܺ Provence phܺ.lo.paƾ.s’ܺ b. Bastille pa.sܺ.thi.ju Pasteur pha.sܺ.thܭ.lܺ Jospin tso.sܺ.phܭƾ Basque pa.sܺ.khܺ In short, Korean adaptation of the English and French [s] (H. Kim 2007, 2009), as well as perception studies (e.g., S. Kim 1999; Lee and Iverson 2006) indicates that the difference in frication duration between /s/ and /sƍ/ gives rise to the distinction of the two fricatives. Moreover, it is expressed by reference to the feature [tense]. The third type of evidence for the proposed speci¿cation of Korean /s, sƍ/ concerns F0 in the vowel onset after the fricatives which, in the present study, does not show the systematic variation found for stops. In the literature it has been reported that the onset value of F0 after fortis and aspirated stops is higher than that of lenis ones in Korean (e.g., C.-W. Kim 1965; Han and Weitzman 1970; Hardcastle 1973; Kagaya 1974; Cho, Jun, and Ladefoged 2002; M.-R. Kim, Beddor, and Horrocks 2002 among others). The systematic variation of the onset value of F0 after Korean stop consonants is articulatorily con¿rmed in the MRI data on larynx raising in Korean labial, coronal and dorsal stops (Kim, Honda, and Maeda 2005; Kim, Maeda, and Honda 2010): the glottis rises from low to high in the order lenis < aspirated (> *NASFRIC >> *NASLIQ >> *NASGLI >> *NASVOW For example, *NASLIQ is violated by [r]Ѻ . If AGREE-R([nasal]) is ranked below *NASLIQ, then liquids will not undergo harmony. Under the further assumption that nasal spreading cannot skip over segments, liquids will block the propagation of nasality. In Johore Malay, AGREE-R([nasal]) is ranked between *NASLIQ and *NASGLI. AGREE fails because it has a “sour-grapes” property: it favors candidates with spreading that is fully successful, but it gives up on candidates where spreading is blocked (McCarthy 2003; Wilson 2003, 2004, 2006). For this reason, it predicts for Johore Malay that hypothetical /mawa/ will become [mãwѺ ã], with total harmony, but hypothetical /mawara/ will become [mawara], with no harmony at all. The tableaux in (6) and (7) illustrate this prediction. When all AGREE violations can be eliminated (6), then they are. But when a blocking constraint prevents complete spreading (7), there is no spreading at all. (The sequences that violate AGREE have been underlined to make them easy to ¿nd. Tableaux are in comparative format (Prince 2002).)
198
John J. McCarthy
(6) Agree without blocker a. ĺ b. c. d.
/mawa/ mãwѺ ã mawa mãwa mãwѺ a
*NASLIQ
AGREE-R([nas]) 1W 1W 1W
IDENT([nas]) 3 L 1L 2L
(7) Sour-grapes effect of Agree with blocker a. ĺ b. c. d. e. f.
/mawara/ mawara mãwara mãwѺ ara mãwѺ ãra mãwѺ ãraѺ mãwѺ ãrãѺ
*NASLIQ AGREE-R([nas]) 1 1 1 1 1W 1 L 1W
IDENT([nas]) 1W 2W 3W 4W 5W
The intended winner is [mãwѺ ãra] in (7d), but it is harmonically bounded by the candidate with no spreading, [mawara] in (7a). Therefore, the intended winner cannot actually win under any ranking of these constraints. Clearly, AGREE is unable to account for real languages like Johore Malay. Worse yet, it predicts the existence of languages with sour-grapes spreading like (6) and (7), and such languages are not attested. A devotee of AGREE might offer to solve this problem by building the blocking effect into the AGREE constraint itself, instead of deriving this effect from interaction with higher-ranking constraints like *NASLIQ. In Johore Malay, for instance, the AGREE constraint would have to prohibit any sequence of a nasal segment immediately followed by an oral vowel or glide: *[+nasal][–cons, –nasal]. Since [mãwѺ ãra] satis¿es this constraint but no candidate with less spreading does, it would do the job. This seemingly innocent analytic move misses the point of OT (Wilson 2003, 2004). The fundamental descriptive and explanatory goals of OT are (i) to derive complex patterns from the interaction of simple constraints and (ii) to derive language typology by permuting rankings. If AGREE in Johore Malay is de¿ned as *[+nasal][–cons, –nasal], then we are deriving a more complex pattern by complicating a constraint and not by interaction. That becomes apparent when we look at a language with a different set of
Autosegmental spreading in Optimality Theory 199
blockers, such as Sundanese (Anderson 1972; Robins 1957). Because glides are blockers in Sundanese, a slightly different AGREE constraint will be required. If we adopt this constraint, then we are deriving language typology by constraint parametrization rather than ranking permutation. The move of rede¿ning AGREE to incorporate the blocking conditions, while technically possible, is antithetical to sound explanation in OT. 2.2.
Long-distance ALIGN
Alignment constraints require that the edges of linguistic structures coincide (McCarthy and Prince 1993; Prince and Smolensky 2004). When alignment constraints are evaluated gradiently, they can discriminate among candidates that are imperfectly aligned. Gradient alignment constraints have often been used to enforce autosegmental spreading by requiring an autosegment to be associated with the leftmost or rightmost segment in some domain (Archangeli and Pulleyblank 2002; Cole and Kisseberth 1995a, 1995b; Kirchner 1993; Pulleyblank 1996; Smolensky 1993; and many others). In Johore Malay, the gradient constraint ALIGN-R([nasal], word) ensures that every [nasal] autosegment is linked as far to the right as possible. In (8), the rightward spreading of [nasal] is indicated by underlining the segments associated with it: (8) ALIGN-R([nasal], word) illustrated /mawara/ *NASLIQ ALIGN-R([nasal], IDENT([nasal]) word) L a. mawara 5W b. mãwara 4W 1L c. mãwѺ ara 3W 2L d. ĺ mãwѺ ãra 2 3 e. mãwѺ ãraѺ 1W 1L 4W L f. mãwѺ ãrãѺ 1W 5W Candidate (8d) wins because its [nasal] autosegment is linked to a segment that is only two segments away from the right edge of the word. (Diagram (3) illustrates). In candidates with more ALIGN violations, [nasal] has not spread as far, whereas candidates with fewer ALIGN violations contain the forbidden segment *[r]Ѻ .
200
John J. McCarthy
The blocking situation illustrated in (8) is the source of ALIGN’s problems as a theory of spreading in OT, as Wilson (2003, 2004, 2006) has shown. ALIGN creates an impetus to minimize the number of peripheral segments that are inaccessible to harmony because of an intervening blocker. Many imaginable ways of doing that – such as deleting segments, forgoing epenthesis, or choosing shorter allomorphs – are unattested but predicted to be possible under ranking permutation. These wrong predictions will be discussed in section 5, after SH has been presented. 3. The proposal: Serial Harmony The theory of Serial Harmony (SH) has two novel elements: a proposal about the constraint that favors autosegmental spreading (section 3.1), and a derivational approach to phonological processes (section 3.2). The proposal is worked out here under the assumption that distinctive features are privative, since this seems like the most plausible view (see Lombardi 1991; Steriade 1993a, 1993b, 1995; Trigo 1993; among others). Whether this proposal can be made compatible with equipollent features remains to be determined. 3.1. Autosegmental spreading in SH We saw in section 2 that the markedness constraint favoring autosegmental spreading is a crucial weakness of previous approaches to harmony in OT. SH’s constraint looks somewhat like one of those earlier constraints, AGREE, but there are important differences as a result of other assumptions I make. The constraint SHARE(F) requires adjacent elements (here, segments) to be linked to the same [F] autosegment:3 (9) SHARE(F) Assign one violation mark for every pair of adjacent elements that are not linked to the same token of [F]. Example (10) illustrates the only way that a pair of adjacent segments can satisfy this constraint, while example (11) shows the several ways that a pair of segments can violate it. Below each form I show the simpli¿ed notation I will be using in the rest of this chapter.
Autosegmental spreading in Optimality Theory 201
(10) Example: SHARE([nasal]) obeyed [nas] ma (11) Examples: SHARE([nasal]) violated a.
[nas]
b. [nas]
ma [m|a]
bã [b|ã]
c. [nas] [nas] mã [m|ã]
d. ba [b|a]
The three kinds of SHARE violation exempli¿ed in (11) are: (a) and (b) a [nasal] autosegment is linked to one segment but not the other; (c) each segment is linked to a different [nasal] autosegment; (d) neither segment is linked to a [nasal] autosegment. In the simpli¿ed notation, these violations are indicated by a vertical bar between the offending segments. Like ALIGN-R([nasal], word), which it replaces, SHARE([nasal]) favors (10) over (11a) and (11c). Unlike ALIGN-R([nasal], word), SHARE([nasal]) also favors (10) over (11d), the form with no [nasal] feature to spread. This difference is addressed in section 3.2. And because it has no inherent directional sense, SHARE([nasal]) disfavors (11b) as much as (11a), whereas ALIGN-R([nasal], word) ¿nds (11b) inoffensive. Limitations of space do not permit me to present SH’s theory of directionality, which is an obvious extension of recent proposals that the source segment in autosegmental spreading is the head of the featural domain (Cassimjee and Kisseberth 1997; Cole and Kisseberth 1995a; McCarthy 2004; Smolensky 1995, 1997, 2006).
3.2.
SH and Harmonic Serialism
Harmonic Serialism (HS) is a version of OT in which GEN is limited to making one change at a time. Since inputs and outputs may differ in many ways, the output of each pass through HS’s GEN and EVAL is submitted as the input to another pass through GEN and EVAL, until no further changes are possible. HS was brieÀy considered by Prince and Smolensky (2004), but then set aside. Lately, the case for HS has been reopened (see Jesney to appear; Kimper to
202
John J. McCarthy
appear; McCarthy 2000, 2002, 2007a, 2007b, 2007c, 2008a, 2008b; Pater to appear; Pruitt 2008; Wolf 2008). Besides Prince and Smolensky’s work, HS also has connections with other ideas about serial optimization (e.g., Black 1993; Chen 1999; Goldsmith 1990; 1993; Kenstowicz 1995; Kiparsky 2000; Norton 2003; Rubach 1997; Tesar 1995). An important aspect of the on-going HS research program is determining what it means to make “one change at a time”. Answering this question for the full range of phonological phenomena is beyond the scope of this chapter, but before analysis can proceed it is necessary to adopt some assumptions about how GEN manipulates autosegmental structures: (12) Assumptions about GEN for autosegmental phonology in HS4 GEN’s set of operations consists of: a. Insertions: – A feature and a single association line linking it to some preexisting structure. – A single association line linking two elements of pre-existing structure. b. Deletions: – A feature and a single association line linking it to some preexisting structure. – An association line linking two elements of pre-existing structure. Under these assumptions, GEN cannot supply a candidate that differs from the input by virtue of, say, spreading a feature from one segment and delinking it from another. This means that feature “Àop” processes require two steps in an HS derivation (McCarthy 2007a: 91–93).
3.3.
SH exempli¿ed
We now have suf¿cient resources to work through an example in SH. The grammar of Johore Malay maps /mawara/ to [mãwѺ ãra] by the succession of derivational steps shown in (13). At each step, the only candidates that are considered are those that differ from the step’s input by at most one GENimposed change. The grammar evaluates this limited set of candidates in exactly the same way as in parallel OT. The optimal form then becomes the
Autosegmental spreading in Optimality Theory 203
input to another pass through GEN, and so on until the unchanged candidate wins (“convergence”). (13) SH derivation of /mawara/ ĺ [mãwѺ ãra] (cf. (8) Step 1 m|a|w|a|r|a
*NASLIQ
SHARE *NASGLI *NASVOW IDENT ([nasal]) ([nas]) 4 1 1 L L 5W L 5W 1
*NASLIQ
SHARE *NASGLI ([nasal]) 3 1 L 4W
a. ĺ mã|w|a|r|a b. m|a|w|a|r|a c. b|a|w|a|r|a Step 2 mã|w|a|r|a a. ĺ mãwѺ |a|r|a b. mã|w|a|r|a
*NASVOW 1 1
IDENT ([nas]) 1 L
Step 3 mãwѺ |a|r|a
*NASLIQ
a. ĺ mãwѺ ã|r|a b. mãwѺ |a|r|a
SHARE *NASGLI *NASVOW IDENT ([nasal]) ([nas]) 2 1 2 1 L 3W 1 1L
Step 4 – Convergence mãwѺ ã|r|a a. ĺ mãwѺ ã|r|a b. mãwѺ ãr|Ѻ a
*NasLIQ
1W
SHARE *NASGLI *NASVOW IDENT ([nasal]) ([nas]) 2 1 2 1L 1 2 1W
3.4. A difference between HS and parallel OT HS’s architecture imposes limitations on the kinds of mappings that languages can perform. Recall that SHARE([nasal]) favors [mã] over [b|a]. In parallel OT, SHARE([nasal]) can compel insertion and spreading of [nasal] to change /b|a/ into [mã], as shown in tableau (14).
204
John J. McCarthy
(14) Spontaneous nasalization with SHARE([nasal]) in parallel OT a. ĺ b. c.
b|a mã b|a m|a
SHARE([nas]) 1W 1W
IDENT([nas]) 2 L 1L
This prediction is obviously undesirable; languages with nasal harmony do not also have spontaneous nasalization in oral words. HS cannot produce this mapping with these constraints. (This claim has been veri¿ed using OT-Help 2, which is described in section 5.) The winning candidate [mã] differs from the input /ba/ by two changes: nasalization of one of the segments and spreading of [nasal] to the other. In HS, these two changes cannot be effected in a single pass through GEN. Starting with input /b|a/, the candidate set after the ¿rst pass through GEN includes faithful [b|a] and nasalized [m|a] or [b|ã] – but not [mã], which has both inserted [nasal] and spread it. Tableau (15) shows that SHARE([nasal]) does not favor either of these unfaithful candidates over [b|a]. (15) Convergence to [b|a] on ¿rst pass through GEN and EVAL a. ĺ b. c.
/b|a/ b|a m|a b|ã
SHARE([nas]) 1 1 1
IDENT([nas]) 1W 1W
Clearly, there is no danger of SHARE([nasal]) causing spontaneous nasalization, since all three candidates violate this constraint equally. This example typi¿es the difference between parallel OT and HS. In parallel OT, the (spurious) advantage of spontaneous nasalization and spreading is realized immediately, and so the unwanted /ba/ ĺ [mã] mapping is possible. In HS, however, any advantage accruing to spontaneous nasalization must be realized without the bene¿t of spreading, which comes later. HS has no capacity to look ahead to the more favorable result that can be achieved by spreading once [nasal] has been inserted. Since none of the constraints under discussion favors spontaneous nasalization, the /ba/ ĺ [mã] mapping is impossible in HS with exactly the same constraints and representational assumptions that made it possible in parallel OT. Differences like this between parallel OT and HS form the basis for most arguments in support of HS in the literature cited at the beginning of this
Autosegmental spreading in Optimality Theory 205
section. This difference is also key to SH’s ability to avoid the problems of AGREE and ALIGN, as we will now see. 4.
SH compared with AGREE
SH does not share AGREE’s sour-grapes problem described in section 2.1. This problem is AGREE’s inability to compel spreading that is less than complete because of an intervening blocking segment. AGREE has this problem because it is not satis¿ed unless the feature or tone spreads all the way to the periphery. That SHARE does not have this problem is apparent from (13). The mapping /mawara/ ĺ [mãwѺ ãra] is exactly the kind of situation where AGREE fails, since faithful [mawara] and the intended winner [mãwѺ ãra] each violate AGREE once. But SHARE deals with this situation successfully because [m|a|w|a|r|a] has more violations than [mãwѺ ã|r|a]. Another advantage of SHARE over AGREE is that it does not support feature deletion as an alternative to spreading. The violation of AGREE in /mawara/ could be eliminated by denasalizing the /m/. Thus, AGREE predicts the existence of a language where nasal harmony alternates with denasalization: /mawa/ ĺ [mãwѺ ã] vs. /mawara/ ĺ [bawara]. No such language exists, and SHARE makes no such prediction. Step 1 of (13) shows that the mapping /mawara/ ĺ [bawara] (candidate (c)) is harmonically bounded by the faithful mapping. Therefore, the constraints in (13), including SHARE([nasal]), can never cause denasalization under any ranking permutation. 5.
SH compared with ALIGN
As I noted in section 2.2, a constraint like ALIGN-R([nasal], word) could in principle be satis¿ed not only by spreading [nasal] onto segments to its right but also by other methods. Wilson (2003, 2004, 2006) has identi¿ed several such methods, none of which actually occur. These “pathologies”, as he calls them, are problematic for a theory of harmony based on ALIGN, though, as I will argue, they are no problem in SH. All of the pathologies have one thing in common: they minimize the number of segments between the rightmost (or leftmost) segment in the [nasal] span and the edge of the word. Deleting a non-harmonizing segment comes to mind as one way of accomplishing that, but there are several others, including metathesis, af¿x repositioning, blocking of epenthesis, and selection of shorter allomorphs.5
206
John J. McCarthy
All of the claims in this section about what SH can and cannot do have been veri¿ed with OT-Help 2 (Becker et al. 2009). There are principled methods for establishing the validity of typological claims in parallel OT (Prince 2006), but no such techniques exist for HS. Thus, typological claims in HS, such as those in this section, can be con¿rmed only by following all derivational paths for every ranking. OT-Help 2 implements an ef¿cient algorithm of this type. Moreover, it does so from a user-de¿ned GEN and CON, so it calculates and evaluates its own candidates, starting only with userspeci¿ed underlying representations. In the present instance, the typologies were calculated using all of the SH constraints in this chapter and operations equivalent to autosegmental spreading, deletion, metathesis, epenthesis, and morpheme spell-out, as appropriate. 5.1.
Segmental deletion
This is the ¿rst of the pathologies that we will consider. Because ALIGNR([nasal], word) is violated by any non-harmonizing segment that follows a nasal, it can be satis¿ed by deletion as well as spreading. Tableau (16) gives the ranking for a language that deletes non-harmonizing /r/ (and perhaps the vowel that follows it, depending on how ONSET is ranked). This type of harmony has never been observed, to my knowledge. (16) Harmony by deletion pathology with ALIGN /mawara/ *NASLIQ a. ĺ mãwѺ ã.ã b. mãwѺ ãra d. mãwѺ ãrãѺ
1W
ALIGN-R MAX ([nasal], word) 1 2W L
IDENT([nas]) 4 3L 5W
SH does not make this prediction. It does not by virtue of the hypothesis that segmental deletion is the result of gradual attrition that takes place over several derivational steps (McCarthy 2008a). This assumption is a very natural one in light of developments in feature geometry (Clements 1985) and parametric rule theory (Archangeli and Pulleyblank 1994). GEN can perform certain operations on feature-geometric structures, among which is deletion of feature-geometric class nodes. A segment has been deleted when all of its class nodes have been deleted, one by one. Thus, what we observe as total segmental deletion is the “telescoped” (Wang 1968) result of a a series of reductive neutralization processes. This proposal explains why segmental
Autosegmental spreading in Optimality Theory 207
deletion is observed in coda position: codas are independently subject to deletion of the Place and Laryngeal nodes. With this hypothesis about segmental deletion, SH does not allow SHARE (or ALIGN) to compel segmental deletion. The argument is similar to the one in section 3.4: the ¿rst step in deleting a segment does not produce immediate improvement in performance on SHARE, and HS has no look-ahead ability. Imagine that the derivation has reached the point where [mãwѺ ã|r|a] is the input to GEN. The form [mãwѺ ã|a], with outright deletion of [r] and consequent elimination of a SHARE([nasal]) violation, is not among the candidates that GEN emits. There is a candidate in which [r] has lost its Place node, but the resulting Place-less segment still violates SHARE([nasal]). The deletion pathology arises in parallel OT because GEN produces candidates that differ from the underlying representation in many ways – for instance, from /mawara/, it directly produces [mãwѺ ã.ã], which is optimal under the ranking in (16). In this tableau, [mãwѺ ã.ã] is the global minimum of potential for further harmonic improvement. Parallel OT always ¿nds this global minimum. HS’s GEN is incapable of such fell-swoop derivations. As a result, HS derivations sometimes get stuck at a local minimum of harmonic improvement potential. The evidence here and elsewhere (McCarthy 2007b, 2008a) shows that it is sometimes a good thing to get stuck. 5.2.
Metathesis
Though there are skeptics, metathesis really does seem to be securely attested in synchronic phonology (Hume 2001). Certain factors are known to favor metathesis (Ultan 1978), and it is clear that harmony is not among them. Yet metathesis is a possible consequence of enforcement of ALIGN in parallel OT, as tableau (17) shows. Here, [r] and ¿nal [a] have metathesized to make [a] accessible to spreading of [nasal], thereby eliminating a violation of ALIGN. (17) Metathesis pathology with ALIGN /mawara/ *NASLIQ a. ĺ mãwѺ ã.ãr b. mãwѺ ãra c. mãwѺ ãrãѺ
1W
ALIGN-R LINEARITY ID([nas]) ([nasal], word) 1 1 4 L 2W 3L L L 5W
SH makes no such prediction. Metathesis and spreading are distinct operations that require different derivational steps, so the winner in (17) is never among
208
John J. McCarthy
the candidates under consideration. Imagine once again that the derivation has reached the point where [mãwѺ ã|r|a] is the input to GEN. The candidate set includes [mãwѺ ã|a|r], with metathesis, and [mãwѺ ãr|Ѻ a], with spreading, but [mãwѺ ã.ãr] is not possible at this step, because it differs from the input in two distinct ways. This result is similar to the one in (15): because there is no look-ahead, satisfaction of SHARE in HS will never be achieved with a twostep derivation that ¿rst sets up the conditions that make spreading possible and then spreads at the next step. 5.3.
Epenthesis
Wilson also points out that parallel OT predicts a pathologic interaction between ALIGN and epenthesis. Because ALIGN disfavors segments that are inaccessible to spreading, epenthesis into an inaccessible position is also disfavored. For instance, suppose a language with nasal harmony also has vowel epenthesis, satisfying NO-CODA by inserting [i]. Obviously, NO-CODA dominates DEP. Suppose further that NO-CODA is ranked below ALIGNR([nasal], word). In that case, epenthesis will be prevented if the epenthetic vowel is inaccessible to nasal harmony because of an intervening blocking segment: (18) ALIGN-R([nasal], word) preventing epenthesis /mar/ *NASLIQ ALIGN-R([nasal], word) NO-CODA DEP a. ĺ mãr 1 1 L b. mãri 2W 1W L L c. mãrƭѺ 1W 1W Words that contain no nasals vacuously satisfy ALIGN-R([nasal], word), so this constraint is irrelevant in such words. Thus, nasalless words are able to satisfy NO-CODA by vowel epenthesis: /pas/ ĺ [pasi]. Furthermore, words that contain a nasal but no blockers will also undergo epenthesis, since the epenthetic vowel is accessible to nasal spreading: (19) No blocker: /maw/ ĺ [mãwѺ ƭ] /maw/ *NASLIQ ALIGN-R([nasal], word) NO-CODA DEP a. ĺ mãwѺ ƭ 1 L b. mãwѺ 1W
Autosegmental spreading in Optimality Theory 209
A language with this grammar would ¿t the following description: ¿nal consonants become onsets by vowel epenthesis, unless preceded at any distance by a nasal and a true consonant, in that order. This is an implausible prediction. Epenthesis of a vowel and spreading of a feature onto that vowel are separate changes, so HS’s GEN cannot impose them simultaneously on a candidate. Rather, epenthesis and spreading must take place in separate steps, and hence the constraint hierarchy evaluates the consequences of epenthesis without knowing how spreading might subsequently affect the epenthetic vowel. It follows, then, that vowel epenthesis always adds a violation of SHARE([nasal]), regardless of context: [mã|r] vs. [mã|r|i], [mãwѺ ] vs. [mãwѺ |i]. If SHARE([nasal]) is ranked above NO-CODA, then it will simply block epenthesis under all conditions, just as DEP will block epenthesis if ranked above NO-CODA. Ranking SHARE([nasal]) above NO-CODA may be a peculiar way of preventing epenthesis, but there is no pathology. There are languages with no vowel epenthesis, and the grammar just described is consistent with that fact. 5.4. Af¿x repositioning By dominating af¿xal alignment constraints, markedness constraints can compel in¿xation (McCarthy and Prince 1993; Prince and Smolensky 2004; and others). They can even cause af¿xes to switch between pre¿xal and suf¿xal position (Fulmer 1997; Noyer 1993). ALIGN-R([nasal], word) is among the markedness constraints that could in principle have this effect, as Wilson observes. Its inÀuence on af¿x placement is much like its inÀuence on epenthesis. When the stem contains a nasal consonant followed by a blocker like [r], then an oral af¿x can be forced out of suf¿xal position to improve alignment of [nasal] (20a). But if the stem contains no [nasal] segments, then there is no threat of improper alignment, and so the af¿x can be a suf¿x, as is its wont (20b). The af¿x will also be suf¿xed if it is itself nasalizable and no blocker precedes it in the stem (20c). Nothing like this behavior has been observed among the known cases of phonologically-conditioned af¿x placement. It is presumably impossible.
210
John J. McCarthy
(20) ALIGN-R([nasal], word) affecting af¿x placement a. Pre¿xation when inaccessible to harmony /mar, o/ i. ĺ ii. iii.
omãr mãro mãrõѺ
*NASLIQ
1W
ALIGN-R ([nasal], word) 1 2W L
ALIGN-R (-o, word) 3 L L
b. Suf¿xation with no nasal to harmonize /par, o/ i. ĺ ii.
*NASLIQ
ALIGN-R ([nasal], word)
paro opar
ALIGN-R (-o, word) 3W
c. Suf¿xation when accessible to harmony /maw, o/ i. ĺ ii.
mãwѺ õ omãwѺ
*NASLIQ
ALIGN-R ([nasal], word)
ALIGN-R (-o, word) 3W
We will now look at how cases like this play out in SH. We ¿rst need a theory of phonology-morphology interaction in HS to serve as the basis for analyzing af¿x displacement phenomena. To this end, I adopt the framework of Wolf (2008). Wolf proceeds from the assumption that the input to the phonology consists of abstract morphemes represented by their morphosyntactic features – e.g., /DOG-PLURAL/. Spelling out each morpheme requires a single step of a HS derivation: . Spell-out is compelled by the constraint MAX-M, which is satis¿ed when an abstract morpheme is spelled out by some formative. Af¿x displacement phenomena show that the location of spell-out is not predetermined. Thus, [dܳܧz], [dܧzܳ], [dz ]ܳܧetc. are all legitimate candidates that satisfy MAX-M. The actual output [dܳܧz] is selected by the constraint MIRROR, which favors candidates where the phonological spell-out of a feature matches its location in morphosyntactic structure. Af¿x displacement is violation of MIRROR to satisfy some higher-ranking constraint. We now have the resources necessary to study the consequences of SH for our hypothetical example. Small capitals – MAS, PAR, MAW – will be used for the morphosyntactic representation of roots, and the [o] suf¿x will spellout PLURAL. We begin with PAR. The input is the morphosyntactic structure
Autosegmental spreading in Optimality Theory 211
[PAR PLURAL]. The ¿rst derivational step spells out the morphosyntactic representation PAR as the phonological string [par]. This change improves performance on the constraint MAX-M (see (21)), but because it introduces phonological structure where previously there was none, it brings violations of phonological markedness constraints, including SHARE([nasal]). (In subsequent examples, the root spell-out step will be omitted.) (21) First step: [PAR PLURAL] ĺ [par PLURAL] a. ĺ b.
[PAR PLURAL] [p|a|r PLURAL] [PAR PLURAL]
*NASLIQ
MAX-M 1 2W
SHARE([nas]) 2 L
Further improvement on MAX-M is possible by spelling out PLURAL as [o]. GEN offers candidates that differ in where PLURAL is spelled out, and MIRROR chooses the correct one. MIRROR is shown as separated from the rest of the tableau because its ranking cannot be determined by inspecting these candidates: (22) Second step: [par PLURAL] ĺ [paro] a. ĺ b. c. d.
[par PLURAL] *NASLIQ MAX-M SHARE([nas]) [p|a|r|o] 3 [p|a|r PLURAL] 1W 2L [o|p|a|r] 3 [p|o|a|r] 3
MIRROR
3W 2W
Since no further harmonic improvement is possible (relative to the constraints under discussion), the derivation converges on [paro] at the third step. When the input to the second step contains a nasal, like [mar PLURAL], there is a choice between spelling out PLURAL or spreading [nasal]. Since MAX-M is ranked higher, spell-out takes precedence: (23) Second step: [mar PLURAL] ĺ [maro] [mar PLURAL] *NASLIQ MAX-M SHARE([nas]) MIRROR a. ĺ [m|a|r|o] 3 b. [m|a|r PLURAL] 1W 2L c. [mã|r PLURAL] 1W 1L d. [o|m|a|r] 3 3W e. [m|o|a|r] 3 2W This is the crucial tableau. It shows that SHARE([nasal]), unlike ALIGN in (20b), is unable to affect the placement of the af¿x. All placements of the
212
John J. McCarthy
af¿x [o] equally affect performance on SHARE([nasal]), adding one violation of it. Thus, there is no advantage to shifting this af¿x out of the position preferred by the constraint MIRROR. It might seem that SHARE([nasal]) could affect af¿x placement by favoring [õm|a|r] or [mõ|a|r], but these are not legitimate candidates at the af¿x spellout step. HS’s one-change-at-a-time GEN cannot simultaneously spell out a morpheme and spread a feature onto it. Although SHARE([nasal]) would make it advantageous to spell out [o] next to [m], that advantage cannot be discovered until it is too late, when the location of the af¿x has already been determined. An af¿x’s accessibility to autosegmental spreading is irrelevant to its placement, because the effect of spreading and the location of spell-out cannot be decided simultaneously, since it is impossible under HS for competing candidates to differ in both of these characteristics at the same time. 5.5. Allomorph selection In phonologically conditioned allomorphy, a morpheme has two or more surface alternants that are selected for phonological reasons but cannot be derived from a common underlying form. In Korean, for example, the nominative suf¿x has two alternants, [i] and [ka]. There is no reasonable way of deriving them from a single underlying representation, but their occurrence is determined phonologically: [i] follows consonant-¿nal stems and [ka] (voiced intervocalically to [ܳa]) follows vowel-¿nal stems: (24) Korean nominative suf¿x allomorphy cib-i house-NOM cҦa-љa car-NOM Research in OT has led to the development of a theory of phonologically conditioned allomorphy based on the following premises (e.g., Burzio 1994; Hargus 1995; Hargus and Tuttle 1997; Mascaró 1996, 2007; Mester 1994; Perlmutter 1998; Tranel 1996a, 1996b, 1998): (i) The allomorphs of a morpheme are listed together in the underlying representation: /cip-{i, ka}/, /cހa-{i, ka}/. (ii) GEN creates candidates that include all possible choices of an allomorph: [cib-i], [cip-ka], [cހa-i], [cހa-ܳa]. (Intervocalic voicing is an allophonic alternation that I will not be discussing here.) (iii) Faithfulness constraints like MAX and DEP treat all allomorph choices equally.
Autosegmental spreading in Optimality Theory 213
(iv) So markedness constraints determine which allomorph is most harmonic. In Korean, the markedness constraints ONSET and NOCODA correctly favor [cib-i] and [cހa-ܳa] over [cip-ka] and [cހa-i], respectively. The following tableaux illustrate: (25) Allomorph selection in Korean a. i. ĺ ii.
/cip-{i, ka}/ ci.bi cip.ka
ONSET
/cހa-{i, ka}/ cހa.ܳa cހa.i
ONSET
NO-CODA 1W
b. i. ĺ ii.
NO-CODA
1W
Wilson shows that a pathology emerges when ALIGN-R([nasal], word) is allowed to participate in allomorph selection. This constraint will prefer the shorter suf¿x allomorph when the stem contains a [nasal] feature that cannot spread onto the suf¿x. Furthermore, it can exercise this preference even in a language that has no nasal harmony at all, since the potential effect of ALIGNR([nasal], word) on allomorph selection is independent of its ranking with respect to faithfulness to [nasal]. The pseudo-Korean example in (26) illustrates. Although ONSET favors the allomorph [-ܳa] after vowel-¿nal stems, its effect is overridden by ALIGNR([nasal], word) when the stem contains a nasal consonant. But with roots that do not contain a nasal, ALIGN-R([nasal], word) is vacuously satis¿ed by both candidates, and ONSET favors [-ܳa]. (26) Allomorph selection pathology a. ĺ b.
/mi-{i, ka}/ mi.i mi.ܳa
ALIGN-R([nasal], word) 2 3W
ONSET 1 L
In a language with the ranking in (26), the choice between [i] and [ka] will be determined by ONSET except when the stem contains a nasal consonant
214
John J. McCarthy
at any distance, in which case the shorter allomorph will win despite the marked syllable structure it creates. Furthermore, this effect has nothing to do with the ranking of IDENT([nasal]) or any similar faithfulness constraint. It is therefore possible for ALIGN-R([nasal], word) to have this effect in languages without an inkling of nasal harmony. This prediction is surely an implausible one. SHARE([nasal]) does not make these predictions. It simply favors the shorter allomorph, [i], since this allomorph introduces one SHARE([nasal]) violation while the longer allomorph [k|a] introduces two. SHARE([nasal]) has this effect regardless of whether the stem contains a nasal consonant: (27) No pathology with SHARE([nasal]) a. No nasal in stem i. ĺ ii.
/t|a-{i, k|a}/ t|a|i t|a|ܳ|a
SHARE([nas]) 2 3W
ONSET 1 L
b. Nasal in stem a. ĺ b.
/n|a|m|i-{i, k|a}/ n|a|m|i|i n|a|m|i|ܳ|a
SHARE([nas]) 4 5W
ONSET 1 L
This effect of SHARE([nasal]) in systems of allomorphy might seem a bit odd, but it is not pathological. As in the case of epenthesis (section 5.3), SHARE([nasal]) predicts a system that we already predict in another, more obvious way. The language in (27) is simply one where ONSET does not choose among allomorphs; the suf¿x always surfaces as [i] because SHARE([nasal]) favors the shorter allomorph consistently. Presumably the learner would be content to represent this suf¿x as just /i/ instead of taking the roundabout route in (27). But a language without allomorphy is a possible human language, so there is no pathological prediction being made. Although (27) is a language without nasal harmony, the result is the same in a language with harmony. The reason is the same as in section 5.4: HS’s GEN is limited to doing one thing at a time. In Wolf’s (2008) theory, morpheme spellout is one of the things that HS’s GEN can do. Since spell-out and spreading cannot occur simultaneously, the possible consequences of spreading cannot inÀuence spell-out, so an allomorph’s amenability to spreading does not improve its chances. In general, SHARE([nasal]) favors shorter allomorphs, but it does so in a non-pathological way: it does not distinguish between bases
Autosegmental spreading in Optimality Theory 215
that contain nasals and those that do not, so it cannot produce the odd longdistance af¿x-minimizing effect that ALIGN predicts.6 5.6.
Summary
When SHARE and its associated representational assumptions are combined with HS, the pathologies identi¿ed by Wilson (2003, 2004, 2006) are resolved. The shift to SHARE eliminates the long-distance segment-counting effect of ALIGN, where a nasal anywhere in the word could affect the possibility of epenthesis, the location of an af¿x, or the selection of an allomorph. HS addresses the deletion and metathesis pathologies, and it also explains why inserting [nasal] is not a legitimate way of improving performance on SHARE([nasal]). Furthermore, HS denies SHARE the power to have even local effects on epenthesis or allomorph selection. 6.
Conclusion
Harmonic Serialism has OT’s core properties: candidate competition judged by ranked, violable constraints. HS differs from parallel OT in two related respects: HS’s GEN is limited to making one change at a time, and the output is fed back into GEN until convergence. In their original discussion of HS, Prince and Smolensky (2004: 95–96) noted that “[i]t is an empirical question of no little interest how Gen is to be construed” and that “[t]here are constraints inherent in the limitation to a single operation”. This chapter is an exploration of that question and those constraints in the domain of autosegmental spreading processes. I have argued that a particular approach to autosegmental spreading, embedded in HS and called Serial Harmony, is superior to alternatives embedded in parallel OT. The parallel OT theories of harmony make incorrect typological predictions, while Serial Harmony does not. Notes 1. This work is much the better for the feedback I received from the participants in the UMass Phonology Grant Group in Fall, 2008: Diana Apoussidou, Emily Elfner, Karen Jesney, Peter Jurgec, Kevin Mullin, Kathryn Pruitt, Brian Smith, Wendell Kimper, and especially Joe Pater. Grace Delmolino provided welcome stylistic support. This research was funded by grant BCS-0813829 from the National Science Foundation to the University of Massachusetts Amherst.
216
John J. McCarthy
2. In the earliest literature on autosegmental phonology such as Goldsmith (1976a, 1976b) or Clements and Ford (1979), spreading was effected by constraints rather than rules. In place of iteration, which makes sense for rules but not constraints, Clements and Ford recruit the Q variable of Halle (1975). 3. The de¿nition of SHARE in (9) is intended to allow some leeway depending on how phenomena like neutral segments or problems like locality are handled. Thus, the “adjacent elements” referred to in the de¿nition of SHARE could be feature-geometric V-Place nodes (Clements and Hume 1995), segments, moras, syllables, or other “P-bearing units” (Clements 1980, 1981). Adjacency is also an abstraction, as the adjacency parameters in Archangeli and Pulleyblank (1987, 1994) make clear. 4. Under the assumptions about GEN in (12), feature spreading is an iterative process, affecting one segment at a time. Nothing in this paper depends on that assumption, though Pruitt (2008) has argued that stress assignment must iterate in HS, while Walker (2008) presents evidence from Romance metaphony against iterative spreading. 5. Wilson cites one more pathological prediction of ALIGN. In a language with positional faithfulness to [nasal] in stressed syllables, such as Guaraní (Beckman 1998), stress could be shifted to minimize ALIGN([nasal]) violations. I do not address this here because it is one of many pathologies associated with positional faithfulness – pathologies that are eliminated in HS, as Jesney (to appear) demonstrates. 6. Wilson also points out a related prediction. If it dominates MAX-BR, ALIGNR([nasal], word) can cause a reduplicative suf¿x to copy fewer segments when the stem contains a nasal consonant: /pataka-RED/ ĺ [pataka-taka] versus /makasa-RED/ ĺ [makasa-sa] (if other constraints favor a disyllabic reduplicant that can shrink to monosyllabic under duress). This behavior is also unattested, and cannot arise in SH. The reasoning is similar to the allomorphy case.
References Anderson, Stephen R. 1972 On nasalization in Sundanese. Linguistic Inquiry 3: 253–268. 1980 Problems and perspectives in the description of vowel harmony. In: Robert Vago (ed.), Issues in Vowel Harmony, 1–48. Amsterdam: John Benjamins. Archangeli, Diana and Douglas Pulleyblank 1987 Minimal and maximal rules: Effects of tier scansion. In: Joyce McDonough and Bernadette Plunkett (eds.), Proceedings of the North East Linguistic Society 17, 16–35. Amherst, MA: GLSA Publications. 1994 Grounded Phonology. Cambridge, MA: MIT Press. 2002 Kinande vowel harmony: Domains, grounded conditions and onesided alignment. Phonology 19: 139–188.
Autosegmental spreading in Optimality Theory 217 Bakovic, Eric 2000 Harmony, dominance, and control. Ph.D. diss., Department of Linguistics, Rutgers University. Becker, Michael, Patrick Pratt, Christopher Potts, Robert Staubs, John J. McCarthy and Joe Pater 2009 OT-Help 2.0 [computer program]. Beckman, Jill 1998 Positional faithfulness. Ph. D. diss., Department of Linguistics, University of Massachusetts Amherst. Black, H. Andrew 1993 Constraint-ranked derivation: A serial approach to optimization. Ph. D. diss., Department of Linguistics, University of California, Santa Cruz. Burzio, Luigi 1994 Metrical consistency. In: Eric Sven Ristad (ed.), Language Computations, 93–125. Providence, RI: American Mathematical Society. Cassimjee, Farida and Charles Kisseberth 1997 Optimal Domains Theory and Bantu tonology: A case study from Isixhosa and Shingazidja. In: Larry Hyman and Charles Kisseberth (eds.), Theoretical Aspects of Bantu Tone, 33–132. Stanford: CSLI Publications. Chen, Matthew 1999 Directionality constraints on derivation? In: Ben Hermans and Marc van Oostendorp (eds.), The Derivational Residue in Phonological Optimality Theory, 105–127. Amsterdam: John Benjamins. Clements, G. N. 1980 Vowel Harmony in Nonlinear Generative Phonology: An Autosegmental Model. Bloomington: Indiana University Linguistics Club Publications. 1981 Akan vowel harmony: A nonlinear analysis. Harvard Studies in Phonology 2: 108–177. 1985 The geometry of phonological features. Phonology Yearbook 2: 225–252. Clements, G. N. and Kevin C. Ford 1979 Kikuyu tone shift and its synchronic consequences. Linguistic Inquiry 10: 179–210. Clements, G. N. and Elizabeth Hume 1995 The internal organization of speech sounds. In: John A. Goldsmith (ed.), The Handbook of Phonological Theory, 245–306. Cambridge, MA, and Oxford, UK: Blackwell. Cohn, Abigail 1993 A survey of the phonology of the feature [nasal]. Working Papers of the Cornell Phonetics Laboratory 8: 141–203. Cole, Jennifer S. and Charles Kisseberth 1995a Nasal harmony in Optimal Domains Theory. Unpublished, University of Illinois.
218
John J. McCarthy
1995b
An Optimal Domains theory of harmony. Unpublished, University of Illinois.
Eisner, Jason 1999 Doing OT in a straitjacket. Unpublished, Johns Hopkins University. Fulmer, S. Lee 1997 Parallelism and planes in Optimality Theory. Ph.D. diss., Department of Linguistics, University of Arizona. Goldsmith, John 1976a Autosegmental phonology. Ph. D. diss., Department of Linguistics, MIT. 1976b An overview of autosegmental phonology. Linguistic Analysis 2: 23– 68. 1990 Autosegmental and Metrical Phonology. Oxford and Cambridge, MA: Blackwell. 1993 Harmonic phonology. In: John Goldsmith (ed.), The Last Phonological Rule: ReÀections on Constraints and Derivations, 21–60. Chicago: University of Chicago Press. Halle, Morris 1975 Confessio grammatici. Language 51: 525–535. Hargus, Sharon 1995 The ¿rst person plural pre¿x in Babine–Witsuwit’en. Unpublished, University of Washington. Hargus, Sharon and Siri G. Tuttle 1997 Augmentation as af¿xation in Athabaskan languages. Phonology 14: 177–220. Howard, Irwin 1972 A directional theory of rule application in phonology. Ph.D. diss., Department of Linguistics, MIT. Hume, Elizabeth 2001 Metathesis: Formal and functional considerations. In: Elizabeth Hume, Norval Smith and Jeroen Van de Weijer (eds.), Surface Syllable Structure and Segment Sequencing, 1–25. Leiden: Holland Institute of Linguistics (HIL). Jesney, Karen to appear Positional faithfulness, non-locality, and the Harmonic Serialism solution. Proceedings of NELS 39. Johnson, C. Douglas 1972 Formal Aspects of Phonological Description. The Hague: Mouton. Kenstowicz, Michael 1995 Cyclic vs. non-cyclic constraint evaluation. Phonology 12: 397– 436. Kenstowicz, Michael and Charles Kisseberth 1977 Topics in Phonological Theory. New York: Academic Press.
Autosegmental spreading in Optimality Theory 219 Kimper, Wendell to appear Local optionality and Harmonic Serialism. Natural Language & Linguistic Theory. Kiparsky, Paul 2000 Opacity and cyclicity. The Linguistic Review 17: 351–367. Kirchner, Robert 1993 Turkish vowel harmony and disharmony: An Optimality Theoretic account. Unpublished, UCLA. Lombardi, Linda 1991 Laryngeal features and laryngeal neutralization. Ph. D. diss., Department of Linguistics, University of Massachusetts Amherst. 1999 Positional faithfulness and voicing assimilation in Optimality Theory. Natural Language & Linguistic Theory 17: 267–302. 2001 Why Place and Voice are different: Constraint-speci¿c alternations in Optimality Theory. In: Linda Lombardi (ed.), Segmental Phonology in Optimality Theory: Constraints and Representations, 13–45. Cambridge: Cambridge University Press. Originally circulated in 1995. Mascaró, Joan 1996 External allomorphy as emergence of the unmarked. In: Jacques Durand and Bernard Laks (eds.), Current Trends in Phonology: Models and Methods, 473–483. Salford, Manchester: European Studies Research Institute, University of Salford. 2007 External allomorphy and lexical representation. Linguistic Inquiry 38: 715–735. McCarthy, John J. 2000 Harmonic serialism and parallelism. In: Masako Hirotani (ed.), Proceedings of the North East Linguistics Society 30, 501–524. Amherst, MA: GLSA Publications. 2002 A Thematic Guide to Optimality Theory. Cambridge: Cambridge University Press. 2003 OT constraints are categorical. Phonology 20: 75–138. 2004 Headed spans and autosegmental spreading. Unpublished, University of Massachusetts Amherst. 2007a Hidden Generalizations: Phonological Opacity in Optimality Theory. London: Equinox Publishing. 2007b Restraint of analysis. In: Sylvia Blaho, Patrik Bye and Martin Krämer (eds.), Freedom of Analysis, 203–231. Berlin and New York: Mouton de Gruyter. 2007c Slouching towards optimality: Coda reduction in OT-CC. In: Phonological Society of Japan (ed.), Phonological Studies 10, 89– 104. Tokyo: Kaitakusha. 2008a The gradual path to cluster simpli¿cation. Phonology 25: 271–319.
220
John J. McCarthy
2008b
The serial interaction of stress and syncope. Natural Language & Linguistic Theory 26: 499–546. McCarthy, John J. and Alan Prince 1993 Generalized Alignment. In: Geert Booij and Jaap van Marle (eds.), Yearbook of Morphology, 79–153. Dordrecht: Kluwer. Mester, Armin 1994 The quantitative trochee in Latin. Natural Language & Linguistic Theory 12: 1–61. Norton, Russell J. 2003 Derivational phonology and optimality phonology: Formal comparison and synthesis. Ph.D. diss., Department of Linguistics, University of Essex. Noyer, Rolf 1993 Mobile af¿xes in Huave: Optimality and morphological wellformedness. In: Erin Duncan, Donka Farkas and Philip Spaelti (eds.), The Proceedings of the West Coast Conference on Formal Linguistics 12, 67–82. Stanford, CA: Stanford Linguistics Association. Onn, Farid M. 1980 Aspects of Malay Phonology and Morphology: A Generative Approach. Kuala Lumpur: Universiti Kebangsaan Malaysia. Pater, Joe to appear Serial Harmonic Grammar and Berber syllabi¿cation. In: Toni Borowsky, Shigeto Kawahara, Takahito Shinya and Mariko Sugahara (eds.), Prosody Matters: Essays in Honor of Lisa Selkirk. London: Equinox Publishing. Perlmutter, David 1998 Interfaces: Explanation of allomorphy and the architecture of grammars. In: Steven G. Lapointe, Diane K. Brentari and Patrick M. Farrell (eds.), Morphology and its Relation to Phonology and Syntax, 307–338. Stanford, CA: CSLI Publications. Piggott, G. L. 1992 Variability in feature dependency: The case of nasality. Natural Language & Linguistic Theory 10: 33–78. Prince, Alan 2002 Arguing optimality. In: Angela Carpenter, Andries Coetzee and Paul de Lacy (eds.), University of Massachusetts Occasional Papers in Linguistics 26: Papers in Optimality Theory II, 269–304. Amherst, MA: GLSA. 2006 Implication and impossibility in grammatical systems: What it is and how to ¿nd it. Unpublished, Rutgers University. Prince, Alan and Paul Smolensky 2004 Reprint. Optimality Theory: Constraint Interaction in Generative Grammar. Malden, MA, and Oxford, UK: Blackwell. Originally circulated in 1993.
Autosegmental spreading in Optimality Theory 221 Pruitt, Kathryn 2008 Iterative foot optimization and locality in stress systems. Unpublished, University of Massachusetts Amherst. Pulleyblank, Douglas 1989 Patterns of feature co-occurrence: The case of nasality. In: S. Lee Fulmer, M. Ishihara and Wendy Wiswall (eds.), Coyote Papers 9, 98– 115. Tucson, AZ: Department of Linguistics, University of Arizona. 1996 Neutral vowels in Optimality Theory: A comparison of Yoruba and Wolof. Canadian Journal of Linguistics 41: 295–347. 2004 Harmony drivers: No disagreement allowed. In: Julie Larson and Mary Paster (eds.), Proceedings of the Twenty-eighth Annual Meeting of the Berkeley Linguistics Society, 249–267. Berkeley, CA: Berkeley Linguistics Society. Robins, R. H. 1957 Vowel nasality in Sundanese: A phonological and grammatical study. In: Philological Society of Great Britain (ed.), Studies in Linguistic Analysis, 87–103. Oxford: Blackwell. Rubach, Jerzy 1997 Extrasyllabic consonants in Polish: Derivational Optimality Theory. In: Iggy Roca (ed.), Derivations and Constraints in Phonology, 551– 582. Oxford: Oxford University Press. Schourup, Lawrence 1972 Characteristics of vowel nasalization. Papers in Linguistics 5: 550–548. Smolensky, Paul 1993 Harmony, markedness, and phonological activity. Unpublished, University of Colorado. 1995 On the structure of the constraint component Con of UG. Unpublished, Johns Hopkins University. 1997 Constraint interaction in generative grammar II: Local conjunction, or random rules in Universal Grammar. Unpublished, Johns Hopkins University. 2006 Optimality in phonology II: Harmonic completeness, local constraint conjunction, and feature-domain markedness. In: Paul Smolensky and Géraldine Legendre (eds.), The Harmonic Mind: From Neural Computation to Optimality-Theoretic Grammar, 585–720. Cambridge, MA: MIT Press/Bradford Books. Steriade, Donca 1993a Closure, release and nasal contours. In: Marie Huffman and Rena Krakow (eds.), Nasality. San Diego: Academic Press. 1993b Orality and markedness. In: B. Keyser and J. Guenther (eds.), Papers from BLS 19. Berkeley: Berkeley Linguistic Society. 1995 Underspeci¿cation and markedness. In: John Goldsmith (ed.), Handbook of Phonological Theory, 114–174. Cambridge, MA: Blackwell.
222
John J. McCarthy
Tesar, Bruce 1995 Computational Optimality Theory. Ph.D. diss., Department of Linguistics, University of Colorado. Tranel, Bernard 1996a Exceptionality in Optimality Theory and ¿nal consonants in French. In: Karen Zagona (ed.), Grammatical Theory and Romance Languages, 275–291. Amsterdam: John Benjamins. 1996b French liaison and elision revisited: A uni¿ed account within Optimality Theory. In: Claudia Parodi, Carlos Quicoli, Mario Saltarelli and Maria Luisa Zubizarreta (eds.), Aspects of Romance Linguistics, 433–455. Washington, DC: Georgetown University Press. 1998 Suppletion and OT: On the issue of the syntax/phonology interaction. In: Emily Curtis, James Lyle and Gabriel Webster (eds.), The Proceedings of the West Coast Conference on Formal Linguistics 16, 415–429. Stanford, CA: CSLI Publications. Trigo, L. 1993 The inherent structure of nasal segments. In: Marie Huffman and Rena Krakow (eds.), Nasality. San Diego: Academic Press. Ultan, Russell 1978 A typological view of metathesis. In: Joseph Greenberg (ed.), Universals of Human Language, 367–402 (vol. ii). Stanford: Stanford University Press. Walker, Rachel 1998 Nasalization, neutral segments, and opacity effects. Ph.D. diss., Department of Linguistics, University of California, Santa Cruz. 2008 Non-myopic harmony and the nature of derivations. Unpublished, University of Southern California. Wang, William S.-Y. 1968 Vowel features, paired variables, and the English vowel shift. Language 44: 695–708. Wilson, Colin 2003 Unbounded spreading in OT (or, Unbounded spreading is local spreading iterated unboundedly). Unpublished, UCLA. 2004 Analyzing unbounded spreading with constraints: Marks, targets, and derivations. Unpublished, UCLA. 2006 Unbounded spreading is myopic. Unpublished, UCLA. Wolf, Matthew 2008 Optimal Interleaving: Serial phonology-morphology interaction in a constraint-based model. Ph. D. diss., Department of Linguistics, University of Massachusetts Amherst.
Evaluating the effectiveness of Uni¿ed Feature Theory and three other feature systems Jeff Mielke, Lyra Magloughlin, and Elizabeth Hume 1.
Introduction
This chapter is in part a response to a suggestion made by Nick Clements in 2009. Commenting on the results of a comparison of feature theories presented in Mielke (2008), Clements proposed the following in an e-mail message to the third author: ...the Clements & Hume [Uni¿ed Feature Theory] system fares rather poorly in capturing natural classes compared with the other systems discussed. If you look closely at the examples, you’ll notice... that most of the natural classes it fails to capture correspond to minus values of the features labial, coronal, and dorsal, in both vowels and consonants... If these are provided, I believe the system may perform better than the others... One way of doing so would be in terms of [the model proposed in Clements 2001], in which features are ெpotentially’’ binary but speci¿ed only as needed, marked values tend to be prominent, and only prominent features are projected onto separate tiers. Assuming that plus values are marked, the asymmetry between e.g. +labial and ílabial is still captured... If a revised version of the model does indeed prove to be a competitive system, even for capturing natural classes (a criterion we did not emphasize), it would be well worth making this point somewhere in print.
We have thus carried out a large-scale investigation of the ability of several feature systems to de¿ne natural classes. One of these systems is a revised version of Uni¿ed Feature Theory (UFT, Clements and Hume 1995) in which place features are represented with both plus and minus values. As Clements predicts, such a model does indeed emerge as competitive. This chapter also examines two means by which unnatural classes are formally expressed in rule- and constraint-based theories of phonology: the union or subtraction of natural classes. Based on a survey of 1691 unnatural classes, both techniques are shown to be effective at modeling unnatural yet phonologically active classes.
224
2.
Jeff Mielke, Lyra Magloughlin, and Elizabeth Hume
Six feature systems
In this chapter we report on a comparison of the performance of six sets of distinctive features in accounting for recurrent phonologically active classes. This study builds on Mielke (2008), where three feature systems were compared. In the current work, we increase the number of feature systems to six and examine differences among the theories in more detail. Similar to the earlier investigation, we use the P-base database of phonologically active classes (Mielke 2008) to evaluate the systems. The six feature theories compared in this paper are: 1. 2. 3. 4. 5. 6.
Preliminaries to the Analysis of Speech (Jakobson, Fant, and Halle 1952) The Sound Pattern of English (Chomsky and Halle 1968) Problem Book in Phonology (Halle and Clements 1983) Uni¿ed Feature Theory (Clements and Hume 1995) Uni¿ed Feature Theory with binary place features Uni¿ed Feature Theory with full speci¿cation of all features
The ¿rst three feature systems, all proposed by Morris Halle and colleagues, can be viewed as descendants of one another. UFT is distinct from these feature systems in its use of privative features and its emphasis on feature organization. We consider a version similar to the one proposed by Clements and Hume, as well as two variations of this system. The ¿rst of these uses binary place features, as suggested by Nick Clements. The other variant (with full speci¿cation of all features) was included for comparison with the others, and to our knowledge, has never been proposed in the literature. Figure 1 shows the features used in these feature systems and the relationships between them. Arrows connect features with their counterparts in the earlier feature system which match for the largest number of segments in P-base. The labels on the arrows indicate the degree of match between the two features, and dotted arrows are used for features that match for fewer than 90% of the segments. For example, the four systems largely agree on the features [continuant] and [interrupted/continuant], but the [distributed] feature found in three of the systems has no counterpart in the Preliminaries system, most closely matching [vocalic]. Adding a feature to a feature system creates an opportunity to handle additional phonologically active classes, but also gives the theory more power, thus creating an opportunity for it to specify phonetically unnatural
(,$.&'
#,$&)'
##$0,'
##$%&'
#($%,'
##$#&'
%($#('
##$#&'
%2$(,'
,#$2'
#($#)'
- !
/
! *+
/
&11'
&11'
&11'
&11'
&11'
&11'
&11'
&11'
#($0#'
/
/
#2$0)'
#%$.'
%($&2'
%1$('
#($(,'
&11'
##$#&'
#,$,&'
(.$,('
#%$.#'
Figure 1. Relationships between features from four systems (Continued on following pages)
! *+
! *+
/ *+
Figure 1. (Continued)
"
#0$1('
.0$)0'
%0$02'
%&$0#'
#&$1('
.%$#2'
%.$.#'
#,$1%'
(('
!
! /!
-
%,$2.'
&11'
&11'
&11'
#,$20'
&11'
&11'
##$%2'
&11'
/
!
34
%)$,2'
#,$&)'
#0$1.'
6
6
%#$.('
#($2%'
%&$0%'
##$,2'
#1$.'
#($1.'
/ *+
/ *+
/ *7+
&5 05 $$$
/ *7+
34
Figure 1. (Continued)
!
(#$#%'
%&$1%'
(2$1)'
/
&11'
&11'
&11'
/
()$.#'
#.$)2'
#($%'
(%$2#'
%%$2'
(%$#('
%($0.'
(0$&'
/ *+
*+
*+
*+
*+
*7+
*+
*7+
228
Jeff Mielke, Lyra Magloughlin, and Elizabeth Hume
classes that may not actually participate in a sound pattern in any language. We consider two ways to evaluate the effectiveness of adding a feature. The ¿rst is to measure the performance of each feature system with a matching set of randomly-generated classes, with the idea that a good feature system will succeed with naturally-occurring phonologically active classes, but fail with randomly-generated classes. The real classes were the 6159 naturally-occurring phonologically active classes in P-base (Mielke 2008). Each of these is a subset of the segments in an inventory which undergo or trigger a phonological process, to the exclusion of the other segments in the inventory. For each of these classes a corresponding randomly-generated class was created by drawing a class of the same size from the same inventory as the real class. This is similar to the procedure used by Mackie and Mielke (2011) for randomly-generated inventories. Table 1 shows four phonologically active classes in Japanese with their matched randomly generated classes, which were produced by drawing the same number of segments randomly from the segment inventory shown in Table 2.1 Feature analyses were conducted on the 6159 real and 6159 randomlygenerated classes, using the feature analysis algorithm described in Mielke (2008: ch. 3). Figure 2 shows the success rates of the six feature systems in specifying these classes. An ideal feature system would represent as many of the real classes as possible using a conjunction of feature values. One way to achieve this is for the system to make good choices about which features to include. Another way is to use a massive number of features, so that virtually any class can be represented. The analysis with randomly-generated classes is meant to safeguard against this. In Figure 2, a higher position along the y axis means more success with real classes, and having a lower x value means this is being done using well-chosen features, rather than by brute force. Table 1. Phonologically active classes in Japanese and matched randomly-generated classes. ெX’’ in pattern descriptions represents the active class Phonological behavior
Active class
Random class
1. X → vls / C[–vc]__{C[–vc], #} 2. X → voiced / at start of non-initial morpheme without voiced obstruent 3. high vowels ĺ vls / X {X, #} 4. / t k s ݕh / voiced at start of non-initial morpheme without X
i݁ tksݕh
iܳ ݕndsi
ptksݕh bdܳz
ngbݕje ݕs݂ܳ
Evaluating the effectiveness of Uni¿ed Feature Theory 229 Table 2. The segment inventory of Japanese p b
m
t d s z n ݐ
k ܳ ݕ
i e h
݁ o a
iޝ eޝ
݂ޝ oޝ aޝ
j
݂ޝ
Going in chronological order, Preliminaries (Jakobson et al. 1952) handles about 60% of real classes and fewer than 5% of the random ones. The differences between the Preliminaries and SPE (Chomsky and Halle 1968)
0.75
Naturalness according to six feature systems
UFTífull
good
0.70 0.65
UFT
0.60
real classes
UFTíplace H&C SPE
bad
Preliminaries
0.00
0.05
0.10
0.15
random classes
Figure 2. Success rates with real and random classes
230
Jeff Mielke, Lyra Magloughlin, and Elizabeth Hume
systems mostly involve adding features (see Figure 1) that were motivated by observed sound patterns, and this enabled SPE features to account for an additional 11% of real classes, with the by-product of accounting for another 2% of random classes as well. The Halle and Clements system (Halle and Clements 1983) was an update of SPE that involved removing seldom-used features and adding the feature [labial]. This resulted in small increases in the number of real and random classes accounted for. The UFT system (Clements and Hume 1995) involves many of the same features as the Halle and Clements system. However, many of the features were not treated as binary, and the emphasis was on feature organization rather than on capturing phonologically active classes. The original UFT system accounts for slightly more observed classes than Preliminaries, and represents the fewest random classes of all the feature systems. The UFT variant with binary place features (proposed by Nick Clements, as noted above) is a substantial improvement over the original UFT, and performs slightly better than the SPE and Halle and Clements systems, accounting for more observed classes, and fewer random classes. The full speci¿cation variant of UFT (proposed by no one, to our knowledge) accounts for more real and random classes than any of the other approaches. The fact that three of the feature systems can represent 70-73% of the naturally-occurring classes suggests that this might be a natural threshold, and that 27-30% of naturally-occurring classes are not phonetically natural enough to be represented in terms of the best systems of phonetically-de¿ned distinctive features. If this is the case, we expect the apparent improvements achieved by the full speci¿cation version of UFT to involve a seemingly random assortment of phonetically unnatural classes that happen to be handled by this feature system. We also expect the additional classes handled by other incremental changes (if the feature proposals were on the right track) to involve multiple instances of phonetically natural classes that could not be expressed in the earlier feature systems. This is investigated in the remainder of this section. Comparing the total numbers of real and random classes handled by each feature system gives a very general comparison of the performance of the feature systems. The second, more speci¿c approach is to inspect the phonologically active classes that are natural according to one feature system but not another, and to determine whether these include recurrent phonetically natural classes overlooked by one of the feature systems, or just an assortment of classes that happen to be natural in an overly strong feature system. Table 3 shows how many classes are natural according to combinations of the six feature systems. Of the 63 logically possible combinations, there are 28 that correspond to sets of classes natural in only those feature systems.
Evaluating the effectiveness of Uni¿ed Feature Theory 231 Table 3. Classes that are natural according to combinations of feature systems classes
median size
Prelim.
SPE
H&C
UFT
UFTplace
UFTfull
3110 570 370 127 84 73 63 55 47 37 34 34 33 30 24 19 15 10 9 8 8 7 5 5 2 2 1 1 1367
4 7 3 6 4 3 4 4 2 3 4 4 2 3 3 3 4 2 3 5 3 4 4 5 3 6 15 2 5
3
3 3 3
3 3 3
3 3
3 3 3 3
3 3 3 3 3 3 3 3 3 3
6159
4
3676 59.7%
3
3 3
3
3 3 3 3
3 3 3 3 3
3 3
3 3
3
3 3 3 3 3
3 3 3
3 3 3 3 3
3 3 3 3
3
3
3 3
3
3 3 3
3 3
3
3
3
3 3 3
3 4347 70.6%
4389 71.3%
3 3
3921 63.7%
4468 72.5%
3 3 3 3 3 3 3 3 4651 75.5%
The most frequent combination by far is the 3110 phonologically active classes that are natural according to all six feature systems. 1367 classes are unnatural according to all six. The remaining sets of classes are revealing about differences among the feature systems. For example, the 570 classes that are natural according to all feature systems except Preliminaries indicate
232
Jeff Mielke, Lyra Magloughlin, and Elizabeth Hume
a gap in the coverage of this system. Similarly, the 370 classes that are natural for all except regular UFT indicate a different gap, while the 127 classes that are natural only according to the three versions of UFT indicate aspects of the UFT system that were a step in the right direction. 2.1.
Comparing the Preliminaries and SPE systems
The combinations of feature sets shown in Table 3 include a total of 760 (714+46) classes that are natural according to SPE features but not according to Preliminaries features. Table 4 focuses on how classes are accounted for by the Preliminaries system and its two direct descendants. This subsection and the next one explore these classes in more detail in pairwise comparisons between the feature systems. The most frequent feature descriptions among these classes involve the feature [syllabic], which is included only in SPE : 164 × [ísyllabic], 29 × [+syllabic]. Many of the other recurrent classes in this category involve [syllabic] in conjunction with other features, e.g., 23 × [+voice, ísyllabic] and 17 × [ílow, ísyllabic]. The most frequent classes without [syllabic] involve [sonorant], another feature not included in the Preliminaries feature set, e.g., 14 × [+coronal,+sonorant], 13 × [íheightened subglottal pressure,+coronal,+sonorant], and 11 × [+consonantal,+sonorant]. Figure 3 compares how frequently each SPE feature is used in describing classes that are unnatural according to Preliminaries features (y-axis) with its frequency in describing classes that are natural according to both feature Table 4. The Preliminaries feature system and its direct descendants classes
median size
Preliminaries
SPE
H&C
3560 714 86 65 46 24 22 1633
3 5 4 3 4 3 2 5
3
3 3
3 3 3
3 3
3
6159
4
3676 59.7%
4347 70.6%
3 3 3
4389 71.3%
0.4
−syl
0.3 0.0
0.1
0.2
SPE only (n=760)
0.5
0.6
0.7
Evaluating the effectiveness of Uni¿ed Feature Theory 233
+son +voice +cor −nas −son −back +cont hi−glot subglclpr −high +high −strid −cor +ant −low −voc−voice +cons −ant+nas −distr −cont −del rel +back +distr −round −lat +glot cl −cons +voc +low −tense +strid mv glot i+del subgl pr del rel 2cl +lat ++tense round covered rel 0.0
0.1
+syl
0.2
0.3
0.4
0.5
0.6
0.7
0.4
−syl
0.3 0.2 0.1 0.0
SPE only (n=760)
0.5
0.6
0.7
SPE and Preliminaries (n=3582)
+son
−son −back hi subgl pr +ant −ant −distr −del rel +back +distr −lat mv glot i+del subgl pr del relrel 2cl +lat covered 0.0
0.1
+syl
0.2
0.3
0.4
0.5
0.6
SPE and Preliminaries (n=3582)
Figure 3. Usage of SPE features (vs. Preliminaries)
0.7
234
Jeff Mielke, Lyra Magloughlin, and Elizabeth Hume
systems (x-axis). This is meant to highlight the particular features that allow SPE to represent more classes than Preliminaries, which appear above the diagonal because they are used more frequently for the classes that are not natural in the Preliminaries system. The ¿rst chart shows all SPE feature values, and the second chart shows only the feature values that have no counterpart in the Preliminaries system. As seen in Figure 3, [ísyllabic] is used much more frequently to describe classes that are unnatural for Preliminaries. This is consistent with the observation above that the class [ísyllabic] appears 164 times among the classes that SPE is able to account for but Preliminaries is not. [+syllabic] is used less frequently for SPE-only classes, probably because SPEތs improvements apply mostly to classes of consonants. [+son] is also used more frequently to account for SPE-only classes, while [ísonorant] is used about the same in both sets of classes. [íheightened subglottal pressure] is also used more to account for SPE-only classes. Other features that have no counterpart in Preliminaries ([movement of glottal closure], [delayed release], [covered], [lateral], and [delayed release of secondary closure]) do not appear to be used more for SPE-only classes than for classes that are also natural using the feature system of Preliminaries . The most frequent recurring class involving [heightened subglottal pressure] is the class of coronal sonorants excluding /r/ (which is [+heightened subglottal pressure]) as seen in Table 5. 385 out of 628 language varieties in P-base (61.3%) have /r/, including all of the languages in Table 5. The fact that classes involving [heightened subglottal pressure] are dominated by classes requiring the [í] value to exclude /r/ suggests that a cross-cutting feature like [heightened subglottal pressure] is not well motivated, and these classes speci¿cally involve the exclusion of a trill from sound patterns that Table 5. Phonologically active classes de¿ned as [íheightened subglottal pressure,+coronal,+sonorant] in SPE and unnatural in Preliminaries. References for languages referred to in examples from P-base are given in the appendix. segments
excluded
languages
/n l/
/r /
/ nࡧ lࡧ / /l Ǖ/
/r/ /r/
Acehnese, Catalan, Dhaasanac, Ecuador Quichua (Puyo Pongo variety), Harar Oromo, Kalenjin, Kilivila, Maasai, Slovene, Sri Lanka Portuguese Creole, Yiddish Faroese Uvbie
0.5 0.4
unvoiced 0.3
interrupted acute grave
0.2
non−vocalic
0.0
0.1
Preliminaries only (n=89)
0.6
0.7
Evaluating the effectiveness of Uni¿ed Feature Theory 235
mellow oral non−flat consonantal on−sharp lax nasal voiced non−compact continuant vocalic nchecked compact tense flat non−diffuse diffuse non−cons strident sharp checked 0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Preliminaries and SPE (n=3582)
Figure 4. Usage of Preliminaries features (vs. SPE)
involve other coronal sonorants. This observation is consistent with the abandonment of [heightened subglottal pressure] shortly after it was introduced. Figure 4 compares Preliminaries features with those from SPE. There are no Preliminaries features without corresponding features in SPE (so there is only one chart in this ¿gure) but not all of the feature de¿nitions are the same. Ten of the twelve classes occurring more than once among the Preliminaries -only classes involve [grave/acute]. The most frequent of these are 5 × [unvoiced, grave] and 5 × [grave, non-vocalic]. As seen in Table 6, these are all grave classes consisting of labial and velar consonants. What is special about these ten cases is that they are all in languages with palatal consonants of the same manner class. 273 out of 628 language varieties (43.5%) in P-base have at least one palatal stop or nasal, including all of the languages in Table 6. These classes are handled by Preliminaries but not SPE because palatals are [acute] in Preliminaries but [ícoronal] in SPE. While the [grave/acute] was replaced by [±coronal], the boundary between the two feature values is not exactly the same in both systems, crucially involving the status of palatals. For more detailed discussion of the need for [grave], see Hyman (1972), Clements (1976), Odden (1978), and Hume (1994). See Mielke (2008: 158–161) for related discussion on differences in place feature boundaries across feature systems.
236
Jeff Mielke, Lyra Magloughlin, and Elizabeth Hume
Table 6. Phonologically active classes de¿ned as [unvoiced, grave], [consonantal, oral, grave], and [interrupted, grave] in Preliminaries and unnatural in SPE segments
excluded
languages
/p k /
/c /
/b ܳ / /p m k ƾ / /p b m k ܳ ƾ /
/ܱ/ /c ݄ / /c ܱ ݄ /
Nandi Kalenjin, Martuthunira, Midland Mixe (Puxmetacan), South Highland Mixe (Tlahuitoltepec), Muruwari Gooniyandi, Gunin/Kwini Dieri (Diyari), Muruwari K hmuݦ
2.2.
Comparing the SPE and Halle and Clements systems
The combinations of feature sets shown above in Table 3 include 110 classes that are natural according to the Halle and Clements system, but not according to the SPE system. Figure 5 shows that [labial], especially [ílabial] is used more among the Halle and Clements-only classes. The other features common to both systems that are used more among these classes are primarily features for consonants, which are involved in most of the classes involving [ílabial]. The Halle and Clements feature [labial], in conjunction with other features, is found in 92 phonologically active classes that cannot be handled by SPE, which has no feature [labial]. The most frequent of these include: 7 × [ílabial, íback, ísyllabic], 4 × [ílabial, íback, ísyllabic, ílow], 4 × [ílabial, +voice, ístrid, íson], 4 × [ílabial, +voice, ícontinuant, ísonorant], and 4 × [ílabial, +nasal]. The ¿rst two of these involve [íback] in addition to [ílabial], as shown in Table 7. These are classes of coronal consonants, including palatals, the opposite of the [grave] classes in Table 6. This indicates that this use of [ílabial] (in conjunction with [íback]) is better interpreted as a proxy for a different interpretation of the feature [coronal]. The other [ílabial] classes with at least four instances more directly involve [ílabial], as seen in Table 8. These are classes of coronals and velars to the exclusion of labials. This is effectively the lingual class that is not available in SPE but is available in some other feature systems. On the other hand, 68 classes that were natural in SPE features are unnatural using Halle and Clements features. However, most of these do not recur very often, and only three sets of natural classes recur three times ([íhigh, íheightened subglottal pressure, +coronal], [íheightened
Evaluating the effectiveness of Uni¿ed Feature Theory 237
0.5 0.4 0.3 0.2 0.0
0.1
Halle and Clements only (n=110)
0.6
0.7
−lab
−back −son −cont −strid−syl +cont −voc +voice −voice +lab −nas +high +son +cons −low +nas −c.g. −high +cor +voc −tense +c.g. +s.g. +back +ATR +ant −ATR −s.g. −cons −cor −ant +tense −distr −round −lat +strid +low +lat +distr +round 0.0
0.1
+syl 0.2
0.3
0.4
0.5
0.6
0.7
Halle and Clements and SPE (n=4274)
Feature usage for Halle and Clements vs. SPE
0.5 0.4 0.3 0.2 0.1
+lab
0.0
Halle and Clements only (n=110)
0.6
0.7
−lab
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Halle and Clements and SPE (n=4274)
Figure 5. Usage of Halle and Clements features (vs. SPE)
238
Jeff Mielke, Lyra Magloughlin, and Elizabeth Hume
Table 7. Phonologically active classes de¿ned as [ílabial, íback, ísyllabic] or [ílabial, íback, ísyl, ílow] in the Halle and Clements system and unnatural in SPE segments
excluded
languages
/ t d s n l r ݚ ݏcç ç ݄ ݠj /
/p b φ m w k ܳ x ݅ /
/ [tࡧ nࡧ lࡧ t n ݐr l ݅ ܿ ݍc ݄ j ݠ/ / tࡧ nࡧ d n ݐl ݄ ܱ ܿ ݍ ݅ ܩj ݠ/ / tࡧ sࡧ nࡧ rࡧ t n φ r l ݕcç ݄ ݠj ݠ/ Û / tࡧ nࡧ lࡧ t n ݐr l ܿ ݍ ݅ ݚc ݄ j ݠ/ / tࡧ dࡧ nࡧ s ܩ ݏt ݕd ݄ ݤj / / t d ts tܾ s n l ݐr t ݕ ݕj / /t d ܪs z n ݐl c ܱ ݄ ݖj/ / t s n ݐt ݕj /
/p m w k ƾ/ /b m w g ƾ/ / p m w k x kw xw q / /p m w k ƾ/ /p b m w k ܳ ƾ ݦh/ / p b m w f k kw g h / /b ܦm w k g ܲ ƾ h/ /p m w f k/
/ t t’ d ts ts’ s ܾ n n’ l l’ t ݕt’ݕ d ݕ ݤj j’ /
/ p p’ kw k’w ܳw xw qqތ qw q’w w ݦh /
San Pedro de Cajas Junín Quechua, Tarma Quechua Arabana Gooniyandi Kumiái (Diegueño) Wangkangurru O’odham Tetelcingo Nahuatl Tirmaga Asmat (Flamingo Bay dialect) Coeur d’Alene
subglottal pressure, +voice, ínasal, ísyllabic, +sonorant], and [+consonantal, +continuant]), and seven sets recur twice. Figure 6 shows that [íheightened subglottal pressure] and [+continuant] are used more for SPE-only classes than for the classes that are also natural according to Halle and Clements features. Table 8. Phonologically active classes de¿ned as [ílabial, +voice, ístrident, ísonorant], [ílabial, +voice, ícontinuant, ísonorant], or [ílabial, +nasal] in Halle and Clements system and unnatural in SPE segments
excluded
languages
/d ܳ/
/b/
/n ݄/ /n ݄ ƾ/ /d ܳ/ / d d j ܳ ܳj / / d ܳj /
/m/ / m / / b ܳb / j /b b / /b/
Batibo Moghamo (Meta’), Koromfé, Supyire Senoufo, Yiddish Hungarian, Northern Tepehuan Nandi Kalenjin, Yidi Dàgáárè Irish Muscat Arabic
0.6
0.7
Evaluating the effectiveness of Uni¿ed Feature Theory 239
hi subgl pr
0.4
−syl
0.3 0.0
0.1
0.2
SPE only (n=68)
0.5
+cont
−strid +cons −del rel +voice +son −high +cor −nas −son +ant−voice −distr −back +voc −round −cor +high +lat −glot cl −voc +nas −tense −lat −low mv +glot i+del subgl glot cl2cl pr −cons −ant −cont +tense +distr rel del covered rel +low +strid +round +back 0.0
0.1
+syl 0.2
0.3
0.4
0.5
0.6
0.7
SPE and Halle and Clements (n=4274)
0.4 0.3 −del rel
0.0
0.1
0.2
SPE only (n=68)
0.5
0.6
0.7
Feature usage for SPE vs. Halle and Clements
mv glot del +del relrel 2cl 0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
SPE and Halle and Clements (n=4274)
Figure 6. Usage of SPE features (vs. the Halle and Clements system)
240
Jeff Mielke, Lyra Magloughlin, and Elizabeth Hume
Table 9. Phonologically active classes de¿ned using [íheightened subglottal pressure] in SPE and unnatural in the Halle and Clements system segments
excluded
languages
/ ݐl w ݝj/ /l ܿ w ݝj/ /z l w j/ / ݐl/ / l j w/ / l/ /ܵ l/ /t s n l/ / tࡧ dࡧ s z n l / / t d ts dz s z n l /
/r/ /r/ /r/ /r/ /r/ / r rଜ / /r/ /r/ /r/ /r/
Okpe, Uvbie Edo Mising Agulis Armenian Ehueun Ukue Epie Estonian Catalan Ukrainian
The feature [continuant] appears here because of how it is handled differently by the Halle and Clements system as opposed to the SPE feature system, particularly with respect to laterals (which are [ícontinuant] in SPE and [+continuant] in the Halle and Clements system). The most frequent types of classes involving [íheightened subglottal pressure] are shown in Table 9. As seen above in Table 5, the function of this feature is to exclude a trill from classes that would otherwise be expected to include it. This is consistent with the special conditions needed to produce a trill (Solé 2002) and with the quick abandonment of the [heightened subglottal pressure] as a feature shared by aspirated consonants and other non-trills. 2.3.
Comparing the Halle and Clements and UFT systems
There are 617 classes that are natural within the Halle and Clements system, but not within the UFT system, as summarized in Table 10. Of these, the Halle and Clements features [labial], [back], and [high], in conjunction with other features, are found in 451 natural classes that cannot be handled by UFT, which has the privative feature [Labial] but no [íLabial]. This can be seen in Figure 7, where the values of these three features are shown to describe a great number of classes that UFT is unable to account for. The most frequent of these include 38 × [ílabial, +syllabic], 32 × [+back, +vocalic], 18 × [íhigh, +back], 12 × [íhigh, +nasal], 10 × [+high, ílabial, +back], and 10 × [íhigh, ílabial, +syllabic]. These are mostly subsets of
Evaluating the effectiveness of Uni¿ed Feature Theory 241 Table 10. Classes natural according to the Halle and Clements system and UFT classes
median size
H&C
UFT
3767 617 149 1617
4 3 5 5
3 3
3
6159
4
4389 71.3%
3 3921 63.7%
vowels, which would require minus values of UFT’s place features to be natural classes in that feature system. The most frequent classes de¿ned by [í labial,+syllabic] consist of front vowels and low vowels (but not nonlow back vowels, which are round). In addition, there are classes of unrounded vowels, including some nonlow back or central vowels (but not back round vowels), and classes of unrounded vowels (excluding rounded vowels, including front rounded vowels). Most of the rest are straightforwardly classes of back vowels. The 32 classes that are analyzed as [+back, +vocalic] in the Halle and Clements system but unnatural in the UFT system involve back vowels, which are [Dorsal], and low vowels, which are not, to the exclusion of front vowels. These would be natural in the UFT feature system if [íCoronal] were an option. Conversely, 149 classes that were unnatural using Halle and Clements features are natural using UFT features. Figure 8 does not show many features standing far above the diagonal. The most frequent feature descriptions among these classes involve the feature [vocoid], which is similar, but not identical, to the opposite of [consonantal], e.g., 7 × [ívocoid] and 2 × [ívocoid, íspread glottis]. The primary difference between [+consonantal] and [ívocoid] concerns glottal consonants, which are [íconsonantal] in the Halle and Clements system and [ívocoid] in UFT. The classes involving [ívocoid] are all examples of consonants, including glottals, as opposed to glides and vowels (Table 11). Other recurrent classes involve [approximant], a feature found only in UFT: [íapproximant, ínasal] occurs twice. This is the class of obstruents plus / ݦ/ and/or / h /. This is probably better understood as a slightly different de¿nition of ெobstruent’’ that includes glottals. See Miller (2011) and Mielke (to appear) for discussions of glottals and the sonorant-obstruent
0.5 0.2
0.4 0.3
−lab +back −high
0.0
0.1
Halle and Clements only (n=670)
0.6
0.7
Jeff Mielke, Lyra Magloughlin, and Elizabeth Hume
−son −cor −voc −cont +syl +high +voice −back −voice +voc +cont +ant −nas +son +nas −strid −low −ant −round +lab −syl −c.g. −cons −ATR +distr −s.g. +cons +tense +c.g. +cor −tense +strid +low +s.g. −distr +ATR +lat +round −lat 0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Halle and Clements and Unified Feature Theory (n=3767)
0.3
0.4
0.5
0.6
0.7
Feature usage for Halle and Clements vs. Unified Feature Theory
−lab
0.2 0.1 0.0
Halle and Clements only (n=670)
242
−cor
+tense −tense 0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Halle and Clements and Unified Feature Theory (n=3767)
Figure 7. Usage of Halle and Clements features (vs. UFT)
0.5 0.4 0.3 0.2 0.0
0.1
Unified Feature Theory only (n=149)
0.6
0.7
Evaluating the effectiveness of Uni¿ed Feature Theory 243
−cont −nasal −vocoid V−place +distr +voice −lat−strid −s.g. −approx +son +cont −distr −open2 +open3 +ant−son −c.g.−voice +lat +approx −open3 +open4 Cor −open1 −open4 Dor(V) Cor(C) Phar(C) −ant −ATR +open2 +ATR Lingual ingual(C) Lab(V) +distr(V) −ant(C) C−place Cor(V) +distr(C) Dor(C) Lab Dor +s.g. +c.g. +open5 −open5 +open1 −open6 distr(V) Phar(V) Lab(C) +ant(C) distr(C) +vocoid +strid+nas 0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Unified Feature Theory and Halle and Clements (n=3767)
0.5 0.4 0.3 0.2 0.0
0.1
Unified Feature Theory only (n=149)
0.6
0.7
Feature usage for Unified Feature Theory vs. Halle and Clements
V−place −approx +open3 −open2 +approx −open3 +open4 −open1 −open4 Phar(C) +open2 Lingual ingual(C) C−place Cor(V) Dor(C) Dor +open5 −open5 −open6 +open1 Phar(V) 0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Unified Feature Theory and Halle and Clements (n=3767)
Figure 8. Usage of UFT features (vs. the Halle and Clements system)
244
Jeff Mielke, Lyra Magloughlin, and Elizabeth Hume
Table 11. Phonologically active classes de¿ned using [–vocoid] in UFT and unnatural in the Halle and Clements system segments
excluded
languages
Consonants including glottals
vowels and glides
Consonants including glottals
glides
Consonants including syllabic nasal
vowels
Ilocano, Irish, Oneida, Thompson Ecuador Quichua (Puyo Pongo variety) Dagur
boundary. [+approx] (vowels, liquids, and glides) occurs twice. Many of the other recurrent classes in this category involve [continuant] in conjunction with other features: (6 × [ícontinuant, +sonorant], 3 × [ícontinuant, ídistributed], etc.) 2.4.
UFT: Comparing versions with and without binary place features
We have seen in the previous subsection that a weakness of UFT, in particular compared to the Halle and Clements system, is the absence of [í] values for place features, as observed by Clements in the quote at the beginning of this chapter. Going from the Halle and Clements system to UFT involves a reduction in the number of classes considered to be natural. As noted in the previous subsection, a big part of this is the lack of [íLabial] in UFT. By giving UFT’s place features both plus and minus values, the system is thus able to describe more classes as natural. Table 12 shows the phonologically active classes accounted for by the three versions of UFT. The only difference between UFT and UFT-place is the addition of minus values of place features, and the only difference between UFT-place and UFT-full is feature values speci¿ed in UFT-full but unspeci¿ed in UFTplace . Thus, the classes captured by UFT-full are a superset of the classes accounted for by UFT-place , and the classes accounted for by UFT-place are a superset of those accounted for by UFT. There are 547 more natural classes captured by UFT-place than by conventional UFT, including 167 classes involving [íCoronal], 151 involving [íLabial], and 106 involving [íDorsal]. These include many of the grave classes in Table 6 (above), which made use of the feature [grave]
Evaluating the effectiveness of Uni¿ed Feature Theory 245 Table 12. UFT variants classes
median size
UFT
3916 547 183 1504
4 3 3 5
3
6159
4
3921 63.7%
UFT-place
UFT-full
3 3
3 3 3
4468 72.5%
4651 75.5%
from Preliminaries, and many of the nonlabial classes in Table 8, which made use of [ílabial] in the Halle and Clements system. Figure 9 shows that [íCoronal], [íLabial], and [íDorsal] are the only feature values that are far above the diagonal. 2.5.
UFT: Comparing versions with binary place features and full speci¿cation
Compared to the earlier feature systems, UFT is restrictive, in that it excludes minus values of certain features and does not have values for features that are dependants of privative features that are not present. It was seen in the previous subsection that including the minus values of the place features permits 547 more classes, mostly involving the minus values of [íCoronal], [íLabial], and [íDorsal]. An additional step is to specify all features for every segment (e.g. [distributed], [anterior], and [lateral] for non-coronals). In this scenario, the difference between the full-speci¿cation UFT system and the non-UFT feature systems is just the choice of features, and not the restrictions on specifying them. Full speci¿cation was achieved by giving a [í ] in place of all unspeci¿ed values. There are 183 classes that are natural according to UFT-full, but not according to UFT-place features. The most frequent feature descriptions among these classes involve the feature [ísyllabic], which accounts for 65 of the recurrent classes that UFT-full features handle, but UFT-place features do not. The feature [distributed] accounts for 75 of the recurrent natural classes in this category, [sonorant] for 51 classes. Figure 10 shows that versions of [ídistributed], [íanterior], and [ícoronal] are more frequent among the classes not accounted for by the other version of UFT.
0.6 0.5 0.4 0.3 0.2
−Dors
0.1
−Cor −Lab
0.0
Only UFT with binary place features (n=547)
0.7
Jeff Mielke, Lyra Magloughlin, and Elizabeth Hume
−cont −son −voice −strid +open2 +voice −Phar−nas −Cor(C) +Dors Phar(C) −Cor(V) −c.g. +cont +nas +Lab +son Dors(C) −vocoid −open2 −Lingual +open3 ingual(V) +Cor −s.g. −approx +Dors(C) −open1 Dors(V) +ATR −Lab(C) −open3 +open4 +Phar(C) −lat +c.g. −open4 +Lingual ngual(C) ingual(C) Phar(V) +Lab(C) +strid +s.g. +approx +ant −ant V−place −Lab(V) −distr(C) +Cor(V) +distr(C) −ant(C) +Cor(C) +Phar −open5 +open5 −open6 +open1 ingual(V) distr(V) +Lab(V) +Dors(V) +ant(C) C−place −ATR +lat −distr +distr +vocoid 0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
UFT with or without binary place features (n=3916)
0.6 0.5 0.2
0.4 0.3
−Cor −Lab −Dors −son
0.1
Only UFT with binary place features (n=547)
0.7
Feature usage for binary Unified Feature Theory
0.0
246
−Cor(C) Phar(C) −Cor(V) Dors(C) −Lingual ingual(V) Dors(V) −Lab(C) ngual(C) Phar(V) −ant −Lab(V) −distr(C) −ant(C) distr(V) −distr 0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
UFT with or without binary place features (n=3916)
Figure 9. Usage of UFT features, with and without binary place features
0.5 0.4 0.3 0.2 0.0
0.1
Only UFT with full specification (n=183)
0.6
0.7
Evaluating the effectiveness of Uni¿ed Feature Theory 247
−distr −ant(C) −son −distr(C) −cont −Cor(V)+nas +voice −voice −Lab(C) −Dor(C) −Dor−strid +son +approx V−place ingual(C) −nas −Lab +cont +C−place −s.g. −Dor(V) −c.g. +Lingual −vocalic −lat+ant(C) −ant −Phar(V) −ant(V) +c.g. −Cor(C) +distr −approx −open2 −open6 +V−place ingual(C) distr(V) −Lab(V) +Lab −Phar +s.g. −open4 −Phar(C) +ant +Cor −ATR −open5 +open1 −open1 +open4 +open3 +open2 −open3 −Lingual ingual(V) +Lab(V) Phar(V) distr(V) +Dor(V) +Phar(C) +Lab(C) +Cor(C) +distr(C) +Cor(V) +Dor(C) C−place +lat +vocoid +vocalic +Phar +strid −vocoid +Dor −Cor +ATR 0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Binary UFT with or without full specification (n=4463)
Figure 10. Usage of binary UFT features, with and without full speci¿cation
[distributed] and [anterior] are features that are only speci¿ed in coronals in the other UFT-based feature systems. They are used more when they are fully speci¿ed because there are a lot of segments that otherwise would not have values for them, not necessarily because there are additional phonetically natural classes that were neglected by the other feature systems. The feature bundles with three or more phonologically active classes that become natural if the rest of the UFT features are made binary are 4 × [+nasal, í distributed] (nonpalatal nasals, an analysis that depends, perhaps dubiously, on labials being [ídistributed]), 4 × [íC-place Labial, íC-place Lingual] (vowels, glottals and glides, which in three of these cases pattern together by being transparent to spreading that is blocked by other segments), [íC-place anterior] (noncoronal consonants plus posterior coronals, which are active in the same sound pattern occurring in three varieties of Slavey), and 3 × [íC-place Lingual, íLabial] consonants (glottals and / j /). These classes, being the most frequent of all the classes gained by adding full speci¿cation, suggest that we are now scraping the bottom of the barrel.
248
Jeff Mielke, Lyra Magloughlin, and Elizabeth Hume
2.6.
Discussion: improving UFT-place
0.4 0.3 0.2 0.1 0.0
SPE only (n=201)
0.5
0.6
0.7
While UFT-place appears to do the best out of three feature systems that handle similar numbers of phonologically active classes (as seen above in Figure 2), there are still 201 classes that are handled by SPE but not by UFT-place (Table 3, above). Figure 11 shows that the feature values frequently involved in the classes that are only natural for SPE are the major class feature values [+continuant], [ívocalic], the place feature values used to refer to labial+coronal classes [íhigh] and [+anterior], and also [íheightened subglottal pressure] and [+sonorant], which are not involved in as many recurrent classes. Most of these recurrent classes are also handled by the Halle and Clements system. Many of them involve subgroupings of place of articulation, including the two most frequent SPE classes that are unnatural in the UFTplace system: 8 × [íhigh, +nasal] and 6 × [+anterior, +nasal], both of which de¿ne classes of labial and coronal (but not velar) nasals. Other recurrent classes involve the major class features [vocalic] and [consonantal], which are de¿ned somewhat differently in SPE and UFT: 5 × [ícoronal,
+cont −high −voc −hi subgl +ant +voice +sonpr −syl +nas −son +high −cor +cor −ant−voice −back −strid−nas −cons −cont +cons −del rel +voc −round +distr −distr −glot cl+back +tense −tense −low mv glot +glot clcl +lat +low idel subgl pr +strid −lat +round +del rel covered rel 2 0.0
0.1
+syl 0.2
0.3
0.4
0.5
0.6
0.7
SPE and binary UFT (n=4141)
Figure 11. Usage of SPE features (vs. UFT-place features)
Evaluating the effectiveness of Uni¿ed Feature Theory 249
+continuant, ívocalic], 4 × [íconsonantal], 4 × [ívocalic], 4 × [+vocalic], 4 × [+high, íconsonantal, ívocalic]. At issue here are features that de¿ne slightly different classes, such as slightly different de¿nitions for major class features, and the difference between [íhigh] and [íDorsal]. The problem that these classes highlight is not the particular features chosen, but the expectation that a limited set of universally-de¿ned features should be able to de¿ne the wide range of classes that are phonologically active. These examples suggest that traditional distinctive feature systems underdetermine the phonologically relevant boundaries between different places of articulation. 2.7.
Summary
Most of the historical changes between feature sytems described above involved adding the ability to represent a recurrent phonetically natural class. The exception is the last step, going to UFT-full. This supports the idea that there may be a natural limit around 70-75%, beyond which there are mostly just non-recurrent phonetically unnatural classes. The next section explores two strategies for dealing with the residue of featurally unnatural classes. 3.
Union and Subtraction
The phonologically active classes that are unnatural according to the feature sytems explored in the previous section are generally not random assortments of segments. Many of them may be due to the effects of what were originally more than one sound pattern, and as such may be modeled ef¿ctively by using more than one natural class. This section investigates and interprets the relative success of rule- and constraint-based approaches to phonologically active classes that cannot be represented by a conjunction of UFT-place distinctive features. The OT-style conÀict-based approach is compared with the SPE-style bracket notation. We examine featurally unnatural classes, to see how well they can be represented in terms of two natural classes, i.e., as the union of two natural classes, or as the result of subtracting one natural class from another. Unnatural classes are groups of sounds that may be active in the phonology of a language but which cannot be represented as a single feature bundle in a particular feature theory. Unnatural classes that are phonologically active
250
Jeff Mielke, Lyra Magloughlin, and Elizabeth Hume
are not rare (Mielke 2008). A classic unnatural class is involved in the ெruki’’ rule of Sanskrit, whereby / s / in verbal roots is retroÀexed following any of the segments { u k i} (Whitney 1960, Renou 1961, Zwicky 1970). Phonological theory has had ways of dealing with unnatural classes. For example, in rule-based phonology (e.g., Chomsky and Halle 1968), bracket notation allows rules to reference unnatural classes if they are the union of two or more classes which are themselves natural. In Optimality Theory (Prince and Smolensky 1993), constraint interaction permits unnatural classes to act together if a constraint referring to a natural class is opposed by one or more constraints referring to overlapping natural classes, as described by Flemming (2005). Caveats include the fact that both approaches allow any logically possible class of sounds to be represented (given a feature system rich enough to capture all the necessary segmental contrasts), and each can use the other’s approach too. The point here is about the relative merits of union and subtraction as methods for describing unnatural classes.
3.1.
Predictions
The two approaches make different predictions about the frequency of occurrence of unnatural classes. The SPE-style approach predicts that the most frequent unnatural classes should be easily represented with the union of natural classes, while the OT-style approach predicts that the most frequent unnatural classes should be those which are easily represented through the subtraction of one natural class from another. Figures 12–13 show the segments that cause a preceding /ܭ/ to be realized as [æ] in a variety of Afrikaans spoken in the Transvaal and the Free State Afrikaans. Figure 12 shows how the class can be analyzed using UFT-place features as the union of two natural classes. In this approach, the unnatural class / k x r l / is assembled by combining the natural class of voiceless dorsals (/ k x /) with the natural class of coronal approximant consonants (/ r l /). Figure 13 shows how the same class can be analyzed by subtracting one natural class from another. The unnatural class / k x r l / is assembled in this approach by starting with the class of nonnasal nonvocoids and subtracting the class of nondorsal nonapproximants. Notice that an OT-style ெunion’’ account is possible using two markedness constraints that refer to different classes, and an SPE-style ெsubtraction’’ account is possible using rule ordering.
Evaluating the effectiveness of Uni¿ed Feature Theory 251
p
t s d
f b H
Subset:
k x
v r l j m
−voice +Dorsal
Disjunction:
Rule:
n
/E/ →
N
∨
+open1
+approximant +C-place Coronal
⎧ ⎫ −voice ⎪ ⎪ ⎪ ⎪ ⎨ ⎬ +Dorsal +approximant ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ +C-place Coronal
/
Figure 12. Union analysis of an unnatural class in a variety of Afrikaans (the segments that cause a preceding /ܭ/ to be realized as [æ])
p
t s d
f b H
Subset:
k x
v r l j m
Subtraction:
−vocoid −nasal
n
N
−
−approximant −Dorsal
−approximant −vocoid Ranking: *æ >> *E >> I DENT[open1] −Dorsal −nasal Figure 13. Subtraction analysis of an unnatural class in a variety of Afrikaans
252
Jeff Mielke, Lyra Magloughlin, and Elizabeth Hume
Figures 14 and 15 illustrate phonologically active classes that each can be analyzed in only one of these two ways. Figure 14 shows the segments that cause a preceding nasal to be realized as [n] in Diola-Fogny (Sapir 1965). This unnatural class can be analyzed in UFT-place with the union of two natural classes. Figure 16 shows the segments that are nasalized before a nasalized vowel in Aoma (Elugbe 1989), and the analysis of this unnatural class can in UFT-place with the subtraction of two natural classes. 3.2. Testing The predictions of these two approaches to unnatural classes were tested using the 1691 P-base classes that are unnatural according to UFT-place. These phonologically active unnatural classes were compared with 1691 matching randomly generated class. The random classes were generated as above by randomly drawing a class of the same size from the same inventory as the real class. We exclude the 14 randomly-generated classes that turned out to be natural according to the feature system and focus on the 1677 pairs of real and randomly occurring classes that are unnatural according to this
p b f
Subset:
m
Disjunction:
Rule:
t d s n ô l
c é
k g
ñ j
N w
h
+Labial +continuant
+nasal
→
∨
+anterior −lateral
+Coronal
/
⎫ ⎧ +Labial ⎪ ⎪ ⎪ ⎪ ⎬ ⎨ +continuant +anterior ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ −lateral
Figure 14. Union analysis of an unnatural class in Diola-Fogny: the segments that cause a preceding nasal to be realized as [n]
Evaluating the effectiveness of Uni¿ed Feature Theory 253
p b f v
Subset:
t d s z r ô l
k g x G j
> kp > gb h
w
m Subtraction:
−nasal −syllabic
⎤ −lateral − ⎣ −spread ⎦ −vocoid ⎡
⎤ +nasal ⎢ −lateral ⎥ −nasal ⎥ ⎢ ˜ >> I DENT[nasal] V >> * *⎣ −syllabic −spread ⎦ −vocoid ⎡
Ranking:
Figure 15. Subtraction analysis of an unnatural class in Aoma: the segments that are nasalized before nasal vowels
feature system. The union approach (combining two natural classes) was able to handle 1173 real classes (69.9%), and the subtraction approach was able to handle 857 real classes (51.1%). While these results look favorable for the union approach, union is also better at handling randomly-generated classes. Union represents 434 of the random classes (25.9%), while subtraction represents only 129 random classes (7.7%). Thus, union successfully represents 2.7 times as many real unnatural classes as randomly-generated ones, whereas subtraction successfully represents 6.6 times as many. This suggests that union’s success with real unnatural classes is perhaps not due to being a good model of phonology, but due to being able to handle a lot of logically possible (but not necessarily attested) combinations of segments. However, the randomly-generated classes that are handled by union and subtraction are mostly the very small classes. Two-segment classes are trivially represented by union (i.e., the class of segment A plus the class of segment B), but almost no randomly-generated classes of seven segments or larger can be represented by either technique, while both techniques can
254
Jeff Mielke, Lyra Magloughlin, and Elizabeth Hume
0.8
1.0
Analysis of unnatural real classes by size
0.6 0.4
union and subtraction
0.2
Analysis success rate
subtraction only
0.0
union only
2 (n=261)
3 (n=321)
4 (n=229)
5 (n=175)
6 (n=129)
7+ (n=562)
Class size Figure 16. Representing unnatural real classes: union vs. subtraction
represent about half of the real unnatural classes of seven or more segments. Figures 16–17 show the percentage of classes of different sizes that are handled by the two techniques. We conclude that very small classes have little value for testing these approaches to unnatural classes. Focusing on the classes that are large enough that their random counterparts are not trivially natural shows that both techniques are quite effective at modeling the observed unnatural classes. Union has a higher success rate, and the classes that are handled by subtraction are nearly a subset of the classes that are handled by union. Since many of the super¿cially unnatural classes of sounds can be understood in terms of the interaction of more than one sound pattern, deciding whether subtraction or union is more appropriate in a particular
Evaluating the effectiveness of Uni¿ed Feature Theory 255
0.6 0.4
subtraction only
union and subtraction 0.2
Analysis success rate
0.8
1.0
Analysis of unnatural random classes by size
0.0
union only
2 (n=261)
3 (n=321)
4 5 (n=229) (n=175) Class size
6 (n=129)
7+ (n=562)
Figure 17. Representing unnatural random classes: union vs. subtraction
instance requires looking at each sound pattern in more detail (for this, see e.g., Hall 2010). 4.
Conclusion
A central goal of an adequate feature system is to capture classes of sounds that pattern together in language systems, and ideally only those classes. Achieving this goal requires a theory powerful enough to represent naturally occurring classes, yet suf¿ciently constrained so that it does not predict unnatural classes. The six feature theories examined in this study display varying degrees of success in capturing the 6,159 naturally-occurring phonological classes drawn from P-base. Consistent with the prediction of Nick Clements, a version of UFT enhanced with binary place features
256
Jeff Mielke, Lyra Magloughlin, and Elizabeth Hume
emerged as competitive with the SPE and Halle and Clements systems. Each of these three systems was able to represent between 70% and 73% of the naturally occurring phonologically active classes. UFT-place performed slightly better than the SPE and Halle and Clements systems, accounting for more observed classes, and fewer random classes. We have suggested that this percentage may reÀect distinctive feature theory’s natural threshold for capturing classes, and that the remaining 27–30% of real phonological classes are not phonetically natural enough to be expressed in terms of the best systems of phonetically-de¿ned distinctive features. Of the approximately 6,000 real classes used in this study, about 3,000 phonologically active classes are treated as natural by all six feature systems and about 1,300 classes are unnatural for all. As discussed above, the remaining classes reveal differences among the feature systems. In some cases, the differences are due to the absence of a feature in a theory, e.g. [labial] in SPE. Other differences reÀect the fact that the partitioning of classes can vary from one system to another such that the same value of a feature with the same name targets different sounds depending on the system. This is seen with the use of [continuant] in the Halle and Clements system as opposed to SPE systems, as reÀected, for example, in how laterals are classi¿ed. A slight variation in the de¿nition of similar features also creates differences in how sounds are classi¿ed, e.g. [íhigh] and [íDorsal]. Instead of converging on well-de¿ned boundaries between places of articulation or major classes of segments, the different feature systems succeed because they de¿ne the boundaries in slightly different places, essentially de¿ning the class of coronals with and without palatals (see Table 6) by using the features [grave], [+coronal], and [ílabial, íback], and de¿ning the class of consonants or obstruents with and without glottals and/or glides (see Table 11) by using features such as [ísonorant], [+consonantal], [íapproximant] and [ívocoid]. These slightly different features succeed in allowing slight variations on familiar classes, but often do so circuitously, much like using [íheightened subglottal pressure] in order to de¿ne the class / n l / in a language that also has /r/ (see Table 5). The issue that these differences highlight does not have as much to do with the particular features chosen, but rather with the view that a limited set of phonetically-de¿ned features should be able to express the full range of observed classes. While the top theories do remarkably well, capturing almost 73% of observed classes, the fact that historical pathways can lead to the creation of phonetically unnatural classes and to slight variations on familiar classes means that no phonetically-based feature system with a small number of features will ever be able to represent all phonologically active classes (see Mielke 2008 for discussion).
Evaluating the effectiveness of Uni¿ed Feature Theory 257
Given the signi¿cant number of classes classes that are unnatural, yet phonologically active, our study also evaluated the success of rule- and constraint-based theories of phonology in expressing these classes by means of the union or subtraction of feature values. The results reveal that the union approach fares slightly better in capturing phonologically active unnatural classes. We conclude that the Uni¿ed Feature Theory augmented with minus values of place features, as suggested by Nick Clements, is the most effective of the six systems we compared at describing phonologically active classes without overgenerating too much. Improving upon this system brings us to the point of adding slightly different versions of the same features, blurring the boundaries between two feature values, dealing with unnatural classes, or otherwise departing from the original notion of a small set of distinctive features to describe sound patterns. While these modi¿cations are necessary in order to provide feature-based analyses of all observed sound patterns, we can conclude that UFT-place has achieved the goal of maximizing the range of recurrent phonetically natural classes of sounds that are phonologically active in the world’s languages and can accounted for by a small set of distinctive features.
Appendix 5.
References for languages from P-base mentioned in the chapter
Acehnese (Durie 1985) Afrikaans spoken in the Transvaal and the Free State (Donaldson 1993) Agulis Armenian (Vaux 1998) Aoma (Elugbe 1989) Arabana (Hercus 1994) Asmat (Flamingo Bay dialect) (Voorhoeve 1965) Batibo Moghamo (Meta’) (Stallcup 1978) Catalan (Wheeler 1979) Coeur d’Alene (Johnson 1975) Dàgáárè (Bodomo 2000) Dagur (Martin 1961) Dhaasanac (Tosco 2001) Dieri (Diyari) (Austin 1981) Diola-Fogny (Sapir 1965). Ecuador Quichua (Puyo Pongo variety) (Orr 1962)
258
Jeff Mielke, Lyra Magloughlin, and Elizabeth Hume
Edo (Elugbe 1989) Ehueun (Elugbe 1989) Epie (Elugbe 1989) Estonian (Harms 1962) Faroese (Lockwood 1955) Gooniyandi (McGregor 1990) Gunin/Kwini (McGregor 1993) Harar Oromo (Owens 1985) Hungarian (Abondolo 1988) Ilocano (Rubino 2000) Irish (ÓSiadhail 1989) Japanese (Vance 1987) Kalenjin (Toweett 1979) Khmu( ݦSmalley 1961) Kilivila (Lawton 1993) Koromfé (Rennison 1997) Kumiái (Diegueño) (Gorbet 1976) Maasai (Hollis 1971) Martuthunira (Dench 1995) Melayu Betawi (Ikranagara 1975) Midland Mixe (Puxmetacan) (Wichmann 1995) Mising (Prasad 1991) Muruwari (Oates 1988) Muscat Arabic (Glover 1989) Nandi Kalenjin (Toweett 1979) Northern Tepehuan (Willett 1988) O’odham (Saxton 1979) Okpe (Elugbe 1989) Oneida (Michelson 1983) San Pedro de Cajas Junín (Adelaar 1977) Slovene (Herrity 2000) South Highland Mixe (Tlahuitoltepec) (Wichmann 1995) Sri Lanka Creole Portuguese (Smith 1981) Supyire Senoufo (Carlson 1994) Tarma Quechua (Adelaar 1977) Tetelcingo Nahuatl (Tuggy 1979) Thompson (Thompson and Thompson 1992) Tirmaga (Bryant 1999) Ukrainian (Bidwell 1967–68)
Evaluating the effectiveness of Uni¿ed Feature Theory 259
Ukue (Elugbe 1989) Uvbie (Elugbe 1989) Wangkangurru (Hercus 1994) Yiddish (Katz 1987) Yidi݄ (Dixon 1977) Note 1. The voiceless˜voiced pairs in Japanese include h˜b.
References Abondolo, D. M. 1988 Hungarian inÀectional morphology. Budapest: Akadémiai Kiadó. Adelaar, W. F. H. 1977 Tarma Quechua grammar, texts, dictionary. Lisse: The Peter de Ridder Press. Austin, Peter. 1981 A grammar of Diyari, South Australia. Cambridge: Cambridge University Press. Bidwell, Charles E. 1967–68 Outline of Ukrainian morphology. University of Pittsburgh. Bodomo, Adams B. 2000 Dágááré. Muenchen: Lincom Europa. Bryant, Michael Grayson. 1999 Aspects of Tirmaga grammar. Ann Arbor: UMI. Carlson, Robert. 1994 A grammar of Supyire. Berlin: Mouton de Gruyter. Chomsky, Noam, and Morris Halle. 1968. The Sound Pattern of English. Cambridge, Mass.: MIT Press. Clements, G.N. 1976 Palatalization: Linking or assimilation? In Chicago Linguistic Society 12, 96–109. 2001 Representational economy in constraint-based phonology. In Distinctive feature theory, ed. T. Alan Hall, 71–146. Berlin: Mouton de Gruyter. Clements, G.N., and Elizabeth V. Hume. 1995 The internal organization of speech sounds. In The handbook of phonological theory, ed. John Goldsmith, 245–306. Cambridge Mass.: Blackwell.
260
Jeff Mielke, Lyra Magloughlin, and Elizabeth Hume
Dench, Alan Charles. 1995 Martuthunira: A language of the Pilbara region of Western Australia. Canberra: Paci¿c Linguistics. Dixon, R. M. W. 1977 A grammar of YidiѪ. New York: Cambridge University Press. Donaldson, Bruce C. 1993 A grammar of Afrikaans. New York: Mouton de Gruyter. Durie, Mark. 1985 A grammar of Acehnese on the basis of a dialect of North Aceh. Bloomington/The Hague: Foris. Elugbe, Ben Ohiomamhe. 1989 Comparative edoid: Phonology and lexicon. Port: University of Port Harcourt Press. Flemming, Edward S. 2005 Deriving natural classes in phonology. Lingua 115:287–309. Glover, Bonnie Carol. 1989. The morphophonology of Muscat Arabic. Ann Arbor: UMI. Gorbet, Larry Paul. 1976 A grammar of Diegue n˜ o nominals New York: Garland no nominals. Publishing, Inc. Hall, Daniel Currie. 2010 Probing the unnatural. In Linguistics in the netherlands 2010, ed. Jacqueline van Kampen and Rick Nouwen, 71–83. Amsterdam: John Benjamins. Halle, Morris, and G.N. Clements. 1983 Problem Book in Phonology. Cambridge, Mass.: The MIT Press. Harms, Robert T. 1962 Estonian grammar. Bloomington/The Hague: Mouton & Co./Indiana University. Hercus, Luise A. 1994 A grammar of the Arabana-Wangkangurru language, Lake Eyre Basin, South Australia. Canberra: Paci¿c Linguistics. Herrity, Peter. 2000 Slovene: A comprehensive grammar. New York: Routledge. Hollis, Alfred C. 1971 The Masai: Their language and folklore. Freeport, NY: Books for Libraries Press. Hume, Elizabeth V. 1994 Front vowels, coronal consonants and their interaction in nonlinear phonology. New York: Garland. Hyman, Larry. 1972 The feature [grave] in phonological theory. Journal of Phonetics 1:329–337.
Evaluating the effectiveness of Uni¿ed Feature Theory 261 Ikranagara, Kay. 1975 Melayu Betawi grammar. Ann Arbor: UMI. Jakobson, Roman, C. Gunnar M. Fant, and Morris Halle. 1952 Preliminaries to speech analysis: the distinctive features and their correlates. Cambridge, Mass.: MIT Press. Johnson, Robert Erik. 1975 The role of phonetic detail in Coeur d’Alene phonology. Ann Arbor: UMI. Katz, Dovid. 1987 Grammar of the Yiddish language. London: Duckworth. Langacker, Ronald W., ed. 1979 Studies in Uto-Aztecan Grammar. Arlington: The Summer Institute of Linguistics and The University of Texas at Arlington. Lawton, Ralph. 1993 Topics in the description of Kiriwina. Canberra: Paci¿c Linguistics. Lockwood, W. B. 1955 An introduction to Modern Faroese. Köbenhavn: Ejnar Munskgaard. Mackie, Scott, and Jeff Mielke. 2011 Feature economy in natural, random, and synthetic inventories. In Where do phonological contrasts come from?, ed. G. N. Clements and Rachid Ridouane. Amsterdam: John Benjamins. Martin, Samuel Elmo. 1961 Dagur Mongolian grammar, texts, and lexicon; based on the speech of Peter Onon. Bloomington: Indiana University. McGregor, William. 1990 A functional grammar of Gooniyandi. Philadelphia: John Benjamins Publishing Company. McGregor, William B. 1993 Gunin/Kwini. München: Lincom Europa. Michelson, Karin Eva. 1983 A comparative study of accent in the Five nations Iroquoian languages. Ann Arbor: UMI. Mielke, Jeff. 2008 The Emergence of Distinctive Features. Oxford: Oxford University Press. to appear A phonetically-based metric of sound similarity. Lingua. Miller, Brett. 2011 Feature patterns: Their sources and status in grammar and reconstruction. Doctoral Dissertation, Trinity College, Dublin. Oates, Lynette F. 1988 The Muruwari language. Canberra: Paci¿c Linguistics. Odden, David. 1978 Further evidence for the feature [grave]. Linguistic Inquiry 9:141–144.
262
Jeff Mielke, Lyra Magloughlin, and Elizabeth Hume
Orr, Carolyn. 1962 Equador Quichua phonology. In Studies in Ecuadorian Indian languages: I, ed. Benjamin Elson. Norman: Summer Institute of Linguistics of the University of Oklahoma. ÓSiadhail, Mícheál. 1989 Modern Irish: Grammatical structure and dialectal variation. New York: Cambridge University Press. Owens, Jonathan. 1985 A grammar of Harar Oromo (Northeastern Ethiopia). Hamburg: Helmut Buske Verlag. Prasad, Bal Ram. 1991 Mising grammar. Mysore: Central Institute of Indian Languages. Prince, Alan, and Paul Smolensky. 1993 Optimality theory: Constraint interaction in generative grammar. Ms, Rutgers University, New Brunswick and University of Colorado, Boulder. Rennison, John R. 1997 Koromfe. New York: Routledge. Renou, L. 1961 Grammaire Sanscrite. Paris: Adrein-Maisonneuve. Rubino, Carl Ralph Galvez. 2000 Ilocano dictionary and grammar. Ilocano: University of Hawai’i Press. Sapir, J. David. 1965 A grammar of Diola-Fogny. Cambridge: Cambridge University Press. Saxton, Dean. 1979 Papago. In Langacker (1979). Smalley, William A. 1961 Outline of Khmu ݦstructure. New Haven: American Oriental Society. Smith, Ian Russell. 1981 Sri Lanka Portuguese Creole phonology. Ann Arbor: UMI. Solé, Maria-Josep. 2002 Aerodynamic characteristics of trills and phonological patterning. Journal of Phonetics 30:655–688. Stallcup, Kenneth Lyell. 1978 A comparative perspective on the phonology and noun classi¿cation of three Cameroon Grass¿elds Bantu languages: Moghamo, Ngie, and Oshie. Ann Arbor: UMI. Thompson, Laurence C., and M. Terry Thompson. 1992 The Thompson language. University of Montana Occasional Papers in Linguistics No. 8. Tosco, Mauro. 2001 The Dhaasanac language: grammar, text, vocabulary of a Cushitic language of Ethiopia. Köln: Rüdiger Kóppe Verlag.
Evaluating the effectiveness of Uni¿ed Feature Theory 263 Toweett, Taaitta. 1979 A study of Kalenjin linguistics. Nairobi: Kenya Literature Bureau. Tuggy, David H. 1979 Tetelcingo nahuatl. In Langacker (1979). Vance, Timothy J. 1987 An introduction to Japanese phonology. Albany, N.Y.: State University of New York Press. Vaux, Bert. 1998 The phonology of Armenian. Oxford: Clarendon Press. Voorhoeve, C. L. 1965 The Flamingo Bay Dialect of the Asmat language. S-GravenhageMartinus Nijhoff. Wheeler, Max. 1979. Phonology of Catalan. Oxford: Basil Blackwell. Whitney, W.D. 1960. Sanskrit grammar [9th issue of 2nd ed.]. Cambridge, Mass.: Harvard University Press. Wichmann, Søren. 1995. The relationship among the Mixe-Zoquean languages of Mexico. Salt Lake City: University of Utah Press. Willett, Thomas Leslie. 1988 A reference grammar of Southeastern Tepehuan. Ann Arbor: UMI. Zwicky, Arnold M. 1970 Greek-letter variables and the Sanskrit ruki class. Linguistic Inquiry 1: 549–55.
Language-independent bases of distinctive features Rachid Ridouane, G. N. Clements, and Rajesh Khatiwada 1.
Introduction
A basic principle of human spoken language communication is phonological contrast: distinctions among discrete units that convey different grammatical, morphological or lexical meanings. Among these units, features have achieved wide success in the domain of phonological description and play a central role as the ultimate constitutive elements of phonological representation. Various principles are claimed to characterize these elements. They are universal in the sense that all languages de¿ne their speech sounds in terms of a small feature set. They are distinctive in that they commonly distinguish one phoneme from another. They delimit the number of theoretically possible speech sound contrasts within and across languages. They are economical in allowing relatively large phoneme systems to be de¿ned in terms of a much smaller feature set. They de¿ne natural classes of sounds observed in recurrent phonological patterns. A main theme is that existing distinctive features are generally satisfactory for phonological purposes but may not be phonetically adequate (e.g. Ladefoged 1973, 1980, 1993; Löfqvist and Yoshioka 1981). At least three positions have been taken: (1) Existing features should be improved by better phonetic de¿nitions (e.g. Halle 1983; Stevens, 1989, 2003; Stevens and Keyser 2010), (2) Existing features should be supplemented with phonetic features (e.g. Flemming 1995; Boersma 1998), and (3) Existing features should be replaced with phonetically more adequate primitives (e.g. gestures of Browman and Goldstein 1986). We take up the ¿rst position and argue that both the acoustic and articulatory structure of speech should be incorporated into the de¿nition of phonological features. Features are typically de¿ned, according to the researcher, either in the acoustic-auditory domain (e.g. Jakobson, Fant and Halle 1952), or in the articulatory domain (e.g. Chomsky and Halle 1968). After several decades of research, these conÀicting approaches have not yet led to any widely-accepted synthesis (Durand 2000). A problem for purely acoustic approaches is the widely-
Language-independent bases of distinctive features 265
acknowledged dif¿culty in ¿nding acoustic invariants for a number of fundamental features, such as those characterizing the major places of articulation. A problem for purely articulatory approaches is raised by the existence of articulator-independent features such as [continuant], which is implemented with different gestures according to the articulator employed (e.g. no invariant gesture is shared by the continuants [f], [s], and [x]). These and other problems suggest that neither a purely acoustic nor a purely articulatory account is self-suf¿cient. In recent years, a new initiative has emerged within the framework of the Quantal Theory of speech, developed by K.N. Stevens and his colleagues (e.g. Stevens 1989, 2002, 2003, 2005; Stevens and Keyser 2010). Quantal theory claims that there are phonetic regions in which the relationship between an articulatory con¿guration and its corresponding acoustic output is not linear. These regions form the basis for a universal set of distinctive features, each of which corresponds to an articulatory-acoustic coupling within which the auditory system is insensitive to small articulatory movements. A main innovation of this theory is the equal status it accords to the acoustic, auditory, and articulatory dimensions of spoken language. For a feature to be recovered from a speech event, not only must its articulatory condition be met, but its acoustic de¿nition must be satis¿ed, or else further enhancing attributes must be present. The de¿ning acoustic attributes of a feature are a direct consequence of its articulatory de¿nition. These are considered to be language-independent. The enhancing attributes of a feature are additional cues that aid in its identi¿cation. These may vary from language to language (Stevens and Keyser 2010). Our objective will be to propose a language-independent phonetic de¿nition of the feature [spread glottis]. We will show that an articulatory de¿nition of this feature in terms of a single common glottal con¿guration or gesture would be insuf¿cient to account for the full range of speech sounds characterized by this feature; an acoustic de¿nition is also necessary. 1.1. The feature [spread glottis] Laryngeal features for consonants are used to de¿ne the following phonologically distinctive dimensions: Voicing, aspiration, and glottalization. These have been expressed with different sets of features. Depending on the particular authors, the differences between the features used are mainly due to the different interpretations of how these laryngeal dimensions are
266
Rachid Ridouane, G. N. Clements, and Rajesh Khatiwada
physically produced. Voicing is de¿ned either by the feature [voice] or by a combination of the features [slack vocal cords]/[stiff vocal folds]. Glottalisation is de¿ned either by the feature [checked] or [constricted glottis]. Good reviews of the use of these features are provided by Keating (1988: 17–22), Lombardi (1991: 1–30), Jessen (1998: 117–136), and the references therein. Aspiration, which is the main concern of this paper, has been traditionally de¿ned as ‘a puff of air’ (Heffner 1950) or ‘breath’ (Jones 1964) following the release of a consonant. Lisker and Abramson (1964) relate the contrast of aspiration (and voicing) mainly to a different timing of laryngeal activity relative to the supralaryngeal constriction. According to them “… the feature of aspiration is directly related to the timing of voice onset…” (Lisker and Abramson 1967: 15). With the exception of Browman and Goldstein (1986), who incorporated this timing into phonological analysis, Lisker and Abramson’s framework had almost no impact on phonologists (the VOT criterion is discussed in more detail in section 3). Rather, phonologists used different timeless features to de¿ne aspiration: [tense], [heightened subglottal pressure], [spread glottis], or [aspirated]. Jakobson, Fant and Halle (1952) use the feature [tense] to distinguish aspirated from unaspirated stops in Germanic languages, such as English. Jessen (1998) provides the same analysis for German stops. Languages using both voicing and aspiration distinctively (such as Nepali, Thai and Hindi) are said to use both the feature [voice] and the feature [tense]. For Chomsky and Halle (1968) aspiration is represented by the feature [heightened subglottal pressure]. This feature, meant to represent the extra energy for aspiration, was highly controversial and has never been widely used. Data on subglottal pressure on several languages, such as Hindi (Dixit and Shipp 1985), have shown that aspirated stops are not systematically produced with heightened subglottal pressure, compared to their unaspirated counterparts (but see Ladefoged and Maddieson (1996) on data on Igbo). The feature [spread glottis] was formally proposed as a phonological laryngeal feature for the ¿rst time by Halle and Stevens (1971), and has achieved since then notable success among both phonologists and phoneticians (Ladefoged 1973; Kingston 1990; Iverson 1993, Kenstowicz 1994; Lombardi 1991, 1995; Iverson and Salmons 1995, 2003; Avery 1996; Jessen 1998; Avery and Idsardi 2001; Vaux and Samuels 2005; among others). The majority of linguists working within nonlinear phonology and Optimality Theory assume it exists as part of the universal set of features. [spread glottis] singles out classes that play a linguistic role. It has a lexical function in that it distinguishes similar words in many languages (e.g.
Language-independent bases of distinctive features 267
Standard Chinese: [pހa] ‘Àower’ vs. [pa] ‘eight’, Nepali [tހiti] ‘condition” vs. [titހi] ‘date’). It has a phonological function in that it de¿nes natural classes that ¿gure in phonological patterns. In Nepali, for example, both consonants in a CVC sequence may not be aspirated, i.e., [spread glottis]: [tހiti] ‘condition’, [titހi] ‘date’, but: *[tހitހi]. The following types of segments are commonly assumed to bear the [spread glottis] speci¿cation: − − −
Voiceless aspirated stops (e.g. Standard Chinese): [p]ހ Voiced aspirated or breathy voiced stops (e.g. Nepali): [d]ހ Voiceless sonorants (e.g. Burmese): [nࡢ ]
While this class of sounds is generally agreed upon, the [+spread glottis] speci¿cation has also been proposed for voiceless fricatives. For Halle and Stevens (1971), fricatives can exceptionally be [+spread glottis] as in Burmese, where it is required to distinguish plain /s/ from aspirated /sh/, but not in English, for instance, where these segments are speci¿ed as [–spread glottis]. Numerous researchers, however, argue that voiceless fricatives, including those occurring in languages where they don’t contrast in terms of aspiration, should be speci¿ed as [+spread glottis] (e.g. Rice 1988; Kingston 1990; Cho 1993; Iverson and Salmons 1995; Vaux and Samuels 2005). Both phonological and phonetic data are claimed to motivate this analysis. It is claimed to be phonologically motivated since, as for English for example, it allows a uni¿ed treatment of stop deaspiration after fricatives (e.g. in speed) and sonorant devoicing after fricatives (e.g. in sࡢlim). The claim is that these clusters represent a sharing of [spread glottis]. It is phonetically motivated since articulatory data from various unrelated languages have shown that voiceless fricatives are produced with a large degree of glottal opening (Kingston 1990; Stevens 1998; see Ridouane 2003, for a review). In the approach assumed here, a segment can be said to bear the feature [spread glottis] at the phonetic level only if it satis¿es both its articulatory and acoustic de¿nitions. We will ¿rst examine two proposed articulatory de¿nitions of the feature [spread glottis], and will show that neither is able to account for the full class of aspirated sounds across languages. We will then propose a new articulatory de¿nition coupled with an acoustic de¿nition, and show that it covers all data. 2. Two proposed articulatory de¿nitions of [spread glottis] [spread glottis] is typically de¿ned in the articulatory domain. In the current view, the basic correlate of this feature involves the spreading of the glottis,
268
Rachid Ridouane, G. N. Clements, and Rajesh Khatiwada
as its name suggests. This view originates in the work of C.-W. Kim (1970) based on cineradiographic data from the Korean voiceless stop series: tense unaspirated, heavily aspirated, and lax slightly aspirated. Kim de¿ned aspiration as a function of glottal opening amplitude. In his words (1970: 111): […] it seems to be safe to assume that aspiration is nothing but a function of the glottal opening at the time of release. This is to say that if a stop is n degree aspirated, it must have an n degree glottal opening at the time of release of the oral closure. […] no stop is aspirated if it has a small glottal opening, and […] a stop cannot be unaspirated if it has a large glottal opening at the time of the oral release.
In this view, the differences in terms of aspiration duration between the Korean three stop series is due to different degrees of glottal opening. The heavily aspirated stops have the largest glottal opening (with a peak around 10 mm according to Kim’s cineradiographic tracings), the unaspirated stops have the smallest (less than 1 mm) and the lax slightly aspirated have an intermediate glottal opening (around 3 mm). Kim’s work was followed by a series of studies on the controlling factors of aspiration vs. non-aspiration. Two main theories can be distinguished from these studies: The glottal width theory which views aspiration primarily as a function of degree of glottal width (e.g. Kagaya 1974, Hutters 1985); and the glottal timing theory which views aspiration as a function of a speci¿c temporal coordination of laryngeal gesture in relation to supralaryngeal events (e.g. Pétursson 1976; Löfqvist 1980). We consider each of these in turn and show that both fail to account for the full range of facts. 2.1. The glottal width theory The relevance of the size of glottal opening for the presence vs. absence of aspiration has been widely demonstrated: In various languages, the voiceless aspirated stops are produced with a relatively large glottal opening gesture, whereas their unspirated counterparts are produced with a glottal opening which is much narrower, being almost completed at the time of oral release. Table 1 reports some of such languages where aspirated voiceless stops have been shown to be invariably produced with a wide open glottis. In Nepali, for example, where aspiration is distinctive, maximal glottal opening is greater in aspirated stops than in unaspirated stops (Figure 1).
Language-independent bases of distinctive features 269 Table 1. A list of languages where aspirated stops have been shown to require a large glottal opening amplitude Language
References
Cantonese Danish English Fukienese German Hindi Icelandic Korean Maithili Swedish Tibetan
Iwata et al. (1981) Fukui and Hirose (1983), Hutters (1985) Lisker and Abramson (1971), Löfqvist (1980), Cooper (1991) Iwata et al. (1979) Hoole, Pompino-Marschall, and Damesl (1984); Jessen (1998) Benguerrel and Bhatia (1980); Dixit (1989) Pétursson (1976); Löfqvist and Pétursson (1978) Kagaya (1970); Kim (2005) Yadav (1984) Löfqvist and Pétursson (1978); Löfqvist and Yoshioka (1980) Kjellin (1977)
th
th
th
th -i
i
t
t
i
t
t
t -i
i
th
th
th - i
i
V
Figure 1. States of the glottis during the production of the Nepali items [tހiti] ‘condition’ (above) and [titހi] ‘date’ (below). The ¿gures show that the amplitude of glottal opening is larger during the production of aspirated [t]ހ, compared to unaspirated [t]. (occ = occlusion phase, v = vowel, rel = release phase).
270
Rachid Ridouane, G. N. Clements, and Rajesh Khatiwada
There is no doubt that differences in the size of glottal opening are relevant for the presence or absence of aspiration. A question is whether these size differences are the controlling factor of aspiration, rather than, say, interarticulatory timing differences between glottal and supralaryngeal gestures, as assumed by Löfqvist and colleagues. For Hutters (1985: 15), based on data from Danish, the production of aspiration is primarily a matter of the glottal gesture type rather than the timing between this gesture and supraglottal articulations: “The difference between aspirated and unaspirated stops in the timing of the explosion relative to the glottal gesture is primarily due to the different types of glottal gesture rather than to a different timing of the glottal and supraglottal articulations.” Similarly, Ladefoged (1993: 142) holds that “In general, the degree of aspiration (the amount of lag in the voice onset time) will depend on the degree of glottal aperture during the closure. The greater the opening of the vocal cords during a stop, the longer the amount of the following aspiration.” The feature [spread glottis], as used by most phonologists, is also assumed to entail the size of glottal spreading without incorporating timing dimensions (e.g. Goldsmith 1990; Kenstowicz 1994).
2.1.1.
Problems with glottal width theory
There are at least three problems with the view that [spread glottis] should be de¿ned in terms of glottal opening amplitude alone. First, not all aspirated sounds are produced with a wide glottal opening. In voiced aspirated stops, the glottis is only slightly open. This is the case for example in Nepali, shown in Figure 2. In this language, voiced aspirated stops are produced with a closed glottis during part of the closure phase and with a slightly open glottis during part of the closure and the release phase, with vocal folds vibrating throughout. Data from Hindi (Dixit 1989; Kagaya and Hirose 1975; and Benguerrel and Bhatia 1980) and Maithili (Yadav 1984) also show that aspirated voiced stops are produced with a narrow glottal opening. This limitation of the glottal width theory of [spread glottis] in characterizing voiced aspirates has already been pointed out by Ladefoged (1972: 77): “Since what are commonly called aspirated sounds can be made with two different degrees of glottal stricture (voiceless and murmur), it seems inadvisable to try to collapse the notion of aspiration within that of glottal stricture as has been suggested by Kim (1970).” This suggests that [spread glottis] should not be de¿ned solely in terms of glottal width.
Language-independent bases of distinctive features 271
Figure 2. Maximal glottal opening during the production of intervocalic [d ]ހin [bidހi] ‘procedure’ (left) and ¿nal [d ]ހin [bibid‘ ]ހvariety’ (right), as produced by a native speaker of Nepali. This maximal opening is produced during the release. These two ¿gures show that the glottis is only slightly open during aspirated voiced stops.
Second, aspiration as reÀected in the VOT duration of pre-vocalic voiceless stops does not always covary with maximal degree of glottal opening. In Hindi, for example, aspiration duration is not proportional to the degree of glottal opening amplitude (Dixit 1989). In Tashlhiyt Berber, geminate stops /tt, kk/ and their singleton counterparts /t, k/, though they are produced with virtually identical VOT durations (56 ms for singletons and 50 ms for geminates), have different glottal opening amplitudes. A photoelectroglottographic (PGG) study, based on one subject, showed that the geminates are systematically produced with larger glottal opening than singletons (data and procedures of the PGG study are described in Ridouane 2003: Chapter 4). This is illustrated in Figure 3, which is arranged so as to show the glottal opening of a minimal pair involving a geminate and a singleton stop in intervocalic position (see also Figure 7). In addition to amplitude differences, singletons and geminates are also different in terms of the timing of laryngeal-supralaryngeal gestures. While for singletons, the peak of glottal opening occurred at or closely around the release, for geminates peak glottal opening is timed well before. For dentals, the interval from peak glottal opening to stop release varies between 0 and 10 ms for singletons and between 55 and 120 ms for geminates. For velars, the interval varies between – 10 and 20 ms for singletons and between 55 and 70 ms for geminates (cf. Ridouane 2003). This suggests that radically different amplitude and timing patterns of glottal opening may lead to similar degrees of aspiration duration (this aspect of the glottal timing theory is dealt with in more detail in section 2.2.1. later).
Glottal opening amplitude
272
Rachid Ridouane, G. N. Clements, and Rajesh Khatiwada
Geminate
Singleton
–150
–50
0
50
Time (ms) Figure 3. Schematic illustration of the amplitude and duration of glottal opening during the production of a singleton and a geminate aspirated stop in Tashlhiyt. The vertical bar shows the point of oral release and the horizontal bar shows the degree of glottal opening at this point.
A third problem with the glottal width theory of [+spread glottis] is that a wide glottal opening during the production of a stop does not always result in an aspirated sound. Examples of unaspirated sounds produced with wide glottal opening include at least unaspirated geminate stops in Icelandic (Ladefoged and Maddieson 1996), voiceless stops in Kabiye (Rialland et al. 2009), and the voiceless uvular stop /q/ in Tashlhiyt (Ridouane 2003). Icelandic has three types of voiceless stops: unaspirated (e.g. [pp] in [kܧހppar] ‘young seal’), post-aspirated (e.g. [p ]ހin [kܧހpހar] ‘small pot’), and pre-aspirated (e.g. [hp] in [kܧހhpar] ‘small pot’). As Ladefoged and Maddieson (1996: 71) showed, based on data from ni Chasaide (1985), the degree of glottal aperture for post-aspirated [p ]ހis virtually identical to that of the unaspirated [pp], suggesting that glottal width alone is not the de¿ning characteristic of aspiration in this language (see also Pétursson 1976). Kabiye has a contrast between voiceless unaspirated and voiced stops. Fiberscopic data, drawn from the production of one subject, show that voiced stops are produced with an adducted glottis and vibrating vocal folds. Voiceless stops, on the other hand, are sometimes produced with a large glottal opening, as shown in Figure 4. These stops, however, never display aspiration.
Language-independent bases of distinctive features 273
1
2
3
4
5
6
Figure 4. State of the glottis during the production of unaspirated [t] in Kabiye, showing wide glottal opening size during the occlusion phase (boxes 2, 3, 4). Utterance: [eti] ‘he demolishes’.
Degree of glottal opening (arbitrary units)
In Tashlhiyt, where aspiration is not distinctive, /t/ and /k/ are aspirated, whereas the dorsopharyngealized /tޫ/ and the uvular /q/ are systematically produced with no aspiration and a VOT duration of less than 30 ms. Fiberscopic data from two subjects showed, however, that /q/ is produced with the largest glottal opening, whereas /tޫ/ is produced with the smallest. 4 3
/t/
4 3
/k/
2
2
Initial position
1
1
Intervocalic position
0 4
0 4
Final position
3
/q/
3
2
2
1
1
0
0
/tޫ/
40 ms
Figure 5. Averaged glottal pattern for Tashlhiyt voiceless stops in three word positions, based on 5 repetitions from one native speaker. The ¿gure shows that /q/, though unaspirated, is produced with a wider glottal opening, compared to phonetically aspirated [th] and [kh] and unaspirated dorsopharyngealised [tޫ].
274
Rachid Ridouane, G. N. Clements, and Rajesh Khatiwada
/t/ and /k/ display intermediate amplitudes (data and procedures of the ¿berscopic study are described in Ridouane 2003: Chapter 3). The reason why the Icelandic geminate /pp/, the Kabiye dental stop /t/, and the Tashlhiyt /q/ are not aspirated although they are produced with a large glottal opening can be related to how this laryngeal opening is aligned relative to oral release. In Tashlhiyt /q/, for example, the peak glottal opening is reached during the closure phase so that by the time this stop is released, the size of glottal opening is so small that the voicing for the following vowel starts few milliseconds after, thus yielding unaspirated stops. This suggests that timing relationships between laryngeal and supralaryngeal gestures is an important factor in the control of aspiration. As Kingston (1990) posits, aspiration can be implemented when the feature [spread glottis] is tightly “bound” to the release of a stop. In light of such facts, it might appear necessary to abandon the glottal width theory for the glottal timing theory, according to which aspiration is a function of the alignment of peak glottal opening with the point of release. 2.2. The glottal timing theory Studies by Löqfvist and colleagues on various Germanic languages have argued rather persuasively for the importance of laryngeal–oral timing relationships in contrasting aspirated and unaspirated plosives (Löfqvist 1980; Löfqvist and Pétursson 1978; Löfqvist and Yoshioka 1981; Munhall and Löfqvist 1992; Yoshioka, Löfqvist, and Hirose 1981). Based on data from Swedish, Löfqvist (1980) showed that the timing of laryngeal gesture in relation to supralaryngeal events is the primary factor in the control of aspiration: “Even if differences in peak glottal opening were a regular phenomenon in the production of different stop categories, it should be noted that, in the published studies, these size differences always appear to be accompanied by timing differences [...]. Thus it appears to be unwarranted to claim that the size difference is more basic than the timing difference.” Speci¿cally, he showed that if the glottal opening gesture starts at implosion and peak glottal opening occurs early during stop closure the stop is unaspirated, whereas if peak glottal opening occurs late during closure, aspiration results. For Löfqvist and Yoshioka (1981: 31): “Specifying glottal states along dimension of spread/constricted glottis and stiff/slack vocal cords [Halle and Stevens 1971] would thus not only seem to be at variance with the phonetic facts, but also to introduce unnecessary complications. The difference between postaspirated and unaspirated voiceless stops is rather
Language-independent bases of distinctive features 275
one of interarticulator timing than of spread versus constricted glottis.” Adopting this view, however, requires that timing relations or other dynamic information be incorporated into feature presentation. The theoretical issue here is whether timing must be speci¿ed in the de¿nition of the feature itself, at the level at which it is coordinated with other features as in Steriade’s aperture node model (1994), or at the level of gestural coordination in the sense of Browman & Goldstein’s (1986) articulatory phonology. We show below that this is an unnecessary complication, and that timing relations or other dynamic information need not be included in feature de¿nitions. 2.2.1.
Problems with glottal timing theory
Including timing information in the de¿nition of [spread glottis] would not be satisfactory to account for the full class of aspirated sounds, for at least three reasons: (1) In some aspirated sounds, peak glottal opening is not aligned with the release, (2) In fricative-stop clusters, aspiration can result from different interarticulatory timings, and (3) A voiceless stop can be produced with a wide glottal opening at the point of release without being aspirated. The ¿rst problem is illustrated by the voiceless sonorants, normally de¿ned as [+spread glottis]. Voiceless sonorants are contrastive in several languages, such as Icelandic [ni:ta] ‘to use’ vs. [nࡢ i:ta] to ‘knot’ and Burmese [na‘ ] ޞpain’ vs. [nࡢ a‘ ] ޞnose’. Data on airÀow during the production of voiceless sonorants in Burmese suggest a relatively wide glottal aperture. According to Ladefoged and Maddieson (1996, 113): ‘‘There is a high volume of airÀow ... suggesting that these nasals are produced with a wide open glottis and might therefore be characterized as aspirated’’. PGG data from Icelandic also show that voiceless sonorants are sometimes produced with a wide glottal opening size (Bombien 2006). The problem with the glottal timing theory is that for these aspirated sounds, peak glottal opening is not aligned with the release. The second problem for the glottal timing theory of [spread glottis] concerns the way it accounts for the presence or absence of aspiration in fricative-stop clusters (e.g. English speed). A common phonological account for stop de-aspiration in this context is that word-initial clusters contain a single speci¿cation of [spread glottis] shared between the fricative and the stop (Kingston 1990; Iverson and Salmons 1995). This analysis echoes the PGG and electromyographic studies of Löfqvist and colleagues on the time-course of glottal movement in consonant clusters in some Germanic languages (see e.g. Löfqvist 1990, for overview). These studies showed that
276
Rachid Ridouane, G. N. Clements, and Rajesh Khatiwada
a word-initial fricative-stop cluster (e.g. [#sk] in ‘I may scale’) is produced with only one glottal opening-closing gesture, with the peak reached during the fricative. In heteromorphemic sequences (e.g. [s#k ]ހin ‘my ace caves’), however, each fricative and stop requires a separate laryngeal peak. In other words, the stop [k] is aspirated in a heteromorphemic sequence but not in a tautomorphemic sequence, because it is associated with a separate glottal opening peak in the former but not in the latter. For Browman and Goldstein (1986), this single-peaked glottal opening is a phonological regularity of syllable-initial position in English, suggesting that it is a property of the whole syllable onset. They capture the relevant timing of laryngeal-oral coordination for stops and fricative-stops clusters in the following rule: If a fricative gesture is present, coordinate the peak glottal opening with the midpoint of the fricative. Otherwise, coordinate the peak glottal opening with the release of the stop gesture.
This rule has been tested over various clusters in German by Hoole, Fuchs and Dahlmeier (2003). Their results suggest that Browman and Goldstein’s (1986) rule does not appear to be completely accurate. The only generalization that could be made from their study is that if a fricative is present, the peak glottal opening almost always occurs within the fricative. In Tashlhiyt, the PGG analysis of fricative-stop as well stop-fricative clusters also shows that the peak glottal opening is not systematically coordinated with the midpoint of the fricative (Ridouane et al. 2006). As in German, the only generalization that can be drawn from Tashlhiyt data is that peak glottal opening is almost always located within the fricative both for stop-fricative and for fricativestop sequences. The timing of this opening peak tends to shift to a relatively earlier point in the fricative when it follows a stop (at 23.49% of the fricative) and to a later point in the fricative when it precedes a stop (at 66.06% of the fricative), regardless of the word boundary location. What is more interesting to the topic under issue is that Tashlhiyt stops are aspirated following fricatives in word-initial position (e.g. #sk in [skހijf] ‘make someone smoke’). Nevertheless, the results make clear that only one glottal opening peak occurs. In other words, stops can be aspirated after /s/ even if they share a single glottal gesture with the preceding fricative, and even if the peak glottal opening is not timed to coincide with the release of the stop. This pattern may well be related to the above-cited fact that voiceless geminates also show aspiration – in fact a very similar amount of aspiration to the singleton consonants: When the stop is released in [kkV] and [skV], the glottis has about the same size as it has in [kV], yielding similar aspiration values in all cases (Figure 6). In sum, different interarticulatory
Language-independent bases of distinctive features 277 /#sk/
glottal width at release
/k/ /kk/
–150
–50
0
50
Figure 6. Schematic illustration of the degree of glottal opening at the point of release for pre-vocalic /k/, /kk/ and /#sk/.
timings can result in the presence of aspiration after /s/: On the one hand, a large amplitude and a delay in peak glottal opening relative to fricative onset (as in Tashlhiyt); on the other, two peak glottal openings, each corresponding to one of the two obstruents (as in English). The third problem with the glottal timing theory is that it predicts aspiration in cases where stops satisfy the articulatory requirement, though no aspiration is present acoustically. We examine two cases: unaspirated voiceless stops in utterance-¿nal position and voiceless stops followed by an obstruent. In Tashlhiyt, voiceless stops are not aspirated in word-¿nal position. The patterns of glottal dynamics of these stops show that the glottis starts opening at the closure onset and continues to open towards a respiratory open position so that when stops are released, the glottis is largely open. This is illustrated in Figure 7 with the ¿nal unaspirated [k#] of [i¿k] ‘he gave you’. A similar wide glottis con¿guration has also been reported on voiceless word-¿nal stops in Moroccan Arabic (Zeroual 2000), English (Sawashima 1970), Korean (Sawashima et al. 1980), Maithili (Yadav 1984), and Swedish (Lindqvist 1972). In none of these languages, however, do voiceless stops display aspiration in this context. Stops are also unaspirated when followed by a fricative, as in English [læps], [dܭpș], or a stop as in English [daktr]. Though the glottal con¿guration of English stops in these positions has not been explicitly examined, one can infer from the PGG studies on English clusters that these stops are produced while the glottis is largely open. In the glottographic curves presented in Yoshioka, Löfqvist, and Hirose (1981), for instance, the ¿rst unaspirated /k/ of [sks#k] in ‘He masks cave’, is produced with a larger glottal opening than the second /k/, which is aspirated! Voiceless words in Tashlhiyt provide additional evidence that a stop can be produced with a large glottal opening at the point of release without being aspirated (e.g. in [tfkt] ‘you gave’,
278
Rachid Ridouane, G. N. Clements, and Rajesh Khatiwada
Figure 7. States of the glottis during the production of Tashlhiyt word-¿nal unaspirated [k] in [ifik] ‘he gave you’. The ¿gure shows that a segment may be produced with a large glottal opening at stop release without being aspirated.
[tkkststt] ‘you took it off’, [tsskݕftstt] ‘you dried it’). The stop consonants in [tkkststt], for instance, are not followed by aspiration noise. Yet, PGG examination of the abduction patterns during the production of this item indicates that the glottis is largely open at the release of these stops. This is illustrated in Figure 8, which shows the averaged glottographic pattern for the item [tkkststt]. As we can see, word-internal voiceless stops are released while the glottis is almost maximally open (the peak opening is reached during the two fricatives contained in the item). Data from voiceless stops followed by a voiceless obstruent show that a segment may satisfy the articulatory de¿nition of [spread glottis] without satisfying the acoustic de¿nition. This is because the glottal function is constrained by the degree of the constriction within the supraglottal vocal cavity. For the glottal opening to be manifested as aspiration, it must be timed to coincide, at least in part, with an unobstructed vocal tract (cf. Dixit 1993). That is, there must be no narrower constriction in the supralaryngeal cavities. This requirement holds for aspirated stops in prevocalic position and before non-homorganic sonorants. It is not satis¿ed, however, by stops before obstruents. 3.
On the acoustics of [spread glottis]
As already mentioned, one well established acoustic criterion of distinguishing aspirated and unaspirated voiceless stops is through the notion of positive
Language-independent bases of distinctive features 279 Amp. peaks: 3.3 Vel. peaks (ab): 5.8 Vel. peaks (ad): 4.3
–0.2
PGG2F(V)
–0.25
n=7
–0.3
t
kk
s
t
0.4
0.5
0.6
0.7
s
tt
–0.35 –0.4 –0.45 0.8
0.9
1
1.1
Time (s) Figure 8. Averaged glottographic pattern for the item [tkkststt] ‘you took it off’, as realised by a speaker of Tashlhiyt. The pattern indicates the duration, degree, and number of glottal-opening peaks. The vertical axis shows the amount of light in arbitrary unit. The dashed lines delimit the onset and offset of each segment. The ¿gure also displays the number of amplitude peaks as well as the number of abduction and adduction velocity peaks. The number of repetitions is indicated between parentheses. Arrows show how large the glottal opening is near the offset of the voiceless unaspirated stops /kk/ and /t/.
VOT, which is longer in the former (Lisker and Abramson 1964). While this is a highly effective measure for differentiating pre-vocalic aspirated and unaspirated stops in various languages, a number of problems arise in de¿ning aspiration in terms of VOT alone (Bhatia 1976; Dixit 1989; Tsui and Ciocca 2000; Cho and Ladefoged 1999; Jessen 2001; Vaux and Samuels 2005; Mikuteit and Reetz 2007). First, VOT theory cannot account for the presence of aspiration in word-¿nal position in the languages which maintain the contrast in this position (e.g. Eastern Armenian (Vaux 1998), Nepali minimal pairs like [ruk] ‘stop!’ (imp.) vs. [ruk‘ ]ހtree’ (Bandhu et al. 1971)), since there is no onset of voicing in this context. Second, VOT does not provide for a distinction between plain voiced and aspirated voiced stops such as /d/ vs. /dހ/ in Hindi, Maithili, and Nepali since they are produced with vibrating vocal folds throughout. Third, positive VOT alone cannot distinguish aspirated from unaspirated stops in languages contrasting
280
Rachid Ridouane, G. N. Clements, and Rajesh Khatiwada
ejectives and aspirated segments (e.g. Athabaskan languages, Oowekyala, Lezgian, Haisla, Hupa). In the Lezgian example, shown in Figure 9, aspirated /kh/ and ejective /k'/ have virtually identical VOT durations (see also data from Hupa presented in Cho and Ladefoged (1999), where aspirated /kh/ has a VOT duration of 84 ms and ejective /k'/ a duration of 80 ms). The acoustic information occurring within the period from stop release to voicing onset in aspirated stops is important both for characterizing and recovering the feature [spread glottis]. In Cantonese, for instance, Tsui and Ciocca (2000) showed that VOT per se is not a suf¿cient cue to the perception of aspiration. They manipulated the duration of the VOT interval of naturally produced initial aspirated and unaspirated stops to create long VOT conditions with or without aspiration noise between the release and the onset of voicing. They found that long VOT stimuli – manipulated by adding a silent interval between the burst and the onset of voicing of unaspirated stops – were perceived as unaspirated stops by native listeners. Following Fant (1973), we recognize three phases in the interval from the aspirated stop release to the onset of voicing: (1) Aperiodic transient noise known as release burst, when the pressure behind the constriction is released and the resulting abrupt increase in volume velocity excites the entire vocal tract; (2) Frication segment, when turbulent noise generated at the supraglottal constriction excites primarily the cavity in front of the constriction; and (3) An aspirated segment, when turbulent noise generated near the approximating vocal folds excites the entire vocal tract (see also Stevens 1998: 457–465).1 Adopting this view, we de¿ne aspiration as glottal frication,
kh
k’
Figure 9. Acoustic waveforms and spectrograms illustrating the virtually identical VOT durations of an aspirated [kh] in [khymekar] ‘help, pl.’ (left) and ejective [k'] in [sik’ar] ‘fox-pl.’ (right). In this example, aspirated [k]ހ has a VOT duration of 53 ms and ejective [k'] a VOT duration of 58 ms (courtesy of Ioana Chitoran).
Language-independent bases of distinctive features 281
displayed as a mid- and high-frequency formant pattern partly masked by noise.2 In other words, we contend, contra Kim (1970)’s view, that the aperiodic energy corresponding to aspiration noise is created not at the point of constriction of the following vowel but at the glottis. This acoustic de¿nition makes it possible to distinguish aspirated stops from not only ejectives presented above, but also from affricated stops which arise through stop assibilation. There is good evidence from various languages lacking aspiration contrast that the VOT of /t/ is longer when followed by /i/ compared to /a/. /ti/ in this context is allophonically realized [tsi]. Languages in which this is the case include Maori (Maclagan et al. 2009), Moroccan Arabic (Shoul 2007), Japanese, Romanian, Cheyene, E¿k, Canadian French (Kim 2001; see also Hall and Hamann 2006, for a cross-linguistic review). These affricated stops, like aspirated stops, are produced with a positive voicing lag. They differ, however, in the acoustic information occurring within this lag. The turbulent noise produced after the burst of affricated stops is created, not at the glottis, but at the point of constriction for the following vowel whose con¿guration is formed through coarticulation, during the stop. Their affricate-like properties involve supralaryngeal constriction (Kim 2001). Affricated stops may be produced with a large glottal opening, but this glottis area is greater than the oral constriction area so that the frication noise generated at the oral constriction becomes dominant over aspiration noise at the glottis. The [spread glottis] contrast may be signaled by additional acoustic cues, which may vary depending on the speaker and the structural context in which the feature occurs. In pre-vocalic position, where aspirated sounds are most commonly attested, aspiration may be additionally cued by increased F0 values in the ¿rst periods of the following vowel. According to Stevens (1998), this increased fundamental frequency of glottal vibrations is presumably a result of increased stiffness of the vocal folds that is an attribute of these voiceless consonants. Languages with this cue include Cantonese (Zee 1980), Mandarin Chinese (Iwata and Hirose 1976), and Nepali (Clements and Khatiwada 2007). Aspiration can also be cued by a greater difference in amplitude between the ¿rst and second harmonics in the ¿rst few glottal pulses of the vowel. Languages with this acoustic cue include English (Chapin Ringo 1988), German (Jessen 1998), and Nepali (Clements and Khatiwada 2007). 4. A new proposal: Combine articulation and acoustics In the approach assumed here, a segment can be said to bear a distinctive feature F at the phonetic level only if it satis¿es both its articulatory and
282
Rachid Ridouane, G. N. Clements, and Rajesh Khatiwada
acoustic de¿nition. We have shown that an articulatory de¿nition of the feature [+spread glottis] in terms of a single common glottal con¿guration or gesture is problematic and would be insuf¿cient to account for the full range of speech sounds characterized by this feature. Indeed, different glottal sizes and different interarticulator timing of laryngeal and supralaryngeal gesture can result in aspiration. From this point of view, we suggest that the class of aspirated segments must be de¿ned in both articulatory and acoustic terms, as in (1). (1) De¿ning attributes of [+spread glottis]: a. (Articulatory) presence of a glottal noise source b. (Acoustic) presence of aspiration noise, i.e. aperiodic energy in the second and higher formants, with a duration of around 30 ms or more in deliberate speech. The suggested de¿nition does not require that timing relations or other dynamic information be included in feature de¿nitions. Timing relations follow from the requirement that the acoustic goal associated with the feature be manifested in the signal. This requirement is satis¿ed by voiceless stops in contexts where they are followed by signi¿cant aspiration: stops before vowels, stops before sonorants, and stops in ¿nal position (for languages contrasting aspirated and unaspirated stops in this context). Figure 10 illustrates the characteristics of the aspiration phase that follows a pre-vocalic voiceless stop. F2, F3, and F4 patterns are visible during this phase. It is also satis¿ed by voiced aspirated stops. In Nepali /dha/, for example, a fricative segment and a voiced aspirated segment can be seen in the transition from the voiced stop release to a vowel (Figure 11). This de¿nition of aspiration extends to the Burmese voiceless nasals mentioned earlier and to aspirated fricatives that have been documented in Burmese (Ladefoged and Maddieson 1996) and Korean (Kagaya 1974). It is not satis¿ed, however, by stops which are not followed by aspiration, even if they are produced with a spread glottis: pre-vocalic voiceless stops with open glottis, but no aspiration (as in Kabiye) and stops before fricatives (as in English and Tashlhiyt). The suggested de¿nition is not satis¿ed by plain fricatives, even though they are produced with a spread glottis. The reason is that their glottal opening tends to coincide with a narrower supralaryngeal constriction, so that oral noise becomes dominant over glottal noise. This large glottal opening, which may be considered as an enhancing gesture, is due to the aerodynamics of these segments, according to Löfqvist & McGarr (1987:
Language-independent bases of distinctive features 283
Figure 10. Spectrogram of Tashlhiyt [tހutހid] ‘she passes’ (scale: 0–5 KHz) illustrating the acoustic characteristics of the aspiration phase that follows the burst of a voiceless stop.
399): “… a large glottal opening [in fricatives] not only prevents voicing but also reduces laryngeal resistance to air Àow and assists in the build-up of oral pressure necessary for driving the noise source.” Differences in glottal opening amplitude between anterior and posterior fricatives in languages such Moroccan Arabic (Zeroual 2000) and Tashlhiyt (Ridouane 2003) provide additional evidence that the constriction at glottal and supraglottal levels have to be adjusted to meet the aerodynamics of fricatives. While
Figure 11. Spectrogram of the Nepali item [dހada] (scale: 0–5 KHz) illustrating the acoustic characteristics of the aspiration phase following a voiced aspirated stop.
284
Rachid Ridouane, G. N. Clements, and Rajesh Khatiwada
[f] and [s] exhibit almost the same glottal opening, the backer fricatives, produced with a less narrow constriction, are clearly produced with a larger opening amplitude, yielding a relationship of the form f=s